Evaluating Autoencoders as a Dimensionality Reduction Mechanism to Support Clustering Brazilian Agricultural Diversity
Conteúdo do artigo principal
Resumo
Brazilian agricultural production presents high spatial diversity, challenging the conception of public policies. This article proposes an approach for grouping Brazilian municipalities according to their agricultural production. We combine a feature extraction using autoencoders and clustering based on k-means and Self-Organizing Maps. We used panel data from IBGE’s annual estimates of the production value of permanent and temporary crops, animal products, aquaculture, plant extractivism, forestry, planted areas, and herd population between 1999 and 2018. We analyzed different structures of simple stacked and incomplete autoencoders, varying the number of layers and neurons in each, and evaluated the asymmetric exponential linear loss function to handle the sparse data. We applied the Isomap, Kernel PCA, Truncated SVD, and MDS dimensionality reduction methods for comparative purposes. Results showed that the autoencoders could extract characteristics from the transformed raw data to allow the clustering of municipalities to reveal regional and even
intra-regional patterns. The autoencoders improved comparative performance as the intrinsic dimensionality increased.
Downloads
Métricas
Detalhes do artigo
Esta obra está licenciado com uma Licença Creative Commons Attribution 3.0 Unported License.
Autores que publicam nesta revista concordam com os seguintes termos:
- Autores mantém os direitos autorais e concedem à revista o direito de primeira publicação, com o trabalho simultaneamente licenciado sob a Licença Creative Commons Atribuição que permite o compartilhamento do trabalho com reconhecimento da autoria e publicação inicial nesta revista.
- Autores têm autorização para assumir contratos adicionais separadamente, para distribuição não-exclusiva da versão do trabalho publicada nesta revista (ex.: publicar em repositório institucional ou como capítulo de livro), com reconhecimento de autoria e publicação inicial nesta revista.
- Autores têm permissão e são estimulados a publicar e distribuir seu trabalho online (ex.: em repositórios institucionais ou na sua página pessoal) a qualquer ponto antes ou durante o processo editorial, já que isso pode gerar alterações produtivas, bem como aumentar o impacto e a citação do trabalho publicado (veja "O Efeito do Acesso Aberto").
Referências
BERK, R. Asymmetric loss functions for forecasting in criminal justice settings. Journal of Quantitative Criminology, v. 27, n. 1, p. 107–123, 2011.
CALDEIRA, Charly; PARRÉ, José Luiz. Diversificação agropecuária e desenvolvimento rural no bioma Cerrado. Revista Americana de Empreendedorismo e Inovação, v. 2, n. 1, p. 344–359, 2020.
CHARTE, David; CHARTE, Francisco; GARCÍA, Salvador; JESUS, María J.del; HERRERA, Francisco. A practical tutorial on autoencoders for nonlinear feature fusion: Taxonomy, models, software and guidelines. Information Fusion, v. 44, p. 78–96, 2018. DOI: 10.1016/j.inffus.2017.12.007.
DAVIES, David L.; BOULDIN, Donald W. A Cluster Separation Measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, v. 2, PAMI-1, p. 224–227, 1979. DOI: 10.1109/TPAMI.1979.4766909.
DONFOUET, Hermann Pythagore Pierre; BARCZAK, Aleksandra; DÉTANG-DESSENDRE, Cécile; MAIGNÉ, Elise. Crop Production and Crop Diversity in France: A Spatial Analysis. Ecological Economics, v. 134, p. 29–39, 2017.
DRESS, Korbinian; LESSMANN, Stefan; METTENHEIM, Hans-Jörgvon. Residual value forecasting using asymmetric cost functions. International Journal of Forecasting, v. 34, n. 4, p. 551–565, 2018. DOI: 10.1016/j.ijforecast.2018.01.008.
DU, Guowang; ZHOU, Lihua; YANG, Yudi; LÜ, Kevin;WANG, Lizhen. Deep Multiple Auto-Encoder- Based Multi-view Clustering. Data Science and Engineering, v. 6, p. 323–338, 2021. DOI: 10.1007/s41019-021-00159-z.
FALISSARD, L.; FAGHREAZZI, G.; HOWARD, N.; FALISSARD, B. Deep clustering of longitudinal data. ArXiv, 2018.
FATCH, Paul; MASANGANO, Charles; HILGER, Thomas; JORDAN, Irmgard; MAMBO, Isaac; FRANCESCA, Judith; KAMOTO, Mangani; KALIMBIRA, Alexander; NUPPENAU, Ernst- August. Holistic agricultural diversity index as a measure of agricultural diversity: A crosssectional study of smallholder farmers in Lilongwe district of Malawi. Agricultural Systems, v. 187, p. 102991, 2021.
GENOLINI, Christophe; ALACOQUE, Xavier; SENTENAC, Mariane; ARNAUD, Catherine. kml and kml3d: R Packages to Cluster Longitudinal Data. Journal of Statistical Software, v. 65, n. 4, p. 1–34, 2015. DOI: 10.18637/jss.v065.i04.
GUO, X.; LIU, X.; ZHU, E.; YIN, J. Deep Clustering with Convolutional Autoencoders. Lecture Notes in Computer Science, n. 10635, p. 373–382, 2017. DOI: 10.1007/978-3-319-70096-0_39.
GUPTA, D.; HAZARIKA, B. B.; BERLIN, M. Robust regularized extreme learning machine with asymmetric huber loss function. Neural Computing and Applications, v. 32, p. 12971–12998, 2020.
HALKIDI, M.; VAZIRGIANNIS, M.Adensity-based cluster validity approach using multi-representatives. Pattern Recognition Letters, v. 29, p. 773–786, 2008.
HALKO, Nathan; MARTINSSON, Per-Gunnar; TROPP, Joel A. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. arXiv, 2009. DOI: 10.48550/ARXIV.0909.4061.
HUBER, P. J. Robust estimation of a location parameter. The Annals of Mathematical Statistics, v. 35, n. 1, p. 73–101, 1964.
IBGE. Tabelas 74, 94, 289, 291, 1612, 1613, 3939 e 3940: sistema IBGE de Recuperação Automática. Rio de Janeiro: IBGE, 2021. Available at https://sidra.ibge.gov.br (2021/06/15).
KHATUN, N.; MATIN, M. A. A Study on LINEX Loss Function with Different Estimating Methods. Open Journal of Statistics, v. 10, p. 52–63, 2020. DOI: 10.4236/ojs.2020.101004.
KOHONEN, T. Essentials of the self-organizing map. Neural Networks, v. 37, p. 52–65, 2013.
KOHONEN, Teuvo. Self-Organizing Maps. Berlin: Springer, 2001.
KOHONEN, Teuvo; HYNNINEN, Jussi; KANGAS, Jari; LAAKSONEN, Jorma. SOM PAK: The
Self-Organizing Map Program Package. A31. Espoo: Finland, 1996.
KRUSKAL, J. B. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, v. 29, n. 1, p. 1–27, 1964.
KUHN, H.W. The Hungarian method for the assignment problem.NavalResearch Logistics Quarterly, v. 2, n. 1-2, p. 83–97, 1955. DOI: 10.1002/nav.3800020109.
LECUN, Yann; BENGIO, Yoshua; HINTON, Geoffrey. Deep learning. Nature, v. 521, n. 7553, p. 436–444, 2015. DOI: 10.1038/nature14539.
MIN, E.; GUO, X.; LIU, Q.; ZHANG, G.; CUI, J.; LONG, J. A Survey of Clustering with Deep
Learning: From the Perspective of Network Architecture. IEEE Access, v. 6, p. 39501–39514, 2018. DOI: 10.1109/ACCESS.2018.2855437.
MOHAMMED, M.A.; ALSHANBARI, Huda M.; EL-BAGOURY, Abdal-Aziz H. Application of
the LINEX Loss Function with a Fundamental Derivation of Liu Estimator. Computational Intelligence and Neuroscience, n. 2307911, p. 1–9, 2022. Artificial Intelligence and Machine Learning-Driven Decision-Making. DOI: 10.1155/2022/2307911.
MÜLLER, Klaus-Robert; MIKA, Sebastian; RÄTSCH, Gunnar; TSUDA, Koji; SCHÖLKOPF, Bernhard. An Introduction to Kernel-Based Learning Algorithms. IEEE Transactions on Neural Networks, v. 12, n. 2, p. 181–201, 2001.
PIEDRA-BONILLA, Elena Beatriz;BRAGA, CíceroAugusto S.;BRAGA, Marcelo José. Diversificação agropecuária no Brasil: conceitos e aplicações em nível municipal. Revista de Agronomia e Agronegócio, v. 18, n. 2, p. 1–28, 2020.
ROUSSEEUW, Peter J. Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis.Computational and Applied Mathematics, v. 20, p. 53–65, 1987. DOI: 10.1016/0377-0427(87)90125-7.
SALES, C.M.C.F.; RODRIGUES, R.N. Espaço rural brasileiro: diversificação e peculiaridades. Revista Espinhaço, v. 8, n. 1, p. 54–65, 2019. DOI: 10.5281/zenodo.3345145.
SAMBUICHI, R.H.R.; GALINDO, E.P.; PEREIRA, R.M.; CONSTANTINO, M.; RABETTI, M.d.S.
Diversidade da produção nos estabelecimentos da agricultura familiar no Brasil: uma
análise econométrica baseada no cadastro da declaração de aptidão ao PRONAF (DAP). v. 2202. Brasília: Rio de Janeiro, 2016. (Texto para discussão).
SCHÖLKOPF, Bernhard; SMOLA, Alex; MÜLLER, Klaus-Robert. Nonlinear Component Analysis as a Kernel Eigenvalue Problem. Neural Computation, v. 10, n. 5, p. 1299–1319, 1998. SHANNON, Claude E. A mathematical theory of communication. The Bell system technical journal, Nokia Bell Labs, v. 27, n. 3, p. 379–423, 1948.
SILVA, M. A. S. da; MATOS, Leonardo N.; O. SANTOS, Flávio E. de; DOMPIERI, Márcia
H. G.; MOURA, Fábio R. de. Data and R script - Tracking the Connection Between
Brazilian Agricultural Diversity and Native Vegetation Change by a Machine Learning
Approach. São Francisco: Github, 2022. Available at: <https://github.com/marcos-silvainf/SOMSpatialPanelData>.
SILVA, M. A. S. da; MATOS, Leonardo Nogueira; SANTOS, Flavio Emanuel de Oliveira; DOMPIERI,Marcia Helena Galina; MOURA, Fabio Rodrigues de. Tracking the Connection Between BrazilianAgricultural Diversity andNativeVegetation Change by a Machine Learning Approach. IEEE Latin America Transactions, v. 20, n. 11, p. 2371–2380, ago. 2022. Special Issue on Artificial Intelligence for Sustainability. DOI: 10.1109/tla.2022.9904762.
SIMPSON, Edward H. Measurement of diversity. Nature, v. 163, n. 4148, p. 688–688, 1949.
SONG, C.; Y, Y Huang; LIU, F.;WANG, Z.;WANG, L. Deep auto-encoder based clustering. Intelligent Data Analysis, v. 18, n. 6, s65–s76, 2014. DOI: 10.3233/IDA-140709.
TEIXEIRA, M.L.C.; RIBEIRO, S.M.C. Agricultura e paisagens sustentáveis: a diversidade produtiva do setor agrícola de Minas Gerais, Brasil. Sustainability in Debate, v. 11, n. 2, p. 29–41, 2020.
TENENBAUM, J. B.; SILVA, V. de; LANGFORD, J. C. A global geometric framework for nonlinear dimensionality reduction. Science, v. 290, p. 2319–2323, 2000.
VARIAN, H. R. A bayesian approach to real estate assessment. Studies in Bayesian Econometric and Statistics in Honor of Leonard J. Savage, v. 5, p. 195–208, 1975.
VINH, Nguyen Xuan; EPPS, Julien; BAILEY, James. Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance. Journal of Machine Learning Research, v. 11, p. 2837–2854, 2010.
XU, Chaoyang; DAI, Yuanfei; LIN, Renjie; WANG, Shiping. Deep clustering by maximizing mutual information in variational auto-encoder. Knowledge-Based Systems, v. 205, n. 106260, set. 2020. DOI: 10.1016/j.knosys.2020.106260.