Evaluating Autoencoders as a Dimensionality Reduction Mechanism to Support Clustering Brazilian Agricultural Diversity
Main Article Content
Abstract
Brazilian agricultural production presents high spatial diversity, challenging the conception of public policies. This article proposes an approach for grouping Brazilian municipalities according to their agricultural production. We combine a feature extraction using autoencoders and clustering based on k-means and Self-Organizing Maps. We used panel data from IBGE’s annual estimates of the production value of permanent and temporary crops, animal products, aquaculture, plant extractivism, forestry, planted areas, and herd population between 1999 and 2018. We analyzed different structures of simple stacked and incomplete autoencoders, varying the number of layers and neurons in each, and evaluated the asymmetric exponential linear loss function to handle the sparse data. We applied the Isomap, Kernel PCA, Truncated SVD, and MDS dimensionality reduction methods for comparative purposes. Results showed that the autoencoders could extract characteristics from the transformed raw data to allow the clustering of municipalities to reveal regional and even
intra-regional patterns. The autoencoders improved comparative performance as the intrinsic dimensionality increased.
Downloads
Metrics
Article Details
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
Authors who publish in this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors can enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) before and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (see "The Effect of Open Access").
References
BERK, R. Asymmetric loss functions for forecasting in criminal justice settings. Journal of Quantitative Criminology, v. 27, n. 1, p. 107–123, 2011.
CALDEIRA, Charly; PARRÉ, José Luiz. Diversificação agropecuária e desenvolvimento rural no bioma Cerrado. Revista Americana de Empreendedorismo e Inovação, v. 2, n. 1, p. 344–359, 2020.
CHARTE, David; CHARTE, Francisco; GARCÍA, Salvador; JESUS, María J.del; HERRERA, Francisco. A practical tutorial on autoencoders for nonlinear feature fusion: Taxonomy, models, software and guidelines. Information Fusion, v. 44, p. 78–96, 2018. DOI: 10.1016/j.inffus.2017.12.007.
DAVIES, David L.; BOULDIN, Donald W. A Cluster Separation Measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, v. 2, PAMI-1, p. 224–227, 1979. DOI: 10.1109/TPAMI.1979.4766909.
DONFOUET, Hermann Pythagore Pierre; BARCZAK, Aleksandra; DÉTANG-DESSENDRE, Cécile; MAIGNÉ, Elise. Crop Production and Crop Diversity in France: A Spatial Analysis. Ecological Economics, v. 134, p. 29–39, 2017.
DRESS, Korbinian; LESSMANN, Stefan; METTENHEIM, Hans-Jörgvon. Residual value forecasting using asymmetric cost functions. International Journal of Forecasting, v. 34, n. 4, p. 551–565, 2018. DOI: 10.1016/j.ijforecast.2018.01.008.
DU, Guowang; ZHOU, Lihua; YANG, Yudi; LÜ, Kevin;WANG, Lizhen. Deep Multiple Auto-Encoder- Based Multi-view Clustering. Data Science and Engineering, v. 6, p. 323–338, 2021. DOI: 10.1007/s41019-021-00159-z.
FALISSARD, L.; FAGHREAZZI, G.; HOWARD, N.; FALISSARD, B. Deep clustering of longitudinal data. ArXiv, 2018.
FATCH, Paul; MASANGANO, Charles; HILGER, Thomas; JORDAN, Irmgard; MAMBO, Isaac; FRANCESCA, Judith; KAMOTO, Mangani; KALIMBIRA, Alexander; NUPPENAU, Ernst- August. Holistic agricultural diversity index as a measure of agricultural diversity: A crosssectional study of smallholder farmers in Lilongwe district of Malawi. Agricultural Systems, v. 187, p. 102991, 2021.
GENOLINI, Christophe; ALACOQUE, Xavier; SENTENAC, Mariane; ARNAUD, Catherine. kml and kml3d: R Packages to Cluster Longitudinal Data. Journal of Statistical Software, v. 65, n. 4, p. 1–34, 2015. DOI: 10.18637/jss.v065.i04.
GUO, X.; LIU, X.; ZHU, E.; YIN, J. Deep Clustering with Convolutional Autoencoders. Lecture Notes in Computer Science, n. 10635, p. 373–382, 2017. DOI: 10.1007/978-3-319-70096-0_39.
GUPTA, D.; HAZARIKA, B. B.; BERLIN, M. Robust regularized extreme learning machine with asymmetric huber loss function. Neural Computing and Applications, v. 32, p. 12971–12998, 2020.
HALKIDI, M.; VAZIRGIANNIS, M.Adensity-based cluster validity approach using multi-representatives. Pattern Recognition Letters, v. 29, p. 773–786, 2008.
HALKO, Nathan; MARTINSSON, Per-Gunnar; TROPP, Joel A. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. arXiv, 2009. DOI: 10.48550/ARXIV.0909.4061.
HUBER, P. J. Robust estimation of a location parameter. The Annals of Mathematical Statistics, v. 35, n. 1, p. 73–101, 1964.
IBGE. Tabelas 74, 94, 289, 291, 1612, 1613, 3939 e 3940: sistema IBGE de Recuperação Automática. Rio de Janeiro: IBGE, 2021. Available at https://sidra.ibge.gov.br (2021/06/15).
KHATUN, N.; MATIN, M. A. A Study on LINEX Loss Function with Different Estimating Methods. Open Journal of Statistics, v. 10, p. 52–63, 2020. DOI: 10.4236/ojs.2020.101004.
KOHONEN, T. Essentials of the self-organizing map. Neural Networks, v. 37, p. 52–65, 2013.
KOHONEN, Teuvo. Self-Organizing Maps. Berlin: Springer, 2001.
KOHONEN, Teuvo; HYNNINEN, Jussi; KANGAS, Jari; LAAKSONEN, Jorma. SOM PAK: The
Self-Organizing Map Program Package. A31. Espoo: Finland, 1996.
KRUSKAL, J. B. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, v. 29, n. 1, p. 1–27, 1964.
KUHN, H.W. The Hungarian method for the assignment problem.NavalResearch Logistics Quarterly, v. 2, n. 1-2, p. 83–97, 1955. DOI: 10.1002/nav.3800020109.
LECUN, Yann; BENGIO, Yoshua; HINTON, Geoffrey. Deep learning. Nature, v. 521, n. 7553, p. 436–444, 2015. DOI: 10.1038/nature14539.
MIN, E.; GUO, X.; LIU, Q.; ZHANG, G.; CUI, J.; LONG, J. A Survey of Clustering with Deep
Learning: From the Perspective of Network Architecture. IEEE Access, v. 6, p. 39501–39514, 2018. DOI: 10.1109/ACCESS.2018.2855437.
MOHAMMED, M.A.; ALSHANBARI, Huda M.; EL-BAGOURY, Abdal-Aziz H. Application of
the LINEX Loss Function with a Fundamental Derivation of Liu Estimator. Computational Intelligence and Neuroscience, n. 2307911, p. 1–9, 2022. Artificial Intelligence and Machine Learning-Driven Decision-Making. DOI: 10.1155/2022/2307911.
MÜLLER, Klaus-Robert; MIKA, Sebastian; RÄTSCH, Gunnar; TSUDA, Koji; SCHÖLKOPF, Bernhard. An Introduction to Kernel-Based Learning Algorithms. IEEE Transactions on Neural Networks, v. 12, n. 2, p. 181–201, 2001.
PIEDRA-BONILLA, Elena Beatriz;BRAGA, CíceroAugusto S.;BRAGA, Marcelo José. Diversificação agropecuária no Brasil: conceitos e aplicações em nível municipal. Revista de Agronomia e Agronegócio, v. 18, n. 2, p. 1–28, 2020.
ROUSSEEUW, Peter J. Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis.Computational and Applied Mathematics, v. 20, p. 53–65, 1987. DOI: 10.1016/0377-0427(87)90125-7.
SALES, C.M.C.F.; RODRIGUES, R.N. Espaço rural brasileiro: diversificação e peculiaridades. Revista Espinhaço, v. 8, n. 1, p. 54–65, 2019. DOI: 10.5281/zenodo.3345145.
SAMBUICHI, R.H.R.; GALINDO, E.P.; PEREIRA, R.M.; CONSTANTINO, M.; RABETTI, M.d.S.
Diversidade da produção nos estabelecimentos da agricultura familiar no Brasil: uma
análise econométrica baseada no cadastro da declaração de aptidão ao PRONAF (DAP). v. 2202. Brasília: Rio de Janeiro, 2016. (Texto para discussão).
SCHÖLKOPF, Bernhard; SMOLA, Alex; MÜLLER, Klaus-Robert. Nonlinear Component Analysis as a Kernel Eigenvalue Problem. Neural Computation, v. 10, n. 5, p. 1299–1319, 1998. SHANNON, Claude E. A mathematical theory of communication. The Bell system technical journal, Nokia Bell Labs, v. 27, n. 3, p. 379–423, 1948.
SILVA, M. A. S. da; MATOS, Leonardo N.; O. SANTOS, Flávio E. de; DOMPIERI, Márcia
H. G.; MOURA, Fábio R. de. Data and R script - Tracking the Connection Between
Brazilian Agricultural Diversity and Native Vegetation Change by a Machine Learning
Approach. São Francisco: Github, 2022. Available at: <https://github.com/marcos-silvainf/SOMSpatialPanelData>.
SILVA, M. A. S. da; MATOS, Leonardo Nogueira; SANTOS, Flavio Emanuel de Oliveira; DOMPIERI,Marcia Helena Galina; MOURA, Fabio Rodrigues de. Tracking the Connection Between BrazilianAgricultural Diversity andNativeVegetation Change by a Machine Learning Approach. IEEE Latin America Transactions, v. 20, n. 11, p. 2371–2380, ago. 2022. Special Issue on Artificial Intelligence for Sustainability. DOI: 10.1109/tla.2022.9904762.
SIMPSON, Edward H. Measurement of diversity. Nature, v. 163, n. 4148, p. 688–688, 1949.
SONG, C.; Y, Y Huang; LIU, F.;WANG, Z.;WANG, L. Deep auto-encoder based clustering. Intelligent Data Analysis, v. 18, n. 6, s65–s76, 2014. DOI: 10.3233/IDA-140709.
TEIXEIRA, M.L.C.; RIBEIRO, S.M.C. Agricultura e paisagens sustentáveis: a diversidade produtiva do setor agrícola de Minas Gerais, Brasil. Sustainability in Debate, v. 11, n. 2, p. 29–41, 2020.
TENENBAUM, J. B.; SILVA, V. de; LANGFORD, J. C. A global geometric framework for nonlinear dimensionality reduction. Science, v. 290, p. 2319–2323, 2000.
VARIAN, H. R. A bayesian approach to real estate assessment. Studies in Bayesian Econometric and Statistics in Honor of Leonard J. Savage, v. 5, p. 195–208, 1975.
VINH, Nguyen Xuan; EPPS, Julien; BAILEY, James. Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance. Journal of Machine Learning Research, v. 11, p. 2837–2854, 2010.
XU, Chaoyang; DAI, Yuanfei; LIN, Renjie; WANG, Shiping. Deep clustering by maximizing mutual information in variational auto-encoder. Knowledge-Based Systems, v. 205, n. 106260, set. 2020. DOI: 10.1016/j.knosys.2020.106260.