Evaluating Autoencoders as a Dimensionality Reduction Mechanism to Support Clustering Brazilian Agricultural Diversity

Main Article Content

Marcos Aurélio Santos da Silva
https://orcid.org/0000-0002-5367-2869
Leonardo Nogueira Matos
https://orcid.org/0000-0002-6302-3299
Gastão Florêncio Miranda Júnior
https://orcid.org/0000-0002-0967-6141
Flávio Emanuel de Oliveira Santos
https://orcid.org/0000-0002-7041-5581
Márcia Helena Galina Dompieri
https://orcid.org/0000-0001-7689-1602
Fábio Rodrigues de Moura
https://orcid.org/0000-0002-6532-110X
Fabrícia Karollyne Santos Resende
https://orcid.org/0000-0001-8010-6304

Abstract

Brazilian agricultural production presents high spatial diversity, challenging the conception of public policies. This article proposes an approach for grouping Brazilian municipalities according to their agricultural production. We combine a feature extraction using autoencoders and clustering based on k-means and Self-Organizing Maps. We used panel data from IBGE’s annual estimates of the production value of permanent and temporary crops, animal products, aquaculture, plant extractivism, forestry, planted areas, and herd population between 1999 and 2018. We analyzed different structures of simple stacked and incomplete autoencoders, varying the number of layers and neurons in each, and evaluated the asymmetric exponential linear loss function to handle the sparse data. We applied the Isomap, Kernel PCA, Truncated SVD, and MDS dimensionality reduction methods for comparative purposes. Results showed that the autoencoders could extract characteristics from the transformed raw data to allow the clustering of municipalities to reveal regional and even
intra-regional patterns. The autoencoders improved comparative performance as the intrinsic dimensionality increased.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Article Details

How to Cite
SILVA, M. A. S. da; MATOS, L. N.; MIRANDA JÚNIOR, G. F.; SANTOS, F. E. de O.; DOMPIERI, M. H. G.; MOURA, F. R. de; RESENDE, F. K. S. Evaluating Autoencoders as a Dimensionality Reduction Mechanism to Support Clustering Brazilian Agricultural Diversity. Revista Brasileira de Cartografia, [S. l.], v. 75, 2023. DOI: 10.14393/rbcv75n0a-68733. Disponível em: https://seer.ufu.br/index.php/revistabrasileiracartografia/article/view/68733. Acesso em: 22 jul. 2024.
Section
Special Section "Brazilian Symposium on GeoInformatics"

References

BERK, R. Asymmetric loss functions for forecasting in criminal justice settings. Journal of Quantitative Criminology, v. 27, n. 1, p. 107–123, 2011.

CALDEIRA, Charly; PARRÉ, José Luiz. Diversificação agropecuária e desenvolvimento rural no bioma Cerrado. Revista Americana de Empreendedorismo e Inovação, v. 2, n. 1, p. 344–359, 2020.

CHARTE, David; CHARTE, Francisco; GARCÍA, Salvador; JESUS, María J.del; HERRERA, Francisco. A practical tutorial on autoencoders for nonlinear feature fusion: Taxonomy, models, software and guidelines. Information Fusion, v. 44, p. 78–96, 2018. DOI: 10.1016/j.inffus.2017.12.007.

DAVIES, David L.; BOULDIN, Donald W. A Cluster Separation Measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, v. 2, PAMI-1, p. 224–227, 1979. DOI: 10.1109/TPAMI.1979.4766909.

DONFOUET, Hermann Pythagore Pierre; BARCZAK, Aleksandra; DÉTANG-DESSENDRE, Cécile; MAIGNÉ, Elise. Crop Production and Crop Diversity in France: A Spatial Analysis. Ecological Economics, v. 134, p. 29–39, 2017.

DRESS, Korbinian; LESSMANN, Stefan; METTENHEIM, Hans-Jörgvon. Residual value forecasting using asymmetric cost functions. International Journal of Forecasting, v. 34, n. 4, p. 551–565, 2018. DOI: 10.1016/j.ijforecast.2018.01.008.

DU, Guowang; ZHOU, Lihua; YANG, Yudi; LÜ, Kevin;WANG, Lizhen. Deep Multiple Auto-Encoder- Based Multi-view Clustering. Data Science and Engineering, v. 6, p. 323–338, 2021. DOI: 10.1007/s41019-021-00159-z.

FALISSARD, L.; FAGHREAZZI, G.; HOWARD, N.; FALISSARD, B. Deep clustering of longitudinal data. ArXiv, 2018.

FATCH, Paul; MASANGANO, Charles; HILGER, Thomas; JORDAN, Irmgard; MAMBO, Isaac; FRANCESCA, Judith; KAMOTO, Mangani; KALIMBIRA, Alexander; NUPPENAU, Ernst- August. Holistic agricultural diversity index as a measure of agricultural diversity: A crosssectional study of smallholder farmers in Lilongwe district of Malawi. Agricultural Systems, v. 187, p. 102991, 2021.

GENOLINI, Christophe; ALACOQUE, Xavier; SENTENAC, Mariane; ARNAUD, Catherine. kml and kml3d: R Packages to Cluster Longitudinal Data. Journal of Statistical Software, v. 65, n. 4, p. 1–34, 2015. DOI: 10.18637/jss.v065.i04.

GUO, X.; LIU, X.; ZHU, E.; YIN, J. Deep Clustering with Convolutional Autoencoders. Lecture Notes in Computer Science, n. 10635, p. 373–382, 2017. DOI: 10.1007/978-3-319-70096-0_39.

GUPTA, D.; HAZARIKA, B. B.; BERLIN, M. Robust regularized extreme learning machine with asymmetric huber loss function. Neural Computing and Applications, v. 32, p. 12971–12998, 2020.

HALKIDI, M.; VAZIRGIANNIS, M.Adensity-based cluster validity approach using multi-representatives. Pattern Recognition Letters, v. 29, p. 773–786, 2008.

HALKO, Nathan; MARTINSSON, Per-Gunnar; TROPP, Joel A. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. arXiv, 2009. DOI: 10.48550/ARXIV.0909.4061.

HUBER, P. J. Robust estimation of a location parameter. The Annals of Mathematical Statistics, v. 35, n. 1, p. 73–101, 1964.

IBGE. Tabelas 74, 94, 289, 291, 1612, 1613, 3939 e 3940: sistema IBGE de Recuperação Automática. Rio de Janeiro: IBGE, 2021. Available at https://sidra.ibge.gov.br (2021/06/15).

KHATUN, N.; MATIN, M. A. A Study on LINEX Loss Function with Different Estimating Methods. Open Journal of Statistics, v. 10, p. 52–63, 2020. DOI: 10.4236/ojs.2020.101004.

KOHONEN, T. Essentials of the self-organizing map. Neural Networks, v. 37, p. 52–65, 2013.

KOHONEN, Teuvo. Self-Organizing Maps. Berlin: Springer, 2001.

KOHONEN, Teuvo; HYNNINEN, Jussi; KANGAS, Jari; LAAKSONEN, Jorma. SOM PAK: The

Self-Organizing Map Program Package. A31. Espoo: Finland, 1996.

KRUSKAL, J. B. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, v. 29, n. 1, p. 1–27, 1964.

KUHN, H.W. The Hungarian method for the assignment problem.NavalResearch Logistics Quarterly, v. 2, n. 1-2, p. 83–97, 1955. DOI: 10.1002/nav.3800020109.

LECUN, Yann; BENGIO, Yoshua; HINTON, Geoffrey. Deep learning. Nature, v. 521, n. 7553, p. 436–444, 2015. DOI: 10.1038/nature14539.

MIN, E.; GUO, X.; LIU, Q.; ZHANG, G.; CUI, J.; LONG, J. A Survey of Clustering with Deep

Learning: From the Perspective of Network Architecture. IEEE Access, v. 6, p. 39501–39514, 2018. DOI: 10.1109/ACCESS.2018.2855437.

MOHAMMED, M.A.; ALSHANBARI, Huda M.; EL-BAGOURY, Abdal-Aziz H. Application of

the LINEX Loss Function with a Fundamental Derivation of Liu Estimator. Computational Intelligence and Neuroscience, n. 2307911, p. 1–9, 2022. Artificial Intelligence and Machine Learning-Driven Decision-Making. DOI: 10.1155/2022/2307911.

MÜLLER, Klaus-Robert; MIKA, Sebastian; RÄTSCH, Gunnar; TSUDA, Koji; SCHÖLKOPF, Bernhard. An Introduction to Kernel-Based Learning Algorithms. IEEE Transactions on Neural Networks, v. 12, n. 2, p. 181–201, 2001.

PIEDRA-BONILLA, Elena Beatriz;BRAGA, CíceroAugusto S.;BRAGA, Marcelo José. Diversificação agropecuária no Brasil: conceitos e aplicações em nível municipal. Revista de Agronomia e Agronegócio, v. 18, n. 2, p. 1–28, 2020.

ROUSSEEUW, Peter J. Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis.Computational and Applied Mathematics, v. 20, p. 53–65, 1987. DOI: 10.1016/0377-0427(87)90125-7.

SALES, C.M.C.F.; RODRIGUES, R.N. Espaço rural brasileiro: diversificação e peculiaridades. Revista Espinhaço, v. 8, n. 1, p. 54–65, 2019. DOI: 10.5281/zenodo.3345145.

SAMBUICHI, R.H.R.; GALINDO, E.P.; PEREIRA, R.M.; CONSTANTINO, M.; RABETTI, M.d.S.

Diversidade da produção nos estabelecimentos da agricultura familiar no Brasil: uma

análise econométrica baseada no cadastro da declaração de aptidão ao PRONAF (DAP). v. 2202. Brasília: Rio de Janeiro, 2016. (Texto para discussão).

SCHÖLKOPF, Bernhard; SMOLA, Alex; MÜLLER, Klaus-Robert. Nonlinear Component Analysis as a Kernel Eigenvalue Problem. Neural Computation, v. 10, n. 5, p. 1299–1319, 1998. SHANNON, Claude E. A mathematical theory of communication. The Bell system technical journal, Nokia Bell Labs, v. 27, n. 3, p. 379–423, 1948.

SILVA, M. A. S. da; MATOS, Leonardo N.; O. SANTOS, Flávio E. de; DOMPIERI, Márcia

H. G.; MOURA, Fábio R. de. Data and R script - Tracking the Connection Between

Brazilian Agricultural Diversity and Native Vegetation Change by a Machine Learning

Approach. São Francisco: Github, 2022. Available at: <https://github.com/marcos-silvainf/SOMSpatialPanelData>.

SILVA, M. A. S. da; MATOS, Leonardo Nogueira; SANTOS, Flavio Emanuel de Oliveira; DOMPIERI,Marcia Helena Galina; MOURA, Fabio Rodrigues de. Tracking the Connection Between BrazilianAgricultural Diversity andNativeVegetation Change by a Machine Learning Approach. IEEE Latin America Transactions, v. 20, n. 11, p. 2371–2380, ago. 2022. Special Issue on Artificial Intelligence for Sustainability. DOI: 10.1109/tla.2022.9904762.

SIMPSON, Edward H. Measurement of diversity. Nature, v. 163, n. 4148, p. 688–688, 1949.

SONG, C.; Y, Y Huang; LIU, F.;WANG, Z.;WANG, L. Deep auto-encoder based clustering. Intelligent Data Analysis, v. 18, n. 6, s65–s76, 2014. DOI: 10.3233/IDA-140709.

TEIXEIRA, M.L.C.; RIBEIRO, S.M.C. Agricultura e paisagens sustentáveis: a diversidade produtiva do setor agrícola de Minas Gerais, Brasil. Sustainability in Debate, v. 11, n. 2, p. 29–41, 2020.

TENENBAUM, J. B.; SILVA, V. de; LANGFORD, J. C. A global geometric framework for nonlinear dimensionality reduction. Science, v. 290, p. 2319–2323, 2000.

VARIAN, H. R. A bayesian approach to real estate assessment. Studies in Bayesian Econometric and Statistics in Honor of Leonard J. Savage, v. 5, p. 195–208, 1975.

VINH, Nguyen Xuan; EPPS, Julien; BAILEY, James. Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance. Journal of Machine Learning Research, v. 11, p. 2837–2854, 2010.

XU, Chaoyang; DAI, Yuanfei; LIN, Renjie; WANG, Shiping. Deep clustering by maximizing mutual information in variational auto-encoder. Knowledge-Based Systems, v. 205, n. 106260, set. 2020. DOI: 10.1016/j.knosys.2020.106260.