Optimization of Integration of Toponyms by Lexical Similarity

Main Article Content

Lanna Kallen Parreiras
https://orcid.org/0000-0002-5983-6991
Fredy Sales Ribeiro
https://orcid.org/0000-0002-4255-6790
Vagner Braga Nunes Coelho
https://orcid.org/0000-0002-7512-2024

Abstract

Real-world identifiable features are, through mapping functions, instantiated in a Geographic Database (GD) as representations of this reality. These representations are individualized by the specifier attributes of the mapped class. Among these attributes are at least one geometry and an identifier name (toponym) associated with the primary key. However, different data producers interpret reality with slight discrepancies, making some representations of mapped features similar but not identical. In particular, toponyms have small differences resulting from modifications over the years, the way they are spelled or, also, due to human errors in the recording of the data. Therefore, when trying to integrate different GDs, through toponyms, they do not favor a total pairing, since the records are not identified as being the same reality. In the particular case of the toponymy class, this occurs mainly due to typos arising from the data insertion process, especially by inversion in the positioning of the characters within the word. In this research, an improvement in the Dice Coefficient was developed and compared with the original method applied in three distinct GDs. The analysis was based on the frequencies of characters and bigrams existing in those bases. The proposed improvement was based on the hypothesis that inverted bigrams, like 'αβ' and 'βα', may, according to certain criteria, be admitted as similar. The analysis identified the most common characters and the most frequent bigrams in the bases whose association with a distance analysis on a standard keyboard allowed the identification of a series of pairs of bigrams to be considered similar. This proposal allowed an average increase of 0.58% in the total paired instances in the GDs tested.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Article Details

How to Cite
PARREIRAS, L. K.; RIBEIRO, F. S.; COELHO, V. B. N. Optimization of Integration of Toponyms by Lexical Similarity. Brazilian Journal of Cartography, [S. l.], v. 74, n. 2, p. 290–304, 2022. DOI: 10.14393/rbcv74n2-64136. Disponível em: https://seer.ufu.br/index.php/revistabrasileiracartografia/article/view/64136. Acesso em: 22 nov. 2024.
Section
Original Articles

Most read articles by the same author(s)