This paper is situed at the interface between Lexicography (PORTO DAPENA, 2002; HARTMANN, 2016), Dialectology (CARDOSO, 2010; CHAMBERS; THUDGILL, 1994) and Computational Linguistics (HABERT, 2004; PÉREZ HERNÁNDEZ; MORENO ORTIZ, 2009; HAUSSER, 2014; KURDI, 2016). The objective is to discuss the proposal of building a database in XML (Extensible Markup Language), exploring the results obtained with NLP (Natural Language Processing). The XML file is also based on parameters of Dialectal Lexicography (ESQUERRA, 1997; NAVARRO CARRASCO, 1993) and is being fed with dialectal data from the project Atlas Linguístico do Brasil (ALiB) documented in the country's Northern region. Therefore, the jEdit software was used as a text editor and, to manage the database, the BaseX program. The linguistic information extraction was performed in the BaseX, from a sample of data and with the X-Query expressions support. Thus, the following data manipulations were performed: i) location of a specific lexical unit; ii) visualization of any microstructure data filtered by variables gender, age, education and location; iii) selection of information from one of the 14 semantic areas in which the questions of the ALiB semantic-lexical questionnaire were organized. In summary, it is understands that the construction of a XML database provides agility in concerning the information extraction and data compatibility to implement interfaces with another applications, for example, the development of a lexicographic product to be published in online support.


Jorge Luiz Nunes dos Santos Junior, UFMS/CPTL

Doutorando do Programa de Pós-Graduação em Letras da Universidade Federal de Mato Grosso do Sul, campus de Três Lagoas (UFMS/CPTL). Bolsista CAPES.

Aparecida Negri Isquerdo, UFMS

Doutora em Letras (Linguística e Língua Portuguesa) pela UNESP/Araraquara. Docente permanente na Pós-Graduação stricto sensu da UFMS – Estudos de Linguagens/FAALC e Letras/CPTL.


