Methodological reflections on datasets and corpus linguistics

a preliminary analysis of legislative data




Text processing, Legal norms dataset, Diachronic analysis, Language and law


Computational tools and methods are increasingly important for conducting research in the humanities. In particular, these tools are relevant for diachronic linguistic analysis. In this study, we present a discussion about the use of corpora and datasets in linguistics, highlighting some strengths and limitations of these resources. To illustrate the possibilities of using a dataset for linguistic research, a preliminary study employing a dataset of Brazilian legal norms is also presented.


Author Biographies

Lúcia de Almeida Ferrari, Universidade Federal de Minas Gerais

Doutora em Estudos Linguísticos pela Universidade Federal de Minas Gerais (UFMG). Professora na Faculdade de Letras da UFMG.

Evandro Landulfo Teixeira Paradela Cunha, Universidade Federal de Minas Gerais

Doutor em Linguística pela Universiteit Leiden e em Ciência da Computação pela Universidade Federal de Minas Gerais (UFMG). Professor na Faculdade de Letras da UFMG.


How to Cite

FERRARI, L. de A.; CUNHA, E. L. T. P. Methodological reflections on datasets and corpus linguistics: a preliminary analysis of legislative data. Domínios de Lingu@gem, Uberlândia, v. 16, n. 4, p. 1571–1607, 2022. DOI: 10.14393/DL52-v16n4a2022-12. Disponível em: Acesso em: 22 jul. 2024.