Portuguese for Academic Purposes with the contribution of Corpus Linguistics and Natural Language Processing

Authors

  • Cristina Becker Lopes Perna Pontifícia Universidade Católica do Rio Grande do Sul
  • Lucelene Lopes Pontifícia Universidade Católica do Rio Grande do Sul
  • Lucas Zambrano Rollsing Pontifícia Universidade Católica do Rio Grande do Sul

DOI:

https://doi.org/10.14393/DL29-v11n2a2017-6

Keywords:

Corpus Linguistics, Natural Language Processing, Portuguese for Academic Purposes

Abstract

The article hereby intends to present an interface project between Corpus Linguistics and Natural Language Processing (NLP) in progress at the Pontifical Catholic University of RS. This project consists in the exploration, through a written corpus, of theses and dissertations in Linguistics from the Postgraduate Program in Letters of said institution of Higher Education, through software called ExATO (LOPES, 2012). From this tool produced in NLP, we can have in hand a series of linguistic resources that allow the continuation of our analysis, such as the detection of Concept Hierarchy, Extracted Term Lists, Concordance of Terms and Concepts Clouds in order to base a proposal of teaching of Portuguese as an Additional Language, with a view to the proficiency within the academic genre by non-Portuguese speakers. We bring a series of results already available by ExATO, which foment the discussion about the subjects visualized from the obtained data, as well as foment the necessary interdisciplinarity between Linguistics and Computer Science, as far as the description and explanation of the pragmatic bias of the LP on a large scale, within the limits of the university discursive register.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Author Biography

Lucas Zambrano Rollsing, Pontifícia Universidade Católica do Rio Grande do Sul

Mestrando em Linguística da Faculdade de Letras (FALE) da Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS). Graduado em Letras - habilitação português/inglês (FALE - PUCRS).

References

BAKER, P.; HARDIE, A.; MCENERY, T. A Glossary of Corpus Linguistics. Edinburgh: Edinburgh University Press, 2006. 187p.

BIBER, D.; CONRAD, S.; REPPEN, R. Corpus Linguistics: Investigating Language Structure and Use. Cambridge: Cambridge University Press, 1998. https://doi.org/10.1017/CBO9780511804489

BICK, E. The parsing system PALAVRAS: automatic grammatical analysis of Portuguese in constraint grammar framework. 2000. Ph.D. (Doctoral Thesis). Arhus University, Arhus, 2000.

DALE, R.; MOISL, H.; SOMERS, H. Handbook of Natural Language Processing (first edition). New York: Marcel Dekker, 2000.

FINATTO, M. J. B.; LOPES, L.; CIULLA, A. Processamento de Linguagem Natural, Linguística de Corpus e Estudos Linguísticos: uma parceria bem-sucedida. In: Domínios de Lingu@gem. v. 9, n. 5 (dez. 2015).

KENNEDY, G. An introduction to Corpus Linguistics. London & New York: Longman, 1998.

LOPES, L. Extração Automática de Conceitos a partir de Textos em Língua Portuguesa. 2012. Tese (Doutorado em Ciência da Computação) – Programa de Pós-Graduação em Ciência da Computação, Pontifícia Universidade Católica do Rio Grande do Sul, Porto Alegre, 2012.

LOPES, L.; VIEIRA, R. Processamento de Linguagem Natural e o Tratamento Computacional de Linguagens Científicas. In: Linguagens Especializadas em Corpora: modos de dizer e interfaces de pesquisa. PERNA, C.; DELGADO, H.; e FINATTO, M. (orgs.). Porto Alegre: EdiPucrs, 2010. p.183-201.

LOPES, L.; FERNANDES, P.; VIEIRA, R. ExATO - High Quality Term Extraction for Portuguese and English. In: 2016 IEEE/WIC/ACM International Conference on Web Intelligence, 2016, Omaha. Proceedings of International Conference on Web Intelligence. Omaha - Nebraska - USA, 2016. p. 1-6. https://doi.org/10.1109/WI.2016.0092

LOPES, L.; VIEIRA, R. Evaluation of cutoff policies for term extraction. Journal of the Brazilian Computer Society, v. 21(1), p. 1-9, Elsevier, 2015.

LOPES, L.; FERNANDES, P.; VIEIRA, R. Estimating term domain relevance through term frequency, disjoint corpora frequency - tf-dcf. Knowledge-Based Systems, v. 97: p. 237-249, Elsevier, 2016.

LOPES, L.; VIEIRA, R. Improving Portuguese Term Extraction. In: International Conference on Computational Processing of the Portuguese Language - PROPOR, 2012, Coimbra. Lecture Notes in Computer Science - Proceedings of PROPOR 2012. Heidelberg: Springer, 2012. v. 7243. p. 85-92. https://doi.org/10.1007/978-3-642-28885-2_9

LOPES, L.; FERNANDES, P.; VIEIRA, R.; FEDRIZZI, G. ExATOlp - An Automatic Tool for Term Extraction from Portuguese Language Corpora. In: LTC'09 - 4th Language and Technology Conference, 2009, Poznan, 2009, Poznan. Proceedings of the Fourth Language and Technology Conference. Poznan: Adam Mickiewicz University, 2009. p. 427-431.

MANNING, C. D.; SCHÜTZE, H. Foundations of Statistical Natural Language Processing. Cambridge: The MIT Press, 1999.

MITKOV, R. (ed.). The Oxford Handbook of Computational Linguistics. Oxford: Oxford University Press, 2003.

MOLSING, K. V.; PERNA, C. B. L.-P. Research and Teaching in Portuguese for Specific Purposes. BELT-Brazilian English Language Teaching Journal 5.2 (2015): 1-7. https://doi.org/10.15448/2178-3640.2014.2.19701

TEUBERT, W.; CERMÁKOVÁ, A. Corpus Linguistics. A short introduction. London: Continuum, 2007.

Published

2017-04-17

How to Cite

PERNA, C. B. L.; LOPES, L.; ROLLSING, L. Z. Portuguese for Academic Purposes with the contribution of Corpus Linguistics and Natural Language Processing. Domínios de Lingu@gem, Uberlândia, v. 11, n. 2, p. 379–393, 2017. DOI: 10.14393/DL29-v11n2a2017-6. Disponível em: https://seer.ufu.br/index.php/dominiosdelinguagem/article/view/36933. Acesso em: 22 nov. 2024.