Portuguese for Academic Purposes with the contribution of Corpus Linguistics and Natural Language Processing
DOI:
https://doi.org/10.14393/DL29-v11n2a2017-6Keywords:
Corpus Linguistics, Natural Language Processing, Portuguese for Academic PurposesAbstract
The article hereby intends to present an interface project between Corpus Linguistics and Natural Language Processing (NLP) in progress at the Pontifical Catholic University of RS. This project consists in the exploration, through a written corpus, of theses and dissertations in Linguistics from the Postgraduate Program in Letters of said institution of Higher Education, through software called ExATO (LOPES, 2012). From this tool produced in NLP, we can have in hand a series of linguistic resources that allow the continuation of our analysis, such as the detection of Concept Hierarchy, Extracted Term Lists, Concordance of Terms and Concepts Clouds in order to base a proposal of teaching of Portuguese as an Additional Language, with a view to the proficiency within the academic genre by non-Portuguese speakers. We bring a series of results already available by ExATO, which foment the discussion about the subjects visualized from the obtained data, as well as foment the necessary interdisciplinarity between Linguistics and Computer Science, as far as the description and explanation of the pragmatic bias of the LP on a large scale, within the limits of the university discursive register.Downloads
Metrics
References
BAKER, P.; HARDIE, A.; MCENERY, T. A Glossary of Corpus Linguistics. Edinburgh: Edinburgh University Press, 2006. 187p.
BIBER, D.; CONRAD, S.; REPPEN, R. Corpus Linguistics: Investigating Language Structure and Use. Cambridge: Cambridge University Press, 1998. https://doi.org/10.1017/CBO9780511804489
BICK, E. The parsing system PALAVRAS: automatic grammatical analysis of Portuguese in constraint grammar framework. 2000. Ph.D. (Doctoral Thesis). Arhus University, Arhus, 2000.
DALE, R.; MOISL, H.; SOMERS, H. Handbook of Natural Language Processing (first edition). New York: Marcel Dekker, 2000.
FINATTO, M. J. B.; LOPES, L.; CIULLA, A. Processamento de Linguagem Natural, Linguística de Corpus e Estudos Linguísticos: uma parceria bem-sucedida. In: Domínios de Lingu@gem. v. 9, n. 5 (dez. 2015).
KENNEDY, G. An introduction to Corpus Linguistics. London & New York: Longman, 1998.
LOPES, L. Extração Automática de Conceitos a partir de Textos em Língua Portuguesa. 2012. Tese (Doutorado em Ciência da Computação) – Programa de Pós-Graduação em Ciência da Computação, Pontifícia Universidade Católica do Rio Grande do Sul, Porto Alegre, 2012.
LOPES, L.; VIEIRA, R. Processamento de Linguagem Natural e o Tratamento Computacional de Linguagens Científicas. In: Linguagens Especializadas em Corpora: modos de dizer e interfaces de pesquisa. PERNA, C.; DELGADO, H.; e FINATTO, M. (orgs.). Porto Alegre: EdiPucrs, 2010. p.183-201.
LOPES, L.; FERNANDES, P.; VIEIRA, R. ExATO - High Quality Term Extraction for Portuguese and English. In: 2016 IEEE/WIC/ACM International Conference on Web Intelligence, 2016, Omaha. Proceedings of International Conference on Web Intelligence. Omaha - Nebraska - USA, 2016. p. 1-6. https://doi.org/10.1109/WI.2016.0092
LOPES, L.; VIEIRA, R. Evaluation of cutoff policies for term extraction. Journal of the Brazilian Computer Society, v. 21(1), p. 1-9, Elsevier, 2015.
LOPES, L.; FERNANDES, P.; VIEIRA, R. Estimating term domain relevance through term frequency, disjoint corpora frequency - tf-dcf. Knowledge-Based Systems, v. 97: p. 237-249, Elsevier, 2016.
LOPES, L.; VIEIRA, R. Improving Portuguese Term Extraction. In: International Conference on Computational Processing of the Portuguese Language - PROPOR, 2012, Coimbra. Lecture Notes in Computer Science - Proceedings of PROPOR 2012. Heidelberg: Springer, 2012. v. 7243. p. 85-92. https://doi.org/10.1007/978-3-642-28885-2_9
LOPES, L.; FERNANDES, P.; VIEIRA, R.; FEDRIZZI, G. ExATOlp - An Automatic Tool for Term Extraction from Portuguese Language Corpora. In: LTC'09 - 4th Language and Technology Conference, 2009, Poznan, 2009, Poznan. Proceedings of the Fourth Language and Technology Conference. Poznan: Adam Mickiewicz University, 2009. p. 427-431.
MANNING, C. D.; SCHÜTZE, H. Foundations of Statistical Natural Language Processing. Cambridge: The MIT Press, 1999.
MITKOV, R. (ed.). The Oxford Handbook of Computational Linguistics. Oxford: Oxford University Press, 2003.
MOLSING, K. V.; PERNA, C. B. L.-P. Research and Teaching in Portuguese for Specific Purposes. BELT-Brazilian English Language Teaching Journal 5.2 (2015): 1-7. https://doi.org/10.15448/2178-3640.2014.2.19701
TEUBERT, W.; CERMÁKOVÁ, A. Corpus Linguistics. A short introduction. London: Continuum, 2007.
Downloads
Published
How to Cite
Issue
Section
License
Authors who publish in this journal agree to the following terms:
Authors retain the copyright and waiver the journal the right of first publication, with the work simultaneously licensed under the Creative Commons Attribution License (CC BY-NC-ND 4.0), allowing the sharing of work with authorship recognition and preventing its commercial use.
Authors are authorized to take additional contracts separately, for non-exclusive distribution of the version of the work published in this journal (publish in institutional repository or as a book chapter), with acknowledgment of authorship and initial publication in this journal.