Modeling verb valency in a computational grammar for Portuguese in the HPSG formalism

Authors

DOI:

https://doi.org/10.14393/DL52-v16n4a2022-6

Keywords:

Computational linguistics, Grammar engineering, Syntactic parsing, Valence, Computational semantics

Abstract

HPSG is a lexicalist grammatical theory that proposes the parallel formalization of morphosyntactic and semantic structures. This work describes the computational implementation of verbal valences in a new Portuguese grammar in this formalism. This grammar is relevant not only for text understanding applications, but also represents a contribution to the formal documentation of the structures of the language, standing out for its treatment of control and raising constructions. The grammar has been incrementally implemented through its application to increasingly comprehensive test suites. With 278 entries and a total of 215 lemmas, the verb lexicon is still small. However, the proposed type hierarchy models the properties of 118 valence classes, of which 57 are types that encode  verb classes. The grammar analyzes 94% of a total of 581 grammatical sentences, while showing low hypergeneration in a set of 167 ungrammatical examples.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Author Biographies

Leonel Figueiredo de Alencar, Universidade Federal do Ceará

Professor Associado do Departamento de Letras Estrangeiras e do Programa de Pós-Graduação em Linguística da Universidade Federal do Ceará (UFC). Pesquisador colaborador do Group of Computer Networks, Software Engineering, and Systems (GREat), do Departamento de Computação da UFC. Doutor e pós-doutor em Linguística pela Universität Konstanz, Alemanha.

Alexandre Rademaker, Escola de Matemática Aplicada da FGV e IBM Research

Alexandre has a doctorate in Computer Science from PUC-Rio (2010). He is a research scientist in IBM Research (Brazil Lab) and adjunct professor at the Applied Mathematics Department from Getulio Vargas Foundation (Rio de Janeiro, Brazil). He has written more than 90 papers published in peer-reviewed journals and international conferences in his areas of expertise: logic, proof theory, knowledge representation and reasoning, language resources, computational linguistics, and type theory. He participates as a member of many program committees of regular conferences like ACL, COLING, LREC, and others. He is also a board member of the Global WordNet Association and coordinated the CE-PLN between 2017-219. Alexandre collaborates on maintaining many language resources and developing applications using these resources for ‘deep’ linguistic processing of human languages.

During his Ph.D., Alexandre was an international fellow at Microsoft Research and SRI International. At MSR, in 2008, he worked with the Z3 SMT Solver team (Leonardo de Moura and Nikolaj Bjørner) developing a distributed environment for testing and optimizations of Z3. At SRI International, in 2009, he worked under the supervision of Natarajan Shankar. Alexandre participated in several research projects like MIST (using natural language processing and description logics for Knowledge modeling), ANUBIS (database consistency check) and Ontology and Context (investigating the problem of ontology alignment). In his thesis, we proposed new deduction systems for description logics under the supervision of Edward Hermann Haeusler, published by Springer with the title A proof theory for Description Logics in 2012 in the Springer Briefs series.

References

ABEILLÉ, A. Control and raising. In: MÜLLER, S. et al. (ed.). Head Driven Phrase Structure Grammar: The handbook. Berlin: Language Science Press, 2021. p. 489–535.

ALENCAR, L. F. de; RADEMAKER, A. Cross-validating language resources for the development of a large-coverage computational grammar of Portuguese. Language Resources and Evaluation. Submetido à publicação.

ARRAIS, D. Você sabe a diferença entre “pedir para” e “pedir que”? 2017. Disponível em: https://exame.com/carreira/voce-sabe-a-diferenca-entre-pedir-para-e-pedir-que/. Acesso em: 21 nov. 2021.

BENDER, E. M. Reweaving a grammar for Wambaya: A case study in grammar engineering for linguistic hypothesis testing. Linguistic Issues in Language Technology, v. 3, p. 1–34, 2010. DOI https://doi.org/10.33011/lilt.v3i.1215

BENDER, E. M. et al. Grammar customization. Research on Language & Computation, v. 8, n. 1, p. 23–72, 2010. DOI https://doi.org/10.1007/s11168-010-9070-1

BENDER, E. M.; FLICKINGER, D.; OEPEN, S. The Grammar Matrix: An open-source starter-kit for the rapid development of cross-linguistically consistent broad-coverage precision grammars. In: COLING-GEE '02: Proceedings of the 2002 Workshop on Grammar Engineering and Evaluation. [S.l.]: [s.n.], 2002. p. 8–14. DOI https://doi.org/10.3115/1118783.1118785

BENDER, E. M.; FLICKINGER, D.; OEPEN, S. MRS in the LinGO Grammar Matrix: A practical user’s guide. [S.l.]: [s.n.], 2003. Disponível em: http://faculty.washington.edu/ebender/papers/userguide.pdf. Acesso em: 25 set. 2021.

BENDER, E. M.; FLICKINGER, D.; OEPEN, S. Grammar engineering and linguistic hypothesis testing: Computational support for complexity in syntactic analysis. In: BENDER, E. M.; ARNOLD, J. E. (ed.). Language from a cognitive perspective: Grammar, usage and processing. Stanford: CSLI, 2011. p. 5–29.

BIRD, S.; KLEIN, E.; LOPER, E. Natural language processing with Python: analyzing text with the Natural Language Toolkit. Sebastopol: O’Reilly, 2009.

BORBA, F. da S. (org.). Dicionário gramatical de verbos do português contemporâneo do Brasil. 2. ed. São Paulo: Editora da UNESP, 1991.

CANÇADO, M.; AMARAL, L.; MEIRELLES, L. VerboWeb: classificação sintático-semântica dos verbos do português brasileiro. Belo Horizonte: UFMG, 2017. Disponível em: http://www.letras.ufmg.br/verboweb. Acesso em: 13 dez. 2021.

CANÇADO, M. et al. Banco de dados VerboWeb: um panorama do léxico verbal do PB. In: SIMPÓSIO BRASILEIRO DE TECNOLOGIA DA INFORMAÇÃO E DA LINGUAGEM HUMANA (STIL), 13 , 2021. Evento Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021. p. 372-380. DOI https://doi.org/10.5753/stil.2021.17817

CANÇADO, M.; GODOY, L.; AMARAL, L. Catálogo de verbos do português brasileiro: Classificação verbal segundo a decomposição de predicados. vol. 1: Verbos de mudança. Belo Horizonte: Editora da UFMG, 2013.

CARLIER, A.; GOYENS, M.; LAMIROY, B. De: A genitive marker in french? In: CARLIER, A.; VERSTRAETE, J.-C. (ed.). The genitive. Amsterdam: John Benjamins, 2013. p. 141–216. DOI https://doi.org/10.1075/cagral.5.07car

COPESTAKE, A. Implementing typed feature structure grammars. Stanford: CSLI, 2002.

COPESTAKE, A. Slacker semantics: Why superficiality, dependency and avoidance of commitment can be the right way to go. In: CONFERENCE OF THE EUROPEAN CHAPTER OF THE ACL, 12 , 2009, Athens. Proceedings [...]. Athens: Association for Computational Linguistics, 2009. p. 1–9. DOI https://doi.org/10.3115/1609067.1609167

COPESTAKE, A. et al. Minimal Recursion Semantics: An introduction. Research on language and computation, Springer, v. 3, n. 2, p. 281–332, 2005. DOI https://doi.org/10.1007/s11168-006-6327-9

COPESTAKE, A.; LASCARIDES, A.; FLICKINGER, D. An algebra for semantic construction in constraint-based grammars. In: ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 39, Toulouse. Proceedings […]. Toulouse: Association for Computational Linguistics, 2001. p. 140–147. DOI https://doi.org/10.3115/1073012.1073031

COSTA, F.; BRANCO, A. LXGram: A deep linguistic processing grammar for Portuguese. In: PARDO, T. A. S. et al. (ed.). Computational Processing of the Portuguese Language. Berlin, Heidelberg: Springer, 2010. p. 86–89. DOI https://doi.org/10.1007/978-3-642-12320-7_11

CUNHA, C.; CINTRA, L. Nova gramática do português contemporâneo. Rio de Janeiro: Nova Fronteira, 1985.

CURTIS, C. A Parametric implementation of valence-changing morphoplogy in the LinGO Grammar Matrix. Dissertação (Mestrado) — University of Washington, Seattle, 2018. Disponível em: http://hdl.handle.net/1773/41814

DRELLISHAK, S. Widespread but Not Universal: Improving the typological coverage of the Grammar Matrix. Tese (Doutorado) — University of Washington, Seattle, 2009.

DROGANOVA, K.; ZEMAN, D. Towards deep Universal Dependencies. In: INTERNATIONAL CONFERENCE ON DEPENDENCY LINGUISTICS (DepLing), 5, 2019, Paris. Proceedings [...]. Paris: Association for Computational Linguistics, 2019. p. 144–152. DOI https://doi.org/10.18653/v1/W19-7717

FALK, Y. N. Lexical-Functional Grammar: An introduction to parallel constraint-based syntax. Stanford: CSLI, 2001.

FERNANDES, F. Dicionário de verbos e regimes. 35. ed. Rio de Janeiro: Globo, 1987.

FERRUCCI, D. et al. Building Watson: An overview of the DeepQA project. AI Magazine, v. 31, n. 3, p. 59–79, 2010. DOI https://doi.org/10.1609/aimag.v31i3.2303

FLICKINGER, D. On building a more efficient grammar by exploiting types. Natural Language Engineering, Cambridge University Press, v. 6, n. 1, p. 15–28, 2000. DOI https://doi.org/10.1017/S1351324900002370

FRANCEZ, N.; WINTNER, S. Unification grammars. Cambridge: Cambridge University Press, 2012.

GABRIEL, C.; MÜLLER, N. Grundlagen der generativen Syntax: Französisch, Italienisch, Spanisch. Tübingen: Niemeyer, 2008.

GONÇALVES, A.; CARRILHO, E.; PEREIRA, S. Predicados complexos numa perspetiva comparativa. In: MARTINS, A. M.; CARRILHO, E. (ed.). Manual de linguística portuguesa. Berlin: De Gruyter, 2016. p. 523–557. DOI https://doi.org/10.1515/9783110368840-022

GOODMAN, M. W. Generation of machine-readable morphological rules with human readable input. University of Washington Working Papers in Linguistics, v. 30, p. 1–34, 2013.

MARNEFFE, M.-C. de et al. Universal Dependencies. Computational Linguistics, v. 47, n. 2, p. 255–308, 2021.

MATEUS, M. H. M. et al. Gramática da língua portuguesa. Lisboa: Caminho, 1989.

MATOS, M. J. “Pedir que” vs. “pedir para”. 2008. Disponível em: https://ciberduvidas.iscte-iul.pt/consultorio/perguntas/pedir-que-vs-pedir-para24813. Acesso em: 11 nov. 2021.

MCCORD, M. C.; MURDOCK, J. W.; BOGURAEV, B. K. Deep parsing in Watson. IBM Journal of research and development, IBM, v. 56, n. 3.4, p. 3–1, 2012. DOI https://doi.org/10.1147/JRD.2012.2185409

MIOTO, C.; SILVA, M. C. F.; LOPES, R. E. V. Novo manual de sintaxe. 2. ed. Florianópolis: Insular, 2005.

MÜLLER, S. Grammatical theory: From transformational grammar to constraint-based approaches. 4. ed. Berlin: Language Science Press, 2020.

NIVRE, J. et al. Universal dependencies v2: An evergrowing multilingual treebank collection. In: LANGUAGE RESOURCES AND EVALUATION CONFERENCE, 12, 2020, Marseille. Proceedings […]. Marseille: European Language Resources Association, 2020. p. 4034–4043. Disponível em: https://aclanthology.org/2020.lrec-1.497. Acesso em: 29 dez. 2021.

NUNES, A. L.; RADEMAKER, A.; ALENCAR, L. F. de: Utilizando um dicionário morfológico para expandir a cobertura lexical de uma gramática do português no formalismo HPSG. In: SIMPÓSIO BRASILEIRO DE TECNOLOGIA DA INFORMAÇÃO E DA LINGUAGEM HUMANA (STIL), 13 , 2021. Evento Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021. p. 11–18. DOI https://doi.org/10.5753/stil.2021.17779

PERINI, M. Construindo o Dicionário de Valências: problemas e resultados. Scripta, Belo Horizonte, v. 20, p. 148–167, 2016. DOI https://doi.org/10.5752/P.2358-3428.2016v20n38p148

PERINI, M. A. Describing verb valency: Practical and theoretical issues. Cham: Springer, 2015. DOI https://doi.org/10.1007/978-3-319-20985-2

PERINI, M. A. et al. Valency dictionary of Brazilian Portuguese verbs. Não publicado. 2019.

POLINSKY, M. Raising and control. In: DIKKEN, M. den (ed.). The Cambridge handbook of generative syntax: Grammar and syntax. Cambridge: Cambridge University Press, 2013. p. 577–606. DOI https://doi.org/10.1017/CBO9780511804571.021

POULSON, L. Meta-modeling of tense and aspect in a cross-linguistic grammar engineering platform. University of Washington Working Papers in Linguistics, v. 28, p. 1–67, 2011.

RADEMAKER, A. et al. Universal Dependencies for Portuguese. In: INTERNATIONAL CONFERENCE ON DEPENDENCY LINGUISTICS (DepLing), 4, 2017, Pisa. Proceedings […]. Pisa: Linköping University Electronic Press, 2017. p. 197–206.

ROSÉN, V. et al. An open infrastructure for advanced treebanking. In: HAJIČ, J. META-RESEARCH Workshop on Advanced Treebanking at LREC2012. [S.l.], 2012. p. 22–29.

SAG, I. A.; WASOW, T.; BENDER, E. M. Syntactic theory: A formal introduction. 2. ed. Stanford: CSLI, 2003.

SCHUSTER, S.; MANNING, C. D. Enhanced English Universal Dependencies: An improved representation for natural language understanding tasks. In: INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC), 2016, 10, Portorož. Proceedings [...]. Portorož: European Language Resources Association, 2016.

SIEGEL, M.; BENDER, E. M.; BOND, F. Jacy: An implemented grammar of Japanese. Stanford: CSLI, 2016.

SILVA, D. G. B. da; KOMISSAROV, B. N. et al. (org.). Os diários de Langsdorff. Campinas: Associação Internacional de Estudos Langsdorff, 1997. DOI https://doi.org/10.7476/9788575412459

SILVEIRA, G. O comportamento sintático dos clíticos no português brasileiro. Dissertação (Mestrado) — Universidade Federal de Santa Catarina, Florianópolis, 1997. Disponível em: https://repositorio.ufsc.br/handle/123456789/112183

STRAKA, M.; STRAKOVÁ, J. Tokenizing, POS tagging, lemmatizing and parsing UD 2.0 with UDPipe. In: Proceedings of the CoNLL 2017 Shared Task: Multilingual parsing from raw text to Universal Dependencies. Vancouver: Association for Computational Linguistics, 2017. p. 88–99. DOI https://doi.org/10.18653/v1/K17-3009

WESTERSTÅHL, D. Generalized quantifiers. In: ZALTA, E. N. (ed.). The Stanford encyclopedia of philosophy. Stanford: Stanford University, 2019.

ZAMARAEVA, O. Assembling Syntax: Modeling constituent questions in a grammar engineering framework. Tese (Doutorado) — University of Washington, Seattle, 2021. Disponível em: http://hdl.handle.net/1773/47087

ZAMARAEVA, O.; HOWELL, K.; BENDER, E. M. Modeling clausal complementation for a grammar engineering resource. In: SOCIETY FOR COMPUTATION IN LINGUISTICS (SciL), 2019, New York. Proceedings […]. [S.l.]: [s.n.], 2019. p. 39–49.

ZARING, L. On prepositions and case-marking in French. Canadian Journal of Linguistics, v. 36, p. 363–377, 1991. DOI https://doi.org/10.1017/S000841310001450X

Published

2022-09-12

How to Cite

ALENCAR, L. F. de; RADEMAKER, A. Modeling verb valency in a computational grammar for Portuguese in the HPSG formalism. Domínios de Lingu@gem, Uberlândia, v. 16, n. 4, p. 1339–1400, 2022. DOI: 10.14393/DL52-v16n4a2022-6. Disponível em: https://seer.ufu.br/index.php/dominiosdelinguagem/article/view/64132. Acesso em: 6 oct. 2024.