Modeling verb valency in a computational grammar for Portuguese in the HPSG formalism
DOI:
https://doi.org/10.14393/DL52-v16n4a2022-6Keywords:
Computational linguistics, Grammar engineering, Syntactic parsing, Valence, Computational semanticsAbstract
HPSG is a lexicalist grammatical theory that proposes the parallel formalization of morphosyntactic and semantic structures. This work describes the computational implementation of verbal valences in a new Portuguese grammar in this formalism. This grammar is relevant not only for text understanding applications, but also represents a contribution to the formal documentation of the structures of the language, standing out for its treatment of control and raising constructions. The grammar has been incrementally implemented through its application to increasingly comprehensive test suites. With 278 entries and a total of 215 lemmas, the verb lexicon is still small. However, the proposed type hierarchy models the properties of 118 valence classes, of which 57 are types that encode verb classes. The grammar analyzes 94% of a total of 581 grammatical sentences, while showing low hypergeneration in a set of 167 ungrammatical examples.
Downloads
Metrics
References
ABEILLÉ, A. Control and raising. In: MÜLLER, S. et al. (ed.). Head Driven Phrase Structure Grammar: The handbook. Berlin: Language Science Press, 2021. p. 489–535.
ALENCAR, L. F. de; RADEMAKER, A. Cross-validating language resources for the development of a large-coverage computational grammar of Portuguese. Language Resources and Evaluation. Submetido à publicação.
ARRAIS, D. Você sabe a diferença entre “pedir para” e “pedir que”? 2017. Disponível em: https://exame.com/carreira/voce-sabe-a-diferenca-entre-pedir-para-e-pedir-que/. Acesso em: 21 nov. 2021.
BENDER, E. M. Reweaving a grammar for Wambaya: A case study in grammar engineering for linguistic hypothesis testing. Linguistic Issues in Language Technology, v. 3, p. 1–34, 2010. DOI https://doi.org/10.33011/lilt.v3i.1215
BENDER, E. M. et al. Grammar customization. Research on Language & Computation, v. 8, n. 1, p. 23–72, 2010. DOI https://doi.org/10.1007/s11168-010-9070-1
BENDER, E. M.; FLICKINGER, D.; OEPEN, S. The Grammar Matrix: An open-source starter-kit for the rapid development of cross-linguistically consistent broad-coverage precision grammars. In: COLING-GEE '02: Proceedings of the 2002 Workshop on Grammar Engineering and Evaluation. [S.l.]: [s.n.], 2002. p. 8–14. DOI https://doi.org/10.3115/1118783.1118785
BENDER, E. M.; FLICKINGER, D.; OEPEN, S. MRS in the LinGO Grammar Matrix: A practical user’s guide. [S.l.]: [s.n.], 2003. Disponível em: http://faculty.washington.edu/ebender/papers/userguide.pdf. Acesso em: 25 set. 2021.
BENDER, E. M.; FLICKINGER, D.; OEPEN, S. Grammar engineering and linguistic hypothesis testing: Computational support for complexity in syntactic analysis. In: BENDER, E. M.; ARNOLD, J. E. (ed.). Language from a cognitive perspective: Grammar, usage and processing. Stanford: CSLI, 2011. p. 5–29.
BIRD, S.; KLEIN, E.; LOPER, E. Natural language processing with Python: analyzing text with the Natural Language Toolkit. Sebastopol: O’Reilly, 2009.
BORBA, F. da S. (org.). Dicionário gramatical de verbos do português contemporâneo do Brasil. 2. ed. São Paulo: Editora da UNESP, 1991.
CANÇADO, M.; AMARAL, L.; MEIRELLES, L. VerboWeb: classificação sintático-semântica dos verbos do português brasileiro. Belo Horizonte: UFMG, 2017. Disponível em: http://www.letras.ufmg.br/verboweb. Acesso em: 13 dez. 2021.
CANÇADO, M. et al. Banco de dados VerboWeb: um panorama do léxico verbal do PB. In: SIMPÓSIO BRASILEIRO DE TECNOLOGIA DA INFORMAÇÃO E DA LINGUAGEM HUMANA (STIL), 13 , 2021. Evento Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021. p. 372-380. DOI https://doi.org/10.5753/stil.2021.17817
CANÇADO, M.; GODOY, L.; AMARAL, L. Catálogo de verbos do português brasileiro: Classificação verbal segundo a decomposição de predicados. vol. 1: Verbos de mudança. Belo Horizonte: Editora da UFMG, 2013.
CARLIER, A.; GOYENS, M.; LAMIROY, B. De: A genitive marker in french? In: CARLIER, A.; VERSTRAETE, J.-C. (ed.). The genitive. Amsterdam: John Benjamins, 2013. p. 141–216. DOI https://doi.org/10.1075/cagral.5.07car
COPESTAKE, A. Implementing typed feature structure grammars. Stanford: CSLI, 2002.
COPESTAKE, A. Slacker semantics: Why superficiality, dependency and avoidance of commitment can be the right way to go. In: CONFERENCE OF THE EUROPEAN CHAPTER OF THE ACL, 12 , 2009, Athens. Proceedings [...]. Athens: Association for Computational Linguistics, 2009. p. 1–9. DOI https://doi.org/10.3115/1609067.1609167
COPESTAKE, A. et al. Minimal Recursion Semantics: An introduction. Research on language and computation, Springer, v. 3, n. 2, p. 281–332, 2005. DOI https://doi.org/10.1007/s11168-006-6327-9
COPESTAKE, A.; LASCARIDES, A.; FLICKINGER, D. An algebra for semantic construction in constraint-based grammars. In: ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 39, Toulouse. Proceedings […]. Toulouse: Association for Computational Linguistics, 2001. p. 140–147. DOI https://doi.org/10.3115/1073012.1073031
COSTA, F.; BRANCO, A. LXGram: A deep linguistic processing grammar for Portuguese. In: PARDO, T. A. S. et al. (ed.). Computational Processing of the Portuguese Language. Berlin, Heidelberg: Springer, 2010. p. 86–89. DOI https://doi.org/10.1007/978-3-642-12320-7_11
CUNHA, C.; CINTRA, L. Nova gramática do português contemporâneo. Rio de Janeiro: Nova Fronteira, 1985.
CURTIS, C. A Parametric implementation of valence-changing morphoplogy in the LinGO Grammar Matrix. Dissertação (Mestrado) — University of Washington, Seattle, 2018. Disponível em: http://hdl.handle.net/1773/41814
DRELLISHAK, S. Widespread but Not Universal: Improving the typological coverage of the Grammar Matrix. Tese (Doutorado) — University of Washington, Seattle, 2009.
DROGANOVA, K.; ZEMAN, D. Towards deep Universal Dependencies. In: INTERNATIONAL CONFERENCE ON DEPENDENCY LINGUISTICS (DepLing), 5, 2019, Paris. Proceedings [...]. Paris: Association for Computational Linguistics, 2019. p. 144–152. DOI https://doi.org/10.18653/v1/W19-7717
FALK, Y. N. Lexical-Functional Grammar: An introduction to parallel constraint-based syntax. Stanford: CSLI, 2001.
FERNANDES, F. Dicionário de verbos e regimes. 35. ed. Rio de Janeiro: Globo, 1987.
FERRUCCI, D. et al. Building Watson: An overview of the DeepQA project. AI Magazine, v. 31, n. 3, p. 59–79, 2010. DOI https://doi.org/10.1609/aimag.v31i3.2303
FLICKINGER, D. On building a more efficient grammar by exploiting types. Natural Language Engineering, Cambridge University Press, v. 6, n. 1, p. 15–28, 2000. DOI https://doi.org/10.1017/S1351324900002370
FRANCEZ, N.; WINTNER, S. Unification grammars. Cambridge: Cambridge University Press, 2012.
GABRIEL, C.; MÜLLER, N. Grundlagen der generativen Syntax: Französisch, Italienisch, Spanisch. Tübingen: Niemeyer, 2008.
GONÇALVES, A.; CARRILHO, E.; PEREIRA, S. Predicados complexos numa perspetiva comparativa. In: MARTINS, A. M.; CARRILHO, E. (ed.). Manual de linguística portuguesa. Berlin: De Gruyter, 2016. p. 523–557. DOI https://doi.org/10.1515/9783110368840-022
GOODMAN, M. W. Generation of machine-readable morphological rules with human readable input. University of Washington Working Papers in Linguistics, v. 30, p. 1–34, 2013.
MARNEFFE, M.-C. de et al. Universal Dependencies. Computational Linguistics, v. 47, n. 2, p. 255–308, 2021.
MATEUS, M. H. M. et al. Gramática da língua portuguesa. Lisboa: Caminho, 1989.
MATOS, M. J. “Pedir que” vs. “pedir para”. 2008. Disponível em: https://ciberduvidas.iscte-iul.pt/consultorio/perguntas/pedir-que-vs-pedir-para24813. Acesso em: 11 nov. 2021.
MCCORD, M. C.; MURDOCK, J. W.; BOGURAEV, B. K. Deep parsing in Watson. IBM Journal of research and development, IBM, v. 56, n. 3.4, p. 3–1, 2012. DOI https://doi.org/10.1147/JRD.2012.2185409
MIOTO, C.; SILVA, M. C. F.; LOPES, R. E. V. Novo manual de sintaxe. 2. ed. Florianópolis: Insular, 2005.
MÜLLER, S. Grammatical theory: From transformational grammar to constraint-based approaches. 4. ed. Berlin: Language Science Press, 2020.
NIVRE, J. et al. Universal dependencies v2: An evergrowing multilingual treebank collection. In: LANGUAGE RESOURCES AND EVALUATION CONFERENCE, 12, 2020, Marseille. Proceedings […]. Marseille: European Language Resources Association, 2020. p. 4034–4043. Disponível em: https://aclanthology.org/2020.lrec-1.497. Acesso em: 29 dez. 2021.
NUNES, A. L.; RADEMAKER, A.; ALENCAR, L. F. de: Utilizando um dicionário morfológico para expandir a cobertura lexical de uma gramática do português no formalismo HPSG. In: SIMPÓSIO BRASILEIRO DE TECNOLOGIA DA INFORMAÇÃO E DA LINGUAGEM HUMANA (STIL), 13 , 2021. Evento Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021. p. 11–18. DOI https://doi.org/10.5753/stil.2021.17779
PERINI, M. Construindo o Dicionário de Valências: problemas e resultados. Scripta, Belo Horizonte, v. 20, p. 148–167, 2016. DOI https://doi.org/10.5752/P.2358-3428.2016v20n38p148
PERINI, M. A. Describing verb valency: Practical and theoretical issues. Cham: Springer, 2015. DOI https://doi.org/10.1007/978-3-319-20985-2
PERINI, M. A. et al. Valency dictionary of Brazilian Portuguese verbs. Não publicado. 2019.
POLINSKY, M. Raising and control. In: DIKKEN, M. den (ed.). The Cambridge handbook of generative syntax: Grammar and syntax. Cambridge: Cambridge University Press, 2013. p. 577–606. DOI https://doi.org/10.1017/CBO9780511804571.021
POULSON, L. Meta-modeling of tense and aspect in a cross-linguistic grammar engineering platform. University of Washington Working Papers in Linguistics, v. 28, p. 1–67, 2011.
RADEMAKER, A. et al. Universal Dependencies for Portuguese. In: INTERNATIONAL CONFERENCE ON DEPENDENCY LINGUISTICS (DepLing), 4, 2017, Pisa. Proceedings […]. Pisa: Linköping University Electronic Press, 2017. p. 197–206.
ROSÉN, V. et al. An open infrastructure for advanced treebanking. In: HAJIČ, J. META-RESEARCH Workshop on Advanced Treebanking at LREC2012. [S.l.], 2012. p. 22–29.
SAG, I. A.; WASOW, T.; BENDER, E. M. Syntactic theory: A formal introduction. 2. ed. Stanford: CSLI, 2003.
SCHUSTER, S.; MANNING, C. D. Enhanced English Universal Dependencies: An improved representation for natural language understanding tasks. In: INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC), 2016, 10, Portorož. Proceedings [...]. Portorož: European Language Resources Association, 2016.
SIEGEL, M.; BENDER, E. M.; BOND, F. Jacy: An implemented grammar of Japanese. Stanford: CSLI, 2016.
SILVA, D. G. B. da; KOMISSAROV, B. N. et al. (org.). Os diários de Langsdorff. Campinas: Associação Internacional de Estudos Langsdorff, 1997. DOI https://doi.org/10.7476/9788575412459
SILVEIRA, G. O comportamento sintático dos clíticos no português brasileiro. Dissertação (Mestrado) — Universidade Federal de Santa Catarina, Florianópolis, 1997. Disponível em: https://repositorio.ufsc.br/handle/123456789/112183
STRAKA, M.; STRAKOVÁ, J. Tokenizing, POS tagging, lemmatizing and parsing UD 2.0 with UDPipe. In: Proceedings of the CoNLL 2017 Shared Task: Multilingual parsing from raw text to Universal Dependencies. Vancouver: Association for Computational Linguistics, 2017. p. 88–99. DOI https://doi.org/10.18653/v1/K17-3009
WESTERSTÅHL, D. Generalized quantifiers. In: ZALTA, E. N. (ed.). The Stanford encyclopedia of philosophy. Stanford: Stanford University, 2019.
ZAMARAEVA, O. Assembling Syntax: Modeling constituent questions in a grammar engineering framework. Tese (Doutorado) — University of Washington, Seattle, 2021. Disponível em: http://hdl.handle.net/1773/47087
ZAMARAEVA, O.; HOWELL, K.; BENDER, E. M. Modeling clausal complementation for a grammar engineering resource. In: SOCIETY FOR COMPUTATION IN LINGUISTICS (SciL), 2019, New York. Proceedings […]. [S.l.]: [s.n.], 2019. p. 39–49.
ZARING, L. On prepositions and case-marking in French. Canadian Journal of Linguistics, v. 36, p. 363–377, 1991. DOI https://doi.org/10.1017/S000841310001450X
Published
How to Cite
Issue
Section
License
Copyright (c) 2022 Leonel Figueiredo de Alencar, Alexandre Rademaker
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who publish in this journal agree to the following terms:
Authors retain the copyright and waiver the journal the right of first publication, with the work simultaneously licensed under the Creative Commons Attribution License (CC BY-NC-ND 4.0), allowing the sharing of work with authorship recognition and preventing its commercial use.
Authors are authorized to take additional contracts separately, for non-exclusive distribution of the version of the work published in this journal (publish in institutional repository or as a book chapter), with acknowledgment of authorship and initial publication in this journal.