Automated Phraseology Extraction and Cultural Factors: An Experiment
Abstract
This paper reports the results of an experiment with the Parseme 1.1. dataset for English. While the Parseme initiative represented a breakthrough in computational phraseology, it also raised a number of theoretical and practical issues. In this experiment, an attempt is made to improve the results obtained for English, by having recourse to external resources, in the form of a large web corpus. At the same time, attention is paid to the subtle interaction between linguistic tradition, culture and the manipulation of linguistic data in a supervised model for the automatic extraction of verbal multiword expressions. The results show that our algorithm, relying on an open track with external linguistic data, scores better in terms of recall, while deep learning systems yield a better precision. At various stages of the supervised model, the experiment shows that cultural factors play a crucial role.
Downloads
Riferimenti bibliografici
BURGER, Harald / DOBROVOL’SKIJ, Dmitrij / KÜHN, Peter / NORRICK, Neal, eds. (2007), Phraseologie / Phraseology. Ein internationales Handbuch der zeitgenössischen Forschung / An International Handbook of Contemporary Research, Berlin / New York, De Gruyter.
COLSON, Jean-Pierre (2017), “The Idiom Search Experiment: Extracting Phraseology from a Probabilistic Network of Constructions”, in MITKOV, Ruslan (ed.), Computational and Corpus-based phraseology, Lecture Notes in Artificial Intelligence 10596. Cham, Springer International Publishing, pp. 16-28.
COLSON, Jean-Pierre (2018), “From Chinese Word Segmentation to Extraction of Constructions: Two Sides of the Same Algorithmic Coin” in Savary, A. et al. (eds.) (2018), Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018). Santa Fe, Association for Computational Linguistics, 41-50.
CROFT, William (2001), Radical Construction Grammar: Syntactic Theory in Typological Perspective, Oxford, Oxford University Press.
GOLDBERG, Adele (2006), Constructions at Work. Oxford, Oxford University Press.
GRIES, Stefan (2013), “50-something years of work on collocations. What is or should be next…”, International Journal of Corpus Linguistics, 18, 137-165.
HOFFMANN, Thomas / TROUSDALE, Graeme, eds. (2013), The Oxford Handbook of Construction Grammar, Oxford/NewYork, Oxford University Press. 58
MARKANTONATOU, Stella / RAMISCH, Carlos / SAVARY, Agata / VINCZE, Veronika, eds. (2017), Proceedings of the 13th Workshop on Multiword Expressions (EACL 2017). Valencia, Association for Computational Linguistics.
SAVARY, Agata / RAMISCH, Carlos / HWANG, Jena D. / SCHNEIDER, Nathan / ANDRESEN, Melanie / PRADHAN, Sameer / PETRUCK, Miriam R.L., eds. (2018), Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018). Santa Fe, Association for Computational Linguistics.
WRAY, Alison (2008), Formulaic Language: Pushing the Boundaries, Oxford, Oxford University Press.
Copyright (c) 2020 PHRASIS | Rivista di studi fraseologici e paremiologici
Questo lavoro è fornito con la licenza Creative Commons Attribuzione - Condividi allo stesso modo 4.0.
La rivista è pubblicata sotto licenza Creative Commons Attribution-ShareAlike 4.0 International License.