Automated Phraseology Extraction and Cultural Factors: An Experiment

Jean-Pierre Colson

Jean-Pierre Colson Université catholique de Louvain

Parole chiave: phraseology, automatic extraction, algorithm, culture

Abstract

This paper reports the results of an experiment with the Parseme 1.1. dataset for English. While the Parseme initiative represented a breakthrough in computational phraseology, it also raised a number of theoretical and practical issues. In this experiment, an attempt is made to improve the results obtained for English, by having recourse to external resources, in the form of a large web corpus. At the same time, attention is paid to the subtle interaction between linguistic tradition, culture and the manipulation of linguistic data in a supervised model for the automatic extraction of verbal multiword expressions. The results show that our algorithm, relying on an open track with external linguistic data, scores better in terms of recall, while deep learning systems yield a better precision. At various stages of the supervised model, the experiment shows that cultural factors play a crucial role.

Downloads

I dati di download non sono ancora disponibili

Biografia autore

Jean-Pierre Colson, Université catholique de Louvain

Jean-Pierre Colson is professor and chairman of the Department of Translation and Interpreting at the University of Louvain (Louvain-la-Neuve, Belgium). He is also a member of the Board of the European Association for Phraseology (Europhras) and has published many papers on phraseology, translation studies and computational linguistics. In the last years his works are dedicated to the automatic processing of multi-word units in large electronic corpora.

Riferimenti bibliografici

BURGER, Harald / DOBROVOL’SKIJ, Dmitrij / KÜHN, Peter / NORRICK, Neal, eds. (2007), Phraseologie / Phraseology. Ein internationales Handbuch der zeitgenössischen Forschung / An International Handbook of Contemporary Research, Berlin / New York, De Gruyter.

COLSON, Jean-Pierre (2017), “The Idiom Search Experiment: Extracting Phraseology from a Probabilistic Network of Constructions”, in MITKOV, Ruslan (ed.), Computational and Corpus-based phraseology, Lecture Notes in Artificial Intelligence 10596. Cham, Springer International Publishing, pp. 16-28.

COLSON, Jean-Pierre (2018), “From Chinese Word Segmentation to Extraction of Constructions: Two Sides of the Same Algorithmic Coin” in Savary, A. et al. (eds.) (2018), Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018). Santa Fe, Association for Computational Linguistics, 41-50.

CROFT, William (2001), Radical Construction Grammar: Syntactic Theory in Typological Perspective, Oxford, Oxford University Press.

GOLDBERG, Adele (2006), Constructions at Work. Oxford, Oxford University Press.

GRIES, Stefan (2013), “50-something years of work on collocations. What is or should be next…”, International Journal of Corpus Linguistics, 18, 137-165.

HOFFMANN, Thomas / TROUSDALE, Graeme, eds. (2013), The Oxford Handbook of Construction Grammar, Oxford/NewYork, Oxford University Press. 58

MARKANTONATOU, Stella / RAMISCH, Carlos / SAVARY, Agata / VINCZE, Veronika, eds. (2017), Proceedings of the 13th Workshop on Multiword Expressions (EACL 2017). Valencia, Association for Computational Linguistics.

SAVARY, Agata / RAMISCH, Carlos / HWANG, Jena D. / SCHNEIDER, Nathan / ANDRESEN, Melanie / PRADHAN, Sameer / PETRUCK, Miriam R.L., eds. (2018), Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018). Santa Fe, Association for Computational Linguistics.

WRAY, Alison (2008), Formulaic Language: Pushing the Boundaries, Oxford, Oxford University Press.