Extraction of the Multi-word Lexical Units in the Perspective of the Wordnet Expansion

The paper focuses on selecting an optimal set of the Multiword Expressions Extraction methods used as a tool during wordnet expansion. Wordnet multiword lexical units are a broad class and it is difficult to find a single extraction method fulfilling the task. Many extraction association measures were tested on very large corpora and a very large wordnet, namely plWordNet. Several new measures are proposed and compared with selected methods in the literature. Two ways of combining measures into ensembles were analysed too. We showed that method selection and the tuning of their parameters can be transferred between two large corpora. The comparison of the extracted collocations with the huge set of plWordNet multiword lexical units revealed that the performance of the methods is much below the optimistic levels reported in the literature. However, the carefully selected set and combination of the methods can be a valuable tool for lexicographers.
In Proceedings
multi-word expressions; wordnet; plWordNet; collocation extracttion
Ruslan Mitkov and Galia Angelova and Kalina Boncheva
Proceedings of the International Conference Recent Advances in Natural Language Processing -- {RANLP'2015}
Hissar, Bulgaria
