Automatic construction of complex features in Conditional Random Fields for named entities recognition

Conditional Random Fields (CRFs) have been proven to be very useful in many sequence labelling tasks from the field of natural language processing, including named entity recognition (NER). The advantage of CRFs over other statistical models (like Hidden Markov Models) is that they can utilize a large set of features describing a sequence of observations. On the other hand, CRFs potential function is defined as a linear combination of features, what means, that it cannot model relationships between combinations of input features and output labels. This limitation can be overcome by defining the relationships between atomic features as complex features before training the CRFs. In the paper we present the experimental results of automatic generation of complex features for the named entity recognition task for Polish. A rule-induction algorithm called RIPPER is used to generate a set of rules which are latter transformed into a set of complex features. The extended set of features is used to train a CRFs model.
Research areas:
Year:
2015
Type of Publication:
In Proceedings
Keywords:
named entities recognition; NER; CRF
Editor:
Galia Angelova, Kalina Bontcheva, Ruslan Mitkov
Book title:
Proceedings of International Conference Recent Advances in Natural Language Processing
Pages:
413–419
Month:
September
ISSN:
1313-8502
Hits: 627