We are an academic research group gathering people interested in the areas of linguistics, computational linguistics and natural language engineering including language technologies. G4.19 group consists of researchers, Phd students and students. Our team functions mainly within Artificial Intelligence Department at the Institute of Informatics, Wrocław University of Technology, but some of its members have different affiliations and work in Ottawa, Warsaw and Gdańsk. We are involved both in research as well as in the development and implementation of tools designed for the computer-based natural language processing. We are specially interested in the development of basic tools and resources for the Polish language.
Our interests include:
- the construction of wordnet — a kind of an electronic thesaurus — for the Polish language (Słowosieć - plWordNet) on the basis of the semi-automatic method applied to very big Polish language corpora (collections of documents written in Polish); the developed method is applied under the supervision of a team of linguists and with the help of the tools designed by us for the editing of wordnet (WordnetLoom) and the semi-automatic wordnet extension (WordnetWeaver),
- automatic lexical semantics knowledge extraction from text corpora — including algorithms for the automatic extraction of semantic relations linking words with big text corpora,
- morpho-syntactic analysis — including especially the so called tagging that is disambiguating of the morpho-syntactic description of words in the text — we have constructed and we are constantly developing a tagger for Polish called TaKIPI,
- shallow syntactic analysis — the development of tools for shallow syntactic analysis for Polish,
- word sense disambiguation,
- information extraction — recognizing identification units, relations and events in domain documents,
- hand-writing recognition — on the level of picture analysis and on the level of the further correction of the results of recognition relying on different types of language models based text corpora.
We have carried out and we are currently working on a number of research projects financed by the Ministry of Science and Higher Education as well as by the European Union, for instance:
- "Półautomatyczna konstrukcja zasobów leksykalnych przez rozpoznawanie relacji semantycznych na podstawie danych morfo-syntaktycznych i semantycznych w korpusach tekstu" – 'Semi-automatic construction of lexical resources via semantic relations recognition based on morpho-syntactic and semantic data in text corpora'
- "Adaptacyjny system wspomagający rozwiązywanie problemów w oparciu o analizę treści dostępnych źródeł elektronicznych" – 'An adaptation system fostering problem-solving on the basis of content analysis of the available electronic sources'
We offer access to a number of tools and language resources for the Polish language:
- Słowosieć (plWordNet) – a wordnet for Polish available at http://plwordnet.pwr.wroc.pl,
- TaKIPI – a tagger for Polish available on GPL licence, to download from http://nlp.pwr.wroc.pl/takipi/,
- internet services: TaKIPI-WS, plWordNet-WS i SuperMatrix-WS.
- research in the area of computational linguistics and natural language engineering with a special emphasis put on the pecularity of Polish language,
- the development and implementation of tools for the computer-based Polish language processing,
- construction and dissemination of language resources for Polish,
- wide scientific cooperation in the construction of basic tools and resources for Polish,
- the popularization of knowledge about the applications of computational linguistics and natural language engineering methods in different science areas.