Leader : Valentin Vydrin
This operation, which has started in the first phase of the LABEX, is reaching a stage of maturity. During the first stage main electronic tools necessary for the functioning of the parallel corpora, syntactically annotated corpora and audiocorpora, were developed. These will, on the one hand, facilitate the expansion and improvement of the existing text corpora and, on the other hand, help to quickly create text corpora of additional Mande languages and, eventually, of languages from other families.
During the new phase the NLP aspects of the project will be emphasized, in particular: automatic (statistically-based) disambiguation; automatic clusterization of polysemy (on the basis of vectorial analysis); development of OCR tools for the concerned languages. Furthermore, first attempts at speech recognition and automatic translation will be made.