Contrary to popular understanding, numerous African languages (and Mande languages in particular) already possess grammatical and lexicographical descriptions of the “traditional” type. However, modern descriptive require work based on large-scale empirical data, i.e. big text corpora.

The availability of big annotated text corpora for numerous world languages has had a major impact on all linguistic disciplines. Unfortunately, corpus linguistics for African languages lags far behind. Nonetheless, there has been progress; in particular, since 2010 several annotated text corpora for Mande languages have been put online. The current version of the Bambara Reference Corpus contains more than 11 million words, and the Maninka Reference Corpus, about 3.5 million. These corpora and others have been developed as part of the Corpora Mandeica project ( A number of scholars have published works based on data from these corpora (see the list of references),

The main goal of the PhD project is to outline a corpus-driven linguistic study of Manding languages, and to carry out one (or several) investigation(s) in this field. Possible areas of focus include but are by no means limited to:

  • grammatical semantics of certain affixes or auxiliary words;
  • analysis of polysemy of content words, with an eye towards the compilation of corpus-driven dictionary;
  • syntactic studies based on the syntactically annotated subcorpus;
  • particularities of different textual genres;
  • various linguistic phenomena based on parallel corpora.

This PhD project will require mastering the program tools of Corpora Mandeica: the Daba program; its Git repository; the interfaces for integration of texts into the corpus (metadata attribution, disambiguation of automatically annotated texts; syntactic annotation along the model of Universal Dependences). The PhD student will thus actively participate in the development of the Manding corpora.


Facchinetti, R. Theoretical Description and Practical Applications of Linguistic Corpora. Verona: QuiEdit, 2007.

Fuß, Eric et al. (Eds.): Grammar and Corpora 2016, Heidelberg: Heidelberg University Publishing, 2018.

Rovenchak, Andrij. 2011. Phoneme distribution, syllabic structure, and tonal patterns in Nko texts. Mandenkan 47. 77–96.

Rovenchak, Andrij. 2015. Quantitative studies in the corpus of Nko periodicals. In Arjuna Tuzzi, Martina Benešová & Ján Mačutek (eds.), Recent Contributions to Quantitative Linguistics, 125–138. Berlin–Boston: Mouton de Gruyter.

Rovenchak, Andrij. 2018. Texts for the corpus of Nko: collection, conversion, and open issues. Mandenkan 59. 57–66.

Rovenchak, Andrij & Solomija Buk. 2013. Masadennin (The Little Prince in Bamana). Mandenkan (50). 117–130. doi:10.4000/mandenkan.268.

Vydrin, Valentin. 2016. Perfekt v jazyuke maninka Gvinei (Перфект в языке манинка Гвинеи) [The perfect in the Guinean Maninka]. In Timur Majsak, Vladimir Plungian & Ksenia Semenova (eds.), Issledovanija po teorii grammatiki 7 (Исследования по теории грамматики 7) [Studies in the theory of grammar 7] (Acta Linguistica Petropolitana. Trudy Instituta lingvisticheskikh issledovanij RAN (ACTA LINGUISTICA PETROPOLITANA. Труды Института лингвистических исследований РАН) [Acta Linguistica Petropolitana. Transaction of the Instiute for linguistic studies] 12 (2)), 709–741. St. Petersburg: Nauka.

Vydrin, Valentin. 2017a. New Electronic Resources for Texts in Manding Languages. In Daniela Merolla & Mark Turin (eds.), Searching For Sharing: Heritage and Multimedia in Africa, 109–121. Cambridge, UK: Open Book Publishers.

Vydrin, Valentin. 2017b. Korpusnyje slovari jazykov manden (Корпусные словари языков манден) [Towards corpus-driven dictionaries for Manding languages]. In Alexander Zheltov (ed.), African Collection – 2017, 342–357. St. Petersburg: Museum of Anthropology and Ethnography.

Vydrin, Valentin. 2017c. Vyrazhenie predikacii kachestva v gvinejskom maninka (Выражение предикации качества в гвинейском манинка) [Expression of the quality predication in the Maninka of Guinea]. In Valentin Vydrin & Anastasia Lyakhovich (eds.), V zheltoj zharkoj Afrike… K 50-letiju Aleksandra Zheltova (В жёлтой жаркой Африке… К 50-летию Александра Желтова) [In the hot yellow Africa… In honor of Alexander Zheltov on the occasion of his 50th birthday], 25–47. St. Petersburg: Nestor-Historia.

Vydrin, Valentin. 2018. Corpus-driven lexicography for African languages: Perspectives for Manding. The 9th World Congress Of African Linguistics: African languages in a global world: from description to state policies. Rabat, Mohammad V University of Rabat, 75. Rabat.

Vydrin, Valentin, Andrij Rovenchak & Kirill Maslinsky. 2016. Maninka Reference Corpus: A Presentation. TALAf 2016 : Traitement automatique des langues africaines (écrit et parole) Atelier JEP-TALN-RECITAL 2016. Paris.

Required skills and experience:

The candidate must have a MA degree in linguistics before the start of the contract. A good knowledge of linguistic typology is necessary, as well as at least the basics of corpus linguistics. Good proficiency in Bambara and/or Guinean Maninka is necessary.

Desirable skills:

  • knowledge of the Linux environment;
  • knowledge of the Python programming language;
  • knowledge of the Toolbox software (for dictionary compiling);
  • knowledge of the Elan software;
  • corpus annotation experience.

Research context: Labex-EFL research project Corpora for Mande studies (coordinator: Valentin Vydrin).

Insitutional context: INALCO & Laboratoire Langage, langues et cultures d’Afrique (LLACAN). Cooperation with l’Equipe de Recherche Textes, Informatique, Multilinguisme (ERTIM, INALCO) is planned.

Contact: Valentin Vydrin (