The 16th Annual International AFRILEX Conference
UNAM, Windhoek, Namibia, 5-7 July 2011

[Abstract:] Sene Mongaba, B.: A global approach for a dictionary of Lingála: From localization of software to a lemmatization strategy

Our central argument

In the context of DR Congo, French is the vehicular language of teaching. However, today's school population shows an insufficient command of it. This makes it desirable to use at least one of the four national languages of DR Congo (Lingála, Cilubà, Kikongo and Kiswahili) in diglossia with French as a teaching strategy for better appropriation of knowledge and know-how.

In our sociolinguistics surveys, we observed that the national languages are acquiring stronger and stronger empowerment in the linguistic market of DR Congo. This means that teachers have to use national languages when they are teaching in order to be understood.

However, teachers are limited in using those languages because of the lack of teaching aids such as schoolbooks, maps and dictionaries. Dictionaries are particularly important in this context, as sociolinguistics realities show that bilingual dictionaries are needed to explain in the national languages the meaning of the scientific terms which are used in French in the classroom.

Lingála is the national language spoken in Kinshasa, the capital of the country. Even if this language is going through a straightforward empowering process, i.e. it is used now in areas which used to be the preserve of French, such as the media, justice and schools, its progress is still hindered by the lack of terminological and lexicographic tools; hence the need of providing monolingual or plurilingual dictionaries for this language, according to what is a demand of society.

This is why we have undertaken this research on lexicography. It consists in producing lexicons as teaching aids and in making them available to primary users, i.e. teachers and students. For that purpose, the approach to lemmatization is a crucial factor for finding a word easily in a dictionary.[1] There are three main aspects concerning Lingála: morphology, concordance rule and orthography. First of all, we have to establish how to present derivative words formed from the same stem but having different prefixes indicating singular, plural, process, state and so on.  As for the two last aspects, we must make the choice of recording or not all the alternative versions of the same term observed in the corpus.

How the study was conducted

We started off observing the traditional presentation in existing dictionaries and we proposed some adaptations to achieve the best presentation in terms of contents, orthography and register.

We have constituted our corpus from textbooks, existing Lingála dictionaries and web texts. The main material came from internet fora, where people use actual language. We have also used a recent version of the Bible and some existing novels and schoolbooks, as well as an oral corpus put together by ourselves, drawing from everyday conversations and from the media (radio and television).

The tools used are on the one hand Unitex, a corpus software, and on the other hand TshwaneLex, a lexicographic software.

Unitex is a corpus processor in which one can put linguistic resources and use them.[2]  It allows one to install one's own language and process text in that language. We begin by making an alphabetic file of the Lingála language. Then we created a dictionary of inflected forms and a dictionary of non inflected forms. This process allowed us to obtain a wordlist, frequency, words tagged and collocations. We have used these resources to select from the corpus the most frequent lemmas, examples and potential definitions for inclusion in the lexicon.

TshwaneLex, which is a lexicographic software program, allowed us to record data for the purpose of producing dictionaries. The first step consisted in the localization of software in Lingála, which enabled us to work in a Lingála environment. Localization means "that the entire lexicographic process, from initial compilation all the way to final product, may henceforth be conducted in any language of one's choice" (De Schryver 2006:223).[3]

Using TshwaneLex in a Lingála environment, we started recording entries and analyzing what would be the most satisfactory lexicographic approach (lemmatization, contents…) for recording entries in a Lingála dictionary.

Preliminary conclusions

At this stage, we have localized TshwaneLex and Unitex in Lingála. Working in a target language environment (Lingála) helps us have a global view of what we want to propose as an entry and its presentation.

To illustrate our work, we have created a bilingual (French and Lingála) chemical lexicon for secondary school students of chemistry and a bilingual (French and Lingála) general lexicon for primary and secondary school students.

[1] De Schryver G-M 2008, A new way to lemmatize adjectives in a user-friendly Zulu-English Dictionary  Lexikos 18 (AFRILEX-reeks/series 18: 2008) 63-91.

[2] Unitex 2.0 2008, User Manual, p.11

[3] De Schryver G-M 2006, Internationalisation, Localisation and Customisation Aspects of TshwaneLex Lexikos 16 (AFRILEX-reeks/series 16: 2006) p. 223.