The 16th Annual International AFRILEX Conference
UNAM, Windhoek, Namibia, 5-7 July 2011

[Abstract:] Mojela, V.M.: Purism and inadequacies: A case study of the effects of strict corpus-based dictionary writing in Sesotho sa Leboa

The Sesotho sa Leboa language is presently having a corpus of about 6,8 million words (the UP corpus). This corpus is built from the already recorded and published materials in this language. The major irregularity with these published materials is that they are derived from a few dialects of the Sesotho sa Leboa language, while the majority of the terminology is not forming part of the written language. This simply explains why it will not be fair for the Sesotho sa Leboa lexicographers to use the existing (insufficient) corpus as reliable tool to compile dictionaries in this language because almost half of the terminology will be left out of the record.  The vocabulary might have been more than twice its size if it was not disadvantaged by purism and the inadequacies emanating from excessive exclusion policies of a stringent type of standardization.

Geographically, the standard Sesotho sa Leboa orthography (the written language) was built from the dialects within the districts of Sekhukhune, Waterberg and a section of the Capricorn district. The Sesotho sa Leboa dialects in the Northern part of the Capricorn district, the whole of Mopani and Vembe districts, as well as Mapulaneng district, in Mpumalanga, are not part of the written Sesotho sa Leboa language. As a result, terminologies from these side-lined dialects do not form part of the present Northern Sotho or Sesotho sa Leboa corpus of 6,8 million words because these terms were not included in the published materials which were instrumental in the compilation of this corpus.

 The side-lined dialects include, inter alia, Selobedu, Sephalaborwa, Sekgaga (Maake & Mogoboya), Seroka, Setlokwa, Sehananwa, Sepulana, etc. Words such as the following are not included in the Northern Sotho written orthography, because their source dialects were sidelined, and their inclusion in the vocabulary would have increased the size of the Sesotho sa Leboa lexicon much further:

Kheṱola (Selobedu & Seroka) ‘frog’, segwagwa (standard NS)

Khemake (Selobedu) ‘cat’, katse (standard NS)

Mokhope (Seroka, Selobedu) ‘marula beer’, morula (standard NS)

Lesalabu (Seroka, Selobedu) ‘watermelon’, legapu (standard NS)

Mphekwa (Selobedu, Seroka) ‘lizard’ mokgaditswa (standard NS)

Moṱanare (Selobedu, Seroka) ‘mopani tree’ mopani (standard NS)

Tsheṱa (Selobedu, Seroka) ‘greedy’ bojato (standard NS)

The standardization system in this language is still dominated by purism, selectiveness and the destructive exclusion policies. Purism and selfish standardization is due to the fact that most influential members in the language boards prefer to standardize their own dialects and side-line all other dialects which are not represented in the standardizing committees. One of the major reasons why the development of languages like English is very fast is because purism is very much minimal, and it is interesting to realize that the English language has lemmatized many lexical items from most South African indigenous languages, including slang and those of the side-lined or ‘stigmatized’ dialects of Northern Sotho, without fear of ‘contamination’. Dictionaries, like ‘A Dictionary of South African English on historical principles’ (OUP & DSAE) has lemmatized most of the South African indigenous language terminologies as loan words into English, and these terms are now part and parcel of the English Corpus. The following are examples in this regards:    

Moloi                          ‘witch’ or ‘wizard’ (1996:472)

Mampara                   ‘a fool’ or ‘fools’ (1996:472)

Mpimpi                      ‘an informer’ or ‘an evil collaborator’ (1996:481)

Potsotso                     ‘a tight girls’ trouser’ (1996:103)

Zola Budd                  ‘a type of minibus taxi’ (1996:808)

Significances for lexicographers

For the Sesotho sa Leboa lexicographer, corpus lemmatization means omitting the bulk of the Northern Sotho terminology out of the dictionaries, thereby facing the problem of producing dictionaries which are not only one-sided, but also dictionaries which  are too inadequate for this language.

The main objectives in this research can be summarized as follow:

To show the inadequacies of corpus-based dictionary compilations in Sesotho sa Leboa, and to emphasize the importance of field researches to bring on board all the lexical items from the sidelined dialects into the Sesotho sa Leboa lexicon.


The Sesotho sa Leboa National Language Body, which replaced the former Northern Sotho Language Board as standardizing authority in this language, should avoid and eradicate all sorts of purism in the standardization processes. The lexical items from all the Sesotho sa Leboa dialects should not only be incorporated in the vocabulary, but should also be standardized. The lexicographers, especially in Sesotho sa Leboa, should not only rely on the existing corpus when compiling dictionaries, but should embark on intensive researches and fieldworks to bring all the omitted lexical items on board in order to have a full and accurate record of the vocabulary of the Northern Sotho language


