8th International Conference of the

African Association for Lexicography

 

 

AFRILEX 2003

Bilingual Dictionaries

Programme & Abstracts

 

 

To front and back cover of this booklet (pdf 1.198KB)

 

 

Dates:

7-9 July 2003

Host:

Department of Germanic & Romance Languages, University of Namibia, Windhoek, Namibia

Local Conference Organiser:

Mr. Herman Beyer

Abstract Reviewers:

Prof. Rufus H. Gouws, Prof. D.J. Prinsloo, Dr. Elsabé Taljard, Ms. Anneleen Van der Veken

Programme Committee:

Mr. Herman Beyer, Mr. Gilles-Maurice de Schryver, Prof. D.J. Prinsloo

 

 

edited by

Gilles-Maurice de Schryver

Organiser: AFRILEX

 

 

Copyright © 2003 by the African Association for Lexicography

ISBN 0-620-30795-1

Pretoria: (SF)2 Press

Cover Screenshots by David Joffe: “From TshwaneLex to Online Dictionary”
(
david.joffe@africanlanguages.com | http://africanlanguages.com)

Cover Artwork by Giovanni Plozner
(info@giovanniplozner.com | http://www.giovanniplozner.com)

 

 

A FEW WORDS FROM THE CHAIRPERSON

 

 

Afrilex welcomes you to our 8th International Conference which also marks our 8th year of existence. We are proud to be a member of the international –lex family and to present you with this Conference Abstract Booklet, once again meticulously compiled and edited by Gilles-Maurice de Schryver.

 

I wish to thank you for attending the Conference and for your loyal support for our Association and lexicography in Africa.

 

Afrilex greetings

 

D.J. Prinsloo

 

 

Table of Contents

 

 

Programme

 

Keynote papers

 

§        Ulrich Heid — The Handling of Collocations and Idiomatic Multiword Expressions: From Corpora to Dictionaries

§        Rufus H. Gouws — Outer Texts in Bilingual Dictionaries

§        Gwyneth Fox — Corpus Research and Lexicography

 

Parallel sessions

 

§        Thierry Afane Otsaga — Hybrid Dictionaries – The Future of Lexicography

§        Mariëtta AlbertsLexicography and Terminology Training at University Level

§        Herman L. BeyerCan We Quantify the Effects of Dictionary Use?

§        Emmanuel Chabata — Interviewer-Interviewee Interaction in Oral Interviews

§        Gilles-Maurice de Schryver — Concurrent Over- and Under-treatment in Dictionaries — The Woordeboek van die Afrikaanse Taal as a case in point

§        James D. Emejulu — Revisiting Equivalence in Bilingual Lexicography

§        James D. Emejulu, Yolande Nzang-Bie, Pierre Ondo-Mebiame & D. Franck IdiataLe rôle des dictionnaires bilingues dans le développement des langues Gabonaises: Le cas du fang

§        Rachélle GautonBilingual Dictionaries, the Lexicographer and the Translator

§        Wilfrid H.G. HaackeA Khoekhoegowab Dictionary in the Making: Some Lexicographic Considerations in Retrospect

§        Samukele HadebeThe Proposed Ndebele – Shona Dictionary: Prospects and Challenges

§        Kathy KavanaghEnglish for New South African Bilingual Dictionaries

§        Langa KhumaloFrom a General to an Advanced Ndebele Dictionary: An Outline

§        John M. Lubinda The Incorporation and Handling of Metaphorical or Figurative Meaning in Bilingual Dictionaries

§        Matete Madiba, Lorna Mphahlele & Matlakala KganyagoCapturing Cultural Glossaries. Case Study II: Medical Terms

§        Mandlenkosi MaphosaThe Users’ Perspectives on Isichazamazwi SeSiNdebele

§        Webster MavhuBilingual versus Monolingual: A Comparative Analysis of Two Trends in Shona Lexicography

§        Gift MhetaThe Impact of Translation Activities on the Development of African Languages in Multilingual Societies: Shona – Ndebele – English Musical Terms Dictionary, a Case Study

§        Linkie Mohlala, Gilles-Maurice de Schryver & Rachélle GautonThe Lexicographic Treatment of the Feminine/Augmentative Suffix ‑kazi in isiZulu

§        Nomalanga MpofuThe ALRI Experience in the Compilation of a Dictionary of Biomedical Terms

§        Cornelias NcubeLanguage Development or Language Corruption: A Case of Loanwords in Isichazamazwi SeSiNdebele

§        Salmina Nong & M.P. MogodiThe Lexicographic Treatment of the Demonstrative Copulative in Sesotho sa Leboa – An Exercise in Multiple Cross-referencing

§        Thapelo J. OtlogetsweChallenges to Representative and Balanced Corpora for African Lexicography

§        Annél Otto & Nerina BosmanThe User Perspective: Bible Reference Resources as Example

§        D.J. Prinsloo — The Lemmatisation of Adverbs in Northern Sotho

§        M.P. RakgokongAre the Setswana Mockery Words that Objectionable?

§        Mariza Steyn & Liezl GouwsWoordeboek sonder Grense: A Typological and Communicative Bridge

§        P.H. SwanepoelDictionary Tailoring, SL Lexical Acquisition and Computer-Assisted Language Learning: The LINC Approach

§        Elsabé TaljardOn the Semi-automatic Extraction of Definitional Information: A Case Study for Northern Sotho

§        Dirk J. van SchalkwykLanguage Variation and the Lexicographer

 

Correspondence

 

 

Programme AFRILEX 2003

 

To programme

 

 

Keynote papers

 

The Handling of Collocations and Idiomatic Multiword Expressions: From Corpora to Dictionaries

 

Ulrich Heid

Institut für maschinelle Sprachverarbeitung – Computerlinguistik, Universität Stuttgart, Germany

 

Corpus query tools, such as WordSmith Tools or Qwick (Birmingham University), come with a function to extract collocations of a given word from a corpus. As a result, they provide lists of word pairs, often together with a measure indicating how much the two elements belong together. Already years ago, a computational linguist told me in a discussion that, with these tool functions, the problem of collocations in corpus lexicography was solved. This talk is intended to show why this is not the case.

 

The abovementioned collocation tools are based on statistical association measures that determine statistically significant co-occurrences of words. Examples of such association measures include the t-test (Church & Hanks 1992), the log-likelihood ratio test (Dunning 1993), the Mutual Information measure, etc. They are all used to reorder lists of collocation candidates, possibly extracted beforehand by means of corpus query (e.g. for nouns and the verbs these nouns are objects of, as in “pay attention”, “ask a question”, etc.). Examples and a few well-known problems of the underlying statistics will be discussed; for example, Mutual Information unduly privileges low frequency words, and log-likelihood seems to be good in particular for the upper half of the frequency spectrum, however being quite dependent on frequency.

         An analysis of some German and English data obtained in this way from corpora will show that the results of the statistical procedures, even though to some extent useful for lexicographic work, are far from homogeneous: they typically include a mixture of collocations and idiomatic word groups, as well as of trivial, lexicographically irrelevant, word combinations which may, for example, be artefacts of the corpus under analysis.

         We thus need additional linguistic criteria to further classify the material, but also, more importantly, to discover additional morphosyntactic, syntactic and semantic properties of the word combinations identified so far only in terms of the lexemes involved. It is not sufficient to know that “pay” and “attention” go together, we must also know that “pay attention” has no article; or that “former” and “time” typically come as a plural expression, often with a preposition: “in former times”. These aspects contribute to the partial idiomatisation of collocations, and a learner of a foreign language must memorise them along with the collocation. For German and English noun+verb-combinations, an attempt will be made to provide a classified list of phenomena which need to be kept track of, beyond lexical co-occurrence, to make up for a detailed description of the respective multiword items. The claim we would like to make is that collocations and idiomatic multiword expressions must be lexicographically described in as much detail as any single-word lemma; this means that information about the components of the collocation, as well as about the collocation as a whole must be given with respect to morphosyntactic, syntactic (e.g. construction), semantic and pragmatic (e.g. style/register, frequency) properties. Furthermore, collocations tend to be combined, such that texts often include significant triples or quadruples of words (e.g. (pay+attention) + (careful+attention): pay careful attention). Along with the phenomena, a few suggestions for their corpus-based acquisition will be made (Heid & Zinsmeiser 2003).

         In the third part, the question of the lexicographic data presentation will be discussed. Beyond the question of where to lemmatise collocations and idiomatic multiword groups, the detailed phenomena discussed above make the writing of an article somewhat more difficult, as they need to be kept track of. We look at this problem with bilingual (active) dictionaries in mind, printed as well as electronic. Inspiration for the article layout may come from experimental dictionaries such as Mel’cuk’s Explanatory Combinatorial Dictionaries, but also from printed dictionaries for general users, such as the Van Dale series of bilingual dictionaries in the Netherlands. Sample entries in different “styles” will be briefly discussed.

 

To Table of Contents

 

Outer Texts in Bilingual Dictionaries

 

Rufus H. Gouws

Department of Afrikaans and Dutch, University of Stellenbosch, South Africa

 

Metalexicographic research of the recent years has been characterised by a growing interest in and focus on various aspects regarding the structure of dictionaries. In this regard both the mutual features and dictionary-specific features have received attention. Dictionary research no longer only includes attempts to describe and analyse the contents of dictionaries and the different data types on offer, the different structural components of dictionaries also fall within the scope of this field of research. As a carrier of text types a dictionary is not only regarded as a source of information displaying a variety of data types in the central list. A new emphasis deviates the attention from a central list bias towards a more inclusive frame structure approach. This approach works with the assumption that the central list is complemented by front and back matter texts, constituting the outer texts of a dictionary.

Utilising the frame structure approach this paper focuses on the use of outer texts in bilingual dictionaries. The distinction between integrated and unintegrated outer texts is maintained and both these text types, their purpose and the role they play in devising the data distribution structure of a dictionary are examined. In using integrated outer texts it is shown that the data distribution does not have to focus exclusively on the default article in the central list although article stretches still accommodate the most typical data categories directed at the lemmata as guiding elements of articles and primary treatment units. It is shown how an interactive relation between the integrated outer texts and the central list can achieve an optimal realisation of the genuine purpose of a bilingual dictionary and can enhance the quality of dictionary consultation procedures.

As examples of unintegrated outer texts the use of alphabetically ordered equivalent registers, the listing of items representing the lemmata included in complex and synopsis articles as well as additional pedagogical data will be discussed. It is also shown how back matter texts can add a typological hybrid character to a dictionary by using alternative ordering systems, e.g. a thematic ordering as opposed to the alphabetical ordering of the central list. The way in which outer texts can ensure that a dictionary has a poly-accessible character that meets the needs of a user-driven project is also discussed. Looking at the user and usage situation the role of dictionary functions in the planning of the outer texts may never be underestimated and various aspects of the theory of lexicographic functions come to the fore in the discussion.

The successful use of outer texts demands a new look at the data distribution structure of bilingual dictionaries. Emphasis is yet again placed on the importance that each dictionary project should include a well-devised dictionary plan.

In this paper a dictionary is seen as a comprehensive container of knowledge and suggestions are made to improve the quality of the access structure to ensure an optimal retrieval of information by the intended target user.

 

To Table of Contents

 

Corpus Research and Lexicography

 

Gwyneth Fox

Macmillan Education: Publisher, Dictionaries

 

Work with corpora over the past 20 years has shown us a great deal about how we use English. In particular, there have been many revelations about the ways in which vocabulary patterns are surprisingly predictable, and these findings are now being reflected in learners’ dictionaries. This means that such dictionaries are probably the best record we have of the way in which English is now being used. Many examples will be given to justify this statement. But there is no reason why corpus research should not influence bilingual lexicography more than it presently seems to.

 

People are fascinated by language. And researchers have been studying it for centuries. But it is only in the past twenty years or so that we can be sure that the statements we make about the language are accurate. That is because the advent of computers has allowed us to build corpora, as large or as small as are appropriate for our particular needs, and analyse them for frequency, grammar, vocabulary, pragmatics, discourse functions, and so on. Perhaps the two areas where we have learned most are those of frequency and vocabulary.

         Although we always knew that some words were more frequent than others, we now know which words these are, and how often they are used and in what contexts. This must be important information for learners of a language: they need to know which words are worth expending effort on!

         We also realise that it is not enough just to look at words, however frequent they might be, in isolation. Collocation and colligation patterns stand out in the data, and force us to reassess the way in which we describe words, both in the classroom and in dictionaries. Collocation patterns range from the relatively fixed and difficult to decode, as in idioms and proverbs, through binomials and trinomials, through chunking, right down to those that are weak and perhaps not worth mentioning. The same is true of colligation. The phraseology of the language is much less random, much more predictable than we ever imagined.

         Another vocabulary ‘discovery’ is that of semantic prosody. Why is it that some words have attracted to them other words, either positive or negative, so that it is almost impossible to use them in any other way? Some of these words are obvious, others much less so. How could a learner know about their prosody if it were not pointed out to them?

         Corpus findings are now well known, and are expressed at their best in the new breed of learners’ dictionaries produced in the UK in the past fifteen or so years. This makes these dictionaries the best, most up-to-date, most accurate record of English as it is presently being used. Some bilingual dictionaries are now being compiled with the benefit of two, often parallel, corpora; but it seems to me that they are not yet as good (or as helpful) descriptions of the language as you find in monolingual learners’ dictionaries.

 

To Table of Contents

 

Parallel sessions

 

 

Hybrid Dictionaries – The Future of Lexicography

 

Thierry Afane Otsaga

Department of Afrikaans and Dutch, Stellenbosch, South Africa

 

Dictionaries have been compiled for several thousand years. Their need arose when it became more difficult to read and understand religious texts. Therefore, dictionaries were invented in order to assist in the understanding of these texts that were actually written in a language that was no longer understood by the interested people. Nowadays, dictionaries are still produced because certain human linguistic and knowledge needs are observed in society and they are compiled to satisfy these needs. This basic characteristic is the main purpose of dictionaries.

         In order to always satisfy user needs, lexicographers have been trying to compile different types of dictionaries, according to different aspects: the users’ language competences, users’ general culture and knowledge, users’ respective field subjects, users’ translation needs, etc. In general, they have to take into account the objectives of users when these users are using dictionaries. In that regard, various types of dictionaries have been compiled to be used by a specific target user group. Indeed, some dictionaries are directed at the extra-linguistic features of the items treated (encyclopaedic dictionaries), while other dictionaries focus on the linguistic and pragmatic aspects (linguistic dictionaries). Some dictionaries focus on the origin, history and development of the treated language (diachronic dictionaries), while still others focus on the lexicon of a language at a specific time in its development (synchronic dictionaries). In the category of linguistic dictionaries, monolingual dictionaries can aim at a scholar approach (school dictionaries), a learning approach (learners’ dictionaries), a normative approach (standard dictionaries), or a comprehensive approach (comprehensive dictionaries). Conversely, bilingual or multilingual dictionaries can be compiled for a polyfunctional purpose (polyfunctional dictionaries), they can also be monoscopal or biscopal. All these various types of dictionaries were directed by the necessity to satisfy users’ needs.

The main objective of lexicographical works is to satisfy the needs of the users. When dealing with the methodology and even with the planning of a dictionary, one must first define the target user; otherwise the compilation will not be efficient. However, in every lexicographical work the main interest is on the dictionary user. In modern lexicography, the role and the place of the user is more and more taken into account. The users are a great lobby and the publishing houses know it so well: even if a dictionary is compiled within a good methodology, if a user does not find the information he/she needs, this dictionary will not be sold or used. Thus, the user appears to be the focal point on which each element of the lexicographical process focuses. Because user needs are increasing and because most people want knowledge regarding different aspects of life, it is becoming increasingly difficult to satisfy user needs in one specific type of dictionary. At the same time, users do not want to spend more time and money by buying different dictionaries according to what they are looking for. The ideal solution for them could be to find most information they need in one single dictionary. On the other hand, it is important to specify that it is not possible to satisfy all the user needs in one dictionary, even in a multi-volume dictionary. Yet the lexicographer must try to come as close as possible to satisfying user needs. For that reason, the only solution could be the compilation of hybrid dictionaries. In fact, in modern-day lexicography hybrid dictionaries will be the solution of the future that will allow lexicographers to give to the users what there are looking for in a dictionary. In that regard, some dictionaries will not have one specific purpose, but could include two, three, four, and even five functions. A bilingual dictionary for instance will not only give translation equivalents of lemmas, it will also give paraphrases of meaning in order to allow the users to utilise the same dictionary to solve not only their problem of translation, but also to be able to improve their knowledge in the same language. The main purpose of this paper is to show that as a result of new and increasing user needs, the best way for future lexicography will be the compilation of hybrid dictionaries. Dictionaries focusing on one unique and specific aspect will no longer satisfy a public who needs to have knowledge about various aspects and domains.

 

To Table of Contents

 

Lexicography and Terminology Training at University Level

 

Mariëtta Alberts

Manager: Lexicography and Terminology Development, PanSALB, South Africa

 

The multilingual dispensation creates job opportunities for language practitioners. These language practitioners need training in various aspects regarding the language practice since lexicography, terminography, translation and editing (to name but a few) are practices that need highly skilled and knowledgeable practitioners.

Several of the focus areas of the Pan South African Language Board (PanSALB) concentrate to a certain extent on language development, such as terminology development, lexicography or aspects like translation and interpreting services. PanSALB is aware that all these language practices need skilled and highly trained personnel.

The Lexicography and Terminology Development (L&TD) focus area deals with the eleven National Lexicography Units (NLUs) and one national terminology office. The eleven national lexicography units were established and each is situated at a tertiary institution in the geolinguistic area where most of the mother-tongue speakers of the specific language are found. Unfortunately, there are only a few trained lexicographers available to work at these units. The only national terminology office in the country, the Terminology Coordination Section (TCS) is part of the National Language Service (NLS), Department of Arts and Culture (DAC). The terminologists receive in-house training on terminological and terminographical principles and practice. It is of the utmost importance to train language practitioners and students to be able to compile general as well as technical dictionaries for communication purposes.

The value of lexicography and terminology training cannot be stressed enough. The need might even be greater in South Africa than in other countries given the multilingual clause in the Constitution that provides for eleven official South African languages. Multilingual general as well as technical dictionaries are needed for proper communication between linguistic communities. Presently there are very few trained lexicographers and terminologists, especially in the African languages. Language practitioners, who are going to work on lexicographical or terminographical projects in future, need training as soon as possible.

 

This paper addresses the current situation regarding lexicography and terminology training. Suggestions are made regarding the utilisation of Schools for Languages as training venues for lexicography and terminology courses. The benefits for the Schools of Languages are spelled out. The value to other departments and faculties at the given university, the benefit to other students at other universities in the country and worldwide and to language offices or language units receives attention. The process as described would train students in the theory, principles and practice of lexicography and terminology. It would be to the advantage of the NLUs as well as the TCS and the to be established language units to appoint trained personnel rather than to devote time on in-house training. Production of general dictionaries as well as various technical dictionaries would show progress.

The various tertiary institutions such as the universities and technikons would benefit because they would train students and there would be positive and worthwhile outcomes.

The Human Language Technology virtual network would benefit by receiving multilingual general words and multilingual, polythematic terms into its database for dissemination to linguistic communities.

The language community would benefit since they would have words and terms available for better communication. Minority languages would be developed to become functional languages in the higher echelons of science and technology. Finally, the South African languages would be available as functional world languages on the Internet.

 

To Table of Contents

 

Can We Quantify the Effects of Dictionary Use?

 

Herman L. Beyer

Department of Germanic & Romance Languages, University of Namibia, Windhoek, Namibia

 

This paper aims to give an overview of the empiric research into the possibility of quantifying the effects of dictionary use among school learners, which has been conducted as a pilot study at the University of Namibia. The initial processes and results are explained, which provides insight into how the project may be amended to continue meaningfully.

 

The first instances of data captured in this project took place in 1997 while the researcher was a language teacher in Swakopmund, employed by the Ministry of Basic Education and Culture of Namibia. The working hypothesis was to determine whether the use of dictionaries by school learners would result in improved linguistic performance. One linguistic skill, that of spelling, was chosen for the experiment. The respondents comprised of two classes of Grade 11 learners who took Afrikaans as a first language. One class group was labeled the test group, the other the control group. Both groups were given a series of four unannounced spelling tests, the intervals ranging from three days to as much as two months. Each test consisted of the same 25 items, chosen on the basis of the potential spelling difficulties they might pose for learners. The learners were not informed that the test would be repeated. They were, however, on each occasion advised that the tests did not contribute to their continuous assessment mark and were not designed to measure any aspect of intelligence. By doing this, it was hoped that conditions resembling as closely as possible to normal class conditions could be created.

         The first spelling test was written by both groups under similar conditions: normal test conditions without the benefit of a dictionary.

During the second test each member of the test was provided with a dictionary on his/her desk. The respondents were given the freedom to look up any item in the dictionary to make sure of its spelling, provided that they would indicate dictionary use. This would enable the researcher to identify those items that a particular respondent chose to look up. The control group wrote the second test under conditions identical to those during the first test, i.e. without the benefit of a dictionary. Unlike the test group, however, the control group members were given immediate feedback on their tests by having them marked after exchanging the scripts among the respondent (i.e. a respondent would not mark his/her own test). Respondents were instructed to clearly indicate mistakes on their fellow respondents’ scripts and to write down the correct form in full each time. After the respondents received their tests back, they were given about 30 seconds to take a look at the results, including the corrections made by their fellow respondents. The test group was given no feedback of any nature on their tests.

         The third and fourth tests were conducted under the same conditions as the first, i.e. normal test conditions without the benefit of a dictionary.

 

This experimental procedure provided the researcher with extensive data, from which it is hoped the following questions could be approached with quantitative support:

·           Does a respondent who looks up a word for spelling purposes remember its correct spelling later? If yes, for how long? If no, are any consistencies identifiable that may allow us insight into the reasons for the perceived failure to learn and perhaps into spelling rehabilitation?

·           Is a respondent who looks up a word for spelling purposes more likely to remember its correct spelling than a respondent who does not utilise a dictionary but who is provided with rehabilitative feedback in the ‘traditional’ way? If yes, what is the role of the dictionary in this case? If no, why has learning seemingly not taken place?

The above questions underlie the basic research question that this project aims to address: Does dictionary use result in quantifiable improved linguistic performance?

 

To Table of Contents

 

Interviewer-Interviewee Interaction in Oral Interviews

 

Emmanuel Chabata

African Languages Research Institute, University of Zimbabwe, Harare, Zimbabwe

 

The intended presentation will be an analysis of language used by an interviewer and that of the interviewee during an oral interview. It will focus on the language of penetration by the interviewer, that is, the language somebody usually uses when he/she approaches a person for an interview in search of specific information. It will also look at the respondent’s language when he/she responds to different types of questions as well as that used by the people concerned in their subsequent conversation. The presentation will also look at the factors that may shape the respondent’s answers as well as the interviewer’s follow-up questions. It will furthermore look at the element of ‘misfiring’ by either of the parties and its consequences.

 

The intended presentation will focus on the strategies that an interviewer may use when he/she tries to get information from a respondent. In doing this, the presenter will be guided by the principle that each interview and each interview setting is different and needs different skills and also that each situation involves expectations and assumptions. He/She will also be guided by the assumption that whenever the sender of information, in this case the interviewer, sends a question, he/she hopes to be understood by the receiver/interviewee. However, the message may or may not go through. To see whether it has gone through or not, one has to assess the feedback that the sender gets. The presenter will also look at the interviewer’s challenges, some of which will include respondent’s attitude towards interviewer or the subject under discussion, the environment of the interview, misfiring by the respondent as well as lack of knowledge by the interviewee.

         The presentation will also focus on what an interviewer needs to do before he/she gets out to conduct an interview. For example, the interviewer has to be thoroughly prepared. Being prepared means that one has to formulate one’s questions before starting an interview. One has to come up with questions that can incite the respondent to say what he/she knows about the subject under discussion. For example, the questions have to be structured in a way that is most effective and friendly. Preparedness also entails getting the right person to interview. Depending on the purpose or subject of the interview, the interviewer has to get somebody who can supply the desired information. Besides knowing the subject, the person has to be willing to spare time for answering questions. This is an important dimension, especially given the fact that most people are usually busy. Thus, one may expect to obtain better results if one interviews a person who is prepared to give out information. The presenter will look at the common strategies that interviewers usually use to cultivate interest in the respondent.

         The presenter will also devote some time to the qualifications one should possess as a good interviewer. For one to be effective in getting information, one has to have the skill to ask questions. The assumption to be adopted here is that a skilled interviewer is better than one who is not. But this assumption also triggers a few questions. For example, how does one become skilled? Is it through training or not? How does personal character determine the end result?

         In trying to understand exactly what goes on between an interviewer and an interviewee, an analysis of their respective body languages will be part of the investigation. In this case, the assumption to be adopted is that verbal communication should match what is implied by body language. The assumption is based on the fact that verbal and non-verbal messages are intertwined, with the non-verbal symbols usually complementing the verbal ones. However, the analyses to be made will not be blind to the fact that sometimes non-verbal symbols may substitute verbal ones and also that non-verbal symbols may be inconsistent with verbal ones. Body language is considered important in oral interviews because it has a direct impact on what either of the persons involved will say after observing the gesture(s).

 

The intended presentation was inspired by the writer’s experiences as an oral interviewer during data collection for the Shona linguistic corpus. As a result of this, some of the illustrative examples to be used in the presentation will be drawn from the oral Shona corpus, that is, from audiocassettes that were recorded during the mentioned exercise. Other examples will come from general observation, as well as from analyses of what one usually sees in television interviews.

 

To Table of Contents

 

Concurrent Over- and Under-treatment in Dictionaries — The Woordeboek van die Afrikaanse Taal as a case in point

 

Gilles-Maurice de Schryver

Department of African Languages and Cultures, Ghent University, Ghent, Belgium

Department of African Languages, University of Pretoria, Pretoria, South Africa

 

In Prinsloo & De Schryver (2002) a so-called multidimensional lexicographic Ruler was introduced. With this powerful instrument measurements and predictions can be made on various macro- and microstructural dictionary levels. Three levels received thorough treatment so far, viz. considerations regarding the relative size of each alphabetical stretch, the corresponding number of lemma signs, as well as compilation-time aspects. In this paper the interplay between these levels is studied with a focus on ‘moving’ average article length, and the correlated aspects of inclusion versus omission of lemma signs.

In its most basic form, a Ruler is simply an instrument to guide the relative alphabetical breakdown in semasiological dictionaries. As such, each alphabetical category is assigned a certain percentage, reflecting the relative size of that category. Different languages, and even different types of dictionaries for a specific language, have different Rulers. The Rulers themselves are built from statistics derived from electronic corpora, as well as from existing dictionary data. Just as physical rulers with which one measures, they can be made as fine-grained as one wishes, by simply breaking down the alphabetical categories further into smaller sections. Just as the human rulers who govern us, a multidimensional lexicographic Ruler can be called in to manage a project. To date, general-language Rulers for isiNdebele (De Schryver 2002), Afrikaans (Prinsloo & De Schryver 2003a), and Sesotho sa Leboa (Prinsloo & De Schryver 2003b), as well as for Tshivenda, Xitsonga, Setswana and Sesotho have been designed.

 

During the presentation it will be indicated that the very same Ruler for a specific language can now also be used with regard to average article length. The value of this new dimension can be successfully illustrated when one analyses the huge multi-volume overall-descriptive Woordeboek van die Afrikaanse Taal (WAT), in compilation for the past three-quarter century and published up to the letter O (volume XI). Comparing WAT with a so-called ‘Afikaans AO-Ruler’ immediately reveals extreme inconsistencies with regard to average article length. For the letters A and B, for instance, it is clear that both number of pages and number of lemma signs are heavily under-treated in WAT. The under-treatment in terms of space allocation, however, is much more severe, which results in a very low average article length. Up to the letter J, the relative allocation to space is always smaller than the relative allocation to the number of lemma signs. From K onwards, a sudden reversal in this pattern occurs, and this remains so up to O. Throughout K, both space allocation and number of lemma signs are extremely heavily over-treated compared to the AO-Ruler. It should not come as a surprise that, after having spent almost 30 years on the compilation of K, the editors at WAT decided to drastically reconsider their compilation strategies, and entered a ‘new’ era (cf. Botha 1994: 423). Page-wise the compilers indeed moved closer to the AO-Ruler, with L and M slightly above and N and O under the AO-Ruler. As far as the number of lemma signs is concerned, however, these have been consistently under-treated, with O an all-time low.

         Although everyone will agree that the compilation of K was unfortunate for WAT, a new negative trend might have started with the completion of L and then M, where one observes a concurrent over- and under-treatment, in terms of space allocation and number of articles respectively. One should guard against the temptation to move ever faster through the alphabet, as seems to be the case in the last volume, where space allocation is now also under-treated, the number of articles even more so, yet where this is masked by an ever-increasing average article length.

         In order to substantiate the latter claim, an in-depth comparison between WAT and the desktop Verklarende Handwoordeboek van die Afrikaanse Taal (HAT) will be presented for the category O. Given that the entire HAT is smaller than the single category O in WAT, it is logical to assume that every single lemma sign in HAT should in principle also be entered in WAT. Upon comparison, however, one has to conclude that as many as 499 o-initial lemma signs from HAT have not been lemmatised in WAT. Just one of these 499 has been treated as a sub-lemma in WAT, 40 can only be found as untreated sub-lemmas and 175 as untreated run-ons, while 211 have not been lemmatised and do not occur anywhere in the WAT text either. The remaining 72 have not been lemmatised in WAT – either as lemmas, as sub-lemmas or as run-ons – despite the fact that those very same items are used throughout the WAT text itself. Especially problematic are those missing items that are not only highly frequent in a 10-million-word Afrikaans corpus, but are moreover cross-referred to from other items in WAT. Numerous examples of such cases will be discussed.

Monitoring the compilation of especially a multi-volume dictionary project with an average article length Ruler is crucial if one is to avoid such major inconsistencies. A concurrent over- and under-treatment in t