The 16th Annual International AFRILEX Conference
UNAM, Windhoek, Namibia, 5-7 July 2011

[Abstract:] Almind, Richard: Flexible Database Model for Multiple Dictionaries

The most important aspect of a database for lexicographical use is to remember that it is a vessel for data. Underlying the database is a description of how data elements (objects such as fields and tables) are related to each other, often called a relationship diagram. In a lexicographical context, for instance, it describes that data A (“definition”) is related to data B (“lemma”) through the key n. This construction of interrelated data usually consists of alphanumeric data, which is one among many possible data types, but could also be pure numbers or binary data such as images, sound bites, or video clips. Exactly what is stored in a database is for the editor to decide and the programmer to prepare. The database is not a dictionary in itself nor is the editor application that is used to maintain the data.

The underlying database for any modern dictionary is usually sufficiently flexible to allow for most publishing methods and it would be wrong to constrain oneself to thinking in printed versions of dictionaries alone since the electronic options are more natural to databases, especially since the restrictions of fixed media like print are a hindrance to understanding the true possibilities of electronic dictionaries. Whether the output is print or electronic the main reason to use a database is the possibility to act swiftly on new insights be they political, social, linguistic or other. Wikipedia is the preferred standard against which many online information systems are measured and the underlying method of updating data almost as it is generated in real life is key to understanding database-driven information.

Unfortunately, there are two very large obstacles for traditional lexicographers to understand what database-driven lexicography can lead to. The first is their self-imposed limitation to linguistics. Whereas there is nothing wrong in having fixed limitations in a given field and using those limits to explore a set of possibilities, in this case it quickly reaches the limits of what kind of insights can be found by using linguistic phenomena as a case study for lexicography. A lexicographical tool based on dynamic media like the internet can lead to tools that go far beyond the dissemination of linguistic data and reach much further and much more naturally into information sciences than expected. The second obstacle is somewhat harder to dispel. Being used to thinking in fixed media like print it becomes difficult to understand that lexicographical data as defined in a database are interchangeable building blocks and that the original definition of a dictionary article looses its meaning once the user aspect comes into play. In other words: the data in the database has no function until it becomes visible to the user and keeping a fixed article in mind and applying that image to the design of a database is a restriction best overcome quickly since this type of limitation hinders a proper design sometimes even making it useless.

Using the Dictionaries of Accounting as an example it will be shown that flexibly designed databases or data collections extend the lexicographer’s possibilities to let a dictionary evolve from a traditional type where all data is shown all the time to a more modern approach where a small set of data is shown when and as it is needed, an approach suitably explained as “less is more”.

The resulting dictionaries have special focus on different user needs, which are not, however, the focus of the presentation. The main focus is on the design of the underlying database itself, which opens for new uses not originally intended.

The database behind the Dictionaries of Accounting is designed to be a one-to-many relationship between languages where English is at the hub and other languages relate to the English definitions of international accounting terms. This allows for any dictionary in language pairs where English is one of the languages, for instance English-Spanish and Danish-English but not Danish-Spanish. This limitation is not a severe hindrance but necessary for various reasons.

However flawed the design might look at first glance, it is useful in regions with many languages and common law and customs. For instance, since the database has been designed to be flexible, its lexicographic objects or “building blocks” such as collocation-tables, synonym/antonym-constructs etc. are interchangeable and it would be possible to use the framework and editing facilities to create a medical dictionary with Afrikaans or English at the hub and any number of other languages attached to it. It might in fact be possible to create two parallel versions one in English and one in Afrikaans and keep them synchronised with a relatively low use of resources. The same could be done with other areas of specialised languages such as law, biology, geography, etc.