The conversion of ‘Histoloogiasõnastik’ to TBX

Siim Viiklaid
TBX (TermBase eXchange) is an XML-based standard for representing and exchanging terminological data in various computer environments. The objective of this thesis is to convert ‘Histoloogiasõnastik’ (dictionary of histology, created by Ülo Hussar) to a valid TBX document. TBX format allows easier ways for transforming terminological data to various representation forms, an HTML glossary for instance. Another objective of the thesis was to generate cross-references between entries of the ‘Histoloogiasõnastik’, which would make using the dictionary more convenient for the end-user. TBX document is a termbase (terminological database). Termbase should be designed according to a particular model that allows converting it to different formats and prevents systematic errors during the creation of the database. ‘Histoloogiasõnastik’ reflects this model therefore making it possible to convert it to TBX. The original data are in TeX format, entries being rather simple in their structure but containing a lot of different variations and exceptions. The methodology used for the conversion was cyclic in its nature, consisting of four main stages: • parsing original files of the dictionary, outputting an XML representation of the data • finding cross-references and forming the TBX structure, outputting the end product of the conversion • transforming the TBX document to an HTML document, allowing easy inspection of the end result to detect errors and overlooked exceptions in the original data • correcting mistakes of the conversion process and eliminating exceptions in the original data The end result of the conversion is an XML document that is in accordance with the TBX specification and satisfies the main principles of a termbase design. The converted dictionary will be published in Keeleveeb, a portal that along with different linguistic resources also features other technical dictionaries similar to ‘Histoloogiasõnastik’.
Bachelor - Computer Science
Heiki-Jaan Kaalep
