Glossaries are files created and updated manually for use in OmegaT.
If an OmegaT project contains one or more glossaries, any terms in the glossary which are also found in the current segment will be automatically displayed in the Glossary viewer.
To use an existing glossary, simply place it in the /glossary
folder after creating the
project. OmegaT automatically detects glossary
files in this folder when a project is opened. Terms in the current
segment which OmegaT finds in the glossary
file(s) are displayed in the Glossary pane:
The word before the = sign is the source term, and its translation is (or are) the words after =. The vocabulary entry can have a comment added (see "transitive verb" for the second item). The glossary function only finds exact matches with the glossary entry (e.g. does not find inflected forms etc.). New terms can be added manually to the glossary file(s) during translation (for example in a text editor), but newly added terms will not be recognized until the project is reloaded.
The source term does not have to be a single-word item, as the next example shows:
The underlined item "new preview screenshot", consists of three words and can be found in the glossary pane as "nov predogled posnetka zaslona". Note that parts of the multi-term items ("preview" in the above example is also recognized on its own as "predogled") are recognized as well, but ranked lower.
Glossary files are simple plain text files containing three-column, tab-delimited lists with the source and target terms in the first and second columns respectively. The third column can be used for additional information. Glossary files can be either in system default encoding (and indicated by the extension .tab) or in UTF-8 (the extension .utf8). The Unicode encoding (UTF8) is preferred for obvious reasons.
Also supported is the CSV format. This format is the same as the tab separated one: source term, target term. Comment fields are separated by a comma ','. Strings can be enclosed by quotes ", which allows having a comma inside a string:
"This is a source term, which contains a comma","c'est un
terme, qui contient une virgule"
In addition to the plain text format, TBX format is also supported. TBX - Term Base eXchange (TBX) is the open, XML-based standard for exchanging structured terminological data that has been approved as an international standard by LISA and ISO. If you have an existing terminology handling system - MultiTerm for example - it is quite possible it offers the export of terminology data via TBX format.
The method here is foolproof, when followed in a reasonably careful fashion. You need OpenOffice.org Writer for it, so - if you haven't already done so - download and install OpenOffice.org. Launch OpenOffice.org and open a new text document or launch "OpenOffice.org Writer".
In the empty document, enter your glossary terms as follows: a source-language term, tab space, the target-language term , tab space, a comment or explanation for the item, and a Return. Tab space is the tabulator key on the left hand side of the keyboard. If you do not wish to add any comments, you can drop the second tab space. A "term" can be a single word or a whole phrase. On the second line, enter the second term and its translation.
When you have finished entering the terms, you will have two "columns" of terms, source-language terms on the left and their target-language terms on the right, and possibly a third column, containing you comments and explanations, The tab space (→ in the example below) and Enter (¶) characters can be made visible by clicking on the ¶ icon in the Writer's Standard writer bar. Here are a few lines of an English -German glossary)
word →Wort→das (-/e/s, Wörter/-e)¶
small house→Häuschen→das, (pl Häuschen)¶
dog →Hund→m, f Hündin ¶
horse→Pferd→n, m Hengst f Stute n Fohlen¶
Do NOT use OpenOffice.org's "columns" function to create columns: just separate each source and target-language term pair with a single tab space.
When you are finished with the entries, save the file as Unicode-encoded file as follows:
Select File > Save As
In the "File location" box, enter the name for your glossary file.
For "Filter", select "Text Encoded (.txt.)"
Make sure "Automatic file name extension" and "Edit filter settings" are unchecked.
Confirm with OK.
After creating an OmegaT project, copy or move this file into the project's \glossary folder. If the project is already open, reload it after copying the glossary file into the project. You can make changes to a glossary file while it is being used in a project. Glossary changes are detected approximately once every second and modifications load transparently in the background, so there's no need to reload the project after saving the new glossary file.
When a segment containing a source-text term is opened, the glossary pane will display glossary entries for those items in the source segment that can be found in the glossary (or glossaries - you can have more than one available, and they can be stored in subfolders of glossary as well).
Note: Of course there are other
ways and means to create a simple file with tab delimited entries, and
they are all simpler, a lot of them much simpler, than the above
suggestion. For instance, one can export the contents above as a
CSV
, instead of a UTF8
text
file. Note, however, that the above suggestion works for any target
system, be it Windows, OS X or Linux. Nothing speaks against using
Notepad++ on Windows or GEdit on Linux for instance: any text editor, that
can handle UTF8 and that can show white space (so that you do not miss the
required TAB characters) can be used.
The contents of glossary files are kept in memory and are loaded when the project is opened or reloaded. Updating a glossary file is thus rather simple:
keep the file open in your selected editor
When you come across a new term that you would like to add to your glossary, enter the new term, its translation and any comments you may have (ensuring you press tab between the fields) and save the file. The contents of the glossary pane will be updated accordingly.
Data exported from Trados MultiTerm can be used as
OmegaT glossaries without further modification,
provided they are given the file extension .tab
and
the source and target term fields are the first two fields
respectively.
If you export using the system option "Tab-delimited export", you will need to delete the first 5 columns (Seq. Nr, Date created etc). The newer versions of MultiTerm support exporting to TBX format.
Problem: No glossary terms are displayed - possible causes:
No glossary file found in the "glossary" folder.
The glossary file is empty.
The items are not separated with a TAB character.
The glossary file does not have the correct extension (.tab or .utf8).
There is no EXACT match between the glossary entry and the source text in your document - for instance plurals.
The glossary file does not have the correct encoding.
There are no terms in the current segment which match any terms in the glossary.
One or more of the above problems may have been fixed, but the project has not been reloaded.
Problem: In the glossary pane, some characters are not displayed properly
...but the same characters are displayed properly in the Editing pane: the extension and the file encoding do not match.