OmegaT projects can have translation memory files - i.e. files with the extension tmx - in four different places:
The omegat folder contains the project_save.tmx
and possibly
a number of backup TMX files. The
project_save.tmx
file contains all the
segments that have been recorded in memory since you started the
project. This file always exists in the project. Its contents will
always be sorted alphabetically by the source segment.
The main project folder contains 3 tmx files,
project_name-omegat.tmx
,
project_name-level1.tmx
and
project_name-level2.tmx
(project_name being
the name of your project).
The level1 file contains only textual information.
The level2 file encapsulates OmegaT specific tags in correct tmx tags so that the file can be used with its formatting information in a translation tool that supports tmx level 2 memories, or OmegaT itself.
The OmegaT file includes OmegaT specific formatting tags so that the file can be used in other OmegaT projects
These files are copies of the file
project_save.tmx
, i.e. of the project's main
translation memory, excluding the so-called orphan segments. They
carry appropriately changed names, so that its contents still
remain identifiable, when used elsewhere, for instance in
the tm
subfolder of some other project (see
below).
tm
folderThe /tm/ folder can contain any number of ancillary translation memories - i.e. tmx files. Such files can be created in any of the three varieties indicated above. Note that other CAT tools can export (and import as well) tmx files, usually in all three forms. The best thing of course is to use OmegaT-specific TMX files (see above), so that the in-line formatting within the segment is retained.
The contents of translation memories in the tm subfolder serve to generate suggestions for the text(s) to be translated. Any text, already translated and stored in those files, will appear among the fuzzy matches, if it is sufficiently similar to the text currently being translated.
If the source segment in one of the ancillary TMs is identical to the text being translated, OmegaT acts as defined in the [fuzzy], so that the translator can review the translations at a later stage and check whether the segments tagged this way, have been translated correctly (see the Editing behavior chapter)
→ dialogue window. For instance (if the default is accepted), the translation from the ancillary TM is accepted and prefixed withIf it is clear from the very start, that translations in a given TM (or TMs) are all correct, one can put them into the tm/auto folder and avoid confirming a lot of [fuzzy] cases. This will effectively pre-translate the source text: all the segments in the source text, for which translations can be found in those "auto" TMs, will land in the main TM of the project without any user intervention.
Optionally, you can let OmegaT have an additional tmx file (OmegaT-style) anywhere you specify, containing all translatable segments of the project. See pseudo-translated memory below.
Note that all the translation memories are loaded into memory when
the project is opened. Back-ups are produced regularly of the project
translation memory (see next chapter), and project_save.tmx
is also
saved/updated when the project is closed or loaded again. This means for
instance that you do not need to exit a project you are currently
working on if you decide to add another ancillary TM to it: you simply
reload the project, and the changes you have made will be
included.
The locations of the various different translation memories for a given project are user-defined (see Project dialog window in Instant start guide)
Depending on the situation, different strategies are thus possible, for instance:
several projects on the same subject: keep the project structure, and change source and target directories (Source = source/order1, target = target/order1 etc). Note that you segments from order1, that are not present in order2 and other subsequent jobs, will be tagged as orphan segments; however, they will still be useful for getting fuzzy matches.
several translators working on the same
project: split the source files into source/Alice,
source/Bob... and allocate them to team members (Alice, Bob ...). They
can then create their own projects and, deliver their
own project_save.tmx
, when finished or when a given
milestone has been reached. The project_save.tmx
files are then collected and possible conflicts as regards terminology
for instance get resolved. A new version of the master TM is then
created, either to be put in team members'
tm/autosubdirectories or to replace their
project_save.tmx
files. The team can also use the same subfolder
structure for the target files. This allows them for instance to check
at any moment, whether the target version for the complete project is
still OK
As you translate your files, OmegaT
stores your work continually in project_save.tmx
in
the project's /omegat
subdirectory.
OmegaT also backups translation memory
to project_save.tmx.YEARMMDDHHNN.bak
in the same
subfolder whenever a project is opened or reloaded. YEAR is 4-digit
year, MM is a month, DD day of the month, HH and NN are hours and
minutes when the previous translation memory was saved.
If you believe you have lost translation data, follow the following procedure:
Close the project
Rename the current project_save.tmx
file
( e.g. to project_save.tmx.temporary
)
Select the backup translation memory that is most likely - e.g. the most recent one, or the last version from the day before) to contain the data you are looking for
Copy it to project_save.tmx
Open the project
tmx files contain translation units, made of a number of equivalent segments in several languages. A translation unit comprises at least two translation unit variations (tuv). Either can be used as the source or target.
The settings in your project indicate which is the source and which the target language. OmegaT thus takes the tuv segments corresponding to the project's source and target language codes and uses them as the source and target segments respectively. OmegaT recognizes the language codes using the following two standard conventions :
2 letters (e.g. JA for Japanese), or
2- or 3-letter language code followed by the 2-letter country code (e.g. EN-US - See Appendix B, Languages - ISO 639 code list for a partial list of language and country codes).
If the project language codes and the tmx language codes fully match, the segments are loaded in memory. If languages match but not the country, the segments still get loaded. If neither the language code not the country code match, the segments will be ignored.
The file project_save.tmx
contains all the
segments that have been translated since you started the project. If you
modify the project segmentation or delete files from the source, some
matches may appear as orphan strings in
the Match Viewer: such matches refer to segments that do not exist any
more in the source documents, as they correspond to segments translated
and recorded before the modifications took place.
Initially, that is when the project is created, the main TM of the
project, project_save.tmx
is empty. This TM gradually
becomes filled during the translation. To speed up this process, existing
translations can be reused. If a given sentence has already been
translated once, and translated correctly, there is no need for it to be
retranslated. Translation memories may also contain reference
translations: multinational legislation, such as that of the European
Community, is a typical example.
When you create the target documents in an
OmegaT project, the translation memory of the
project is outputted in the form of three files in the root folder of your
OmegaT project (see the above description). You
can regard these three tmx files (-omegat.tmx
,
-level1.tmx
and -level2.tmx
) as
an "export translation memory", i.e. as an export of your current
project's content in bilingual form.
Should you wish to reuse a translation memory from a previous project (for example because the new project is similar to the previous project, or uses terminology which might have been used before), you can use these translation memories as "input translation memories", i.e. for import into your new project. In this case, place the translation memories you wish to use in the \tm or \tm\auto folder of your new project: in the former case you will get hits from these translation memories in the fuzzy matches viewer, and in the latter case these TMs will be used to pre-translate your source text.
By default, the \tm folder is below the project's root folder (e.g. ...\MyProject\tm), but you can choose a different folder in the project properties dialog if you wish. This is useful if you frequently use translation memories produced in the past, for example because they are on the same subject or for the same customer. In this case, a useful procedure would be:
Create a folder (a "repository folder") in a convenient location on your hard drive for the translation memories for a particular customer or subject.
Whenever you finish a project, copy one of the three "export" translation memory files from the root folder of the project to the repository folder.
When you begin a new project on the same subject or for the same customer, navigate to the repository folder in the
and select it as the translation memory folder.Note that all the tmx files in the /tm
repository are parsed when the project is opened, so putting all different TMs you may have
on hand into this folder may unnecessarily slow OmegaT down. You may even
consider removing those that are not required any more, once you have used
their contents to fill up the project-save.tmx
file.
OmegaT supports imported tmx versions 1.1-1.4b (both level 1 and level 2). This enables the translation memories produced by other tools to be read by OmegaT. However, OmegaT does not fully support imported level 2 tmx files (these store not only the translation, but also the formatting). Level 2 tmx files will still be imported and their textual content can be seen in OmegaT, but the quality of fuzzy matches will be somewhat lower.
OmegaT follows very strict procedures when loading translation memory (tmx) files. If an error is found in such a file, OmegaT will indicate the position within the defective file at which the error is located.
Some tools are known to produce invalid tmx files under certain conditions. If you wish to use such files as reference translations in OmegaT, they must be repaired, or OmegaT will report an error and fail to load them. Fixes are trivial operations and OmegaT assists troubleshooting with the related error message. You can ask the user group for advice if you have problems.
OmegaT exports version 1.4 TMX files (both level 1 and level 2). The level 2 export is not fully compliant with the level 2 standard, but is sufficiently close and will generate correct matches in other translation memory tools supporting TMX Level 2. If you only need textual information (and not formatting information), use the level 1 file that OmegaT has created.
Of interest for advanced users only!
Before segments get translated, you may wish to pre-process them or address them in some other way than is possible with OmegaT. For example, if you wish to create a pseudo-translation for testing purposes, OmegaT enables you to create an additional tmx file that contains all segments of the project. The translation in this tmx can be either
translation equals source (default)
translation segment is empty
The tmx file can be given any name you specify. A pseudo-translated memory can be generated with the following command line parameters:
java -jar omegat.jar --pseudotranslatetmx=<filename>
[pseudotranslatetype=[equal|empty]]
Replace <filename>
with the name of the
file you wish to create, either absolute or relative to the working
directory (the directory you start OmegaT
from). The second argument --pseudotranslatetype
is
optional. Its value is either equal
(default value, for
source=target) or empty
(target segment is empty). You
can process the generated tmx with any tool you want. To reuse it in
OmegaT rename it to project_save.tmx
and place it in the omegat
-folder of your
project.
Very early versions of OmegaT were capable of segmenting source files into paragraphs only and were inconsistent when numbering formatting tags in HTML and Open Document files. OmegaT 2.3 can detect and upgrade such tmx files on the fly to increase fuzzy matching quality and leverage your existing translation better, saving you the work of doing this manually.
A project's tmx will be upgraded only once, and will be written in
upgraded form into the project-save.tmx
; legacy tmx
files will be upgraded on the fly each time the project is loaded. Note
that in some cases changes in file filters in
OmegaT 2.3 may lead to totally different
segmentation; as a result, you will have to upgrade your translation
manually in such rare cases.