Chapter 7. File Filters

1. File filters dialog
2. Filter options
3. Edit filter dialog
3.1. Source file type, filename pattern
3.2. Source and Target file encoding
3.3. Target filename

OmegaT features highly customizable filters, enabling you to configure numerous aspects. File filters are pieces of code capable of:

Most users will find the default file filter options sufficient. If this is not the case, open the main dialog by selecting Options → File Filters... from the main menu.

Warning! Should you change filter options whilst a project is open, you must reload the project in order for the changes to take effect.

1. File filters dialog

This dialog lists available file filters. Should you wish not to use OmegaT to translate files of a certain type, you can turn off the corresponding filter by unticking the check box beside its name. OmegaT will then omit the appropriate files while loading projects, and will copy them unmodified when creating target documents. When you wish to use the filter again, just tick the check box. Click Defaults to reset the file filters to the default settings. To edit which files in which encodings the filter is to process, select the filter from the list and click Edit.

2. Filter options

Five filters (Text files, XHTML files, HTML and XHTML files, OpenDocument/OpenOffice.org files and Microsoft Open XML files) have one or more specific options. To modify the options select the filter from the list and click on Options. The available options are:

Text files

  • Paragraph segmentation on line breaks, empty lines or never: if sentence segmentation rules are active, the text will further be segmented according to the option selected here.

PO files

  • Allow blank translations in the target file: If on, when a PO segment (which may be a whole paragraph) is not translated, the translation will be empty in the target file. Technically, msgstr will be empty. As this is the standard behavior for PO files, it is on by default. If the option is off, the source text will be copied to the target segment.

XHTML Files

  • Translate the following attributes: the selected attributes will appear as segments in the Editor window.

  • Start a new paragraph on: the <br> HTML tag will constitute a paragraph for segmentation purposes.

  • Skip text matching regular expression: The text matching the regular expression will be skipped.

  • Do not translate the content attribute of meta-tags ... : the attribute key-value pairs, separated by commas, will be left untranslated

Microsoft Office Open XML files

You can select which elements are to be translated. They will appear as separate segments in the translation.

  • Word: Non-visible Instruction Text, Comments, Footnotes, Endnotes, Footers

  • Excel: Comments, Sheet names

  • Power Point: Slide Comments, Slide Masters, Slide Layouts

  • Global: Charts, Daigrams, Drawings, Wordart

  • Other Options:

    • Aggregate tags: If checked, tags without translatable text between them will be aggregated into single tags.

    • Preserve spaces for all tags: if checked, "white space" (i.e., spaces and newlines) will be preserved, even if not set technically in the document

HTML and XHTML files

  • Add or rewrite encoding declaration in HTML and XHTML files

    Always (default), Only if (X)HTML file has a header, Only if (X)HTML file has an encoding declaration, Never

  • Translate the following attributes: the selected attributes will appear as segments in the Editor window.

  • Start a new paragraph on: the <br> HTML tag will constitute a paragraph for segmentation purposes.

  • Skip text matching regular expression: The text, matching the regular expression, will be skipped.

  • Do not translate the content attribute of meta-tags ... : the attribute key-value pairs, separated by commas, will be left untranslated

Text files

  • Paragraph segmentation on line breaks, empty lines or never: if sentence segmentation rules are active, the text will further be segmented according to the option selected here.

OpenDocument/OpenOffice.org files

  • You can select which of the following items are to be translated:

    Index entries, Bookmarks, Bookmark references, Notes, Comments, Presentation notes, Links (URL), Sheet names

3. Edit filter dialog

This dialog enables you to set up the source filename patterns of files to be processed by the filter, customize the filenames of translated files, and select which encodings should be used for loading the file and saving its translated counterpart. To modify a file filter pattern, either modify the fields directly or click Edit. To add a new file filter pattern, click Add. The same dialog is used to add a pattern or to edit a particular pattern. The dialog is useful because it includes a special target filename pattern editor with which you can customize the names of output files.

3.1. Source file type, filename pattern

When OmegaT encounters a file in its source folder, it attempts to select the filter based upon the file's extension. More precisely, OmegaT attempts to match each filter's source filename patterns against the filename. For example, the pattern *.xhtml matches any file with the .xhtml extension. If the appropriate filter is found, the file is assigned to it for processing. For example, by default, XHTML filters are used for processing files with the .xhtml extension. You can change or add filename patterns for files to be handled by each file. Source filename patterns use wild card characters similar to those used in Searches. The '*' character matches zero or more characters. The '?' character matches exactly one character. All other characters represent themselves. For example, if you wish the text filter to handle readme files (readme, read.me, and readme.txt) you should use the pattern read*.

3.2. Source and Target file encoding

Only a limited number of file formats specify a mandatory encoding. File formats that do not specify their encoding will use the encoding you set up for the extension that matches their name. For example, by default .txt files will be loaded using the default encoding of your operating system. You may change the source encoding for each different source filename pattern. Such files may also be written out in any encoding. By default, the translated file encoding is the same as the source file encoding. Source and target encoding fields use combo boxes with all supported encodings included. <auto> leaves the encoding choice to OmegaT. This is how it works:

  • OmegaT identifies the source file encoding by using its encoding declaration, if present (HTML files, XML based files)

  • OmegaT is instructed to use a mandatory encoding for certain file formats (Java properties etc)

  • OmegaT uses the default encoding of the operating system for text files.

3.3. Target filename

Sometimes you may wish to rename the files you translate automatically, for example adding a language code after the file name. The target filename pattern uses a special syntax, so if you wish to edit this field, you must click Edit...and use the Edit Pattern Dialog. If you wish to revert to default configuration of the filter, click Defaults. You may also modify the name directly in the target filename pattern field of the file filters dialog. The Edit Pattern Dialog offers the following options:

  • Default is ${filename}– full filename of the source file with extension: in this case the name of the translated file is the same as that of the source file.

  • ${nameOnly}– allows you to insert only the name of the source file without the extension.

  • ${targetLocale}– target locale code (of a form "xx_YY").

  • ${targetLanguage}– the target language and country code together (of a form "XX-YY").

  • ${targetLanguageCode} – the target language only ("XX").

  • ${targetCountryCode}– the target country only ("YY").