com.lowagie.text.pdf.hyphenation

Class HyphenationTree

public class HyphenationTree extends TernaryTree implements PatternConsumer

This tree structure stores the hyphenation patterns in an efficient way for fast lookup. It provides the provides the method to hyphenate a word.

Author: Carlos Villegas

Field Summary
protected TernaryTreeclassmap
This map stores the character classes
TernaryTreeivalues
Temporary map to store interletter values on pattern loading.
static longserialVersionUID
protected HashMapstoplist
This map stores hyphenation exceptions
protected ByteVectorvspace
value space: stores the interletter values
Constructor Summary
HyphenationTree()
Method Summary
voidaddClass(String chargroup)
Add a character class to the tree.
voidaddException(String word, ArrayList hyphenatedword)
Add an exception to the tree.
voidaddPattern(String pattern, String ivalue)
Add a pattern to the tree.
StringfindPattern(String pat)
protected byte[]getValues(int k)
protected inthstrcmp(char[] s, int si, char[] t, int ti)
String compare, returns 0 if equal or t is a substring of s
Hyphenationhyphenate(String word, int remainCharCount, int pushCharCount)
Hyphenate word and return a Hyphenation object.
Hyphenationhyphenate(char[] w, int offset, int len, int remainCharCount, int pushCharCount)
Hyphenate word and return an array of hyphenation points.
voidloadSimplePatterns(InputStream stream)
protected intpackValues(String values)
Packs the values by storing them in 4 bits, two values into a byte Values range is from 0 to 9.
voidprintStats()
protected voidsearchPatterns(char[] word, int index, byte[] il)

Search for all possible partial matches of word starting at index an update interletter values.

protected StringunpackValues(int k)

Field Detail

classmap

protected TernaryTree classmap
This map stores the character classes

ivalues

private transient TernaryTree ivalues
Temporary map to store interletter values on pattern loading.

serialVersionUID

private static final long serialVersionUID

stoplist

protected HashMap stoplist
This map stores hyphenation exceptions

vspace

protected ByteVector vspace
value space: stores the interletter values

Constructor Detail

HyphenationTree

public HyphenationTree()

Method Detail

addClass

public void addClass(String chargroup)
Add a character class to the tree. It is used by SimplePatternParser as callback to add character classes. Character classes define the valid word characters for hyphenation. If a word contains a character not defined in any of the classes, it is not hyphenated. It also defines a way to normalize the characters in order to compare them with the stored patterns. Usually pattern files use only lower case characters, in this case a class for letter 'a', for example, should be defined as "aA", the first character being the normalization char.

addException

public void addException(String word, ArrayList hyphenatedword)
Add an exception to the tree. It is used by SimplePatternParser class as callback to store the hyphenation exceptions.

Parameters: word normalized word hyphenatedword a vector of alternating strings and hyphen objects.

addPattern

public void addPattern(String pattern, String ivalue)
Add a pattern to the tree. Mainly, to be used by SimplePatternParser class as callback to add a pattern to the tree.

Parameters: pattern the hyphenation pattern ivalue interletter weight values indicating the desirability and priority of hyphenating at a given point within the pattern. It should contain only digit characters. (i.e. '0' to '9').

findPattern

public String findPattern(String pat)

getValues

protected byte[] getValues(int k)

hstrcmp

protected int hstrcmp(char[] s, int si, char[] t, int ti)
String compare, returns 0 if equal or t is a substring of s

hyphenate

public Hyphenation hyphenate(String word, int remainCharCount, int pushCharCount)
Hyphenate word and return a Hyphenation object.

Parameters: word the word to be hyphenated remainCharCount Minimum number of characters allowed before the hyphenation point. pushCharCount Minimum number of characters allowed after the hyphenation point.

Returns: a Hyphenation object representing the hyphenated word or null if word is not hyphenated.

hyphenate

public Hyphenation hyphenate(char[] w, int offset, int len, int remainCharCount, int pushCharCount)
Hyphenate word and return an array of hyphenation points.

Parameters: w char array that contains the word offset Offset to first character in word len Length of word remainCharCount Minimum number of characters allowed before the hyphenation point. pushCharCount Minimum number of characters allowed after the hyphenation point.

Returns: a Hyphenation object representing the hyphenated word or null if word is not hyphenated.

loadSimplePatterns

public void loadSimplePatterns(InputStream stream)

packValues

protected int packValues(String values)
Packs the values by storing them in 4 bits, two values into a byte Values range is from 0 to 9. We use zero as terminator, so we'll add 1 to the value.

Parameters: values a string of digits from '0' to '9' representing the interletter values.

Returns: the index into the vspace array where the packed values are stored.

printStats

public void printStats()

searchPatterns

protected void searchPatterns(char[] word, int index, byte[] il)

Search for all possible partial matches of word starting at index an update interletter values. In other words, it does something like:

for(i=0; i

But it is done in an efficient way since the patterns are stored in a ternary tree. In fact, this is the whole purpose of having the tree: doing this search without having to test every single pattern. The number of patterns for languages such as English range from 4000 to 10000. Thus, doing thousands of string comparisons for each word to hyphenate would be really slow without the tree. The tradeoff is memory, but using a ternary tree instead of a trie, almost halves the the memory used by Lout or TeX. It's also faster than using a hash table

Parameters: word null terminated word to match index start index from word il interletter values array to update

unpackValues

protected String unpackValues(int k)