converts any Grammar into RELAX NG XML representation through SAX1 events.
How it works
Grammar
object can be thought as a (possibly) cyclic graph
made from
Expression
. For example, the following simple
TREX pattern will be represented as following AGM.
abc
abc
Note that
- sub expressions are shared (see <string> expression).
- there is a cycle in the graph.
- several syntax elements are replaced by others
(e.g., <optional>P</optional> -> <choice><empty/>P</choice>)
To write these expressions into TREX XML representation,
we have to take care of cycles, since cyclic references cannot be written into
XML without first cut it and use <ref>/<define> pair.
First, this algorithm splits the grammar into
"islands".
Island is a tree of expressions; it has a
head expression
and most importantly it doesn't contain any cycles in it. Member of an island
can be always reached from its head.
TREXWriter will make every
ElementExp
and
ReferenceExp
a head of their own island. So each of them
has their own island.
It is guaranteed that this split will always give islands without inner cycles.
Several islands can form a cycle, but one island can never have a cycle in it.
This is because there is always at least one ElementExp in any cycle.
Note that since expressions are shared, one expression can be
a member of several islands (although this isn't depicted in the above figure.)
Then, this algorithm merges some islands. For example, island E is
referenced only once (from island D). This means that there is no need to
give a name to this pattern. Instead, island E can simply written as a
subordinate of island D.
In other words, any island who is only referenced at most once is merged
into its referer. This step makes the output more compact.
Next, TREXWriter assigns a name to each island. It tries to use the name of
the head expression. If a head is anonymous ReferenceExp (ReferenceExp whose
name field is
null
) or there is a name conflict, TREXWriter
will add some suffix to make the name unique.
Finally, each island is written as one named pattern under <define>
element. All inter-island references are replaced by <ref> element.
Why SAX1?
Due to the bug and insufficient supports for the serialization through SAX2,
The decision is made to use SAX1. SAX1 allows us to control namespace prefix
mappings better than SAX2.