dk.brics.automaton

Class RegExp

public class RegExp extends Object

Regular Expression extension to Automaton.

Regular expressions are built from the following abstract syntax:

regexp::=unionexp
|
unionexp::=interexp | unionexp(union)
|interexp
interexp::=concatexp & interexp(intersection)[OPTIONAL]
|concatexp
concatexp::=repeatexp concatexp(concatenation)
|repeatexp
repeatexp::=repeatexp ?(zero or one occurrence)
|repeatexp *(zero or more occurrences)
|repeatexp +(one or more occurrences)
|repeatexp {n}(n occurrences)
|repeatexp {n,}(n or more occurrences)
|repeatexp {n,m}(n to m occurrences, including both)
|complexp
complexp::=~ complexp(complement)[OPTIONAL]
|charclassexp
charclassexp::=[ charclasses ](character class)
|[^ charclasses ](negated character class)
|simpleexp
charclasses::=charclass charclasses
|charclass
charclass::=charexp - charexp(character range, including end-points)
|charexp
simpleexp::=charexp
|.(any single character)
|#(the empty language)[OPTIONAL]
|@(any string)[OPTIONAL]
|" <Unicode string without double-quotes> "(a string)
|( )(the empty string)
|( unionexp )(precedence override)
|< <identifier> >(named automaton)[OPTIONAL]
|<n-m>(numerical interval)[OPTIONAL]
charexp::=<Unicode character>(a single non-reserved character)
|\ <Unicode character> (a single character)

The productions marked [OPTIONAL] are only allowed if specified by the syntax flags passed to the RegExp constructor. The reserved characters used in the (enabled) syntax must be escaped with backslash (\) or double-quotes ("..."). (In contrast to other regexp syntaxes, this is required also in character classes.) Be aware that dash (-) has a special meaning in charclass expressions. An identifier is a string not containing right angle bracket (>) or dash (-). Numerical intervals are specified by non-negative decimal integers and include both end points, and if n and m have the same number of digits, then the conforming strings must have that length (i.e. prefixed by 0's).

Author: Anders Møller <amoeller@brics.dk>

Field Summary
static intALL
Syntax flag, enables all optional regexp syntax.
static intANYSTRING
Syntax flag, enables anystring (@).
static intAUTOMATON
Syntax flag, enables named automata (<identifier>).
static intCOMPLEMENT
Syntax flag, enables complement (~).
static intEMPTY
Syntax flag, enables empty language (#).
static intINTERSECTION
Syntax flag, enables intersection (&).
static intINTERVAL
Syntax flag, enables numerical intervals (<n-m>).
static intNONE
Syntax flag, enables no optional regexp syntax.
Constructor Summary
RegExp(String s)
Constructs new RegExp from a string.
RegExp(String s, int syntax_flags)
Constructs new RegExp from a string.
Method Summary
Set<String>getIdentifiers()
Returns set of automaton identifiers that occur in this regular expression.
booleansetAllowMutate(boolean flag)
Sets or resets allow mutate flag.
AutomatontoAutomaton()
Constructs new Automaton from this RegExp.
AutomatontoAutomaton(AutomatonProvider automaton_provider)
Constructs new Automaton from this RegExp.
AutomatontoAutomaton(Map<String,Automaton> automata)
Constructs new Automaton from this RegExp.

Field Detail

ALL

public static final int ALL
Syntax flag, enables all optional regexp syntax.

ANYSTRING

public static final int ANYSTRING
Syntax flag, enables anystring (@).

AUTOMATON

public static final int AUTOMATON
Syntax flag, enables named automata (<identifier>).

COMPLEMENT

public static final int COMPLEMENT
Syntax flag, enables complement (~).

EMPTY

public static final int EMPTY
Syntax flag, enables empty language (#).

INTERSECTION

public static final int INTERSECTION
Syntax flag, enables intersection (&).

INTERVAL

public static final int INTERVAL
Syntax flag, enables numerical intervals (<n-m>).

NONE

public static final int NONE
Syntax flag, enables no optional regexp syntax.

Constructor Detail

RegExp

public RegExp(String s)
Constructs new RegExp from a string. Same as RegExp(s, ALL).

Parameters: s regexp string

Throws: IllegalArgumentException if an error occured while parsing the regular expression

RegExp

public RegExp(String s, int syntax_flags)
Constructs new RegExp from a string.

Parameters: s regexp string syntax_flags boolean 'or' of optional syntax constructs to be enabled

Throws: IllegalArgumentException if an error occured while parsing the regular expression

Method Detail

getIdentifiers

public Set<String> getIdentifiers()
Returns set of automaton identifiers that occur in this regular expression.

setAllowMutate

public boolean setAllowMutate(boolean flag)
Sets or resets allow mutate flag. If this flag is set, then automata construction uses mutable automata, which is slightly faster but not thread safe. By default, the flag is not set.

Parameters: flag if true, the flag is set

Returns: previous value of the flag

toAutomaton

public Automaton toAutomaton()
Constructs new Automaton from this RegExp. Same as toAutomaton(null) (empty automaton map).

toAutomaton

public Automaton toAutomaton(AutomatonProvider automaton_provider)
Constructs new Automaton from this RegExp. The constructed automaton is minimal and deterministic and has no transitions to dead states.

Parameters: automaton_provider provider of automata for named identifiers

Throws: IllegalArgumentException if this regular expression uses a named identifier that is not available from the automaton provider

toAutomaton

public Automaton toAutomaton(Map<String,Automaton> automata)
Constructs new Automaton from this RegExp. The constructed automaton is minimal and deterministic and has no transitions to dead states.

Parameters: automata a map from automaton identifiers to automata (of type Automaton).

Throws: IllegalArgumentException if this regular expression uses a named identifier that does not occur in the automaton map

Copyright © 2001-2008 Anders Møller.