org.htmlparser.nodes

Class TagNode

public class TagNode extends AbstractNode implements Tag

TagNode represents a generic tag. If no scanner is registered for a given tag name, this is what you get. This is also the base class for all tags created by the parser.
Field Summary
protected static HashtablebreakTags
Set of tags that breaks the flow.
protected VectormAttributes
The tag attributes.
protected static ScannermDefaultScanner
The default scanner for non-composite tags.
Constructor Summary
TagNode()
Create an empty tag.
TagNode(Page page, int start, int end, Vector attributes)
Create a tag with the location and attributes provided
TagNode(TagNode tag, TagScanner scanner)
Create a tag like the one provided.
Method Summary
voidaccept(NodeVisitor visitor)
Default tag visiting code.
booleanbreaksFlow()
Determines if the given tag breaks the flow of text.
StringgetAttribute(String name)
Returns the value of an attribute.
AttributegetAttributeEx(String name)
Returns the attribute with the given name.
VectorgetAttributesEx()
Gets the attributes in the tag.
String[]getEnders()
Return the set of tag names that cause this tag to finish.
intgetEndingLineNumber()
Get the line number where this tag ends.
TaggetEndTag()
Get the end tag for this (composite) tag.
String[]getEndTagEnders()
Return the set of end tag names that cause this tag to finish.
String[]getIds()
Return the set of names handled by this tag.
StringgetRawTagName()
Return the name of this tag.
intgetStartingLineNumber()
Get the line number where this tag starts.
intgetTagBegin()
Gets the nodeBegin.
intgetTagEnd()
Gets the nodeEnd.
StringgetTagName()
Return the name of this tag.
StringgetText()
Return the text contained in this tag.
ScannergetThisScanner()
Return the scanner associated with this tag.
booleanisEmptyXmlTag()
Is this an empty xml tag of the form <tag/>.
booleanisEndTag()
Predicate to determine if this tag is an end tag (i.e.
voidremoveAttribute(String key)
Remove the attribute with the given key, if it exists.
voidsetAttribute(String key, String value)
Set attribute with given key, value pair.
voidsetAttribute(String key, String value, char quote)
Set attribute with given key, value pair where the value is quoted by quote.
voidsetAttribute(Attribute attribute)
Set an attribute.
voidsetAttributeEx(Attribute attribute)
Set an attribute.
voidsetAttributesEx(Vector attribs)
Sets the attributes.
voidsetEmptyXmlTag(boolean emptyXmlTag)
Set this tag to be an empty xml node, or not.
voidsetEndTag(Tag end)
Set the end tag for this (composite) tag.
voidsetTagBegin(int tagBegin)
Sets the nodeBegin.
voidsetTagEnd(int tagEnd)
Sets the nodeEnd.
voidsetTagName(String name)
Set the name of this tag.
voidsetText(String text)
Parses the given text to create the tag contents.
voidsetThisScanner(Scanner scanner)
Set the scanner associated with this tag.
StringtoHtml(boolean verbatim)
Render the tag as HTML.
StringtoPlainTextString()
Get the plain text from this node.
StringtoString()
Print the contents of the tag.

Field Detail

breakTags

protected static Hashtable breakTags
Set of tags that breaks the flow.

mAttributes

protected Vector mAttributes
The tag attributes. Objects of type {@link Attribute}. The first element is the tag name, subsequent elements being either whitespace or real attributes.

mDefaultScanner

protected static final Scanner mDefaultScanner
The default scanner for non-composite tags.

Constructor Detail

TagNode

public TagNode()
Create an empty tag.

TagNode

public TagNode(Page page, int start, int end, Vector attributes)
Create a tag with the location and attributes provided

Parameters: page The page this tag was read from. start The starting offset of this node within the page. end The ending offset of this node within the page. attributes The list of attributes that were parsed in this tag.

See Also: Attribute

TagNode

public TagNode(TagNode tag, TagScanner scanner)
Create a tag like the one provided.

Parameters: tag The tag to emulate. scanner The scanner for this tag.

Method Detail

accept

public void accept(NodeVisitor visitor)
Default tag visiting code. Based on isEndTag(), calls either visitTag() or visitEndTag().

Parameters: visitor The visitor that is visiting this node.

breaksFlow

public boolean breaksFlow()
Determines if the given tag breaks the flow of text.

Returns: true if following text would start on a new line, false otherwise.

getAttribute

public String getAttribute(String name)
Returns the value of an attribute.

Parameters: name Name of attribute, case insensitive.

Returns: The value associated with the attribute or null if it does not exist, or is a stand-alone or

getAttributeEx

public Attribute getAttributeEx(String name)
Returns the attribute with the given name.

Parameters: name Name of attribute, case insensitive.

Returns: The attribute or null if it does not exist.

getAttributesEx

public Vector getAttributesEx()
Gets the attributes in the tag.

Returns: Returns the list of {@link Attribute Attributes} in the tag. The first element is the tag name, subsequent elements being either whitespace or real attributes.

getEnders

public String[] getEnders()
Return the set of tag names that cause this tag to finish. These are the normal (non end tags) that if encountered while scanning (a composite tag) will cause the generation of a virtual tag. Since this a a non-composite tag, the default is no enders.

Returns: The names of following tags that stop further scanning.

getEndingLineNumber

public int getEndingLineNumber()
Get the line number where this tag ends.

Returns: The (zero based) line number in the page where this tag ends.

getEndTag

public Tag getEndTag()
Get the end tag for this (composite) tag. For a non-composite tag this always returns null.

Returns: The tag that terminates this composite tag, i.e. </HTML>.

getEndTagEnders

public String[] getEndTagEnders()
Return the set of end tag names that cause this tag to finish. These are the end tags that if encountered while scanning (a composite tag) will cause the generation of a virtual tag. Since this a a non-composite tag, it has no end tag enders.

Returns: The names of following end tags that stop further scanning.

getIds

public String[] getIds()
Return the set of names handled by this tag. Since this a a generic tag, it has no ids.

Returns: The names to be matched that create tags of this type.

getRawTagName

public String getRawTagName()
Return the name of this tag.

Returns: The tag name or null if this tag contains nothing or only whitespace.

getStartingLineNumber

public int getStartingLineNumber()
Get the line number where this tag starts.

Returns: The (zero based) line number in the page where this tag starts.

getTagBegin

public int getTagBegin()
Gets the nodeBegin.

Returns: The nodeBegin value.

getTagEnd

public int getTagEnd()
Gets the nodeEnd.

Returns: The nodeEnd value.

getTagName

public String getTagName()
Return the name of this tag.

Note: This value is converted to uppercase and does not begin with "/" if it is an end tag. Nor does it end with a slash in the case of an XML type tag. To get at the original text of the tag name use {@link #getRawTagName getRawTagName()}. The conversion to uppercase is performed with an ENGLISH locale.

Returns: The tag name.

getText

public String getText()
Return the text contained in this tag.

Returns: The complete contents of the tag (within the angle brackets).

getThisScanner

public Scanner getThisScanner()
Return the scanner associated with this tag.

Returns: The scanner associated with this tag.

isEmptyXmlTag

public boolean isEmptyXmlTag()
Is this an empty xml tag of the form <tag/>.

Returns: true if the last character of the last attribute is a '/'.

isEndTag

public boolean isEndTag()
Predicate to determine if this tag is an end tag (i.e. </HTML>).

Returns: true if this tag is an end tag.

removeAttribute

public void removeAttribute(String key)
Remove the attribute with the given key, if it exists.

Parameters: key The name of the attribute.

setAttribute

public void setAttribute(String key, String value)
Set attribute with given key, value pair. Figures out a quote character to use if necessary.

Parameters: key The name of the attribute. value The value of the attribute.

setAttribute

public void setAttribute(String key, String value, char quote)
Set attribute with given key, value pair where the value is quoted by quote.

Parameters: key The name of the attribute. value The value of the attribute. quote The quote character to be used around value. If zero, it is an unquoted value.

setAttribute

public void setAttribute(Attribute attribute)
Set an attribute. This replaces an attribute of the same name. To set the zeroth attribute (the tag name), use setTagName().

Parameters: attribute The attribute to set.

setAttributeEx

public void setAttributeEx(Attribute attribute)
Set an attribute.

Parameters: attribute The attribute to set.

See Also: setAttribute

setAttributesEx

public void setAttributesEx(Vector attribs)
Sets the attributes. NOTE: Values of the extended hashtable are two element arrays of String, with the first element being the original name (not uppercased), and the second element being the value.

Parameters: attribs The attribute collection to set.

setEmptyXmlTag

public void setEmptyXmlTag(boolean emptyXmlTag)
Set this tag to be an empty xml node, or not. Adds or removes an ending slash on the tag.

Parameters: emptyXmlTag If true, ensures there is an ending slash in the node, i.e. <tag/>, otherwise removes it.

setEndTag

public void setEndTag(Tag end)
Set the end tag for this (composite) tag. For a non-composite tag this is a no-op.

Parameters: end The tag that terminates this composite tag, i.e. </HTML>.

setTagBegin

public void setTagBegin(int tagBegin)
Sets the nodeBegin.

Parameters: tagBegin The nodeBegin to set

setTagEnd

public void setTagEnd(int tagEnd)
Sets the nodeEnd.

Parameters: tagEnd The nodeEnd to set

setTagName

public void setTagName(String name)
Set the name of this tag. This creates or replaces the first attribute of the tag (the zeroth element of the attribute vector).

Parameters: name The tag name.

setText

public void setText(String text)
Parses the given text to create the tag contents.

Parameters: text A string of the form <TAGNAME xx="yy">.

setThisScanner

public void setThisScanner(Scanner scanner)
Set the scanner associated with this tag.

Parameters: scanner The scanner for this tag.

toHtml

public String toHtml(boolean verbatim)
Render the tag as HTML. A call to a tag's toHtml() method will render it in HTML.

Parameters: verbatim If true return as close to the original page text as possible.

Returns: The tag as an HTML fragment.

See Also: toHtml

toPlainTextString

public String toPlainTextString()
Get the plain text from this node.

Returns: An empty string (tag contents do not display in a browser). If you want this tags HTML equivalent, use {@link #toHtml toHtml()}.

toString

public String toString()
Print the contents of the tag.

Returns: An string describing the tag. For text that looks like HTML use #toHtml().

HTML Parser is an open source library released under LGPL. SourceForge.net