org.htmlparser.tags
public class CompositeTag extends TagNode
Field Summary | |
---|---|
protected static CompositeTagScanner | mDefaultCompositeScanner
The default scanner for non-composite tags. |
protected Tag | mEndTag
The tag that causes this tag to finish.
|
Constructor Summary | |
---|---|
CompositeTag()
Create a composite tag. |
Method Summary | |
---|---|
void | accept(NodeVisitor visitor)
Tag visiting code.
|
Node | childAt(int index)
Get child at given index |
SimpleNodeIterator | children()
Get an iterator over the children of this node. |
void | collectInto(NodeList list, NodeFilter filter)
Collect this node and its child nodes (if-applicable) into the list parameter,
provided the node satisfies the filtering criteria.
|
Text[] | digupStringNode(String searchText)
Finds a text node, however embedded it might be, and returns
it. |
SimpleNodeIterator | elements()
Return the child tags as an iterator.
|
int | findPositionOf(String text)
Returns the node number of the first node containing the given text.
|
int | findPositionOf(String text, Locale locale)
Returns the node number of the first node containing the given text.
|
int | findPositionOf(Node searchNode)
Returns the node number of a child node given the node object.
|
Node | getChild(int index)
Get the child of this node at the given position. |
int | getChildCount()
Return the number of child nodes in this tag. |
Node[] | getChildrenAsNodeArray()
Get the children as an array of Node objects. |
String | getChildrenHTML()
Return the HTML code for the children of this tag. |
Tag | getEndTag()
Get the end tag for this tag.
|
String | getStringText()
Return the text between the start tag and the end tag. |
String | getText()
Return the text contained in this tag. |
protected void | putChildrenInto(StringBuffer sb, boolean verbatim)
Add the textual contents of the children of this node to the buffer. |
protected void | putEndTagInto(StringBuffer sb, boolean verbatim)
Add the textual contents of the end tag of this node to the buffer. |
void | removeChild(int i)
Remove the child at the position given. |
Tag | searchByName(String name)
Searches all children who for a name attribute. |
NodeList | searchFor(String searchString)
Searches for all nodes whose text representation contains the search string.
|
NodeList | searchFor(String searchString, boolean caseSensitive)
Searches for all nodes whose text representation contains the search string.
|
NodeList | searchFor(String searchString, boolean caseSensitive, Locale locale)
Searches for all nodes whose text representation contains the search string.
|
NodeList | searchFor(Class classType, boolean recursive)
Collect all objects that are of a certain type
Note that this will not check for parent types, and will not
recurse through child tags |
void | setEndTag(Tag tag)
Set the end tag for this tag. |
String | toHtml(boolean verbatim)
Return this tag as HTML code. |
String | toPlainTextString()
Return the textual contents of this tag and it's children. |
String | toString()
Return a string representation of the contents of this tag, it's children and it's end tag suitable for debugging. |
void | toString(int level, StringBuffer buffer)
Return a string representation of the contents of this tag, it's children and it's end tag suitable for debugging. |
accept()
on the start tag and then
walks the child list invoking accept()
on each
of the children, finishing up with an accept()
call on the end tag. If shouldRecurseSelf()
returns true it then asks the visitor to visit itself.Parameters: visitor The NodeVisitor
object to be signalled
for each child and possibly this tag.
Parameters: index The index into the child node list.
Returns: Node The child node at the given index or null if none.
Returns: Am iterator over the children of this node.
This mechanism allows powerful filtering code to be written very easily, without bothering about collection of embedded tags separately. e.g. when we try to get all the links on a page, it is not possible to get it at the top-level, as many tags (like form tags), can contain links embedded in them. We could get the links out by checking if the current node is a {@link CompositeTag}, and going through its children. So this method provides a convenient way to do this.
Using collectInto(), programs get a lot shorter. Now, the code to extract all links from a page would look like:
NodeList list = new NodeList(); NodeFilter filter = new TagNameFilter ("A"); for (NodeIterator e = parser.elements(); e.hasMoreNodes();) e.nextNode().collectInto(list, filter);Thus,
list
will hold all the link nodes, irrespective of how
deep the links are embedded.
Another way to accomplish the same objective is:
NodeList list = new NodeList(); NodeFilter filter = new TagClassFilter (LinkTag.class); for (NodeIterator e = parser.elements(); e.hasMoreNodes();) e.nextNode().collectInto(list, filter);This is slightly less specific because the LinkTag class may be registered for more than one node name, e.g. <LINK> tags too.
Parameters: list The list to add nodes to. filter The filter to apply.
See Also: org.htmlparser.filters
Parameters: searchText The text to search for.
Returns: The list of text nodes (recursively) found.
Returns: An iterator over the children.
Parameters: text The text to search for.
Returns: int The node index in the children list of the node containing the text or -1 if not found.
See Also: CompositeTag
Parameters: locale The locale to use in converting to uppercase. text The text to search for.
Returns: int The node index in the children list of the node containing the text or -1 if not found.
Parameters: searchNode The child node to find.
Returns: The offset of the child tag or -1 if it was not found.
Parameters: index The in the node list of the child.
Returns: The child at that index.
Returns: The child node count.
Node
objects.Returns: The children in an array.
Returns: A string with the HTML code for the contents of this tag.
Returns: The contents of the CompositeTag.
Returns: The complete contents of the tag (within the angle brackets).
Parameters: verbatim If true
return as close to the original
page text as possible. sb The buffer to append to.
Parameters: verbatim If true
return as close to the original
page text as possible. sb The buffer to append to.
Parameters: i The index of the child to remove.
Parameters: name Attribute to match in tag
Returns: Tag Tag matching the name attribute
NodeList nodeList = formTag.searchFor("Hello World");
Parameters: searchString Search criterion.
Returns: A collection of nodes whose string contents or
representation have the searchString
in them.
NodeList nodeList = formTag.searchFor("Hello World");
Parameters: searchString Search criterion. caseSensitive If true
this search should be case
sensitive. Otherwise, the search string and the node text are converted
to uppercase using an English locale.
Returns: A collection of nodes whose string contents or
representation have the searchString
in them.
NodeList nodeList = formTag.searchFor("Hello World");
Parameters: searchString Search criterion. caseSensitive If true
this search should be case
sensitive. Otherwise, the search string and the node text are converted
to uppercase using the locale provided. locale The locale for uppercase conversion.
Returns: A collection of nodes whose string contents or
representation have the searchString
in them.
Parameters: classType The class to search for. recursive If true, recursively search through the children.
Returns: A list of children found.
Parameters: tag The new end tag for this tag. Note: no checking is perfromed so you can generate bad HTML by setting the end tag with a name not equal to the name of the start tag, i.e. {@.html
Parameters: verbatim If true
return as close to the original
page text as possible.
Returns: This tag and it's contents (children) and the end tag as HTML code.
Returns: The 'browser' text contents of this tag.
Returns: A textual representation of the tag.
Parameters: level The indentation level to use. buffer The buffer to append to.
HTML Parser is an open source library released under LGPL. | |