org.htmlparser.nodes
public abstract class AbstractNode extends Object implements Node, Serializable
Field Summary | |
---|---|
protected NodeList | children
The children of this node. |
protected Page | mPage
The page this node came from. |
protected int | nodeBegin
The beginning position of the tag in the line |
protected int | nodeEnd
The ending position of the tag in the line |
protected Node | parent
The parent of this node. |
Constructor Summary | |
---|---|
AbstractNode(Page page, int start, int end)
Create an abstract node with the page positions given.
|
Method Summary | |
---|---|
abstract void | accept(NodeVisitor visitor)
Visit this node. |
Object | clone()
Clone this object.
|
void | collectInto(NodeList list, NodeFilter filter)
Collect this node and its child nodes (if-applicable) into the collectionList parameter, provided the node
satisfies the filtering criteria. This mechanism allows powerful filtering code to be written very easily, without bothering about collection of embedded tags separately. e.g. when we try to get all the links on a page, it is not possible to get it at the top-level, as many tags (like form tags), can contain links embedded in them. |
void | doSemanticAction()
Perform the meaning of this tag.
|
NodeList | getChildren()
Get the children of this node. |
int | getEndPosition()
Gets the ending position of the node. |
Node | getFirstChild()
Get the first child of this node. |
Node | getLastChild()
Get the last child of this node. |
Node | getNextSibling()
Get the next sibling to this node. |
Page | getPage()
Get the page this node came from. |
Node | getParent()
Get the parent of this node.
|
Node | getPreviousSibling()
Get the previous sibling to this node. |
int | getStartPosition()
Gets the starting position of the node. |
String | getText()
Returns the text of the node. |
void | setChildren(NodeList children)
Set the children of this node. |
void | setEndPosition(int position)
Sets the ending position of the node. |
void | setPage(Page page)
Set the page this node came from. |
void | setParent(Node node)
Sets the parent of this node. |
void | setStartPosition(int position)
Sets the starting position of the node. |
void | setText(String text)
Sets the string contents of the node. |
String | toHtml()
Return the HTML for this node.
|
abstract String | toHtml(boolean verbatim)
Return the HTML for this node.
|
abstract String | toPlainTextString()
Returns a string representation of the node.
|
abstract String | toString()
Return a string representation of the node.
|
Parameters: page The page this tag was read from. start The starting offset of this node within the page. end The ending offset of this node within the page.
Parameters: visitor The visitor that is visiting this node.
Returns: A clone of this object.
Throws: CloneNotSupportedException This shouldn't be thrown since the {@link Node} interface extends Cloneable.
This mechanism allows powerful filtering code to be written very easily, without bothering about collection of embedded tags separately. e.g. when we try to get all the links on a page, it is not possible to get it at the top-level, as many tags (like form tags), can contain links embedded in them. We could get the links out by checking if the current node is a {@link org.htmlparser.tags.CompositeTag}, and going through its children. So this method provides a convenient way to do this.
Using collectInto(), programs get a lot shorter. Now, the code to extract all links from a page would look like:
NodeList collectionList = new NodeList(); NodeFilter filter = new TagNameFilter ("A"); for (NodeIterator e = parser.elements(); e.hasMoreNodes();) e.nextNode().collectInto(collectionList, filter);Thus, collectionList will hold all the link nodes, irrespective of how deep the links are embedded.
Another way to accomplish the same objective is:
NodeList collectionList = new NodeList(); NodeFilter filter = new TagClassFilter (LinkTag.class); for (NodeIterator e = parser.elements(); e.hasMoreNodes();) e.nextNode().collectInto(collectionList, filter);This is slightly less specific because the LinkTag class may be registered for more than one node name, e.g. <LINK> tags too.
Parameters: list The node list to collect acceptable nodes into. filter The filter to determine which nodes are retained.
Throws: ParserException Not used. Provides for subclasses that may want to indicate an exceptional condition.
Returns: The list of children contained by this node, if it's been set, null
otherwise.
Returns: The end position.
Returns: The first child in the list of children contained by this node,
null
otherwise.
Returns: The last child in the list of children contained by this node,
null
otherwise.
Returns: The next sibling to this node if one exists,
null
otherwise.
Returns: The page that supplied this node.
CompositeTag
.Returns: The parent of this node, if it's been set, null
otherwise.
Returns: The previous sibling to this node if one exists,
null
otherwise.
Returns: The start position.
Returns: The text of this node. The default is null
.
Parameters: children The new list of children this node contains.
Parameters: position The new end position.
Parameters: page The page that supplied this node.
Parameters: node The node that contains this node. Must be a CompositeTag
.
Parameters: position The new start position.
Parameters: text The new text for the node.
Returns: The sequence of characters that would cause this node to be returned by the parser or lexer.
Parameters: verbatim If true
return as close to the original
page text as possible.
Returns: The (exact) sequence of characters that would cause this node to be returned by the parser or lexer.
Node node; for (Enumeration e = parser.elements (); e.hasMoreElements (); ) { node = (Node)e.nextElement(); System.out.println (node.toPlainTextString ()); // or do whatever processing you wish with the plain text string }
Returns: The 'browser' content of this node.
System.out.println(node)
Returns: A textual representation of the node suitable for debugging
HTML Parser is an open source library released under LGPL. | |