org.htmlparser

Interface NodeFactory

public interface NodeFactory

This interface defines the methods needed to create new nodes.

The factory is used when lexing to generate the nodes passed back to the caller. By implementing this interface, and setting that concrete object as the node factory for the {@link org.htmlparser.lexer.Lexer#setNodeFactory lexer} (perhaps via the {@link Parser#setNodeFactory parser}), the way that nodes are generated can be customized.

In general, replacing the factory with a custom factory is not required because of the flexibility of the {@link PrototypicalNodeFactory}.

Creation of Text and Remark nodes is straight forward, because essentially they are just sequences of characters extracted from the page. Creation of a Tag node requires that the attributes from the tag be remembered as well.

See Also: PrototypicalNodeFactory

Method Summary
RemarkcreateRemarkNode(Page page, int start, int end)
Create a new remark node.
TextcreateStringNode(Page page, int start, int end)
Create a new text node.
TagcreateTagNode(Page page, int start, int end, Vector attributes)
Create a new tag node.

Method Detail

createRemarkNode

public Remark createRemarkNode(Page page, int start, int end)
Create a new remark node.

Parameters: page The page the node is on. start The beginning position of the remark. end The ending positiong of the remark.

Returns: A remark node comprising the indicated characters from the page.

Throws: ParserException If there is a problem encountered when creating the node.

createStringNode

public Text createStringNode(Page page, int start, int end)
Create a new text node.

Parameters: page The page the node is on. start The beginning position of the string. end The ending positiong of the string.

Returns: A text node comprising the indicated characters from the page.

Throws: ParserException If there is a problem encountered when creating the node.

createTagNode

public Tag createTagNode(Page page, int start, int end, Vector attributes)
Create a new tag node. Note that the attributes vector contains at least one element, which is the tag name (standalone attribute) at position zero. This can be used to decide which type of node to create, or gate other processing that may be appropriate.

Parameters: page The page the node is on. start The beginning position of the tag. end The ending positiong of the tag. attributes The attributes contained in this tag.

Returns: A tag node comprising the indicated characters from the page.

Throws: ParserException If there is a problem encountered when creating the node.

HTML Parser is an open source library released under LGPL. SourceForge.net