The basic API classes which will be used by most developers when working with the HTML Parser.
See: Description
Interface Summary | |
---|---|
Node | Specifies the minimum requirements for nodes returned by the Lexer or Parser. |
NodeFactory | This interface defines the methods needed to create new nodes. |
NodeFilter | Implement this interface to select particular nodes. |
Remark | This interface represents a comment in the HTML document. |
Tag | This interface represents a tag (<xxx yyy="zzz">) in the HTML document. |
Text | This interface represents a piece of the content of the HTML document. |
Class Summary | |
---|---|
Attribute | An attribute within a tag. |
Parser | The main parser class. |
PrototypicalNodeFactory | A node factory based on the prototype pattern. |
The {@link org.htmlparser.Parser} class is the main high level class that provides simplified access to the contents of an HTML page. A wide range of methods is available to customize the operation of the Parser, as well as access specific pieces of the page as {@link org.htmlparser.Node Nodes}.
The {@link org.htmlparser.NodeFactory} interface specifies the requirements for a developer to have the Parser or Lexer generate nodes. Three types of nodes are required: {@link org.htmlparser.Text}, {@link org.htmlparser.Remark} and {@link org.htmlparser.Tag Tags}. Tags contain lists of child nodes and {@link org.htmlparser.Attribute attributes}.
The only provided implementation of the NodeFactory interface is the {@link org.htmlparser.PrototypicalNodeFactory} which operates by holding example nodes and cloning them as needed to satisfy the requests for nodes by the Parser. By default, a Lexer is it's own NodeFactory, returning new {@link org.htmlparser.nodes.TextNode}, {@link org.htmlparser.nodes.RemarkNode} and undifferentiated {@link org.htmlparser.nodes.TagNode Tagnodes} (see the {@link org.htmlparser.nodes nodes} package), but when the parser uses a lexer it replaces this behaviour with a PrototypicalNodeFactory to return a rich set of specific tags (see the {@link org.htmlparser.tags tags} package).
The {@link org.htmlparser.NodeFilter} interface is used by the filtering code to determine if a node meets a certain criteria. Some generic examples of filters can be found in the {@link org.htmlparser.filters filters} package.
HTML Parser is an open source library released under LGPL. | |