org.htmlparser.parserapplications

Class StringExtractor

public class StringExtractor extends Object

Extract plaintext strings from a web page. Illustrative program to gather the textual contents of a web page. Uses a {@link org.htmlparser.beans.StringBean StringBean} to accumulate the user visible text (what a browser would display) into a single string.
Constructor Summary
StringExtractor(String resource)
Construct a StringExtractor to read from the given resource.
Method Summary
StringextractStrings(boolean links)
Extract the text from a page.
static voidmain(String[] args)
Mainline.

Constructor Detail

StringExtractor

public StringExtractor(String resource)
Construct a StringExtractor to read from the given resource.

Parameters: resource Either a URL or a file name.

Method Detail

extractStrings

public String extractStrings(boolean links)
Extract the text from a page.

Parameters: links if true include hyperlinks in output.

Returns: The textual contents of the page.

Throws: ParserException If a parse error occurs.

main

public static void main(String[] args)
Mainline.

Parameters: args The command line arguments.

HTML Parser is an open source library released under LGPL. SourceForge.net