org.htmlparser.parserapplications

Class WikiCapturer

public class WikiCapturer extends SiteCapturer

Save a wikiwikiweb locally. Illustrative program to save a wiki locally.
Constructor Summary
WikiCapturer()
Create a wikicapturer.
Method Summary
protected booleanisToBeCaptured(String link)
Returns true if the link is one we are interested in.
static voidmain(String[] args)
Mainline to capture a web site locally.

Constructor Detail

WikiCapturer

public WikiCapturer()
Create a wikicapturer.

Method Detail

isToBeCaptured

protected boolean isToBeCaptured(String link)
Returns true if the link is one we are interested in.

Parameters: link The link to be checked.

Returns: true if the link has the source URL as a prefix and doesn't contain '?' or '#'; the former because we won't be able to handle server side queries in the static target directory structure and the latter because presumably the full page with that reference has already been captured previously. This performs a case insensitive comparison, which is cheating really, but it's cheap.

main

public static void main(String[] args)
Mainline to capture a web site locally.

Parameters: args The command line arguments. There are three arguments the web site to capture, the local directory to save it to, and a flag (true or false) to indicate whether resources such as images and video are to be captured as well. These are requested via dialog boxes if not supplied.

Throws: MalformedURLException If the supplied URL is invalid. IOException If an error occurs reading the pages or resources.

HTML Parser is an open source library released under LGPL. SourceForge.net