org.htmlparser.http

Class ConnectionManager

public class ConnectionManager extends Object

Handles proxies, password protected URLs and request properties including cookies.
Field Summary
protected HashtablemCookieJar
Cookie storage, a hashtable (by site or host) of vectors of Cookies.
protected static HashtablemDefaultRequestProperties
Default Request header fields.
protected static SimpleDateFormatmFormat
Cookie expiry date format for parsing.
protected ConnectionMonitormMonitor
The object to be notified prior to and after each connection.
protected StringmPassword
The user password for accessing the URL.
protected StringmProxyHost
The proxy server name.
protected StringmProxyPassword
The proxy user password.
protected intmProxyPort
The proxy port number.
protected StringmProxyUser
The proxy username name.
protected booleanmRedirectionProcessingEnabled
Flag determining if redirection processing is being handled manually.
protected HashtablemRequestProperties
Request header fields.
protected StringmUser
The username name for accessing the URL.
Constructor Summary
ConnectionManager()
Create a connection manager.
ConnectionManager(Hashtable properties)
Create a connection manager with the given connection properties.
Method Summary
voidaddCookies(URLConnection connection)
Generate a HTTP cookie header value string from the cookie jar.
protected VectoraddCookies(Vector cookies, String path, Vector list)
Add qualified cookies from cookies into list.
static Stringencode(byte[] array)
Encodes a byte array into BASE64 in accordance with RFC 2045.
StringfixSpaces(String url)
Turn spaces into %20.
protected StringgenerateCookieProperty(Vector cookies)
Creates the cookie request property value from the list of valid cookies for the domain.
booleangetCookieProcessingEnabled()
Predicate to determine if cookie processing is currently enabled.
static HashtablegetDefaultRequestProperties()
Get the current default request header properties.
protected StringgetDomain(String host)
Get the domain from a host.
protected StringgetLocation(HttpURLConnection http)
Get the Location field if any.
ConnectionMonitorgetMonitor()
Get the monitoring object, if any.
StringgetPassword()
Get the URL users's password.
StringgetProxyHost()
Get the proxy host name, if any.
StringgetProxyPassword()
Set the proxy user's password.
intgetProxyPort()
Get the proxy port number.
StringgetProxyUser()
Get the user name for proxy authorization, if any.
booleangetRedirectionProcessingEnabled()
Predicate to determine if url redirection processing is currently enabled.
HashtablegetRequestProperties()
Get the current request header properties.
StringgetUser()
Get the user name to access the URL.
URLConnectionopenConnection(URL url)
Opens a connection using the given url.
URLConnectionopenConnection(String string)
Opens a connection based on a given string.
voidparseCookies(URLConnection connection)
Check for cookie and parse into cookie jar.
protected voidsaveCookies(Vector list, URLConnection connection)
Save the cookies received in the response header.
voidsetCookie(Cookie cookie, String domain)
Adds a cookie to the cookie jar.
voidsetCookieProcessingEnabled(boolean enable)
Enables and disabled cookie processing.
static voidsetDefaultRequestProperties(Hashtable properties)
Set the default request header properties.
voidsetMonitor(ConnectionMonitor monitor)
Set the monitoring object.
voidsetPassword(String password)
Set the URL users's password.
voidsetProxyHost(String host)
Set the proxy host to use.
voidsetProxyPassword(String password)
Get the proxy user's password.
voidsetProxyPort(int port)
Set the proxy port number.
voidsetProxyUser(String user)
Set the user name for proxy authorization.
voidsetRedirectionProcessingEnabled(boolean enabled)
Enables or disables manual redirection handling.
voidsetRequestProperties(Hashtable properties)
Set the current request properties.
voidsetUser(String user)
Set the user name to access the URL.

Field Detail

mCookieJar

protected Hashtable mCookieJar
Cookie storage, a hashtable (by site or host) of vectors of Cookies. This will be null if cookie processing is disabled (default).

mDefaultRequestProperties

protected static Hashtable mDefaultRequestProperties
Default Request header fields. So far this is just "User-Agent" and "Accept-Encoding".

mFormat

protected static SimpleDateFormat mFormat
Cookie expiry date format for parsing.

mMonitor

protected ConnectionMonitor mMonitor
The object to be notified prior to and after each connection.

mPassword

protected String mPassword
The user password for accessing the URL.

mProxyHost

protected String mProxyHost
The proxy server name.

mProxyPassword

protected String mProxyPassword
The proxy user password.

mProxyPort

protected int mProxyPort
The proxy port number.

mProxyUser

protected String mProxyUser
The proxy username name.

mRedirectionProcessingEnabled

protected boolean mRedirectionProcessingEnabled
Flag determining if redirection processing is being handled manually.

mRequestProperties

protected Hashtable mRequestProperties
Request header fields.

mUser

protected String mUser
The username name for accessing the URL.

Constructor Detail

ConnectionManager

public ConnectionManager()
Create a connection manager.

ConnectionManager

public ConnectionManager(Hashtable properties)
Create a connection manager with the given connection properties.

Parameters: properties Name/value pairs to be added to the HTTP request.

Method Detail

addCookies

public void addCookies(URLConnection connection)
Generate a HTTP cookie header value string from the cookie jar.
   The syntax for the header is:

    cookie          =       "Cookie:" cookie-version
                            1*((";" | ",") cookie-value)
    cookie-value    =       NAME "=" VALUE [";" path] [";" domain]
    cookie-version  =       "$Version" "=" value
    NAME            =       attr
    VALUE           =       value
    path            =       "$Path" "=" value
    domain          =       "$Domain" "=" value

 

Parameters: connection The connection being accessed.

See Also: RFC 2109 RFC 2396

addCookies

protected Vector addCookies(Vector cookies, String path, Vector list)
Add qualified cookies from cookies into list.

Parameters: cookies The list of cookies to check (may be null). path The path being accessed. list The list of qualified cookies.

Returns: The list of qualified cookies.

encode

public static final String encode(byte[] array)
Encodes a byte array into BASE64 in accordance with RFC 2045.

Parameters: array The bytes to convert.

Returns: A BASE64 encoded string.

fixSpaces

public String fixSpaces(String url)
Turn spaces into %20. ToDo: make this more generic (see RFE #1010593 provide URL encoding/decoding utilities).

Parameters: url The url containing spaces.

Returns: The URL with spaces as %20 sequences.

generateCookieProperty

protected String generateCookieProperty(Vector cookies)
Creates the cookie request property value from the list of valid cookies for the domain.

Parameters: cookies The list of valid cookies to be encoded in the request.

Returns: A string suitable for inclusion as the value of the "Cookie:" request property.

getCookieProcessingEnabled

public boolean getCookieProcessingEnabled()
Predicate to determine if cookie processing is currently enabled.

Returns: true if cookies are being processed.

getDefaultRequestProperties

public static Hashtable getDefaultRequestProperties()
Get the current default request header properties. A String-to-String map of header keys and values. These fields are set by the parser when creating a connection.

Returns: The default set of request header properties that will currently be used.

See Also: mDefaultRequestProperties ConnectionManager

getDomain

protected String getDomain(String host)
Get the domain from a host.

Parameters: host The supposed host name.

Returns: The domain (with the leading dot), or null if the domain cannot be determined.

getLocation

protected String getLocation(HttpURLConnection http)
Get the Location field if any.

Parameters: http The connection to get the location from.

getMonitor

public ConnectionMonitor getMonitor()
Get the monitoring object, if any.

Returns: Returns the monitor, or null if none has been assigned.

getPassword

public String getPassword()
Get the URL users's password.

Returns: Returns the URL password.

getProxyHost

public String getProxyHost()
Get the proxy host name, if any.

Returns: Returns the proxy host.

getProxyPassword

public String getProxyPassword()
Set the proxy user's password.

Returns: Returns the proxy password.

getProxyPort

public int getProxyPort()
Get the proxy port number.

Returns: Returns the proxy port.

getProxyUser

public String getProxyUser()
Get the user name for proxy authorization, if any.

Returns: Returns the proxy user, or null if no proxy authorization is required.

getRedirectionProcessingEnabled

public boolean getRedirectionProcessingEnabled()
Predicate to determine if url redirection processing is currently enabled.

Returns: true if redirection is being processed manually.

See Also: ConnectionManager

getRequestProperties

public Hashtable getRequestProperties()
Get the current request header properties. A String-to-String map of header keys and values, excluding proxy items, cookies and URL authorization.

Returns: The request header properties for this connection manager.

getUser

public String getUser()
Get the user name to access the URL.

Returns: Returns the username that will be used to access the URL, or null if no authorization is required.

openConnection

public URLConnection openConnection(URL url)
Opens a connection using the given url.

Parameters: url The url to open.

Returns: The connection.

Throws: ParserException if an i/o exception occurs accessing the url.

openConnection

public URLConnection openConnection(String string)
Opens a connection based on a given string. The string is either a file, in which case file://localhost is prepended to a canonical path derived from the string, or a url that begins with one of the known protocol strings, i.e. http://. Embedded spaces are silently converted to %20 sequences.

Parameters: string The name of a file or a url.

Returns: The connection.

Throws: ParserException if the string is not a valid url or file.

parseCookies

public void parseCookies(URLConnection connection)
Check for cookie and parse into cookie jar.

Parameters: connection The connection to extract cookie information from.

saveCookies

protected void saveCookies(Vector list, URLConnection connection)
Save the cookies received in the response header.

Parameters: list The list of cookies extracted from the response header. connection The connection (used when a cookie has no domain).

setCookie

public void setCookie(Cookie cookie, String domain)
Adds a cookie to the cookie jar.

Parameters: cookie The cookie to add. domain The domain to use in case the cookie has no domain attribute.

setCookieProcessingEnabled

public void setCookieProcessingEnabled(boolean enable)
Enables and disabled cookie processing.

Parameters: enable if true cookie processing will occur, else cookie processing will be turned off.

setDefaultRequestProperties

public static void setDefaultRequestProperties(Hashtable properties)
Set the default request header properties. A String-to-String map of header keys and values. These fields are set by the parser when creating a connection. Some of these can be set directly on a URLConnection, i.e. If-Modified-Since is set with setIfModifiedSince(long), but since the parser transparently opens the connection on behalf of the developer, these properties are not available before the connection is fetched. Setting these request header fields affects all subsequent connections opened by the parser. For more direct control create a URLConnection massage it the way you want and then set it on the parser.

From RFC 2616 Hypertext Transfer Protocol -- HTTP/1.1:

 5.3 Request Header Fields

    The request-header fields allow the client to pass additional
    information about the request, and about the client itself, to the
    server. These fields act as request modifiers, with semantics
    equivalent to the parameters on a programming language method
    invocation.

        request-header = Accept                   ; Section 14.1
                       | Accept-Charset           ; Section 14.2
                       | Accept-Encoding          ; Section 14.3
                       | Accept-Language          ; Section 14.4
                       | Authorization            ; Section 14.8
                       | Expect                   ; Section 14.20
                       | From                     ; Section 14.22
                       | Host                     ; Section 14.23
                       | If-Match                 ; Section 14.24
                       | If-Modified-Since        ; Section 14.25
                       | If-None-Match            ; Section 14.26
                       | If-Range                 ; Section 14.27
                       | If-Unmodified-Since      ; Section 14.28
                       | Max-Forwards             ; Section 14.31
                       | Proxy-Authorization      ; Section 14.34
                       | Range                    ; Section 14.35
                       | Referer                  ; Section 14.36
                       | TE                       ; Section 14.39
                       | User-Agent               ; Section 14.43

    Request-header field names can be extended reliably only in
    combination with a change in the protocol version. However, new or
    experimental header fields MAY be given the semantics of request-
    header fields if all parties in the communication recognize them to
    be request-header fields. Unrecognized header fields are treated as
    entity-header fields.
 

Parameters: properties The new set of default request header properties to use. This affects all subsequently created connections.

See Also: mDefaultRequestProperties ConnectionManager

setMonitor

public void setMonitor(ConnectionMonitor monitor)
Set the monitoring object.

Parameters: monitor The monitor to set.

setPassword

public void setPassword(String password)
Set the URL users's password.

Parameters: password The password for the URL.

setProxyHost

public void setProxyHost(String host)
Set the proxy host to use.

Parameters: host The host to use for proxy access. Note: You must also set the proxy {@link #setProxyPort port}.

setProxyPassword

public void setProxyPassword(String password)
Get the proxy user's password.

Parameters: password The password for the proxy user. Note: You must also set the proxy {@link #setProxyUser user}.

setProxyPort

public void setProxyPort(int port)
Set the proxy port number.

Parameters: port The proxy port. Note: You must also set the proxy {@link #setProxyHost host}.

setProxyUser

public void setProxyUser(String user)
Set the user name for proxy authorization.

Parameters: user The proxy user name. Note: You must also set the proxy {@link #setProxyPassword password}.

setRedirectionProcessingEnabled

public void setRedirectionProcessingEnabled(boolean enabled)
Enables or disables manual redirection handling. Normally the HttpURLConnection follows redirections (HTTP response code 3xx) automatically if the followRedirects property is true. With this flag set the ConnectionMonitor performs the redirection processing; The advantage being that cookies (if enabled) are passed in subsequent requests.

Parameters: enabled The new state of the redirectionProcessingEnabled property.

setRequestProperties

public void setRequestProperties(Hashtable properties)
Set the current request properties. Replaces the current set of fixed request properties with the given set. This does not replace the Proxy-Authorization property which is constructed from the values of {@link #setProxyUser} and {@link #setProxyPassword} values or the Authorization property which is constructed from the {@link #setUser} and {@link #setPassword} values. Nor does it replace the Cookie property which is constructed from the current cookie jar.

Parameters: properties The new fixed properties.

setUser

public void setUser(String user)
Set the user name to access the URL.

Parameters: user The user name for accessing the URL. Note: You must also set the {@link #setPassword password}.

HTML Parser is an open source library released under LGPL. SourceForge.net