sunlabs.brazil.util.http

Class HttpRequest

public class HttpRequest extends Object

Sends an HTTP request to some target host and gets the answer back. Similar to the URLConnection class.

Caches connections to hosts, and reuses them if possible. Talks HTTP/1.1 to the hosts, in order to keep alive connections as much as possible.

The sequence of events for using an HttpRequest is similar to how URLConnection is used:

  1. A new HttpRequest object is constructed.
  2. The setup parameters are modified:
  3. The host (or proxy) is contacted and the HTTP request is issued:
  4. The response headers and body are examined:
  5. The connection is closed:

In the common case, all the setup parameters are initialized to sensible values and won't need to be modified. Most users will only need to construct a new HttpRequest object and then call getInputStream to read the contents. The rest of the member variables and methods are only needed for advanced behavior.

The HttpRequest class is intended to be a replacement for the URLConnection class. It operates at a lower level and makes fewer decisions on behavior. Some differences between the HttpRequest class and the URLConnection class follow:

A number of the fields in the HttpRequest object are public, by design. Most of the methods mentioned above are convenience methods; the underlying data fields are meant to be accessed for more complicated operations, such as changing the socket factory or accessing the raw HTTP response line. Note however, that the order of the methods described above is important. For instance, the user cannot examine the response headers (by calling getResponseHeader or by examining the variable responseHeaders) without first having connected to the host.

However, if the user wants to modify the default behavior, the HttpRequest uses the value of a number of variables and automatically sets some HTTP headers when sending the request. The user can change these settings up until the time connect is called, as follows:

variable version
By default, the HttpRequest issues HTTP/1.1 requests. The user can set version to change this to HTTP/1.0.
variable method
If method is null (the default), the HttpRequest decides what the HTTP request method should be as follows: If the user has called getOutputStream, then the method will be "POST", otherwise the method will be "GET".
variable proxyHost
If the proxy host is specified, the HTTP request will be sent via the specified proxy:
  • connect opens a connection to the proxy.
  • uses the "Proxy-Connection" header to keep alive the connection.
  • sends a fully qualified URL in the request line, for example "http://www.foo.com/index.html". The fully qualified URL tells the proxy to forward the request to the specified host.
Otherwise, the HTTP request will go directly to the host:
  • connect opens a connection to the remote host.
  • uses the "Connection" header to keep alive the connection.
  • sends a host-relative URL in the request line, for example "/index.html". The relative URL is derived from the fully qualified URL used to construct this HttpRequest.
header "Connection" or "Proxy-Connection"
The HttpRequest sets the appropriate connection header to "Keep-Alive" to keep alive the connection to the host or proxy (respectively). By setting the appropriate connection header, the user can control whether the HttpRequest tries to use Keep-Alives.
header "Host"
The HTTP/1.1 protocol requires that the "Host" header be set to the name of the machine being contacted. By default, this is derived from the URL used to construct the HttpRequest, and is set automatically if the user does not set it.
header "Content-Length"
If the user calls getOutputStream and writes some data to it, the "Content-Length" header will be set to the amount of data that has been written at the time that connect is called.

Once all data has been read from the remote host, the underlying socket may be automatically recycled and used again for subsequent requests to the same remote host. If the user is not planning on reading all the data from the remote host, the user should call close to release the socket. Although it happens under the covers, the user should be aware that if an IOException occurs or once data has been read normally from the remote host, close is called automatically. This is to ensure that the minimal number of sockets are left open at any time.

The input stream that getInputStream provides automatically hides whether the remote host is providing HTTP/1.1 "chunked" encoding or regular streaming data. The user can simply read until reaching the end of the input stream, which signifies that all the available data from this request has been read. If reading from a "chunked" source, the data is automatically de-chunked as it is presented to the user. Currently, no access is provided to the underlying raw input stream.

Version: 2.7

Author: Colin Stevens (colin.stevens@sun.com)

Field Summary
protected booleanconnected
static StringdefaultHTTPVersion
The default HTTP version string to send to the remote host when issuing requests.
static StringdefaultProxyHost
The default proxy host for HTTP requests.
static intdefaultProxyPort
The default proxy port for HTTP requests.
static booleandisplayAllHeaders
setting this to "true" causing all http headers to be printed on the standard error stream; useful for debugging client/server interactions.
static intDRAIN_TIMEOUT
Timeout (in msec) to drain an input stream that has been closed before the entire HTTP response has been read.
Stringhost
The host extracted from the URL used to construct this HttpRequest.
static intLINE_LIMIT
Maximum length of a line in the HTTP response headers (sanity check).
Stringmethod
The HTTP method, such as "GET", "POST", or "HEAD".
static HttpSocketPoolpool
The cache of idle sockets.
intport
The port extracted from the URL used to construct this HttpRequest.
StringproxyHost
If non-null, sends this HTTP request via the specified proxy host and port.
intproxyPort
The proxy port.
MimeHeadersrequestHeaders
The headers for the HTTP request.
MimeHeadersresponseHeaders
The headers that were present in the HTTP response.
MimeHeadersresponseTrailers
An artifact of HTTP/1.1 chunked encoding.
static SocketFactorysocketFactory
The factory for constructing new Sockets objects used to connect to remote hosts when issuing HTTP requests.
Stringstatus
The status line from the HTTP response.
URLurl
The URL used to construct this HttpRequest.
Stringversion
The HTTP version string.
Constructor Summary
HttpRequest(URL url)
Creates a new HttpRequest object that will send an HTTP request to fetch the resource represented by the URL.
HttpRequest(String url)
Creates a new HttpRequest object that will send an HTTP request to fetch the resource represented by the URL.
Method Summary
intaddHeaders(String tokens, Properties props)
Convenience method for adding request headers by looking them up in a properties object.
voidclose()
Gracefully closes this HTTP request when user is done with it.
voidconnect()
Connect to the target host (or proxy), send the request, and read the response headers.
voiddisconnect()
Interrupts this HTTP request.
StringgetContent(String encoding)
Get the content as a string.
StringgetContent()
Return the content as a string.
intgetContentLength()
Convenience method to get the "Content-Length" header from the HTTP response.
StringgetEncoding()
HttpInputStreamgetInputStream()
Gets an input stream that can be used to read the body of the HTTP response.
OutputStreamgetOutputStream()
Gets an output stream that can be used for uploading data to the host.
intgetResponseCode()
Gets the HTTP response status code.
StringgetResponseHeader(String key)
Gets the value associated with the given case-insensitive header name from the HTTP response.
static voidmain(String[] args)
Grab http document(s) and save them in the filesystem.
static voidremovePointToPointHeaders(MimeHeaders headers, boolean response)
Removes all the point-to-point (hop-by-hop) headers from the given mime headers.
voidsetMethod(String method)
Sets the HTTP method to the specified value.
voidsetProxy(String proxyHost, int proxyPort)
Sets the proxy for this request.
voidsetRequestHeader(String key, String value)
Sets a request header in the HTTP request that will be issued.

Field Detail

connected

protected boolean connected

defaultHTTPVersion

public static String defaultHTTPVersion
The default HTTP version string to send to the remote host when issuing requests.

The default value can be overridden on a per-request basis by setting the version instance variable.

Default value is "HTTP/1.1".

See Also: version

defaultProxyHost

public static String defaultProxyHost
The default proxy host for HTTP requests. If non-null, then all new HTTP requests will be sent via this proxy. If null, then all new HTTP requests are sent directly to the host specified when the HttpRequest object was constructed.

The default value can be overridden on a per-request basis by calling the setProxy method or setting the proxyHost instance variables.

Default value is null.

See Also: defaultProxyPort proxyHost HttpRequest

defaultProxyPort

public static int defaultProxyPort
The default proxy port for HTTP requests.

Default value is 80.

See Also: defaultProxyHost proxyPort

displayAllHeaders

public static boolean displayAllHeaders
setting this to "true" causing all http headers to be printed on the standard error stream; useful for debugging client/server interactions.

DRAIN_TIMEOUT

public static int DRAIN_TIMEOUT
Timeout (in msec) to drain an input stream that has been closed before the entire HTTP response has been read.

If the user closes the HttpRequest before reading all of the data, but the remote host has agreed to keep this socket alive, we need to read and discard the rest of the response before issuing a new request. If it takes longer than DRAIN_TIMEOUT to read and discard the data, we will just forcefully close the connection to the remote host rather than waiting to read any more.

Default value is 10000.

host

public String host
The host extracted from the URL used to construct this HttpRequest.

See Also: url

LINE_LIMIT

public static int LINE_LIMIT
Maximum length of a line in the HTTP response headers (sanity check).

If an HTTP response line is longer than this, the response is considered to be malformed.

Default value is 1000.

method

public String method
The HTTP method, such as "GET", "POST", or "HEAD".

May be set by the user at any time up until the HTTP request is actually sent.

pool

public static HttpSocketPool pool
The cache of idle sockets. Once a request has been handled, the now-idle socket can be remembered and reused later if another HTTP request is made to the same remote host.

port

public int port
The port extracted from the URL used to construct this HttpRequest.

See Also: url

proxyHost

public String proxyHost
If non-null, sends this HTTP request via the specified proxy host and port.

Initialized from defaultProxyHost, but may be changed by the user at any time up until the HTTP request is actually sent.

See Also: defaultProxyHost proxyPort HttpRequest HttpRequest

proxyPort

public int proxyPort
The proxy port.

See Also: proxyHost

requestHeaders

public MimeHeaders requestHeaders
The headers for the HTTP request. All of these headers will be sent when the connection is actually made.

responseHeaders

public MimeHeaders responseHeaders
The headers that were present in the HTTP response. This field is not valid until after connect has been called and the HTTP response has been read.

responseTrailers

public MimeHeaders responseTrailers
An artifact of HTTP/1.1 chunked encoding. At the end of an HTTP/1.1 chunked response, there may be more MimeHeaders. It is only possible to access these MimeHeaders after all the data from the input stream returned by getInputStream has been read. At that point, this field will automatically be initialized to the set of any headers that were found. If not reading from an HTTP/1.1 chunked source, then this field is irrelevant and will remain null.

socketFactory

public static SocketFactory socketFactory
The factory for constructing new Sockets objects used to connect to remote hosts when issuing HTTP requests. The user can set this to provide a new type of socket, such as SSL sockets.

Default value is null, which signifies plain sockets.

status

public String status
The status line from the HTTP response. This field is not valid until after connect has been called and the HTTP response has been read.

url

public URL url
The URL used to construct this HttpRequest.

version

public String version
The HTTP version string.

Initialized from defaultHTTPVersion, but may be changed by the user at any time up until the HTTP request is actually sent.

Constructor Detail

HttpRequest

public HttpRequest(URL url)
Creates a new HttpRequest object that will send an HTTP request to fetch the resource represented by the URL.

The host specified by the URL is not contacted at this time.

Parameters: url A fully qualified "http:" URL.

Throws: IllegalArgumentException if url is not an "http:" URL.

HttpRequest

public HttpRequest(String url)
Creates a new HttpRequest object that will send an HTTP request to fetch the resource represented by the URL.

The host specified by the URL is not contacted at this time.

Parameters: url A string representing a fully qualified "http:" URL.

Throws: IllegalArgumentException if url is not a well-formed "http:" URL.

Method Detail

addHeaders

public int addHeaders(String tokens, Properties props)
Convenience method for adding request headers by looking them up in a properties object.

Parameters: tokens a white space delimited set of tokens that refer to headers that will be added to the HTTP request. props Keys of the form [token].name and [token].value are used to lookup additional HTTP headers to be added to the request.

Returns: The number of headers added to the request

See Also: HttpRequest

close

public void close()
Gracefully closes this HTTP request when user is done with it.

The user can either call this method or close on the input stream obtained from the getInputStream method -- the results are the same.

When all the response data is read from the input stream, the input stream is automatically closed (recycled). If the user is not going to read all the response data from input stream, the user must call close to release the resources associated with the open request. Otherwise the program may consume all available sockets, waiting forever for the user to finish reading.

Note that the input stream is automatically closed if the input stream throws an exception while reading.

In order to interrupt a pending I/O operation in another thread (for example, to stop a request that is taking too long), the user should call disconnect or interrupt the blocked thread. The user should not call close in this case because close will not interrupt the pending I/O operation.

Closing the request multiple times is allowed.

In order to make sure that open sockets are not left lying around the user should use code similar to the following:

 OutputStream out = ...
 HttpRequest http = new HttpRequest("http://bob.com/index.html");
 try {
     HttpInputStream in = http.getInputStream();
     in.copyTo(out);
 } finally {
     // Copying to "out" could have failed.  Close "http" in case
     // not all the data has been read from it yet.
     http.close();
 }
 

connect

public void connect()
Connect to the target host (or proxy), send the request, and read the response headers. Any setup routines must be called before the call to this method, and routines to examine the result must be called after this method.

Throws: UnknownHostException if the target host (or proxy) could not be contacted. IOException if there is a problem writing the HTTP request or reading the HTTP response headers.

disconnect

public void disconnect()
Interrupts this HTTP request. Can be used to halt an in-progress HTTP request from another thread, by causing it to throw an InterruptedIOException during the connect or while reading from the input stream, depending upon what state this HTTP request is in when it is disconnected.

See Also: HttpRequest

getContent

public String getContent(String encoding)
Get the content as a string. Uses the character encoding specified in the HTTP headers if available. Otherwise the supplied encoding is used, or (if encoding is null), the platform default encoding.

Parameters: encoding The ISO character encoding to use, if the encoding can't be determined by context.

Returns: The content as a string.

getContent

public String getContent()
Return the content as a string.

getContentLength

public int getContentLength()
Convenience method to get the "Content-Length" header from the HTTP response.

If this method is called, it must be called after connect has been called. Otherwise the information is not available and this method will return -1.

Returns: The content length specified in the response headers, or -1 if the length was not specified or malformed (not a number).

See Also: HttpRequest HttpRequest

getEncoding

public String getEncoding()

getInputStream

public HttpInputStream getInputStream()
Gets an input stream that can be used to read the body of the HTTP response. Unlike the other convenience methods for accessing the HTTP response, this one automatically connects to the target host if not already connected.

The input stream that getInputStream provides automatically hides the differences between "Content-Length", no "Content-Length", and "chunked" for HTTP/1.0 and HTTP/1.1 responses. In all cases, the user can simply read until reaching the end of the input stream, which signifies that all the available data from this request has been read. (If reading from a "chunked" source, the data is automatically de-chunked as it is presented to the user. There is no way to access the raw underlying stream that contains the HTTP/1.1 chunking packets.)

Throws: IOException if there is problem connecting to the target.

See Also: HttpRequest

getOutputStream

public OutputStream getOutputStream()
Gets an output stream that can be used for uploading data to the host.

If this method is called, it must be called before connect is called. Otherwise it will have no effect.

Currently the implementation is not as good as it could be. The user should avoid uploading huge amounts of data, for some definition of huge.

getResponseCode

public int getResponseCode()
Gets the HTTP response status code. From responses like:
 HTTP/1.0 200 OK
 HTTP/1.0 401 Unauthorized
 
this method extracts the integers 200 and 401 respectively. Returns -1 if the response status code was malformed.

If this method is called, it must be called after connect has been called. Otherwise the information is not yet available and this method will return -1.

For advanced features, the user can directly access the status variable.

Returns: The integer status code from the HTTP response.

See Also: HttpRequest status

getResponseHeader

public String getResponseHeader(String key)
Gets the value associated with the given case-insensitive header name from the HTTP response.

If this method is called, it must be called after connect has been called. Otherwise the information is not available and this method will return null.

For advanced features, such as enumerating over all response headers, the user should directly access the responseHeaders variable.

Parameters: key The case-insensitive name of the response header.

Returns: The value associated with the given name, or null if there is no such header in the response.

See Also: HttpRequest responseHeaders

main

public static void main(String[] args)
Grab http document(s) and save them in the filesystem. This is a simple batch HTTP url fetcher. Usage:
 java ... sunlabs.brazil.request.HttpRequest [-v(erbose)] [-h(headers)] [-p] url...
 
-v
Verbose. Print the target URL and destination file on stderr
-h
Print all the HTTP headers on stderr
-phttp://proxyhost:port
The following url's are to be fetched via a proxy.
The options and url's may be given in any order. Use "-p" by itself to disable the proxy for all following requests.

There are many limitations: only HTTP GET requests are supported, the output filename is derived autmatically from the URL and can't be overridden, if a destination file already exists, it is overwritten.

removePointToPointHeaders

public static void removePointToPointHeaders(MimeHeaders headers, boolean response)
Removes all the point-to-point (hop-by-hop) headers from the given mime headers.

Parameters: headers The mime headers to be modified. response true to remove the point-to-point response headers, false to remove the point-to-point request headers.

See Also: RFC 2068

setMethod

public void setMethod(String method)
Sets the HTTP method to the specified value. Some of the normal HTTP methods are "GET", "POST", "HEAD", "PUT", "DELETE", but the user can set the method to any value desired.

If this method is called, it must be called before connect is called. Otherwise it will have no effect.

Parameters: method The string for the HTTP method, or null to allow this HttpRequest to pick the method for itself.

setProxy

public void setProxy(String proxyHost, int proxyPort)
Sets the proxy for this request. The HTTP proxy request will be sent to the specified proxy host.

If this method is called, it must be called before connect is called. Otherwise it will have no effect.

Parameters: proxyHost The proxy that will handle the request, or null to not use a proxy. proxyPort The port on the proxy, for the proxy request. Ignored if proxyHost is null.

setRequestHeader

public void setRequestHeader(String key, String value)
Sets a request header in the HTTP request that will be issued. In order to do fancier things like appending a value to an existing request header, the user may directly access the requestHeaders variable.

If this method is called, it must be called before connect is called. Otherwise it will have no effect.

Parameters: key The header name. value The value for the request header.

See Also: requestHeaders