org.htmlparser.lexer

Class InputStreamSource

public class InputStreamSource extends Source

A source of characters based on an InputStream such as from a URLConnection.
Field Summary
static intBUFFER_SIZE
An initial buffer size.
protected char[]mBuffer
The characters read so far.
protected StringmEncoding
The character set in use.
protected intmLevel
The number of valid bytes in the buffer.
protected intmMark
The bookmark.
protected intmOffset
The offset of the next byte returned by read().
protected InputStreamReadermReader
The converter from bytes to characters.
protected InputStreammStream
The stream of bytes.
Constructor Summary
InputStreamSource(InputStream stream)
Create a source of characters using the default character set.
InputStreamSource(InputStream stream, String charset)
Create a source of characters.
InputStreamSource(InputStream stream, String charset, int size)
Create a source of characters.
Method Summary
intavailable()
Get the number of available characters.
voidclose()
Does nothing.
voiddestroy()
Close the source.
protected voidfill(int min)
Fetch more characters from the underlying reader.
chargetCharacter(int offset)
Retrieve a character again.
voidgetCharacters(char[] array, int offset, int start, int end)
Retrieve characters again.
voidgetCharacters(StringBuffer buffer, int offset, int length)
Append characters already read into a StringBuffer.
StringgetEncoding()
Get the encoding being used to convert characters.
InputStreamgetStream()
Get the input stream being used.
StringgetString(int offset, int length)
Retrieve a string.
voidmark(int readAheadLimit)
Mark the present position in the source.
booleanmarkSupported()
Tell whether this source supports the mark() operation.
intoffset()
Get the position (in characters).
intread()
Read a single character.
intread(char[] cbuf, int off, int len)
Read characters into a portion of an array.
intread(char[] cbuf)
Read characters into an array.
booleanready()
Tell whether this source is ready to be read.
voidreset()
Reset the source.
voidsetEncoding(String character_set)
Begins reading from the source with the given character set.
longskip(long n)
Skip characters.
voidunread()
Undo the read of a single character.

Field Detail

BUFFER_SIZE

public static int BUFFER_SIZE
An initial buffer size. Has a default value of {16384}.

mBuffer

protected char[] mBuffer
The characters read so far.

mEncoding

protected String mEncoding
The character set in use.

mLevel

protected int mLevel
The number of valid bytes in the buffer.

mMark

protected int mMark
The bookmark.

mOffset

protected int mOffset
The offset of the next byte returned by read().

mReader

protected transient InputStreamReader mReader
The converter from bytes to characters.

mStream

protected transient InputStream mStream
The stream of bytes. Set to null when the source is closed.

Constructor Detail

InputStreamSource

public InputStreamSource(InputStream stream)
Create a source of characters using the default character set.

Parameters: stream The stream of bytes to use.

Throws: UnsupportedEncodingException If the default character set is unsupported.

InputStreamSource

public InputStreamSource(InputStream stream, String charset)
Create a source of characters.

Parameters: stream The stream of bytes to use. charset The character set used in encoding the stream.

Throws: UnsupportedEncodingException If the character set is unsupported.

InputStreamSource

public InputStreamSource(InputStream stream, String charset, int size)
Create a source of characters.

Parameters: stream The stream of bytes to use. charset The character set used in encoding the stream. size The initial character buffer size.

Throws: UnsupportedEncodingException If the character set is unsupported.

Method Detail

available

public int available()
Get the number of available characters.

Returns: The number of characters that can be read without blocking or zero if the source is closed.

close

public void close()
Does nothing. It's supposed to close the source, but use destroy() instead.

Throws: IOException not used

See Also: InputStreamSource

destroy

public void destroy()
Close the source. Once a source has been closed, further {@link #read() read}, {@link #ready ready}, {@link #mark mark}, {@link #reset reset}, {@link #skip skip}, {@link #unread unread}, {@link #getCharacter getCharacter} or {@link #getString getString} invocations will throw an IOException. Closing a previously-closed source, however, has no effect.

Throws: IOException If an I/O error occurs

fill

protected void fill(int min)
Fetch more characters from the underlying reader. Has no effect if the underlying reader has been drained.

Parameters: min The minimum to read.

Throws: IOException If the underlying reader read() throws one.

getCharacter

public char getCharacter(int offset)
Retrieve a character again.

Parameters: offset The offset of the character.

Returns: The character at offset.

Throws: IOException If the offset is beyond {@link #offset()} or the source is closed.

getCharacters

public void getCharacters(char[] array, int offset, int start, int end)
Retrieve characters again.

Parameters: array The array of characters. offset The starting position in the array where characters are to be placed. start The starting position, zero based. end The ending position (exclusive, i.e. the character at the ending position is not included), zero based.

Throws: IOException If the start or end is beyond {@link #offset()} or the source is closed.

getCharacters

public void getCharacters(StringBuffer buffer, int offset, int length)
Append characters already read into a StringBuffer.

Parameters: buffer The buffer to append to. offset The offset of the first character. length The number of characters to retrieve.

Throws: IOException If the offset or (offset + length) is beyond {@link #offset()} or the source is closed.

getEncoding

public String getEncoding()
Get the encoding being used to convert characters.

Returns: The current encoding.

getStream

public InputStream getStream()
Get the input stream being used.

Returns: The current input stream.

getString

public String getString(int offset, int length)
Retrieve a string.

Parameters: offset The offset of the first character. length The number of characters to retrieve.

Returns: A string containing the length characters at offset.

Throws: IOException If the offset or (offset + length) is beyond {@link #offset()} or the source is closed.

mark

public void mark(int readAheadLimit)
Mark the present position in the source. Subsequent calls to {@link #reset()} will attempt to reposition the source to this point.

Parameters: readAheadLimit Not used.

Throws: IOException If the source is closed.

markSupported

public boolean markSupported()
Tell whether this source supports the mark() operation.

Returns: true.

offset

public int offset()
Get the position (in characters).

Returns: The number of characters that have already been read, or {@link #EOF EOF} if the source is closed.

read

public int read()
Read a single character. This method will block until a character is available, an I/O error occurs, or the end of the stream is reached.

Returns: The character read, as an integer in the range 0 to 65535 (0x00-0xffff), or {@link #EOF EOF} if the end of the stream has been reached

Throws: IOException If an I/O error occurs.

read

public int read(char[] cbuf, int off, int len)
Read characters into a portion of an array. This method will block until some input is available, an I/O error occurs, or the end of the stream is reached.

Parameters: cbuf Destination buffer off Offset at which to start storing characters len Maximum number of characters to read

Returns: The number of characters read, or {@link #EOF EOF} if the end of the stream has been reached

Throws: IOException If an I/O error occurs.

read

public int read(char[] cbuf)
Read characters into an array. This method will block until some input is available, an I/O error occurs, or the end of the stream is reached.

Parameters: cbuf Destination buffer.

Returns: The number of characters read, or {@link #EOF EOF} if the end of the stream has been reached.

Throws: IOException If an I/O error occurs.

ready

public boolean ready()
Tell whether this source is ready to be read.

Returns: true if the next read() is guaranteed not to block for input, false otherwise. Note that returning false does not guarantee that the next read will block.

Throws: IOException If the source is closed.

reset

public void reset()
Reset the source. Repositions the read point to begin at zero.

Throws: IllegalStateException If the source has been closed.

setEncoding

public void setEncoding(String character_set)
Begins reading from the source with the given character set. If the current encoding is the same as the requested encoding, this method is a no-op. Otherwise any subsequent characters read from this page will have been decoded using the given character set.

Some magic happens here to obtain this result if characters have already been consumed from this source. Since a Reader cannot be dynamically altered to use a different character set, the underlying stream is reset, a new Source is constructed and a comparison made of the characters read so far with the newly read characters up to the current position. If a difference is encountered, or some other problem occurs, an exception is thrown.

Parameters: character_set The character set to use to convert bytes into characters.

Throws: ParserException If a character mismatch occurs between characters already provided and those that would have been returned had the new character set been in effect from the beginning. An exception is also thrown if the underlying stream won't put up with these shenanigans.

skip

public long skip(long n)
Skip characters. This method will block until some characters are available, an I/O error occurs, or the end of the stream is reached. Note: n is treated as an int

Parameters: n The number of characters to skip.

Returns: The number of characters actually skipped

Throws: IllegalArgumentException If n is negative. IOException If an I/O error occurs.

unread

public void unread()
Undo the read of a single character.

Throws: IOException If the source is closed or no characters have been read.

HTML Parser is an open source library released under LGPL. SourceForge.net