org.htmlparser.lexer

Class PageAttribute

public class PageAttribute extends Attribute

An attribute within a tag on a page. This attribute is similar to Attribute but 'lazy loaded' from the Page by providing the page and cursor offsets into the page for the name and value. This is done for speed, since if the name and value are not needed we can avoid the cost and memory overhead of creating the strings.

Thus the property getters, defer to the base class unless the property is null, in which case an attempt is made to read it from the underlying page. Optimizations in the predicates and length calculation defer the actual instantiation of strings until absolutely needed.

Field Summary
protected intmNameEnd
The ending offset of the name within the page.
protected intmNameStart
The starting offset of the name within the page.
protected PagemPage
The page this attribute is extracted from.
protected intmValueEnd
The ending offset of the name within the page.
protected intmValueStart
The starting offset of the value within the page.
Constructor Summary
PageAttribute(Page page, int name_start, int name_end, int value_start, int value_end, char quote)
Create an attribute.
PageAttribute(String name, String assignment, String value, char quote)
Create an attribute with the name, assignment string, value and quote given.
PageAttribute(String name, String value, char quote)
Create an attribute with the name, value and quote given.
PageAttribute(String value)
Create a whitespace attribute with the value given.
PageAttribute(String name, String value)
Create an attribute with the name and value given.
PageAttribute(String name, String assignment, String value)
Create an attribute with the name, assignment string and value given.
PageAttribute()
Create an empty attribute.
Method Summary
StringgetAssignment()
Get the assignment string of this attribute.
voidgetAssignment(StringBuffer buffer)
Get the assignment string of this attribute.
intgetLength()
Get the length of the string value of this attribute.
StringgetName()
Get the name of this attribute.
voidgetName(StringBuffer buffer)
Get the name of this attribute.
intgetNameEndPosition()
Get the ending position of the attribute name.
intgetNameStartPosition()
Get the starting position of the attribute name.
PagegetPage()
Get the page this attribute is anchored to, if any.
StringgetRawValue()
Get the raw value of the attribute.
voidgetRawValue(StringBuffer buffer)
Get the raw value of the attribute.
StringgetValue()
Get the value of the attribute.
voidgetValue(StringBuffer buffer)
Get the value of the attribute.
intgetValueEndPosition()
Get the ending position of the attribute value.
intgetValueStartPosition()
Get the starting position of the attribute value.
booleanisEmpty()
Predicate to determine if this attribute has an equals sign but no value.
booleanisStandAlone()
Predicate to determine if this attribute has no equals sign (or value).
booleanisValued()
Predicate to determine if this attribute has a value.
booleanisWhitespace()
Predicate to determine if this attribute is whitespace.
voidsetNameEndPosition(int end)
Set the ending position of the attribute name.
voidsetNameStartPosition(int start)
Set the starting position of the attribute name.
voidsetPage(Page page)
Set the page this attribute is anchored to.
voidsetValueEndPosition(int end)
Set the ending position of the attribute value.
voidsetValueStartPosition(int start)
Set the starting position of the attribute value.

Field Detail

mNameEnd

protected int mNameEnd
The ending offset of the name within the page.

mNameStart

protected int mNameStart
The starting offset of the name within the page. If negative, the name is considered null.

mPage

protected Page mPage
The page this attribute is extracted from.

mValueEnd

protected int mValueEnd
The ending offset of the name within the page.

mValueStart

protected int mValueStart
The starting offset of the value within the page. If negative, the value is considered null.

Constructor Detail

PageAttribute

public PageAttribute(Page page, int name_start, int name_end, int value_start, int value_end, char quote)
Create an attribute.

Parameters: page The page containing the attribute. name_start The starting offset of the name within the page. If this is negative, the name is considered null. name_end The ending offset of the name within the page. value_start he starting offset of the value within the page. If this is negative, the value is considered null. value_end The ending offset of the value within the page. quote The quote, if any, surrounding the value of the attribute, (i.e. ' or "), or zero if none.

PageAttribute

public PageAttribute(String name, String assignment, String value, char quote)
Create an attribute with the name, assignment string, value and quote given. If the quote value is zero, assigns the value using {@link #setRawValue} which sets the quote character to a proper value if necessary.

Parameters: name The name of this attribute. assignment The assignment string of this attribute. value The value of this attribute. quote The quote around the value of this attribute.

PageAttribute

public PageAttribute(String name, String value, char quote)
Create an attribute with the name, value and quote given. Uses an equals sign as the assignment string if the value is not null, and calls {@link #setRawValue} to get the correct quoting if quote is zero.

Parameters: name The name of this attribute. value The value of this attribute. quote The quote around the value of this attribute.

PageAttribute

public PageAttribute(String value)
Create a whitespace attribute with the value given.

Parameters: value The value of this attribute.

Throws: IllegalArgumentException if the value contains other than whitespace. To set a real value use {@link #PageAttribute(String,String)}.

PageAttribute

public PageAttribute(String name, String value)
Create an attribute with the name and value given. Uses an equals sign as the assignment string if the value is not null, and calls {@link #setRawValue} to get the correct quoting.

Parameters: name The name of this attribute. value The value of this attribute.

PageAttribute

public PageAttribute(String name, String assignment, String value)
Create an attribute with the name, assignment string and value given. Calls {@link #setRawValue} to get the correct quoting.

Parameters: name The name of this attribute. assignment The assignment string of this attribute. value The value of this attribute.

PageAttribute

public PageAttribute()
Create an empty attribute. This will provide "" from the {@link #toString} and {@link #toString(StringBuffer)} methods.

Method Detail

getAssignment

public String getAssignment()
Get the assignment string of this attribute. This is usually just an equals sign, but in poorly formed attributes it can include whitespace on either or both sides of an equals sign.

Returns: The assignment string.

getAssignment

public void getAssignment(StringBuffer buffer)
Get the assignment string of this attribute.

Parameters: buffer The buffer to place the assignment string in.

See Also: getAssignment

getLength

public int getLength()
Get the length of the string value of this attribute.

Returns: The number of characters required to express this attribute.

getName

public String getName()
Get the name of this attribute. The part before the equals sign, or the contents of the stand-alone attribute.

Returns: The name, or null if it's just a whitepace 'attribute'.

getName

public void getName(StringBuffer buffer)
Get the name of this attribute.

Parameters: buffer The buffer to place the name in.

See Also: getName

getNameEndPosition

public int getNameEndPosition()
Get the ending position of the attribute name.

Returns: The offset into the page at which the name ends.

getNameStartPosition

public int getNameStartPosition()
Get the starting position of the attribute name.

Returns: The offset into the page at which the name begins.

getPage

public Page getPage()
Get the page this attribute is anchored to, if any.

Returns: The page used to construct this attribute, or null if this is just a regular attribute.

getRawValue

public String getRawValue()
Get the raw value of the attribute. The part after the equals sign, or the text if it's just a whitepace 'attribute'. This includes the quotes around the value if any.

Returns: The value, or null if it's a stand-alone attribute, or the text if it's just a whitepace 'attribute'.

getRawValue

public void getRawValue(StringBuffer buffer)
Get the raw value of the attribute. The part after the equals sign, or the text if it's just a whitepace 'attribute'. This includes the quotes around the value if any.

Parameters: buffer The string buffer to append the attribute value to.

See Also: getRawValue

getValue

public String getValue()
Get the value of the attribute. The part after the equals sign, or the text if it's just a whitepace 'attribute'. NOTE: This does not include any quotes that may have enclosed the value when it was read. To get the un-stripped value use {@link #getRawValue}.

Returns: The value, or null if it's a stand-alone or empty attribute, or the text if it's just a whitepace 'attribute'.

getValue

public void getValue(StringBuffer buffer)
Get the value of the attribute.

Parameters: buffer The buffer to place the value in.

See Also: getValue

getValueEndPosition

public int getValueEndPosition()
Get the ending position of the attribute value.

Returns: The offset into the page at which the value ends.

getValueStartPosition

public int getValueStartPosition()
Get the starting position of the attribute value.

Returns: The offset into the page at which the value begins.

isEmpty

public boolean isEmpty()
Predicate to determine if this attribute has an equals sign but no value.

Returns: true if this attribute is an empty attribute. false if has an equals sign and a value.

isStandAlone

public boolean isStandAlone()
Predicate to determine if this attribute has no equals sign (or value).

Returns: true if this attribute is a standalone attribute. false if has an equals sign.

isValued

public boolean isValued()
Predicate to determine if this attribute has a value.

Returns: true if this attribute has a value. false if it is empty or standalone.

isWhitespace

public boolean isWhitespace()
Predicate to determine if this attribute is whitespace.

Returns: true if this attribute is whitespace, false if it is a real attribute.

setNameEndPosition

public void setNameEndPosition(int end)
Set the ending position of the attribute name.

Parameters: end The new offset into the page at which the name ends.

setNameStartPosition

public void setNameStartPosition(int start)
Set the starting position of the attribute name.

Parameters: start The new offset into the page at which the name begins.

setPage

public void setPage(Page page)
Set the page this attribute is anchored to.

Parameters: page The page to be used to construct this attribute. Note: If you set this you probably also want to uncache the property values by setting them to null.

setValueEndPosition

public void setValueEndPosition(int end)
Set the ending position of the attribute value.

Parameters: end The new offset into the page at which the value ends.

setValueStartPosition

public void setValueStartPosition(int start)
Set the starting position of the attribute value.

Parameters: start The new offset into the page at which the value begins.

HTML Parser is an open source library released under LGPL. SourceForge.net