Package org.cyberneko.html
Class HTMLScanner.ContentScanner
java.lang.Object
org.cyberneko.html.HTMLScanner.ContentScanner
- All Implemented Interfaces:
HTMLScanner.Scanner
- Enclosing class:
HTMLScanner
The primary HTML document scanner.
- Author:
- Andy Clark
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected void
addLocationItem
(org.apache.xerces.xni.XMLAttributes attributes, int index) Adds location augmentations to the specified attribute.protected String
nextContent
(int len) Reads the next characters WITHOUT impacting the buffer content up to current offset.boolean
scan
(boolean complete) Scan.protected boolean
scanAttribute
(org.apache.xerces.util.XMLAttributesImpl attributes, boolean[] empty) Scans a real attribute.protected boolean
scanAttribute
(org.apache.xerces.util.XMLAttributesImpl attributes, boolean[] empty, char endc) Scans an attribute, pseudo or real.protected void
Scans a CDATA section.protected void
Scans characters.protected void
Scans a comment.protected void
Scans an end element.protected boolean
scanMarkupContent
(org.apache.xerces.util.XMLStringBuffer buffer, char cend) Scans markup content.protected void
scanPI()
Scans a processing instruction.protected boolean
scanPseudoAttribute
(org.apache.xerces.util.XMLAttributesImpl attributes) Scans a pseudo attribute.protected String
scanStartElement
(boolean[] empty) Scans a start element.
-
Constructor Details
-
ContentScanner
public ContentScanner()
-
-
Method Details
-
scan
Scan.- Specified by:
scan
in interfaceHTMLScanner.Scanner
- Parameters:
complete
- True if the scanner should not return until scanning is complete.- Returns:
- True if additional scanning is required.
- Throws:
IOException
- Thrown if I/O error occurs.
-
nextContent
Reads the next characters WITHOUT impacting the buffer content up to current offset.- Parameters:
len
- the number of characters to read- Returns:
- the read string (length may be smaller if EOF is encountered)
- Throws:
IOException
-
scanCharacters
Scans characters.- Throws:
IOException
-
scanCDATA
Scans a CDATA section.- Throws:
IOException
-
scanComment
Scans a comment.- Throws:
IOException
-
scanMarkupContent
protected boolean scanMarkupContent(org.apache.xerces.util.XMLStringBuffer buffer, char cend) throws IOException Scans markup content.- Throws:
IOException
-
scanPI
Scans a processing instruction.- Throws:
IOException
-
scanStartElement
Scans a start element.- Parameters:
empty
- Is used for a second return value to indicate whether the start element tag is empty (e.g. "/>").- Throws:
IOException
-
scanAttribute
protected boolean scanAttribute(org.apache.xerces.util.XMLAttributesImpl attributes, boolean[] empty) throws IOException Scans a real attribute.- Parameters:
attributes
- The list of attributes.empty
- Is used for a second return value to indicate whether the start element tag is empty (e.g. "/>").- Throws:
IOException
-
scanPseudoAttribute
protected boolean scanPseudoAttribute(org.apache.xerces.util.XMLAttributesImpl attributes) throws IOException Scans a pseudo attribute.- Parameters:
attributes
- The list of attributes.- Throws:
IOException
-
scanAttribute
protected boolean scanAttribute(org.apache.xerces.util.XMLAttributesImpl attributes, boolean[] empty, char endc) throws IOException Scans an attribute, pseudo or real.- Parameters:
attributes
- The list of attributes.empty
- Is used for a second return value to indicate whether the start element tag is empty (e.g. "/>").endc
- The end character that appears before the closing angle bracket ('>').- Throws:
IOException
-
addLocationItem
protected void addLocationItem(org.apache.xerces.xni.XMLAttributes attributes, int index) Adds location augmentations to the specified attribute. -
scanEndElement
Scans an end element.- Throws:
IOException
-