Class HtmlParser

java.lang.Object
nu.validator.htmlparser.sax.HtmlParser
All Implemented Interfaces:
XMLReader
Direct Known Subclasses:
InfosetCoercingHtmlParser

public class HtmlParser extends Object implements XMLReader
This class implements an HTML5 parser that exposes data through the SAX2 interface.

By default, when using the constructor without arguments, the this parser coerces XML 1.0-incompatible infosets into XML 1.0-compatible infosets. This corresponds to ALTER_INFOSET as the general XML violation policy. To make the parser support non-conforming HTML fully per the HTML 5 spec while on the other hand potentially violating the SAX2 API contract, set the general XML violation policy to ALLOW. It is possible to treat XML 1.0 infoset violations as fatal by setting the general XML violation policy to FATAL.

By default, this parser doesn't do true streaming but buffers everything first. The parser can be made truly streaming by calling setStreamabilityViolationPolicy(XmlViolationPolicy.FATAL). This has the consequence that errors that require non-streamable recovery are treated as fatal.

By default, in order to make the parse events emulate the parse events for a DTDless XML document, the parser does not report the doctype through LexicalHandler. Doctype reporting through LexicalHandler can be turned on by calling setReportingDoctype(true).

Version:
$Id$
Author:
hsivonen
  • Constructor Details

    • HtmlParser

      public HtmlParser()
      Instantiates the parser with a fatal XML violation policy.
    • HtmlParser

      public HtmlParser(XmlViolationPolicy xmlPolicy)
      Instantiates the parser with a specific XML violation policy.
      Parameters:
      xmlPolicy - the policy
  • Method Details