Class DOMNormalizer

  • All Implemented Interfaces:
    org.apache.xerces.xni.XMLDocumentHandler

    public class DOMNormalizer
    extends java.lang.Object
    implements org.apache.xerces.xni.XMLDocumentHandler
    This class adds implementation for normalizeDocument method. It acts as if the document was going through a save and load cycle, putting the document in a "normal" form. The actual result depends on the features being set and governing what operations actually take place. See setNormalizationFeature for details. Noticeably this method normalizes Text nodes, makes the document "namespace wellformed", according to the algorithm described below in pseudo code, by adding missing namespace declaration attributes and adding or changing namespace prefixes, updates the replacement tree of EntityReference nodes, normalizes attribute values, etc. Mutation events, when supported, are generated to reflect the changes occuring on the document. See Namespace normalization for details on how namespace declaration attributes and prefixes are normalized. NOTE: There is an initial support for DOM revalidation with XML Schema as a grammar. The tree might not be validated correctly if entityReferences, CDATA sections are present in the tree. The PSVI information is not exposed, normalized data (including element default content is not available).

    EXPERIMENTAL:

    This class should not be considered stable. It is likely to be altered or replaced in the future.
    Version:
    $Id: DOMNormalizer.java 1710695 2015-10-26 20:48:54Z mrglavas $
    Author:
    Elena Litani, IBM, Neeraj Bajaj, Sun Microsystems, inc.
    • Constructor Summary

      Constructors 
      Constructor Description
      DOMNormalizer()  
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected void addNamespaceDecl​(java.lang.String prefix, java.lang.String uri, ElementImpl element)
      Adds a namespace attribute or replaces the value of existing namespace attribute with the given prefix and value for URI.
      void characters​(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs)
      Character content.
      void comment​(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs)
      A comment.
      void doctypeDecl​(java.lang.String rootElement, java.lang.String publicId, java.lang.String systemId, org.apache.xerces.xni.Augmentations augs)
      Notifies of the presence of the DOCTYPE line in the document.
      void emptyElement​(org.apache.xerces.xni.QName element, org.apache.xerces.xni.XMLAttributes attributes, org.apache.xerces.xni.Augmentations augs)
      An empty element.
      void endCDATA​(org.apache.xerces.xni.Augmentations augs)
      The end of a CDATA section.
      void endDocument​(org.apache.xerces.xni.Augmentations augs)
      The end of the document.
      void endElement​(org.apache.xerces.xni.QName element, org.apache.xerces.xni.Augmentations augs)
      The end of an element.
      void endGeneralEntity​(java.lang.String name, org.apache.xerces.xni.Augmentations augs)
      This method notifies the end of a general entity.
      protected void expandEntityRef​(org.w3c.dom.Node parent, org.w3c.dom.Node reference)  
      org.apache.xerces.xni.parser.XMLDocumentSource getDocumentSource()
      Returns the document source.
      void ignorableWhitespace​(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs)
      Ignorable whitespace.
      static void isAttrValueWF​(org.w3c.dom.DOMErrorHandler errorHandler, DOMErrorImpl error, DOMLocatorImpl locator, org.w3c.dom.NamedNodeMap attributes, org.w3c.dom.Attr a, java.lang.String value, boolean xml11Version)
      NON-DOM: check if attribute value is well-formed
      static void isCDataWF​(org.w3c.dom.DOMErrorHandler errorHandler, DOMErrorImpl error, DOMLocatorImpl locator, java.lang.String datavalue, boolean isXML11Version)
      Check if CDATA section is well-formed
      static void isCommentWF​(org.w3c.dom.DOMErrorHandler errorHandler, DOMErrorImpl error, DOMLocatorImpl locator, java.lang.String datavalue, boolean isXML11Version)
      NON-DOM: check if value of the comment is well-formed
      static void isXMLCharWF​(org.w3c.dom.DOMErrorHandler errorHandler, DOMErrorImpl error, DOMLocatorImpl locator, java.lang.String datavalue, boolean isXML11Version)
      NON-DOM: check for valid XML characters as per the XML version
      protected void namespaceFixUp​(ElementImpl element, AttributeMap attributes)  
      protected void normalizeDocument​(CoreDocumentImpl document, DOMConfigurationImpl config)
      Normalizes document.
      protected org.w3c.dom.Node normalizeNode​(org.w3c.dom.Node node)
      This method acts as if the document was going through a save and load cycle, putting the document in a "normal" form.
      void processingInstruction​(java.lang.String target, org.apache.xerces.xni.XMLString data, org.apache.xerces.xni.Augmentations augs)
      A processing instruction.
      static void reportDOMError​(org.w3c.dom.DOMErrorHandler errorHandler, DOMErrorImpl error, DOMLocatorImpl locator, java.lang.String message, short severity, java.lang.String type)
      Reports a DOM error to the user handler.
      void setDocumentSource​(org.apache.xerces.xni.parser.XMLDocumentSource source)
      Sets the document source.
      void startCDATA​(org.apache.xerces.xni.Augmentations augs)
      The start of a CDATA section.
      void startDocument​(org.apache.xerces.xni.XMLLocator locator, java.lang.String encoding, org.apache.xerces.xni.NamespaceContext namespaceContext, org.apache.xerces.xni.Augmentations augs)
      The start of the document.
      void startElement​(org.apache.xerces.xni.QName element, org.apache.xerces.xni.XMLAttributes attributes, org.apache.xerces.xni.Augmentations augs)
      The start of an element.
      void startGeneralEntity​(java.lang.String name, org.apache.xerces.xni.XMLResourceIdentifier identifier, java.lang.String encoding, org.apache.xerces.xni.Augmentations augs)
      This method notifies the start of a general entity.
      void textDecl​(java.lang.String version, java.lang.String encoding, org.apache.xerces.xni.Augmentations augs)
      Notifies of the presence of a TextDecl line in an entity.
      protected void updateQName​(org.w3c.dom.Node node, org.apache.xerces.xni.QName qname)  
      void xmlDecl​(java.lang.String version, java.lang.String encoding, java.lang.String standalone, org.apache.xerces.xni.Augmentations augs)
      Notifies of the presence of an XMLDecl line in the document.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • DEBUG_ND

        protected static final boolean DEBUG_ND
        Debug normalize document
        See Also:
        Constant Field Values
      • DEBUG

        protected static final boolean DEBUG
        Debug namespace fix up algorithm
        See Also:
        Constant Field Values
      • DEBUG_EVENTS

        protected static final boolean DEBUG_EVENTS
        Debug document handler events
        See Also:
        Constant Field Values
      • PREFIX

        protected static final java.lang.String PREFIX
        prefix added by namespace fixup algorithm should follow a pattern "NS" + index
        See Also:
        Constant Field Values
      • fQName

        protected final org.apache.xerces.xni.QName fQName
      • fValidationHandler

        protected RevalidationHandler fValidationHandler
        Validation handler represents validator instance.
      • fSymbolTable

        protected SymbolTable fSymbolTable
        symbol table
      • fErrorHandler

        protected org.w3c.dom.DOMErrorHandler fErrorHandler
        error handler. may be null.
      • fNamespaceValidation

        protected boolean fNamespaceValidation
      • fPSVI

        protected boolean fPSVI
      • fNamespaceContext

        protected final org.apache.xerces.xni.NamespaceContext fNamespaceContext
        The namespace context of this document: stores namespaces in scope
      • fLocalNSBinder

        protected final org.apache.xerces.xni.NamespaceContext fLocalNSBinder
        Stores all namespace bindings on the current element
      • fAttributeList

        protected final java.util.ArrayList fAttributeList
        list of attributes
      • fLocator

        protected final DOMLocatorImpl fLocator
        DOM Locator - for namespace fixup algorithm
      • fCurrentNode

        protected org.w3c.dom.Node fCurrentNode
        for setting the PSVI
      • abort

        public static final java.lang.RuntimeException abort
        If the user stops the process, this exception will be thrown.
      • EMPTY_STRING

        public static final org.apache.xerces.xni.XMLString EMPTY_STRING
        Empty string to pass to the validator.
    • Constructor Detail

      • DOMNormalizer

        public DOMNormalizer()
    • Method Detail

      • normalizeDocument

        protected void normalizeDocument​(CoreDocumentImpl document,
                                         DOMConfigurationImpl config)
        Normalizes document. Note: reset() must be called before this method.
      • normalizeNode

        protected org.w3c.dom.Node normalizeNode​(org.w3c.dom.Node node)
        This method acts as if the document was going through a save and load cycle, putting the document in a "normal" form. The actual result depends on the features being set and governing what operations actually take place. See setNormalizationFeature for details. Noticeably this method normalizes Text nodes, makes the document "namespace wellformed", according to the algorithm described below in pseudo code, by adding missing namespace declaration attributes and adding or changing namespace prefixes, updates the replacement tree of EntityReference nodes,normalizes attribute values, etc.
        Parameters:
        node - Modified node or null. If node is returned, we need to normalize again starting on the node returned.
        Returns:
        the normalized Node
      • expandEntityRef

        protected final void expandEntityRef​(org.w3c.dom.Node parent,
                                             org.w3c.dom.Node reference)
      • addNamespaceDecl

        protected final void addNamespaceDecl​(java.lang.String prefix,
                                              java.lang.String uri,
                                              ElementImpl element)
        Adds a namespace attribute or replaces the value of existing namespace attribute with the given prefix and value for URI. In case prefix is empty will add/update default namespace declaration.
        Parameters:
        prefix -
        uri -
        Throws:
        java.io.IOException
      • isCDataWF

        public static final void isCDataWF​(org.w3c.dom.DOMErrorHandler errorHandler,
                                           DOMErrorImpl error,
                                           DOMLocatorImpl locator,
                                           java.lang.String datavalue,
                                           boolean isXML11Version)
        Check if CDATA section is well-formed
        Parameters:
        datavalue -
        isXML11Version - = true if XML 1.1
      • isXMLCharWF

        public static final void isXMLCharWF​(org.w3c.dom.DOMErrorHandler errorHandler,
                                             DOMErrorImpl error,
                                             DOMLocatorImpl locator,
                                             java.lang.String datavalue,
                                             boolean isXML11Version)
        NON-DOM: check for valid XML characters as per the XML version
        Parameters:
        datavalue -
        isXML11Version - = true if XML 1.1
      • isCommentWF

        public static final void isCommentWF​(org.w3c.dom.DOMErrorHandler errorHandler,
                                             DOMErrorImpl error,
                                             DOMLocatorImpl locator,
                                             java.lang.String datavalue,
                                             boolean isXML11Version)
        NON-DOM: check if value of the comment is well-formed
        Parameters:
        datavalue -
        isXML11Version - = true if XML 1.1
      • isAttrValueWF

        public static final void isAttrValueWF​(org.w3c.dom.DOMErrorHandler errorHandler,
                                               DOMErrorImpl error,
                                               DOMLocatorImpl locator,
                                               org.w3c.dom.NamedNodeMap attributes,
                                               org.w3c.dom.Attr a,
                                               java.lang.String value,
                                               boolean xml11Version)
        NON-DOM: check if attribute value is well-formed
        Parameters:
        attributes -
        a -
        value -
      • reportDOMError

        public static final void reportDOMError​(org.w3c.dom.DOMErrorHandler errorHandler,
                                                DOMErrorImpl error,
                                                DOMLocatorImpl locator,
                                                java.lang.String message,
                                                short severity,
                                                java.lang.String type)
        Reports a DOM error to the user handler. If the error is fatal, the processing will be always aborted.
      • updateQName

        protected final void updateQName​(org.w3c.dom.Node node,
                                         org.apache.xerces.xni.QName qname)
      • startDocument

        public void startDocument​(org.apache.xerces.xni.XMLLocator locator,
                                  java.lang.String encoding,
                                  org.apache.xerces.xni.NamespaceContext namespaceContext,
                                  org.apache.xerces.xni.Augmentations augs)
                           throws org.apache.xerces.xni.XNIException
        The start of the document.
        Specified by:
        startDocument in interface org.apache.xerces.xni.XMLDocumentHandler
        Parameters:
        locator - The document locator, or null if the document location cannot be reported during the parsing of this document. However, it is strongly recommended that a locator be supplied that can at least report the system identifier of the document.
        encoding - The auto-detected IANA encoding name of the entity stream. This value will be null in those situations where the entity encoding is not auto-detected (e.g. internal entities or a document entity that is parsed from a java.io.Reader).
        namespaceContext - The namespace context in effect at the start of this document. This object represents the current context. Implementors of this class are responsible for copying the namespace bindings from the the current context (and its parent contexts) if that information is important.
        augs - Additional information that may include infoset augmentations
        Throws:
        org.apache.xerces.xni.XNIException - Thrown by handler to signal an error.
      • xmlDecl

        public void xmlDecl​(java.lang.String version,
                            java.lang.String encoding,
                            java.lang.String standalone,
                            org.apache.xerces.xni.Augmentations augs)
                     throws org.apache.xerces.xni.XNIException
        Notifies of the presence of an XMLDecl line in the document. If present, this method will be called immediately following the startDocument call.
        Specified by:
        xmlDecl in interface org.apache.xerces.xni.XMLDocumentHandler
        Parameters:
        version - The XML version.
        encoding - The IANA encoding name of the document, or null if not specified.
        standalone - The standalone value, or null if not specified.
        augs - Additional information that may include infoset augmentations
        Throws:
        org.apache.xerces.xni.XNIException - Thrown by handler to signal an error.
      • doctypeDecl

        public void doctypeDecl​(java.lang.String rootElement,
                                java.lang.String publicId,
                                java.lang.String systemId,
                                org.apache.xerces.xni.Augmentations augs)
                         throws org.apache.xerces.xni.XNIException
        Notifies of the presence of the DOCTYPE line in the document.
        Specified by:
        doctypeDecl in interface org.apache.xerces.xni.XMLDocumentHandler
        Parameters:
        rootElement - The name of the root element.
        publicId - The public identifier if an external DTD or null if the external DTD is specified using SYSTEM.
        systemId - The system identifier if an external DTD, null otherwise.
        augs - Additional information that may include infoset augmentations
        Throws:
        org.apache.xerces.xni.XNIException - Thrown by handler to signal an error.
      • comment

        public void comment​(org.apache.xerces.xni.XMLString text,
                            org.apache.xerces.xni.Augmentations augs)
                     throws org.apache.xerces.xni.XNIException
        A comment.
        Specified by:
        comment in interface org.apache.xerces.xni.XMLDocumentHandler
        Parameters:
        text - The text in the comment.
        augs - Additional information that may include infoset augmentations
        Throws:
        org.apache.xerces.xni.XNIException - Thrown by application to signal an error.
      • processingInstruction

        public void processingInstruction​(java.lang.String target,
                                          org.apache.xerces.xni.XMLString data,
                                          org.apache.xerces.xni.Augmentations augs)
                                   throws org.apache.xerces.xni.XNIException
        A processing instruction. Processing instructions consist of a target name and, optionally, text data. The data is only meaningful to the application.

        Typically, a processing instruction's data will contain a series of pseudo-attributes. These pseudo-attributes follow the form of element attributes but are not parsed or presented to the application as anything other than text. The application is responsible for parsing the data.

        Specified by:
        processingInstruction in interface org.apache.xerces.xni.XMLDocumentHandler
        Parameters:
        target - The target.
        data - The data or null if none specified.
        augs - Additional information that may include infoset augmentations
        Throws:
        org.apache.xerces.xni.XNIException - Thrown by handler to signal an error.
      • startElement

        public void startElement​(org.apache.xerces.xni.QName element,
                                 org.apache.xerces.xni.XMLAttributes attributes,
                                 org.apache.xerces.xni.Augmentations augs)
                          throws org.apache.xerces.xni.XNIException
        The start of an element.
        Specified by:
        startElement in interface org.apache.xerces.xni.XMLDocumentHandler
        Parameters:
        element - The name of the element.
        attributes - The element attributes.
        augs - Additional information that may include infoset augmentations
        Throws:
        org.apache.xerces.xni.XNIException - Thrown by handler to signal an error.
      • emptyElement

        public void emptyElement​(org.apache.xerces.xni.QName element,
                                 org.apache.xerces.xni.XMLAttributes attributes,
                                 org.apache.xerces.xni.Augmentations augs)
                          throws org.apache.xerces.xni.XNIException
        An empty element.
        Specified by:
        emptyElement in interface org.apache.xerces.xni.XMLDocumentHandler
        Parameters:
        element - The name of the element.
        attributes - The element attributes.
        augs - Additional information that may include infoset augmentations
        Throws:
        org.apache.xerces.xni.XNIException - Thrown by handler to signal an error.
      • startGeneralEntity

        public void startGeneralEntity​(java.lang.String name,
                                       org.apache.xerces.xni.XMLResourceIdentifier identifier,
                                       java.lang.String encoding,
                                       org.apache.xerces.xni.Augmentations augs)
                                throws org.apache.xerces.xni.XNIException
        This method notifies the start of a general entity.

        Note: This method is not called for entity references appearing as part of attribute values.

        Specified by:
        startGeneralEntity in interface org.apache.xerces.xni.XMLDocumentHandler
        Parameters:
        name - The name of the general entity.
        identifier - The resource identifier.
        encoding - The auto-detected IANA encoding name of the entity stream. This value will be null in those situations where the entity encoding is not auto-detected (e.g. internal entities or a document entity that is parsed from a java.io.Reader).
        augs - Additional information that may include infoset augmentations
        Throws:
        org.apache.xerces.xni.XNIException - Thrown by handler to signal an error.
      • textDecl

        public void textDecl​(java.lang.String version,
                             java.lang.String encoding,
                             org.apache.xerces.xni.Augmentations augs)
                      throws org.apache.xerces.xni.XNIException
        Notifies of the presence of a TextDecl line in an entity. If present, this method will be called immediately following the startEntity call.

        Note: This method will never be called for the document entity; it is only called for external general entities referenced in document content.

        Note: This method is not called for entity references appearing as part of attribute values.

        Specified by:
        textDecl in interface org.apache.xerces.xni.XMLDocumentHandler
        Parameters:
        version - The XML version, or null if not specified.
        encoding - The IANA encoding name of the entity.
        augs - Additional information that may include infoset augmentations
        Throws:
        org.apache.xerces.xni.XNIException - Thrown by handler to signal an error.
      • endGeneralEntity

        public void endGeneralEntity​(java.lang.String name,
                                     org.apache.xerces.xni.Augmentations augs)
                              throws org.apache.xerces.xni.XNIException
        This method notifies the end of a general entity.

        Note: This method is not called for entity references appearing as part of attribute values.

        Specified by:
        endGeneralEntity in interface org.apache.xerces.xni.XMLDocumentHandler
        Parameters:
        name - The name of the entity.
        augs - Additional information that may include infoset augmentations
        Throws:
        org.apache.xerces.xni.XNIException - Thrown by handler to signal an error.
      • characters

        public void characters​(org.apache.xerces.xni.XMLString text,
                               org.apache.xerces.xni.Augmentations augs)
                        throws org.apache.xerces.xni.XNIException
        Character content.
        Specified by:
        characters in interface org.apache.xerces.xni.XMLDocumentHandler
        Parameters:
        text - The content.
        augs - Additional information that may include infoset augmentations
        Throws:
        org.apache.xerces.xni.XNIException - Thrown by handler to signal an error.
      • ignorableWhitespace

        public void ignorableWhitespace​(org.apache.xerces.xni.XMLString text,
                                        org.apache.xerces.xni.Augmentations augs)
                                 throws org.apache.xerces.xni.XNIException
        Ignorable whitespace. For this method to be called, the document source must have some way of determining that the text containing only whitespace characters should be considered ignorable. For example, the validator can determine if a length of whitespace characters in the document are ignorable based on the element content model.
        Specified by:
        ignorableWhitespace in interface org.apache.xerces.xni.XMLDocumentHandler
        Parameters:
        text - The ignorable whitespace.
        augs - Additional information that may include infoset augmentations
        Throws:
        org.apache.xerces.xni.XNIException - Thrown by handler to signal an error.
      • endElement

        public void endElement​(org.apache.xerces.xni.QName element,
                               org.apache.xerces.xni.Augmentations augs)
                        throws org.apache.xerces.xni.XNIException
        The end of an element.
        Specified by:
        endElement in interface org.apache.xerces.xni.XMLDocumentHandler
        Parameters:
        element - The name of the element.
        augs - Additional information that may include infoset augmentations
        Throws:
        org.apache.xerces.xni.XNIException - Thrown by handler to signal an error.
      • startCDATA

        public void startCDATA​(org.apache.xerces.xni.Augmentations augs)
                        throws org.apache.xerces.xni.XNIException
        The start of a CDATA section.
        Specified by:
        startCDATA in interface org.apache.xerces.xni.XMLDocumentHandler
        Parameters:
        augs - Additional information that may include infoset augmentations
        Throws:
        org.apache.xerces.xni.XNIException - Thrown by handler to signal an error.
      • endCDATA

        public void endCDATA​(org.apache.xerces.xni.Augmentations augs)
                      throws org.apache.xerces.xni.XNIException
        The end of a CDATA section.
        Specified by:
        endCDATA in interface org.apache.xerces.xni.XMLDocumentHandler
        Parameters:
        augs - Additional information that may include infoset augmentations
        Throws:
        org.apache.xerces.xni.XNIException - Thrown by handler to signal an error.
      • endDocument

        public void endDocument​(org.apache.xerces.xni.Augmentations augs)
                         throws org.apache.xerces.xni.XNIException
        The end of the document.
        Specified by:
        endDocument in interface org.apache.xerces.xni.XMLDocumentHandler
        Parameters:
        augs - Additional information that may include infoset augmentations
        Throws:
        org.apache.xerces.xni.XNIException - Thrown by handler to signal an error.
      • setDocumentSource

        public void setDocumentSource​(org.apache.xerces.xni.parser.XMLDocumentSource source)
        Sets the document source.
        Specified by:
        setDocumentSource in interface org.apache.xerces.xni.XMLDocumentHandler
      • getDocumentSource

        public org.apache.xerces.xni.parser.XMLDocumentSource getDocumentSource()
        Returns the document source.
        Specified by:
        getDocumentSource in interface org.apache.xerces.xni.XMLDocumentHandler