module Kramdown::Parser::Html::Parser
Contains the parsing methods. This module can be mixed into any parser to get HTML parsing functionality. The only thing that must be provided by the class are instance variable @stack for storing the needed state and @src (instance of StringScanner) for the actual parsing.
Public Instance Methods
Process the HTML start tag that has already be scanned/checked via @src.
Does the common processing steps and then yields to the caller for further processing (first parameter is the created element; the second parameter is true
if the HTML element is already closed, ie. contains no body; the third parameter specifies whether the body - and the end tag - need to be handled in case closed=false).
# File lib/kramdown/parser/html.rb 87 def handle_html_start_tag(line = nil) # :yields: el, closed, handle_body 88 name = @src[1] 89 name.downcase! if HTML_ELEMENT[name.downcase] 90 closed = !@src[4].nil? 91 attrs = parse_html_attributes(@src[2], line, HTML_ELEMENT[name]) 92 93 el = Element.new(:html_element, name, attrs, category: :block) 94 el.options[:location] = line if line 95 @tree.children << el 96 97 if !closed && HTML_ELEMENTS_WITHOUT_BODY.include?(el.value) 98 closed = true 99 end 100 if name == 'script' || name == 'style' 101 handle_raw_html_tag(name) 102 yield(el, false, false) 103 else 104 yield(el, closed, true) 105 end 106 end
Handle the raw HTML tag at the current position.
# File lib/kramdown/parser/html.rb 127 def handle_raw_html_tag(name) 128 curpos = @src.pos 129 if @src.scan_until(/(?=<\/#{name}\s*>)/mi) 130 add_text(extract_string(curpos...@src.pos, @src), @tree.children.last, :raw) 131 @src.scan(HTML_TAG_CLOSE_RE) 132 else 133 add_text(@src.rest, @tree.children.last, :raw) 134 @src.terminate 135 warning("Found no end tag for '#{name}' - auto-closing it") 136 end 137 end
Parses the given string for HTML attributes and returns the resulting hash.
If the optional line
parameter is supplied, it is used in warning messages.
If the optional in_html_tag
parameter is set to false
, attributes are not modified to contain only lowercase letters.
# File lib/kramdown/parser/html.rb 114 def parse_html_attributes(str, line = nil, in_html_tag = true) 115 attrs = {} 116 str.scan(HTML_ATTRIBUTE_RE).each do |attr, val, _sep, quoted_val| 117 attr.downcase! if in_html_tag 118 if attrs.key?(attr) 119 warning("Duplicate HTML attribute '#{attr}' on line #{line || '?'} - overwriting previous one") 120 end 121 attrs[attr] = val || quoted_val || "" 122 end 123 attrs 124 end
Parse raw HTML from the current source position, storing the found elements in el
. Parsing continues until one of the following criteria are fulfilled:
-
The end of the document is reached.
-
The matching end tag for the element
el
is found (only used ifel
is an HTML element).
When an HTML start tag is found, processing is deferred to handle_html_start_tag
, providing the block given to this method.
# File lib/kramdown/parser/html.rb 150 def parse_raw_html(el, &block) 151 @stack.push(@tree) 152 @tree = el 153 154 done = false 155 while !@src.eos? && !done 156 if (result = @src.scan_until(HTML_RAW_START)) 157 add_text(result, @tree, :text) 158 line = @src.current_line_number 159 if (result = @src.scan(HTML_COMMENT_RE)) 160 @tree.children << Element.new(:xml_comment, result, nil, category: :block, location: line) 161 elsif (result = @src.scan(HTML_INSTRUCTION_RE)) 162 @tree.children << Element.new(:xml_pi, result, nil, category: :block, location: line) 163 elsif @src.scan(HTML_CDATA_RE) 164 @tree.children << Element.new(:text, @src[1], nil, cdata: true, location: line) 165 elsif @src.scan(HTML_TAG_RE) 166 if method(:handle_html_start_tag).arity.abs >= 1 167 handle_html_start_tag(line, &block) 168 else 169 handle_html_start_tag(&block) # DEPRECATED: method needs to accept line number in 2.0 170 end 171 elsif @src.scan(HTML_TAG_CLOSE_RE) 172 if @tree.value == (HTML_ELEMENT[@tree.value] ? @src[1].downcase : @src[1]) 173 done = true 174 else 175 add_text(@src.matched, @tree, :text) 176 warning("Found invalidly used HTML closing tag for '#{@src[1]}' on " \ 177 "line #{line} - ignoring it") 178 end 179 else 180 add_text(@src.getch, @tree, :text) 181 end 182 else 183 add_text(@src.rest, @tree, :text) 184 @src.terminate 185 if @tree.type == :html_element 186 warning("Found no end tag for '#{@tree.value}' on line " \ 187 "#{@tree.options[:location]} - auto-closing it") 188 end 189 done = true 190 end 191 end 192 193 @tree = @stack.pop 194 end