Parsing with tinycss¶
Quickstart¶
Import tinycss, make a parser object with the features you want, and parse a stylesheet:
>>> import tinycss
>>> parser = tinycss.make_parser('page3')
>>> stylesheet = parser.parse_stylesheet_bytes(b'''@import "foo.css";
... p.error { color: red } @lorem-ipsum;
... @page tables { size: landscape }''')
>>> stylesheet.rules
[<ImportRule 1:1 foo.css>, <RuleSet at 2:5 p.error>, <PageRule 3:5 ('tables', None)>]
>>> stylesheet.errors
[ParseError('Parse error at 2:29, unknown at-rule in stylesheet context: @lorem-ipsum',)]
You’ll get a Stylesheet
object which contains
all the parsed content as well as a list of encountered errors.
Parsers¶
Parsers are subclasses of tinycss.css21.CSS21Parser
. Various
subclasses add support for more syntax. You can choose which features to
enable by making a new parser class with multiple inheritance, but there
is also a convenience function to do that:
- tinycss.make_parser(*features, **kwargs)[source]¶
Make a parser object with the chosen features.
- Parameters:
features – Positional arguments are base classes the new parser class will extend. The string
'page3'
is accepted as short forCSSPage3Parser
. The string'fonts3'
is accepted as short forCSSFonts3Parser
.kwargs – Keyword arguments are passed to the parser’s constructor.
- Returns:
An instance of a new subclass of
CSS21Parser
Parsing a stylesheet¶
Parser classes have three different methods to parse CSS stylesheet, depending on whether you have a file, a byte string, or an Unicode string.
- class tinycss.css21.CSS21Parser[source]¶
Parser for CSS 2.1
This parser supports the core CSS syntax as well as @import, @media, @page and !important.
Note that property values are still not parsed, as UAs using this parser may only support some properties or some values.
Currently the parser holds no state. It being a class only allows subclassing and overriding its methods.
- parse_stylesheet(css_unicode, encoding=None)[source]¶
Parse a stylesheet from an Unicode string.
- Parameters:
css_unicode – A CSS stylesheet as an unicode string.
encoding – The character encoding used to decode the stylesheet from bytes, if any.
- Returns:
A
Stylesheet
.
- parse_stylesheet_bytes(css_bytes, protocol_encoding=None, linking_encoding=None, document_encoding=None)[source]¶
Parse a stylesheet from a byte string.
The character encoding is determined from the passed metadata and the
@charset
rule in the stylesheet (if any). If no encoding information is available or decoding fails, decoding defaults to UTF-8 and then fall back on ISO-8859-1.- Parameters:
css_bytes – A CSS stylesheet as a byte string.
protocol_encoding – The “charset” parameter of a “Content-Type” HTTP header (if any), or similar metadata for other protocols.
linking_encoding –
<link charset="">
or other metadata from the linking mechanism (if any)document_encoding – Encoding of the referring style sheet or document (if any)
- Returns:
A
Stylesheet
.
- parse_stylesheet_file(css_file, protocol_encoding=None, linking_encoding=None, document_encoding=None)[source]¶
Parse a stylesheet from a file or filename.
Character encoding-related parameters and behavior are the same as in
parse_stylesheet_bytes()
.- Parameters:
css_file – Either a file (any object with a
read()
method) or a filename.- Returns:
A
Stylesheet
.
Parsing a style
attribute¶
- CSS21Parser.parse_style_attr(css_source)[source]¶
Parse a “style” attribute (eg. of an HTML element).
This method only accepts Unicode as the source (HTML) document is supposed to handle the character encoding.
- Parameters:
css_source – The attribute value, as an unicode string.
- Returns:
A tuple of the list of valid
Declaration
and a list ofParseError
.
Parsed objects¶
These data structures make up the results of the various parsing methods.
- class tinycss.parsing.ParseError[source]¶
Details about a CSS syntax error. Usually indicates that something (a rule or a declaration) was ignored and will not appear as a parsed object.
This exception is typically logged in a list rather than being propagated to the user API.
- line¶
Source line where the error occured.
- column¶
Column in the source line where the error occured.
- reason¶
What happend (a string).
- class tinycss.css21.Stylesheet[source]¶
A parsed CSS stylesheet.
- rules¶
A mixed list, in source order, of
RuleSet
and various at-rules such asImportRule
,MediaRule
andPageRule
. Use theirat_keyword
attribute to distinguish them.
- errors¶
A list of
ParseError
. Invalid rules and declarations are ignored, with the details logged in this list.
- encoding¶
The character encoding that was used to decode the stylesheet from bytes, or
None
for Unicode stylesheets.
Note
All subsequent objects have line
and column
attributes (not
repeated every time fore brevity) that indicate where in the CSS source
this object was read.
- class tinycss.css21.RuleSet[source]¶
A ruleset.
- at_keyword¶
Always
None
. Helps to tell rulesets apart from at-rules.
- selector¶
The selector as a
TokenList
. In CSS 3, this is actually called a selector group.rule.selector.as_css()
gives the selector as a string. This string can be used with cssselect, see Selectors 3.
- declarations¶
The list of
Declaration
, in source order.
- class tinycss.css21.ImportRule[source]¶
A parsed @import rule.
- at_keyword¶
Always
'@import'
- uri¶
The URI to be imported, as read from the stylesheet. (URIs are not made absolute.)
- media¶
For CSS 2.1 without media queries: the media types as a list of strings. This attribute is explicitly
['all']
if the media was omitted in the source.
- class tinycss.css21.MediaRule[source]¶
A parsed @media rule.
- at_keyword¶
Always
'@media'
- media¶
For CSS 2.1 without media queries: the media types as a list of strings.
- class tinycss.css21.PageRule[source]¶
A parsed CSS 2.1 @page rule.
- at_keyword¶
Always
'@page'
- selector¶
The page selector. In CSS 2.1 this is either
None
(no selector), or the string'first'
,'left'
or'right'
for the pseudo class of the same name.
- specificity¶
Specificity of the page selector. This is a tuple of four integers, but these tuples are mostly meant to be compared to each other.
- declarations¶
A list of
Declaration
, in source order.
- at_rules¶
The list of parsed at-rules inside the @page block, in source order. Always empty for CSS 2.1.
- class tinycss.css21.Declaration[source]¶
A property declaration.
- name¶
The property name as a normalized (lower-case) string.
- value¶
The property value as a
TokenList
.The value is not parsed. UAs using tinycss may only support some properties or some values and tinycss does not know which. They need to parse values themselves and ignore declarations with unknown or unsupported properties or values, and fall back on any previous declaration.
tinycss.color3
parses color values, but other values will need specific parsing/validation code.
- priority¶
Either the string
'important'
orNone
.
Tokens¶
Some parts of a stylesheet (such as selectors in CSS 2.1 or property values) are not parsed by tinycss. They appear as tokens instead.
- class tinycss.token_data.TokenList[source]¶
A mixed list of
Token
andContainerToken
objects.This is a subclass of the builtin
list
type. It can be iterated, indexed and sliced as usual, but also has some additional API:- property line¶
The line number in the CSS source of the first token.
- property column¶
The column number (inside a source line) of the first token.
- class tinycss.token_data.Token[source]¶
A single atomic token.
- is_container¶
Always
False
. Helps to tellToken
apart fromContainerToken
.
- type¶
The type of token as a string:
S
A sequence of white space
IDENT
An identifier: a name that does not start with a digit. A name is a sequence of letters, digits,
_
,-
, escaped characters and non-ASCII characters. Eg:margin-left
HASH
#
followed immediately by a name. Eg:#ff8800
ATKEYWORD
@
followed immediately by an identifier. Eg:@page
URI
Eg:
url(foo)
The content may or may not be quoted.UNICODE-RANGE
U+
followed by one or two hexadecimal Unicode codepoints. Eg:U+20-00FF
INTEGER
An integer with an optional
+
or-
signNUMBER
A non-integer number with an optional
+
or-
signDIMENSION
An integer or number followed immediately by an identifier (the unit). Eg:
12px
PERCENTAGE
An integer or number followed immediately by
%
STRING
A string, quoted with
"
or'
:
or;
That character.
DELIM
A single character not matched in another token. Eg:
,
See the source of the
token_data
module for the precise regular expressions that match various tokens.Note that other token types exist in the early tokenization steps, but these are ignored, are syntax errors, or are later transformed into
ContainerToken
orFunctionToken
.
- value¶
The parsed value:
INTEGER, NUMBER, PERCENTAGE or DIMENSION tokens: the numeric value as an int or float.
STRING tokens: the unescaped string without quotes
URI tokens: the unescaped URI without quotes or
url(
and)
markers.IDENT, ATKEYWORD or HASH tokens: the unescaped token, with
@
or#
markers left as-isOther tokens: same as
as_css
Unescaped refers to the various escaping methods based on the backslash
\
character in CSS syntax.
- unit¶
DIMENSION tokens: the normalized (unescaped, lower-case) unit name as a string. eg.
'px'
PERCENTAGE tokens: the string
'%'
Other tokens:
None
- line¶
The line number in the CSS source of the start of this token.
- column¶
The column number (inside a source line) of the start of this token.
- class tinycss.token_data.ContainerToken[source]¶
A token that contains other (nested) tokens.
- is_container¶
Always
True
. Helps to tellContainerToken
apart fromToken
.
- type¶
The type of token as a string. One of
{
,(
,[
orFUNCTION
. ForFUNCTION
, the object is actually aFunctionToken
.
- unit¶
Always
None
. Included to makeContainerToken
behave more likeToken
.
- content¶
A list of
Token
or nestedContainerToken
, not including the opening or closing token.
- line¶
The line number in the CSS source of the start of this token.
- column¶
The column number (inside a source line) of the start of this token.
- class tinycss.token_data.FunctionToken[source]¶
A specialized
ContainerToken
for aFUNCTION
group. Has an additional attribute:- function_name¶
The unescaped name of the function, with the
(
marker removed.