Parsing with tinycss

Quickstart

Import tinycss, make a parser object with the features you want, and parse a stylesheet:

>>> import tinycss
>>> parser = tinycss.make_parser('page3')
>>> stylesheet = parser.parse_stylesheet_bytes(b'''@import "foo.css";
...     p.error { color: red }  @lorem-ipsum;
...     @page tables { size: landscape }''')
>>> stylesheet.rules
[<ImportRule 1:1 foo.css>, <RuleSet at 2:5 p.error>, <PageRule 3:5 ('tables', None)>]
>>> stylesheet.errors
[ParseError('Parse error at 2:29, unknown at-rule in stylesheet context: @lorem-ipsum',)]

You’ll get a Stylesheet object which contains all the parsed content as well as a list of encountered errors.

Parsers

Parsers are subclasses of tinycss.css21.CSS21Parser. Various subclasses add support for more syntax. You can choose which features to enable by making a new parser class with multiple inheritance, but there is also a convenience function to do that:

tinycss.make_parser(*features, **kwargs)[source]

Make a parser object with the chosen features.

Parameters:
  • features – Positional arguments are base classes the new parser class will extend. The string 'page3' is accepted as short for CSSPage3Parser. The string 'fonts3' is accepted as short for CSSFonts3Parser.

  • kwargs – Keyword arguments are passed to the parser’s constructor.

Returns:

An instance of a new subclass of CSS21Parser

Parsing a stylesheet

Parser classes have three different methods to parse CSS stylesheet, depending on whether you have a file, a byte string, or an Unicode string.

class tinycss.css21.CSS21Parser[source]

Parser for CSS 2.1

This parser supports the core CSS syntax as well as @import, @media, @page and !important.

Note that property values are still not parsed, as UAs using this parser may only support some properties or some values.

Currently the parser holds no state. It being a class only allows subclassing and overriding its methods.

parse_stylesheet(css_unicode, encoding=None)[source]

Parse a stylesheet from an Unicode string.

Parameters:
  • css_unicode – A CSS stylesheet as an unicode string.

  • encoding – The character encoding used to decode the stylesheet from bytes, if any.

Returns:

A Stylesheet.

parse_stylesheet_bytes(css_bytes, protocol_encoding=None, linking_encoding=None, document_encoding=None)[source]

Parse a stylesheet from a byte string.

The character encoding is determined from the passed metadata and the @charset rule in the stylesheet (if any). If no encoding information is available or decoding fails, decoding defaults to UTF-8 and then fall back on ISO-8859-1.

Parameters:
  • css_bytes – A CSS stylesheet as a byte string.

  • protocol_encoding – The “charset” parameter of a “Content-Type” HTTP header (if any), or similar metadata for other protocols.

  • linking_encoding<link charset=""> or other metadata from the linking mechanism (if any)

  • document_encoding – Encoding of the referring style sheet or document (if any)

Returns:

A Stylesheet.

parse_stylesheet_file(css_file, protocol_encoding=None, linking_encoding=None, document_encoding=None)[source]

Parse a stylesheet from a file or filename.

Character encoding-related parameters and behavior are the same as in parse_stylesheet_bytes().

Parameters:

css_file – Either a file (any object with a read() method) or a filename.

Returns:

A Stylesheet.

Parsing a style attribute

CSS21Parser.parse_style_attr(css_source)[source]

Parse a “style” attribute (eg. of an HTML element).

This method only accepts Unicode as the source (HTML) document is supposed to handle the character encoding.

Parameters:

css_source – The attribute value, as an unicode string.

Returns:

A tuple of the list of valid Declaration and a list of ParseError.

Parsed objects

These data structures make up the results of the various parsing methods.

class tinycss.parsing.ParseError[source]

Details about a CSS syntax error. Usually indicates that something (a rule or a declaration) was ignored and will not appear as a parsed object.

This exception is typically logged in a list rather than being propagated to the user API.

line

Source line where the error occured.

column

Column in the source line where the error occured.

reason

What happend (a string).

class tinycss.css21.Stylesheet[source]

A parsed CSS stylesheet.

rules

A mixed list, in source order, of RuleSet and various at-rules such as ImportRule, MediaRule and PageRule. Use their at_keyword attribute to distinguish them.

errors

A list of ParseError. Invalid rules and declarations are ignored, with the details logged in this list.

encoding

The character encoding that was used to decode the stylesheet from bytes, or None for Unicode stylesheets.

Note

All subsequent objects have line and column attributes (not repeated every time fore brevity) that indicate where in the CSS source this object was read.

class tinycss.css21.RuleSet[source]

A ruleset.

at_keyword

Always None. Helps to tell rulesets apart from at-rules.

selector

The selector as a TokenList. In CSS 3, this is actually called a selector group.

rule.selector.as_css() gives the selector as a string. This string can be used with cssselect, see Selectors 3.

declarations

The list of Declaration, in source order.

class tinycss.css21.ImportRule[source]

A parsed @import rule.

at_keyword

Always '@import'

uri

The URI to be imported, as read from the stylesheet. (URIs are not made absolute.)

media

For CSS 2.1 without media queries: the media types as a list of strings. This attribute is explicitly ['all'] if the media was omitted in the source.

class tinycss.css21.MediaRule[source]

A parsed @media rule.

at_keyword

Always '@media'

media

For CSS 2.1 without media queries: the media types as a list of strings.

rules

The list RuleSet and various at-rules inside the @media block, in source order.

class tinycss.css21.PageRule[source]

A parsed CSS 2.1 @page rule.

at_keyword

Always '@page'

selector

The page selector. In CSS 2.1 this is either None (no selector), or the string 'first', 'left' or 'right' for the pseudo class of the same name.

specificity

Specificity of the page selector. This is a tuple of four integers, but these tuples are mostly meant to be compared to each other.

declarations

A list of Declaration, in source order.

at_rules

The list of parsed at-rules inside the @page block, in source order. Always empty for CSS 2.1.

class tinycss.css21.Declaration[source]

A property declaration.

name

The property name as a normalized (lower-case) string.

value

The property value as a TokenList.

The value is not parsed. UAs using tinycss may only support some properties or some values and tinycss does not know which. They need to parse values themselves and ignore declarations with unknown or unsupported properties or values, and fall back on any previous declaration.

tinycss.color3 parses color values, but other values will need specific parsing/validation code.

priority

Either the string 'important' or None.

Tokens

Some parts of a stylesheet (such as selectors in CSS 2.1 or property values) are not parsed by tinycss. They appear as tokens instead.

class tinycss.token_data.TokenList[source]

A mixed list of Token and ContainerToken objects.

This is a subclass of the builtin list type. It can be iterated, indexed and sliced as usual, but also has some additional API:

property line

The line number in the CSS source of the first token.

property column

The column number (inside a source line) of the first token.

as_css()[source]

Return as an Unicode string the CSS representation of the tokens, as parsed in the source.

class tinycss.token_data.Token[source]

A single atomic token.

is_container

Always False. Helps to tell Token apart from ContainerToken.

type

The type of token as a string:

S

A sequence of white space

IDENT

An identifier: a name that does not start with a digit. A name is a sequence of letters, digits, _, -, escaped characters and non-ASCII characters. Eg: margin-left

HASH

# followed immediately by a name. Eg: #ff8800

ATKEYWORD

@ followed immediately by an identifier. Eg: @page

URI

Eg: url(foo) The content may or may not be quoted.

UNICODE-RANGE

U+ followed by one or two hexadecimal Unicode codepoints. Eg: U+20-00FF

INTEGER

An integer with an optional + or - sign

NUMBER

A non-integer number with an optional + or - sign

DIMENSION

An integer or number followed immediately by an identifier (the unit). Eg: 12px

PERCENTAGE

An integer or number followed immediately by %

STRING

A string, quoted with " or '

: or ;

That character.

DELIM

A single character not matched in another token. Eg: ,

See the source of the token_data module for the precise regular expressions that match various tokens.

Note that other token types exist in the early tokenization steps, but these are ignored, are syntax errors, or are later transformed into ContainerToken or FunctionToken.

value

The parsed value:

  • INTEGER, NUMBER, PERCENTAGE or DIMENSION tokens: the numeric value as an int or float.

  • STRING tokens: the unescaped string without quotes

  • URI tokens: the unescaped URI without quotes or url( and ) markers.

  • IDENT, ATKEYWORD or HASH tokens: the unescaped token, with @ or # markers left as-is

  • Other tokens: same as as_css

Unescaped refers to the various escaping methods based on the backslash \ character in CSS syntax.

unit
  • DIMENSION tokens: the normalized (unescaped, lower-case) unit name as a string. eg. 'px'

  • PERCENTAGE tokens: the string '%'

  • Other tokens: None

line

The line number in the CSS source of the start of this token.

column

The column number (inside a source line) of the start of this token.

as_css()[source]

Return as an Unicode string the CSS representation of the token, as parsed in the source.

class tinycss.token_data.ContainerToken[source]

A token that contains other (nested) tokens.

is_container

Always True. Helps to tell ContainerToken apart from Token.

type

The type of token as a string. One of {, (, [ or FUNCTION. For FUNCTION, the object is actually a FunctionToken.

unit

Always None. Included to make ContainerToken behave more like Token.

content

A list of Token or nested ContainerToken, not including the opening or closing token.

line

The line number in the CSS source of the start of this token.

column

The column number (inside a source line) of the start of this token.

as_css()[source]

Return as an Unicode string the CSS representation of the token, as parsed in the source.

class tinycss.token_data.FunctionToken[source]

A specialized ContainerToken for a FUNCTION group. Has an additional attribute:

function_name

The unescaped name of the function, with the ( marker removed.