Commit Parsing¶
One of the core components of Python Semantic Release (PSR) is the commit parser. The commit parser is responsible for parsing a Project’s Git Repository commit history to extract insights about project changes and make decisions based on this insight.
The primary decision that PSR makes based on the commit history is whether or not to release a new version of the project, and if so, what version number to release. This decision is made based on the commit message descriptions of the change impact introduced by the commit. The change impact describes the impact to the end consumers of the project. Depending on the type of change, the version number will be incremented according to the Semantic Versioning specification (semver). It is the commit parser’s job to extract the change impact from the commit message to determine the severity of the changes and then subsequently determine the semver level that the version should be bumped to for the next release.
The commit parser is also responsible for interpreting other aspects of the commit message which can be used to generate a helpful and detailed changelog. This includes extracting the type of change, the scope of the change, any breaking change descriptions, any linked pull/merge request numbers, and any linked issue numbers.
PSR provides several built-in commit parsers to handle a variety of different commit message styles. If the built-in parsers do not meet your needs, you can write your own custom parser to handle your specific commit message style.
Warning
PSR’s built-in commit parsers are designed to be flexible enough to provide a convenient way to generate the most effective changelogs we can, which means some features are added beyond the scope of the original commit message style guidelines.
Other tools may not follow the same conventions as PSR’s guideline extensions, so if you plan to use any similar programs in tadem with PSR, you should be aware of the differences in feature support and fall back to the official format guidelines if necessary.
Built-in Commit Parsers¶
The following parsers are built in to Python Semantic Release:
TagCommitParser (deprecated in v9.12.0)
Angular Commit Parser¶
A parser that is designed to parse commits formatted according to the Angular Commit Style Guidelines. The parser is implemented with the following logic in relation to how PSR’s core features:
Version Bump Determination: This parser extracts the commit type from the subject line of the commit (the first line of a commit messsage). This type is matched against the configuration mapping to determine the level bump for the specific commit. If the commit type is not found in the configuration mapping, the commit is considered a non-conformative commit and will return it as a ParseError object and ultimately a commit of type
"unknown"
. The configuration mapping contains lists of commit types that correspond to the level bump for each commit type. Some commit types are still valid do not trigger a level bump, such as"chore"
or"docs"
. You can also configure the default level bump commit_parser_options.default_level_bump if desired. To trigger a major release, the commit message body must contain a paragraph that begins withBREAKING CHANGE:
. This will override the level bump determined by the commit type.Changelog Generation: PSR will group commits in the changelog by the commit type used in the commit message. The commit type shorthand is converted to a more human-friendly section heading and then used as the version section title of the changelog and release notes. Under the section title, the parsed commit descriptions are listed out in full. If the commit includes an optional scope, then the scope is prefixed on to the first line of the commit description. If a commit has any breaking change prefixed paragraphs in the commit message body, those paragraphs are separated out into a “Breaking Changes” section in the changelog (Breaking Changes section is available from the default changelog in v9.15.0). Each breaking change paragraph is listed in a bulleted list format across the entire version. A single commit is allowed to have more than one breaking change prefixed paragraph (as opposed to the Angular Commit Style Guidelines). Commits with an optional scope and a breaking change will have the scope prefixed on to the breaking change paragraph. Parsing errors will return a ParseError object and ultimately a commit of type
"unknown"
. Unknown commits are consolidated into an “Unknown” section in the changelog by the default template. To remove unwanted commits from the changelog that normally are placed in the “unknown” section, consider the use of the configuration option changelog.exclude_commit_patterns to ignore those commit styles.Pull/Merge Request Identifier Detection: This parser implements PSR’s Common Linked Merge Request Detection to identify and extract pull/merge request numbers. The parser will return a string value if a pull/merge request number is found in the commit message. If no pull/merge request number is found, the parser will return an empty string. Feature available in v9.13.0+.
Linked Issue Identifier Detection: This parser implements PSR’s Common Issue Identifier Detection to identify and extract issue numbers. The parser will return a tuple of issue numbers as strings if any are found in the commit message. If no issue numbers are found, the parser will return an empty tuple. Feature available in v9.15.0+.
Limitations:
Squash commits are not currently supported. This means that the level bump for a squash commit is only determined by the subject line of the squash commit. Our default changelog template currently writes out the entire commit message body in the changelog in order to provide the full detail of the changes. Track the implementation of this feature with the issues #733, #1085, and PR#1112.
Commits with the
revert
type are not currently supported. Track the implementation of this feature in the issue #402.
If no commit parser options are provided via the configuration, the parser will use PSR’s
built-in defaults
.
Emoji Commit Parser¶
A parser that is designed to parse commits formatted to the Gitmoji Specification with a few additional features that the specification does not cover but provide similar functionality expected from a Semantic Release tool. As the Gitmoji Specification describes, the emojis can be specified in either the unicode format or the shortcode text format. See the Gitmoji Specification for the pros and cons for which format to use, but regardless, the configuration options must match the format used in the commit messages. The parser is implemented with the following logic in relation to how PSR’s core features:
Version Bump Determination: This parser only looks for emojis in the subject line of the commit (the first line of a commit messsage). If more than one emoji is found, the emoji configured with the highest priority is selected for the change impact for the specific commit. The emoji with the highest priority is the one configured in the
major
configuration option, followed by theminor
, andpatch
in descending priority order. If no emoji is found in the subject line, the commit is classified as other and will default to the level bump defined by the configuration option commit_parser_options.default_level_bump.Changelog Generation: PSR will group commits in the changelog by the emoji type used in the commit message. The emoji is used as the version section title and the commit descriptions are listed under that section. No emojis are removed from the commit message so each will be listed in the changelog and release notes. When more than one emoji is found in the subject line of a commit, the emoji with the highest priority is the one that will influence the grouping of the commit in the changelog. Commits containing no emojis or non-configured emojis are consolidated into an “Other” section. To remove unwanted commits from the changelog that would normally be added into the “other” section, consider the use of the configuration option changelog.exclude_commit_patterns to ignore those commit styles.
Pull/Merge Request Identifier Detection: This parser implements PSR’s Common Linked Merge Request Detection to identify and extract pull/merge request numbers. The parser will return a string value if a pull/merge request number is found in the commit message. If no pull/merge request number is found, the parser will return an empty string. Feature available in v9.13.0+.
Linked Issue Identifier Detection: [Disabled by default] This parser implements PSR’s Common Issue Identifier Detection to identify and extract issue numbers. The parser will return a tuple of issue numbers as strings if any are found in the commit message. If no issue numbers are found, the parser will return an empty tuple. This feature is disabled by default since it is not a part of the Gitmoji Specification but can be enabled by setting the configuration option
commit_parser_options.parse_linked_issues
totrue
. Feature available in v9.15.0+.
If no commit parser options are provided via the configuration, the parser will use PSR’s
built-in defaults
.
Scipy Commit Parser¶
A parser that is designed to parse commits formatted according to the Scipy Commit Style Guidlines. This is essentially a variation of the Angular Commit Style Guidelines with all different commit types. Because of this small variance, this parser only extends our Angular Commit Parser parser with pre-defined scipy commit types in the default Scipy Parser Options and all other features are inherited.
If no commit parser options are provided via the configuration, the parser will use PSR’s
built-in defaults
.
Tag Commit Parser¶
Warning
This parser was deprecated in v9.12.0
. It will be removed in a future release.
The original parser from v1.0.0 of Python Semantic Release. Similar to the emoji parser above, but with less features.
If no commit parser options are provided via the configuration, the parser will use PSR’s
built-in defaults
.
Common Linked Merge Request Detection¶
Introduced in v9.13.0
All of the PSR built-in parsers implement common pull/merge request identifier detection logic to extract pull/merge request numbers from the commit message regardless of the VCS platform. The parsers evaluate the subject line for a paranthesis-enclosed number at the end of the line. PSR’s parsers will return a string value if a pull/merge request number is found in the commit message. If no pull/merge request number is found, the parsers will return an empty string.
Examples:
All of the following will extract a MR number of “x123”, where ‘x’ is the character prefix
BitBucket:
Merged in feat/my-awesome-feature (pull request #123)
GitHub:
feat: add new feature (#123)
GitLab:
fix: resolve an issue (!123)
Common Issue Identifier Detection¶
Introduced in v9.15.0
All of the PSR built-in parsers implement common issue identifier detection logic,
which is similar to many VCS platforms such as GitHub, GitLab, and BitBucket. The
parsers will look for common issue closure text prefixes in the Git Trailer format
in the commit message to identify and extract issue numbers. The detection logic is
not strict to any specific issue tracker as we try to provide a flexible approach
to identifying issue numbers but in order to be flexible, it is required to the
use the Git Trailer format with a colon (:
) as the token separator.
PSR attempts to support all variants of issue closure text prefixes, but not all will work for your VCS. PSR supports the following case-insensitive prefixes and their conjugations (plural, present, & past tense):
close (closes, closing, closed)
fix (fixes, fixing, fixed)
resolve (resolves, resolving, resolved)
implement (implements, implementing, implemented)
PSR also allows for a more flexible approach to identifying more than one issue number without the need of extra git trailors (although PSR does support multiple git trailors). PSR support various list formats which can be used to identify more than one issue in a list. This format will not necessarily work on your VCS. PSR currently support the following list formats:
comma-separated (ex.
Closes: #123, #456, #789
)space-separated (ex.
resolve: #123 #456 #789
)semicolon-separated (ex.
Fixes: #123; #456; #789
)slash-separated (ex.
close: #123/#456/#789
)ampersand-separated (ex.
Implement: #123 & #789
)and-separated (ex.
Resolve: #123 and #456 and #789
)mixed (ex.
Closed: #123, #456, and #789
orFixes: #123, #456 & #789
)
All the examples above use the most common issue number prefix (#
) but PSR is flexible
to support other prefixes used by VCS platforms or issue trackers such as JIRA (ex. ABC-###
).
The parsers will return a tuple of issue numbers as strings if any are found in the commit
message. Strings are returned to ensure that the any issue number prefix characters are
preserved (ex. #123
or ABC-123
). If no issue numbers are found, the parsers will
return an empty tuple.
References:
Customization¶
Each of the built-in parsers can be customized by providing overrides in the commit_parser_options setting of the configuration file. This can be used to toggle parsing features on and off or to add, modify, or remove the commit types that are used to determine the level bump for a commit. Review the API documentation for the specific parser’s options class to see what changes to the default behavior can be made.
Custom Parsers¶
Custom parsers can be written to handle commit message styles that are not covered by the built-in parsers or by option customization of the built-in parsers.
Python Semantic Release provides several building blocks to help you write your parser. To maintain compatibility with how Python Semantic Release will invoke your parser, you should use the appropriate object as described below, or create your own object as a subclass of the original which maintains the same interface. Type parameters are defined where appropriate to assist with static type-checking.
The commit_parser option, if set to a string which
does not match one of Python Semantic Release’s built-in commit parsers, will be
used to attempt to dynamically import a custom commit parser class. As such you will
need to ensure that your custom commit parser is import-able from the environment in
which you are running Python Semantic Release. The string should be structured in the
standard module:attr
format; for example, to import the class MyCommitParser
from the file custom_parser.py
at the root of your repository, you should specify
"commit_parser=custom_parser:MyCommitParser"
in your configuration, and run the
semantic-release
command line interface from the root of your repository. Equally
you can ensure that the module containing your parser class is installed in the same
virtual environment as semantic-release. If you can run
python -c "from $MODULE import $CLASS"
successfully, specifying
commit_parser="$MODULE:$CLASS"
is sufficient. You may need to set the
PYTHONPATH
environment variable to the directory containing the module with
your commit parser.
Tokens¶
The tokens built into Python Semantic Release’s commit parsing mechanism are inspired
by both the error-handling mechanism in Rust’s error handling and its
implementation in black. It is documented that catching exceptions in Python is
slower than the equivalent guard implemented using if/else
checking when
exceptions are actually caught, so although try/except
blocks are cheap if no
exception is raised, commit parsers should always return an object such as
ParseError
instead of raising an error immediately. This is to avoid catching a potentially large
number of parsing errors being caught as the commit history of a repository is being
parsed. Python Semantic Release does not raise an exception if a commit cannot be parsed.
Python Semantic Release uses ParsedCommit
as the return type of a successful parse operation, and
ParseError
as the return type from an unsuccessful parse of a commit. You should review the API
documentation linked to understand the fields available on each of these objects.
It is important to note, the ParseError
implements an additional method, raise_error
. This method raises a
CommitParseError
with the message
contained in the error
field, as a convenience.
In Python Semantic Release, the type semantic_release.commit_parser.token.ParseResult
is defined as ParseResultType[ParsedCommit, ParseError]
, as a convenient shorthand.
ParseResultType
is a
generic type, which is the Union
of its two type parameters. One of the types in this
union should be the type returned on a successful parse of the commit
, while the other
should be the type returned on an unsuccessful parse of the commit
.
A custom parser result type, therefore, could be implemented as follows:
MyParsedCommit
subclassesParsedCommit
MyParseError
subclassesParseError
MyParseResult = ParseResultType[MyParsedCommit, MyParseError]
Internally, Python Semantic Release uses isinstance()
to determine if the result
of parsing a commit was a success or not, so you should check that your custom result
and error types return True
from isinstance(<object>, ParsedCommit)
and
isinstance(<object>, ParseError)
respectively.
While it’s not advisable to remove any of the fields that are available in the built-in
token types, currently only the bump
field of the successful result type is used to
determine how the version should be incremented as part of this release. However, it’s
perfectly possible to add additional fields to your tokens which can be populated by
your parser; these fields will then be available on each commit in your
changelog template, so you can make additional information
available.
Parser Options¶
To provide options to the commit parser which is configured in the configuration file, Python Semantic Release includes a
ParserOptions
class. Each parser built into Python Semantic Release has a corresponding “options” class, which
subclasses ParserOptions
.
The configuration in commit_parser_options is passed to the “options” class which is specified by the configured commit_parser - more information on how this is specified is below.
The “options” class is used to validate the options which are configured in the repository, and to provide default values for these options where appropriate.
If you are writing your own parser, you should accompany it with an “options” class
which accepts the appropriate keyword arguments. This class’ __init__
method should
store the values that are needed for parsing appropriately.
Commit Parsers¶
The commit parsers that are built into Python Semantic Release implement an instance
method called parse
, which takes a single parameter commit
of type
git.objects.commit.Commit, and returns the type
ParseResultType
.
To be compatible with Python Semantic Release, a commit parser must subclass
CommitParser
.
A subclass must implement the following:
A class-level attribute
parser_options
, which must be set toParserOptions
or a subclass of this.An
__init__
method which takes a single parameter,options
, that should be of the same type as the class’parser_options
attribute.A method,
parse
, which takes a single parametercommit
that is of type git.objects.commit.Commit, and returnsParseResult
, or a subclass of this.
By default, the constructor for CommitParser
will set the options
parameter on the options
attribute of the parser, so there
is no need to override this in order to access self.options
during the parse
method. However, if you have any parsing logic that needs to be done only once, it may
be a good idea to perform this logic during parser instantiation rather than inside the
parse
method. The parse method will be called once per commit in the repository’s
history during parsing, so the effect of slow parsing logic within the parse
method
will be magnified significantly for projects with sizeable Git histories.
Commit Parsers have two type parameters, “TokenType” and “OptionsType”. The first
is the type which is returned by the parse
method, and the second is the type
of the “options” class for this parser.
Therefore, a custom commit parser could be implemented via:
class MyParserOptions(semantic_release.ParserOptions):
def __init__(self, message_prefix: str) -> None:
self.prefix = message_prefix * 2
class MyCommitParser(
semantic_release.CommitParser[semantic_release.ParseResult, MyParserOptions]
):
def parse(self, commit: git.objects.commit.Commit) -> semantic_release.ParseResult:
...