deb_control_files:
- control
- md5sums
- postinst
- prerm
deb_fields:
Architecture: all
Depends: python3-lxml, python3:any
Description: |-
extract text from HTML.
How is html_text different from .xpath('//text()') from LXML or .get_text()
from Beautiful Soup ?
.
* Text extracted with html_text does not contain inline styles,
javascript, comments and other text that is not normally visible to
users;
* html_text normalizes whitespace, but in a way smarter than
.xpath('normalize-space()), adding spaces around inline elements (which
are often used as block elements in html markup), and trying to avoid
adding extra spaces for punctuation;
* html-text can add newlines (e.g. after headers or paragraphs), so that
the output text looks more like how it is rendered in browsers.
Homepage: https://github.com/TeamHG-Memex/html-text
Installed-Size: '38'
Maintainer: Christian Marillat <marillat@debian.org>
Package: python3-html-text
Priority: optional
Section: python
Source: html-text
Version: 0.5.2-2
srcpkg_name: html-text
srcpkg_version: 0.5.2-2