Artifact python3-html-text_0.5.2-2_all

Metadata

deb_control_files:
- control
- md5sums
- postinst
- prerm
deb_fields:
  Architecture: all
  Depends: python3-lxml, python3:any
  Description: |-
    extract text from HTML.
     How is html_text different from .xpath('//text()') from LXML or .get_text()
     from Beautiful Soup ?
     .
      * Text extracted with html_text does not contain inline styles,
        javascript, comments and other text that is not normally visible to
        users;
      * html_text normalizes whitespace, but in a way smarter than
        .xpath('normalize-space()), adding spaces around inline elements (which
        are often used as block elements in html markup), and trying to avoid
        adding extra spaces for punctuation;
      * html-text can add newlines (e.g. after headers or paragraphs), so that
        the output text looks more like how it is rendered in browsers.
  Homepage: https://github.com/TeamHG-Memex/html-text
  Installed-Size: '38'
  Maintainer: Christian Marillat <marillat@debian.org>
  Package: python3-html-text
  Priority: optional
  Section: python
  Source: html-text
  Version: 0.5.2-2
srcpkg_name: html-text
srcpkg_version: 0.5.2-2

File

python3-html-text_0.5.2-2_all.deb

Binary file python3-html-text_0.5.2-2_all.deb cannot be displayed. you can view it raw or download it instead.

Relations

Relation	Direction	Type	Name
built-using		Source package	html-text_0.5.2-2