deb_control_files:
- control
- md5sums
deb_fields:
Architecture: all
Depends: libnekohtml-java, libxerces2-java
Description: |-
Boilerplate removal and fulltext extraction from HTML pages
The boilerpipe library provides algorithms to detect and remove the surplus
"clutter" (boilerplate, templates) around the main textual content of a web
page.
.
The library already provides specific strategies for common tasks (for example:
news article extraction) and may also be easily extended for individual problem
settings.
.
Extracting content is very fast (milliseconds), just needs the input document
(no global or site-level information required) and is usually quite accurate.
Homepage: https://github.com/kohlschutter/boilerpipe
Installed-Size: '132'
Maintainer: Debian Java Maintainers <pkg-java-maintainers@lists.alioth.debian.org>
Package: libboilerpipe-java
Priority: optional
Section: java
Source: boilerpipe
Version: 1.2.0-2
srcpkg_name: boilerpipe
srcpkg_version: 1.2.0-2