Example using the default setup
Parsing HTML using the default setup is as easy as creating a PDF in five steps:
Document
objectPdfWriter
instance.Document
XMLWorkerHelper.getInstance().parseXHtml()
Document
Let's take a look at a code snippet that converts the walden.html file to PDF.
In this snippet we use the XMLWorkerHelper
class and its parseXHtml()
method to do all the work:
Document document = new Document(); PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream("results/walden1.pdf")); document.open(); XMLWorkerHelper.getInstance().parseXHtml(writer, document, HTMLParsingDefault.class.getResourceAsStream("/html/walden.html"), null); document.close();
see HTMLParsingDefault and the resulting PDF walden1.pdf
The HTML was taken from project Gutenberg. It's a book by H.D. Thoreau: Walden, or Life in the Woods.
When we look at the first page that is generated by iText, we see that something went wrong: the first lines on the HTML result in a line of gibberish. What went wrong and how can we fix it?