HTMLDOC is a program that reads HTML and Markdown source files or web pages and
generates corresponding EPUB, HTML, PostScript, or PDF files with an optional
table of contents. HTMLDOC is provided under version 2 of the GNU General Public
HTMLDOC was developed in the 1990’s as a documentation generator for my previous
company, and has since seen a lot of usage as a report generator embedded in web
servers. However, it does not support many things in “the modern web”, such as:
- Cascading Style Sheets (CSS): While I have experimented with adding CSS
support to HTMLDOC, proper CSS support is non-trivial especially for paged
output (which is not well supported by CSS)
- Encryption: HTMLDOC currently supports the older (and very insecure) PDF 1.4
(128-bit RC4) encryption. I am investigating supporting AES (256-bit)
- Forms: HTML forms and PDF forms are very different things. While I have had
many requests to add support for PDF forms over the years, I have not found a
satisfactory way to do so.
- Tables: HTMLDOC supports HTML 3.2 tables with basic support for TBODY and
THEAD from HTML 4.0.
- Unicode: While HTMLDOC does support UTF-8 for “Western” languages, there is
absolutely no support for languages that require dynamic rewriting or
right-to-left text formatting. Basically this means you can’t use HTMLDOC to
format Arabic, Chinese, Hebrew, Japanese, or other languages that are not
based on latin-based alphabets that read left-to-right.
- Emoji: The fonts bundled with HTMLDOC do not include Unicode Emoji characters.
My focus is on addressing security, build, and formatting issues and not on
extending HTMLDOC to support CSS or full Unicode.