hocr2pdf - hOCR to PDF converter of the ExactImage toolkit
Contents
Copyright
This manual page was written for the Debian system (and may be used by others).
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU General
Public License, Version 2 or (at your option) any later version published by the Free Software
Foundation.
On Debian systems, the complete text of the GNU General Public License can be found in
/usr/share/common-licenses/GPL-2.
hocr2pdf 02/18/2015 HOCR2PDF(1)
Description
ExactImage is a fast C++ image processing library. Unlike many other library frameworks it allows
operation in several color spaces and bit depths natively, resulting in low memory and computational
requirements.
hocr2pdf creates well layouted, searchable PDF files from hOCR (annotated HTML) input obtained from an
OCR system.
Example
$ hocr2pdf -i scan.tiff -o test.pdf < cuneiform-out.hocr
Name
hocr2pdf - hOCR to PDF converter of the ExactImage toolkit
Options
-ifile, --inputfile
Read image from the specified file. Note that input hOCR is read from the standard input.
-ofile, --outputfile
Save output PDF to the specified file.
-n, --no-image
Don't place the image over the text. By default the text layer is hidden behind the image.
-s, --sloppy-text
Sloppily place text, group words, do not draw single glyphs.
-rn, --resolutionn
Override resolution of the input image to n dpi. The default resolution (if not specified in the
input file) is 300 dpi.
--quality
Quality setting used for writing compressed images. Integer range 0-100, the default is 75
--compress
Compression method for writing images e.g. ascii85, hex, flate, jpeg, jpeg2000, ... Default based on
bit-depth
-h, --help
Display help text and exit.
See Also
exactimage(7)
Synopsis
hocr2pdf [option...] {-i | --input} input-file {-o | --output} output-filehocr2pdf {-h | --help}
