cuneiform - multi-language OCR system
Contents
Description
Cuneiform is an OCR system. In addition to text recognition it also does layout analysis and text format
recognition. Cuneiform supports several languages.
Homepage
More information about cuneiform can be found at <http://launchpad.net/cuneiform-linux/>.
Input Format
Cuneiform can process any single-page image that GraphicsMagick knows how to open. Please consult the
gm(1) manual page for the comprehensive list of supported image formats.
Name
cuneiform - multi-language OCR system
Options
--dotmatrix
Use recognition mode optimized for text printed with a dot matrix printer.
--fax
Use recognition mode optimized for text that has been faxed.
--singlecolumn
Disable page layout analysis and assumes that the image consists of only one column of text.
-fformat
Select output format. The following formats are available: html (HTML format), hocr (hOCR HTML
format), native (native Cuneiform 2000), rtf (RTF format), smarttext (plain text with TeX
paragraphs), text (plain text). The default is plain text.
-llanguage
By default Cuneiform recognizes English text. To change the language use the command line switch -l
followed by a language code (typically an ISO 639-2 three-letter code). The following languages are
supported:
bul Bulgarian
cze Czech
dan Danish
dut Dutch
eng English
est Estonian
fra French
ger German
hrv Croatian
hun Hungarian
ita Italian
lav Latvian
lit Lithuanian
pol Polish
por Portuguese
rum Romanian
rus Russian
ruseng mixed Russian/English
slv Slovenian
spa Spanish
srp Serbian
swe Swedish
tur Turkish
ukr Ukrainian
-ooutput
If you do not define an output file with the -o switch, Cuneiform writes the result to a file
‘cuneiform-out.format’. The file extension depends on your output format.
Synopsis
cuneiform [--dotmatrix] [--fax] [--singlecolumn] [-f format] [-l language] [-o output] input
