gocr - command line text recognition tool

Author

       Joerg Schulenburg (see http://www-e.uni-magdeburg.de/jschulen/ocr/ for EMAIL)
       First version of man page by Tim Waugh <twaugh@redhat.com>

Description

       gocr  is an optical character recognition program that can be used from the command line.  It takes input
       in PNM, PGM, PBM, PPM, or PCX format, and writes recognized text to stdout.  If the pnmfile is a  single
       dash,  PNM  data  is  read  from stdin.  If gzip, bzip2 and netpbm are installed and your system supports
       popen(3) also pnm.gz, pnm.bz2, png, jpg, jpeg, tiff, gif,  bmp,  ps  (only  single  pages)  and  eps  are
       supported as input files (not as input stream), where pnm can be replaced by one of ppm, pgm and pbm.

Examples

gocr-v33text1.pbm
              output verbose information, out30.png is created to see details of recognition process

       gocr-v7-c_YVtext1.pbm
              verbose output for unknown chars and chars Y and V

       djpeg-pnm-graytext.jpg|gocr-
              convert a jpeg file to pnm format and input via pipe

Linux                                              20 Sep 2018                                           GOCR(1)

Name

       gocr - command line text recognition tool

Options

-h     show usage information

       -V     show version information

       -ifile
              read input from file (or stdin if file is a single dash)

       -ofile
              send output to file instead of stdout-efile
              send errors to file instead of stderr or to stdout if file is a dash

       -xfile
              progress  output to file (file can be a file name, a fifo name or a file descriptor 1...255), this
              is useful for GUI developpers to show the OCR progress,  the  file  descriptor  argument  is  only
              available, if compiled with __USE_POSIX defined

       -ppath
              database  path, a final slash must be included, default is ./db/, this path will be populated with
              images of learned characters

       -fformat
              output format of the recognized text (ISO8859_1 TeX HTML XML UTF8 ASCII),  XML  will  also  output
              position and probability data

       -llevel
              set  grey  level  to  level  (0<160<=255,  default:  0  for  autodetect),  darker pixels belong to
              characters, brighter pixels are interpreted as background of the input image

       -dsize
              set dust size in pixels (clusters smaller than this are removed), 0 means no clusters are removed,
              the default is -1 for auto detection

       -snum set spacewidth between words in units of dots  (default:  0  for  autodetect),  wider  widths  are
              interpreted as word spaces, smaller as character spaces

       -vverbosity
              be verbose to stderr; verbosity is a bitfield

       -cstring
              only  verbose  output  of  characters  from  string  to  stderr,  more output is generated for all
              characters within the string, the underscore stands for unknown chars, this function is usefull to
              limit debug information to the necessary one

       -Cstring
              only recognise characters from string, this is a filter function in cases where  the  interest  is
              only  to  a  part  of  the character alphabet, you can use 0-9 or a-z to specify ranges, use -- to
              detect the minus sign

       -acertainty
              set value for certainty of recognition (0..100; default: 95), characters with a  higher  certainty
              are  accepted,  characters  with  a  lower  certainty are treated as unknown (not recognized); set
              higher values, if you want to have only more certain recognized characters

       -ustring
              output this string for every unrecognized character (default is "_")

       -mmode
              set oprational mode; mode is a bitfield (default: 0)

       -nbool
              if bool is non-zero, only recognise numbers (this is now obsolete, use -C "0123456789")

       The verbosity is specified as a bitfield:

       1         print more info

       2         list shapes of boxes (see -c) to stderr

       4         list pattern of boxes (see -c) to stderr

       8         print pattern after recognition for debugging

       16        print debug information about recognition of lines to stderr

       32        create outXX.png with boxes and lines marked on each general OCR-step

       The operation modes are:

       2         use database to recognize characters which are  not  recognized  by  other  algorithms,  (early
                 development)

       4         switching on layout analysis or zoning (development)

       8         don't compare unrecognized characters to recognized one

       16        don't try to divide overlapping characters to two or three single characters

       32        don't do context correction

       64        character  packing,  before recognition starts, similar characters are searched and only one of
                 this characters will be send to the recognition engine (development)

       130       extend database, prompts user for unidentified characters and extends the database  with  users
                 answer (128+2, early development)

       256       switch off the recognition engine (makes sense together with -m 2)

Reporting Bugs

       Report bugs to Joerg Schulenburg

Synopsis

gocr [OPTION] [-i] pnm-file

Version Information

       This man page documents gocr, version 0.52.

gocr - command line text recognition tool

Contents

Author

Description

Examples

Name

Options

Reporting Bugs

See Also

Synopsis

Version Information

See Also