logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

mftraining - feature training for Tesseract

Author

       The Tesseract OCR engine was written by Ray Smith and his research groups at Hewlett Packard (1985-1995)
       and Google (2006-2018).

                                                   01/19/2025                                      MFTRAINING(1)

Copying

       Copyright (C) Hewlett-Packard Company, 1988 Licensed under the Apache License, Version 2.0

Description

       mftraining takes a list of .tr files, from which it generates the files inttemp (the shape prototypes),
       shapetable, and pffmtable (the number of expected features for each character). (A fourth file called
       Microfeat is also written by this program, but it is not used.)

Name

       mftraining - feature training for Tesseract

Options

       -U FILE
           (Input) The unicharset generated by unicharset_extractor(1)

       -F font_properties_file
           (Input) font properties file, each line is of the following form, where each field other than the
           font name is 0 or 1:

               *font_name* *italic* *bold* *fixed_pitch* *serif* *fraktur*

       -X xheights_file
           (Input) x heights file, each line is of the following form, where xheight is calculated as the pixel
           x height of a character drawn at 32pt on 300 dpi. [ That is, if base x height + ascenders +
           descenders = 133, how much is x height? ]

               *font_name* *xheight*

       -D dir
           Directory to write output files to.

       -O FILE
           (Output) The output unicharset that will be given to combine_tessdata(1)

See Also

tesseract(1), cntraining(1), unicharset_extractor(1), combine_tessdata(1), shapeclustering(1),
       unicharset(5)

       https://tesseract-ocr.github.io/tessdoc/Training-Tesseract.html

Synopsis

       mftraining -U unicharset -O lang.unicharsetFILE...

See Also