text2image - generate OCR training pages.

Author

       The Tesseract OCR engine was written by Ray Smith and his research groups at Hewlett Packard (1985-1995)
       and Google (2006-2018).

                                                   01/19/2025                                      TEXT2IMAGE(1)

Copying

       Copyright (C) 2012 Google, Inc. Licensed under the Apache License, Version 2.0

Description

text2image(1) generates OCR training pages. Given a text file it outputs an image with a given font and
       degradation.

History

text2image(1) was first made available for tesseract 3.03.

Name

       text2image - generate OCR training pages.

Options

--textFILE
           File name of text input to use for creating synthetic training data. (type:string default:)

       --outputbaseFILE
           Basename for output image/box file (type:string default:)

       --fontconfig_tmpdirPATH
           Overrides fontconfig default temporary dir (type:string default:/tmp)

       --fonts_dirPATH
           If empty it use system default. Otherwise it overrides system default font location (type:string
           default:)

       --fontFONTNAME
           Font description name to use (type:string default:Arial)

       --writing_modeMODE
           Specify one of the following writing modes.  horizontal : Render regular horizontal text. (default)
           vertical : Render vertical text. Glyph orientation is selected by Pango.  vertical-upright : Render
           vertical text. Glyph orientation is set to be upright. (type:string default:horizontal)

       --tlog_levelINT
           Minimum logging level for tlog() output (type:int default:0)

       --max_pagesINT
           Maximum number of pages to output (0=unlimited) (type:int default:0)

       --degrade_imageBOOL
           Degrade rendered image with speckle noise, dilation/erosion and rotation (type:bool default:true)

       --rotate_imageBOOL
           Rotate the image in a random way. (type:bool default:true)

       --strip_unrenderable_wordsBOOL
           Remove unrenderable words from source text (type:bool default:true)

       --ligaturesBOOL
           Rebuild and render ligatures (type:bool default:false)

       --exposureINT
           Exposure level in photocopier (type:int default:0)

       --resolutionINT
           Pixels per inch (type:int default:300)

       --xsizeINT
           Width of output image (type:int default:3600)

       --ysizeINT
           Height of output image (type:int default:4800)

       --marginINT
           Margin round edges of image (type:int default:100)

       --ptsizeINT
           Size of printed text (type:int default:12)

       --leadingINT
           Inter-line space (in pixels) (type:int default:12)

       --box_paddingINT
           Padding around produced bounding boxes (type:int default:0)

       --char_spacingDOUBLE
           Inter-character space in ems (type:double default:0)

       --underline_start_probDOUBLE
           Fraction of words to underline (value in [0,1]) (type:double default:0)

       --underline_continuation_probDOUBLE
           Fraction of words to underline (value in [0,1]) (type:double default:0)

       --render_ngramsBOOL
           Put each space-separated entity from the input file into one bounding box. The ngrams in the input
           file will be randomly permuted before rendering (so that there is sufficient variety of characters on
           each line). (type:bool default:false)

       --output_word_boxesBOOL
           Output word bounding boxes instead of character boxes. This is used for Cube training, and implied by
           --render_ngrams. (type:bool default:false)

       --unicharset_fileFILE
           File with characters in the unicharset. If --render_ngrams is true and --unicharset_file is
           specified, ngrams with characters that are not in unicharset will be omitted (type:string default:)

       --bidirectional_rotationBOOL
           Rotate the generated characters both ways. (type:bool default:false)

       --only_extract_font_propertiesBOOL
           Assumes that the input file contains a list of ngrams. Renders each ngram, extracts spacing
           properties and records them in output_base/[font_name].fontinfo file. (type:bool default:false)

Resources

       Main web site: https://github.com/tesseract-ocr Information on training tesseract LSTM:
       https://tesseract-ocr.github.io/tessdoc/TrainingTesseract-4.00.html

Single Options

--list_available_fontsBOOL
           List available fonts and quit. (type:bool default:false)

Synopsis

text2image --text FILE --outputbase PATH --fonts_dir PATH [OPTION]

Use These Flags To Find Fonts That Can Render A Given Text

--find_fontsBOOL
           Search for all fonts that can render the text (type:bool default:false)

       --render_per_fontBOOL
           If find_fonts==true, render each font to its own image. Image filenames are of the form
           output_name.font_name.tif (type:bool default:true)

       --min_coverageDOUBLE
           If find_fonts==true, the minimum coverage the font has of the characters in the text file to include
           it, between 0 and 1. (type:double default:1)

       Example Usage: ``` text2image --find_fonts \ --fonts_dir /usr/share/fonts \ --text
       ../langdata/hin/hin.training_text \ --min_coverage .9 \ --render_per_font \ --outputbase
       ../langdata/hin/hin \ |& grep raw | sed -e s/:.*/"\\/g | sed -e s/^/"/ >../langdata/hin/fontslist.txt
       ```

Use These Flags To Output Zero-Padded, Square Individual Character Images

--output_individual_glyph_imagesBOOL
           If true also outputs individual character images (type:bool default:false)

       --glyph_resized_sizeINT
           Each glyph is square with this side length in pixels (type:int default:0)

       --glyph_num_border_pixels_to_padINT
           Final_size=glyph_resized_size+2*glyph_num_border_pixels_to_pad (type:int default:0)