logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

gendict - Compiles word list into ICU string trie dictionary

Authors

       Maxime Serrano

Caveats

       The input-file is assumed to be encoded in UTF-8.  The integers in the input-file that are used as values
       must be made up of ASCII digits. They may be specified either in  hex,  by  using  a  0x  prefix,  or  in
       decimal.  Either --bytes or --uchars must be specified.

Description

gendict reads the word list from dictionary-file and creates a string trie dictionary file. Normally this
       data file has the .dict extension.

       Words begin at the beginning of a line and are terminated by the first whitespace.  Lines that begin with
       whitespace are ignored.

Environment

ICU_DATA  Specifies the directory containing ICU data. Defaults to ${prefix}/share/icu/76.1/.  Some tools
                 in  ICU depend on the presence of the trailing slash. It is thus important to make sure that it
                 is present if ICU_DATA is set.

Name

gendict - Compiles word list into ICU string trie dictionary

Options

-h, -?, --help
              Print help about usage and exit.

       -V, --version
              Print the version of gendict and exit.

       -c, --copyright
              Embeds the standard ICU copyright into the output-file.

       -v, --verbose
              Display extra informative messages during execution.

       -i, --icudatadirdirectory
              Look for any necessary ICU data files in directory.  For example,  the  file  pnames.icu  must  be
              located  when  ICU's  data  is  not  built as a shared library.  The default ICU data directory is
              specified by the environment variable ICU_DATA.  Most configurations of ICU do  not  require  this
              argument.

       --uchars
              Set the output trie type to UChar. Mutually exclusive with --bytes.--bytes
              Set the output trie type to Bytes. Mutually exclusive with --uchars.--transform
              Set  the  transform  type.  Should only be specified with --bytes.  Currently supported transforms
              are: offset-<hex-number>, which specifies an offset to subtract from  all  input  characters.   It
              should be noted that the offset transform also maps U+200D to 0xFF and U+200C to 0xFE, in order to
              offer compatibility to languages that require these characters.  A transform must be specified for
              a  bytes  trie, and when applied to the non-value characters in the input-file must produce output
              between 0x00 and 0xFF.

        input-file
              The source file to read.

        output-file
              The file to write the output dictionary to.

See Also

Synopsis

gendict  [  --uchars  |  --bytes--transformtransform  ]  [  -h,  -?, --help ] [ -V, --version ] [ -c,
       --copyright ] [ -v, --verbose ] [ -i, --icudatadirdirectory ]  input-fileoutput-file

Version

       1.0

See Also