logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

eoconv - Convert text files between various Esperanto encodings

Author

       Tristan Miller <psychonaut@nothingisreal.com>

Bugs And Limitations

       Because  the  postfix-h and postfix-H notations are inherently ambiguous, conversion from postfix-h or -H
       text is unlikely to result in coherent text.  Use at your own risk, and carefully proofread the results.

       Report bugs to <psychonaut@nothingisreal.com>.

Description

eoconv will read the given input files (or stdin if no files are specified) containing Esperanto text in
       the encoding specified by --from, and then output it in the encoding specified by --to.

Esperanto Orthography

       Esperanto is written in an alphabet of 28 letters.  However, only 22 of these letters can be found in the
       standard  ASCII character set.  The remaining six -- `c', `g', `h', `j', and `s' with circumflex, and `u'
       with breve -- are not available in ASCII; neither are they among the characters available in  the  common
       8-bit  ISO-8859-1  character  encoding.   Therefore,  while  the six special Esperanto characters pose no
       problem for handwritten texts, they were  impossible  to  represent  on  standard  typewriters,  and  are
       somewhat  problematic  even  on  modern-day  computers.   Various encoding systems have been developed to
       represent Esperanto text in printed and typed text.

   POSTFIX-hNOTATION
       This was the solution proposed by the creator of Esperanto, L. L. Zamenhof.  He recommended using `u' for
       `u-breve' and appending an `h' to a letter to indicate that it should have a  circumflex.   However,  the
       letters `u' and `h' are already part of the Esperanto alphabet, so using them for another purpose invites
       ambiguity and mispronunciation.  It also makes conversion of Esperanto text to postfix-h notation `lossy'
       or  one-way;  it  is generally not possible to convert from postfix-h notation via automated means.  This
       notation suffers from the additional drawback that the text cannot be  sorted  with  standard  rules  for
       ASCII text.

   POSTFIX-HNOTATION
       This  is  the  same  as  postfix-h  notation,  except that `H' is used instead of `h' following a capital
       letter.

   POSTFIX-xNOTATION
       This is the most common ASCII notation encountered today.  It involves appending an `x' to  a  letter  to
       indicate  that  it  should  have an accent (be it circumflex or breve).  Since `x' is not a letter in the
       Esperanto alphabet, no ambiguity results.  However, ASCII sorting algorithms still  fail  with  postfix-x
       text.

   POSTFIX-XNOTATION
       This  is  the  same  as  postfix-x  notation,  except that `X' is used instead of `x' following a capital
       letter.

   PREFIX-ANDPOSTFIX-CARETNOTATION
       Two slightly less popular ASCII encodings are to prepend or append a caret (`^') to a letter to  indicate
       that it should have an accent.

   ISO-8859-3(LATIN-3)
       ISO 8859-3, also known as Latin-3 or South European, is an 8-bit character encoding for Esperanto.  High-
       bit  characters  are  used  to  encode  the  accented Esperanto letters.  ISO-8859-3 can also be used for
       encoding English, Finnish, German, Italian, Latin, Maltese, Turkish, and Portuguese, making it useful for
       texts which mix Esperanto with one or more of these languages.

   UNICODE(ISO/IEC10646)
       Unicode is a standard for matching every character of every human  language  to  a  specific  code.   The
       mapping  methods  are known as Unicode Transformation Formats (UTF). Among them are UTF-32, UTF-16, UTF-8
       and UTF-7, where the numbers indicate the number of bits in one unit.

   LaTeXSEQUENCES
       The popular LaTeX typesetting package is capable of representing virtually any accented character.   Note
       that conversion from LaTeX sequences assumes that characters to be accented are enclosed in braces -- for
       example, `\^{C}' will be recognized as `C' with circumflex, but `\^C' will not be.

   HTMLENTITIES
       Unicode  codes  for  Esperanto  characters  can be escaped in HTML documents by using HTML entities.  The
       codes can be represented in either decimal (base-10) or  hexadecimal  (base-16)  notation;  the  two  are
       functionally equivalent.

Name

       eoconv - Convert text files between various Esperanto encodings

Options

--from=encoding  Specify character encoding for input

       --to=encoding    Specify character encoding for output

       -q--quiet       Suppress non-essential warning messages

       -?--help        Print a brief help message and exit.

       --man            Print the manual page and exit.

       --version        Print version information and exit.

   CHARACTERENCODINGSpost-h           ASCII postfix h notation

       post-H           ASCII postfix H notation

       post-x           ASCII postfix x notation

       post-X           ASCII postfix X notation

       post-caret       ASCII postfix caret (^) notation

       pre-caret        ASCII prefix caret (^) notation

       latex, LaTeX     ASCII LaTeX sequences

       html-hex, HTML-hex
                        ASCII HTML hexadecimal entities

       html-dec, HTML-dec
                        ASCII HTML decimal entities

       iso-8859-3, ISO-8859-3, latin3, latin-3, Latin3, Latin-3
                        ISO-8859-3

       utf-7, UTF-7, utf7, UTF7
                        Unicode UTF-7

       utf-8, UTF-8, utf8, UTF8
                        Unicode UTF-8

       utf-16, UTF-16, utf16, UTF16
                        Unicode UTF-16

       utf-32, UTF-32, utf32, UTF32
                        Unicode UTF-32

See Also

charsets(7), ascii(7), iso_8859-3(7), unicode(7), utf-8(7), latex(1)

Usage

       eoconv [-q] --from=encoding --to=encoding [file ...]

        Options:
          --from       specify input encoding (see below)
          --to         specify output encoding (see below)
          -q, --quiet  suppress warnings

          --help       detailed help message
          --man        full documentation
          --version    display version information

        Valid encodings:
          post-h post-H post-x post-X post-caret pre-caret latex
          html-hex html-dec iso-8859-3 utf-7 utf-8 utf-16 utf-32

See Also