logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

Lingua::Translit - transliterates text between writing systems

Adding New Transliterations

       In case you want to add your own transliteration tables to Lingua::Translit, have a look at the developer
       documentation at <https://www.netzum-sorglos.de/software/lingua-translit/developer-documentation.html>.

       A template of a transliteration table is provided as well (xml/template.xml)  so  you  can  easily  start
       developing.

Authors

Bugs

       None known.

       Please        report         bugs         using         CPAN's         request         tracker         at
       <https://rt.cpan.org/Public/Dist/Display.html?Name=Lingua-Translit>.

Credits

       Thanks  to  Dr.  Daniel  Eiwen,  Romanisches  Seminar,  Universitaet  Koeln  for  his  help  on  Romanian
       transliteration.

       Thanks to Dmitry Smal and Rusar Publishing for contributing the "ALA-LC RUS" transliteration table.

       Thanks to Ahmed Elsheshtawy for his help implementing the "Common ARA" Arabic transliteration.

       Thanks to Dusan Vuckovic for contributing the "ISO/R 9" transliteration table.

       Thanks to Ștefan Suciu for contributing the "ISO 8859-16 RON" transliteration table.

       Thanks to Philip Kime for contributing  the  "IAST  Devanagari"  and  "Devanagari  IAST"  transliteration
       tables.

       Thanks  to  Nikola  Lečić  for  contributing  the  "BGN/PCGN  RUS  Standard"  and  "BGN/PCGN  RUS Strict"
       transliteration tables.

Description

       Lingua::Translit can be used to convert text from one writing system to another, based on national or
       international transliteration tables.  Where possible a reverse transliteration is supported.

       The term "transliteration" describes the conversion of text from one writing system or alphabet to
       another one.  The conversion is ideally unique, mapping one character to exactly one character, so the
       original spelling can be reconstructed.  Practically this is not always the case and one single letter of
       the original alphabet can be transcribed as two, three or even more letters.

       Furthermore there is more than one transliteration scheme for one writing system.  Therefore it is an
       important and necessary information, which scheme will be or has been used to transliterate a text, to
       work integrative and be able to reconstruct the original data.

       Reconstruction is a problem though for non-unique transliterations, if no language specific knowledge is
       available as the resulting clusters of letters may be ambiguous.  For example, the Greek character "PSI"
       maps to "ps", but "ps" could also result from the sequence "PI", "SIGMA" since "PI" maps to "p" and
       "SIGMA" maps to s.  If a transliteration table leads to ambiguous conversions, the provided table cannot
       be used reverse.

       Otherwise the table can be used in both directions, if appreciated.  So if ISO 9 is originally created to
       convert Cyrillic letters to the Latin alphabet, the reverse transliteration will transform Latin letters
       to Cyrillic.

Methods

new("nameoftable")
       Initializes an object with the specific transliteration table, e.g. "ISO 9".

   translit("characterorientedstring")
       Transliterates the given text according to the object's transliteration table.  Returns the
       transliterated text.

   translit_reverse("characterorientedstring")
       Transliterates the given text according to the object's transliteration table, but uses it the other way
       round. For example table ISO 9 is a transliteration scheme for the conversion of Cyrillic letters to the
       Latin alphabet. So if used reverse, Latin letters will be mapped to Cyrillic ones.

       Returns the transliterated text.

   can_reverse()
       Returns true (1), iff reverse transliteration is possible.  False (0) otherwise.

   name()
       Returns the name of the chosen transliteration table, e.g. "ISO 9".

   desc()
       Returns a description for the transliteration, e.g. "ISO 9:1995, Cyrillic to Latin".

Name

       Lingua::Translit - transliterates text between writing systems

Restrictions

       Lingua::Translit  is  suited to handle Unicode and utilizes comparisons and regular expressions that rely
       on codepoints.  Therefore, any input is supposed to be characteroriented ("use utf8;", ...) instead  of
       byte oriented.

       However,  if  your  data  is  byte  oriented,  be  sure  to  pass  it  UTF-8encoded to translit() and/or
       translit_reverse() - it will be converted internally.

See Also

       Lingua::Translit::Tables, Encode, perlunicode

       "translit"'s manpage

       <http://www.netzum-sorglos.de/software/lingua-translit/>

Supported Transliterations

       Cyrillic
           ALA-LCRUS, not reversible, ALA-LC:1997, Cyrillic to Latin, Russian

           ISO9, reversible, ISO 9:1995, Cyrillic to Latin

           ISO/R9, reversible, ISO 9:1954, Cyrillic to Latin

           DIN1460RUS, reversible, DIN 1460:1982, Cyrillic to Latin, Russian

           DIN1460UKR, reversible, DIN 1460:1982, Cyrillic to Latin, Ukrainian

           DIN1460BUL, reversible, DIN 1460:1982, Cyrillic to Latin, Bulgarian

           StreamlinedSystemBUL, not reversible, The Streamlined System: 2006, Cyrillic to Latin, Bulgarian

           GOST7.79RUS, reversible, GOST 7.79:2000 (table B), Cyrillic to Latin, Russian

           GOST7.79RUSOLD, not reversible, GOST 7.79:2000 (table B), Cyrillic to Latin with support for Old
           Russian (pre 1918), Russian

           GOST7.79UKR, reversible, GOST 7.79:2000 (table B), Cyrillic to Latin, Ukrainian

           BGN/PCGNRUSStandard, not reversible, BGN/PCGN:1947 (Standard Variant), Cyrillic to Latin, Russian

           BGN/PCGNRUSStrict, not reversible, BGN/PCGN:1947 (Strict Variant), Cyrillic to Latin, Russian

       Greek
           ISO843, not reversible, ISO 843:1997, Greek to Latin

           DIN31634, not reversible, DIN 31634:1982, Greek to Latin

           Greeklish, not reversible, Greeklish (Phonetic), Greek to Latin

       Latin
           CommonCES, not reversible, Czech without diacritics

           CommonDEU, not reversible, German without umlauts

           CommonPOL, not reversible, Unaccented Polish

           CommonRON, not reversible, Romanian without diacritics as commonly used

           CommonSLK, not reversible, Slovak without diacritics

           CommonSLV, not reversible, Slovenian without diacritics

           ISO8859-16RON, reversible, Romanian with appropriate diacritics

       Arabic
           CommonARA, not reversible, Common Romanization of Arabic

       Sanskrit
           IASTDevanagari, not reversible, IAST Romanization to Devanāgarī

           DevanagariIAST, not reversible, Devanāgarī to IAST Romanization

Synopsis

         use Lingua::Translit;

         my $tr = new Lingua::Translit("ISO 843");

         my $text_tr = $tr->translit("character oriented string");

         if ($tr->can_reverse()) {
           $text_tr = $tr->translit_reverse("character oriented string");
         }

See Also