Lingua::Translit - transliterates text between writing systems
Contents
Adding New Transliterations
In case you want to add your own transliteration tables to Lingua::Translit, have a look at the developer
documentation at <https://www.netzum-sorglos.de/software/lingua-translit/developer-documentation.html>.
A template of a transliteration table is provided as well (xml/template.xml) so you can easily start
developing.
Bugs
None known.
Please report bugs using CPAN's request tracker at
<https://rt.cpan.org/Public/Dist/Display.html?Name=Lingua-Translit>.
Credits
Thanks to Dr. Daniel Eiwen, Romanisches Seminar, Universitaet Koeln for his help on Romanian
transliteration.
Thanks to Dmitry Smal and Rusar Publishing for contributing the "ALA-LC RUS" transliteration table.
Thanks to Ahmed Elsheshtawy for his help implementing the "Common ARA" Arabic transliteration.
Thanks to Dusan Vuckovic for contributing the "ISO/R 9" transliteration table.
Thanks to Ștefan Suciu for contributing the "ISO 8859-16 RON" transliteration table.
Thanks to Philip Kime for contributing the "IAST Devanagari" and "Devanagari IAST" transliteration
tables.
Thanks to Nikola Lečić for contributing the "BGN/PCGN RUS Standard" and "BGN/PCGN RUS Strict"
transliteration tables.
Description
Lingua::Translit can be used to convert text from one writing system to another, based on national or
international transliteration tables. Where possible a reverse transliteration is supported.
The term "transliteration" describes the conversion of text from one writing system or alphabet to
another one. The conversion is ideally unique, mapping one character to exactly one character, so the
original spelling can be reconstructed. Practically this is not always the case and one single letter of
the original alphabet can be transcribed as two, three or even more letters.
Furthermore there is more than one transliteration scheme for one writing system. Therefore it is an
important and necessary information, which scheme will be or has been used to transliterate a text, to
work integrative and be able to reconstruct the original data.
Reconstruction is a problem though for non-unique transliterations, if no language specific knowledge is
available as the resulting clusters of letters may be ambiguous. For example, the Greek character "PSI"
maps to "ps", but "ps" could also result from the sequence "PI", "SIGMA" since "PI" maps to "p" and
"SIGMA" maps to s. If a transliteration table leads to ambiguous conversions, the provided table cannot
be used reverse.
Otherwise the table can be used in both directions, if appreciated. So if ISO 9 is originally created to
convert Cyrillic letters to the Latin alphabet, the reverse transliteration will transform Latin letters
to Cyrillic.
License And Copyright
Copyright (C) 2007-2008 Alex Linke and Rona Linke
Copyright (C) 2009-2016 Lingua-Systems Software GmbH
Copyright (C) 2016-2017 Netzum Sorglos, Lingua-Systems Software GmbH
Copyright (C) 2017-2022 Netzum Sorglos Software GmbH
This module is free software; you can redistribute it and/or modify it under the same terms as Perl
itself.
perl v5.36.0 2022-10-13 Lingua::Translit(3pm)
Methods
new("nameoftable")
Initializes an object with the specific transliteration table, e.g. "ISO 9".
translit("characterorientedstring")
Transliterates the given text according to the object's transliteration table. Returns the
transliterated text.
translit_reverse("characterorientedstring")
Transliterates the given text according to the object's transliteration table, but uses it the other way
round. For example table ISO 9 is a transliteration scheme for the conversion of Cyrillic letters to the
Latin alphabet. So if used reverse, Latin letters will be mapped to Cyrillic ones.
Returns the transliterated text.
can_reverse()
Returns true (1), iff reverse transliteration is possible. False (0) otherwise.
name()
Returns the name of the chosen transliteration table, e.g. "ISO 9".
desc()
Returns a description for the transliteration, e.g. "ISO 9:1995, Cyrillic to Latin".
Name
Lingua::Translit - transliterates text between writing systems
Restrictions
Lingua::Translit is suited to handle Unicode and utilizes comparisons and regular expressions that rely
on codepoints. Therefore, any input is supposed to be characteroriented ("use utf8;", ...) instead of
byte oriented.
However, if your data is byte oriented, be sure to pass it UTF-8encoded to translit() and/or
translit_reverse() - it will be converted internally.
See Also
Lingua::Translit::Tables, Encode, perlunicode
"translit"'s manpage
<http://www.netzum-sorglos.de/software/lingua-translit/>
Supported Transliterations
Cyrillic
ALA-LCRUS, not reversible, ALA-LC:1997, Cyrillic to Latin, Russian
ISO9, reversible, ISO 9:1995, Cyrillic to Latin
ISO/R9, reversible, ISO 9:1954, Cyrillic to Latin
DIN1460RUS, reversible, DIN 1460:1982, Cyrillic to Latin, Russian
DIN1460UKR, reversible, DIN 1460:1982, Cyrillic to Latin, Ukrainian
DIN1460BUL, reversible, DIN 1460:1982, Cyrillic to Latin, Bulgarian
StreamlinedSystemBUL, not reversible, The Streamlined System: 2006, Cyrillic to Latin, Bulgarian
GOST7.79RUS, reversible, GOST 7.79:2000 (table B), Cyrillic to Latin, Russian
GOST7.79RUSOLD, not reversible, GOST 7.79:2000 (table B), Cyrillic to Latin with support for Old
Russian (pre 1918), Russian
GOST7.79UKR, reversible, GOST 7.79:2000 (table B), Cyrillic to Latin, Ukrainian
BGN/PCGNRUSStandard, not reversible, BGN/PCGN:1947 (Standard Variant), Cyrillic to Latin, Russian
BGN/PCGNRUSStrict, not reversible, BGN/PCGN:1947 (Strict Variant), Cyrillic to Latin, Russian
Greek
ISO843, not reversible, ISO 843:1997, Greek to Latin
DIN31634, not reversible, DIN 31634:1982, Greek to Latin
Greeklish, not reversible, Greeklish (Phonetic), Greek to Latin
Latin
CommonCES, not reversible, Czech without diacritics
CommonDEU, not reversible, German without umlauts
CommonPOL, not reversible, Unaccented Polish
CommonRON, not reversible, Romanian without diacritics as commonly used
CommonSLK, not reversible, Slovak without diacritics
CommonSLV, not reversible, Slovenian without diacritics
ISO8859-16RON, reversible, Romanian with appropriate diacritics
Arabic
CommonARA, not reversible, Common Romanization of Arabic
Sanskrit
IASTDevanagari, not reversible, IAST Romanization to Devanāgarī
DevanagariIAST, not reversible, Devanāgarī to IAST Romanization
Synopsis
use Lingua::Translit;
my $tr = new Lingua::Translit("ISO 843");
my $text_tr = $tr->translit("character oriented string");
if ($tr->can_reverse()) {
$text_tr = $tr->translit_reverse("character oriented string");
}
