logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

Text::Undiacritic - remove diacritics from a string

Author

       Helmut Wollmersdorfer "<WOLLMERS@cpan.org>"

Bugs And Limitations

       There is no experience if this module gives useful results for scripts other than Latin.

Configuration And Environment

Dependencies

       •   version

       •   charnames

       •   Unicode::Normalize

Description

       Changes characters with diacritics into their base characters.

       Also changes into base character in cases where UNICODE does not provide a decomposition.

       E.g. all characters '... WITH STROKE' like 'LATIN SMALL LETTER L WITH STROKE' do not have a
       decomposition. In the latter case the result will be 'LATIN SMALL LETTER L'.

       Removing diacritics is useful for matching text independent of spelling variants.

Diagnostics

Incompatibilities

Name

       Text::Undiacritic - remove diacritics from a string

Subroutines/Methods

undiacritic
           $ascii_string = undiacritic( $characters );

       Removes diacritics from $characters and returns a simplified character string.

       The input string must be in character modus, i.e. UNICODE code points.

Synopsis

           use Text::Undiacritic qw(undiacritic);
           $ascii_string = undiacritic( $czech_string );

Version

       This document describes Text::Undiacritic 0.01

See Also