confusable_homoglyphs - confusable_homoglyphs Documentation
Contents
Api Documentation
confusable_homoglyphspackageSubmodulesconfusable_homoglyphs.categoriesmoduleconfusable_homoglyphs.categories.alias(chr)
Retrieves the script block alias for a unicode character.
>>> categories.alias('A')
'LATIN'
>>> categories.alias('τ')
'GREEK'
>>> categories.alias('-')
'COMMON'
Parameterschr (str) – A unicode character
Returns
The script block alias.
Returntype
str
confusable_homoglyphs.categories.aliases_categories(chr)
Retrieves the script block alias and unicode category for a unicode character.
>>> categories.aliases_categories('A')
('LATIN', 'L')
>>> categories.aliases_categories('τ')
('GREEK', 'L')
>>> categories.aliases_categories('-')
('COMMON', 'Pd')
Parameterschr (str) – A unicode character
Returns
The script block alias and unicode category for a unicode character.
Returntype
(str, str)
confusable_homoglyphs.categories.category(chr)
Retrieves the unicode category for a unicode character.
>>> categories.category('A')
'L'
>>> categories.category('τ')
'L'
>>> categories.category('-')
'Pd'
Parameterschr (str) – A unicode character
Returns
The unicode category for a unicode character.
Returntype
str
confusable_homoglyphs.categories.unique_aliases(string)
Retrieves all unique script block aliases used in a unicode string.
>>> categories.unique_aliases('ABC')
{'LATIN'}
>>> categories.unique_aliases('ρAτ-')
{'GREEK', 'LATIN', 'COMMON'}
Parametersstring (str) – A unicode character
Returns
A set of the script block aliases used in a unicode string.
Returntype
(str, str)
confusable_homoglyphs.climoduleconfusable_homoglyphs.cli.generate_categories()
Generates the categories JSON data file from the unicode specification.
Returns
True for success, raises otherwise.
Returntype
bool
confusable_homoglyphs.cli.generate_confusables()
Generates the confusables JSON data file from the unicode specification.
Returns
True for success, raises otherwise.
Returntype
bool
confusable_homoglyphs.confusablesmoduleexceptionconfusable_homoglyphs.confusables.Found
Bases: Exceptionconfusable_homoglyphs.confusables.is_confusable(string,greedy=False,preferred_aliases=[])
Checks if string contains characters which might be confusable with characters from
preferred_aliases.
If greedy=False, it will only return the first confusable character found without looking at the
rest of the string, greedy=True returns all of them.
preferred_aliases=[] can take an array of unicode block aliases to be considered as your ‘base’
unicode blocks:
• considering paρa,
• with preferred_aliases=['latin'], the 3rd character ρ would be returned because this greek
letter can be confused with latin p.
• with preferred_aliases=['greek'], the 1st character p would be returned because this latin
letter can be confused with greek ρ.
• with preferred_aliases=[] and greedy=True, you’ll discover the 29 characters that can be
confused with p, the 23 characters that look like a, and the one that looks like ρ (which is,
of course, p aka LATINSMALLLETTERP).
>>> confusables.is_confusable('paρa', preferred_aliases=['latin'])[0]['character']
'ρ'
>>> confusables.is_confusable('paρa', preferred_aliases=['greek'])[0]['character']
'p'
>>> confusables.is_confusable('Abç', preferred_aliases=['latin'])
False
>>> confusables.is_confusable('AlloΓ', preferred_aliases=['latin'])
False
>>> confusables.is_confusable('ρττ', preferred_aliases=['greek'])
False
>>> confusables.is_confusable('ρτ.τ', preferred_aliases=['greek', 'common'])
False
>>> confusables.is_confusable('ρττp')
[{'homoglyphs': [{'c': 'p', 'n': 'LATIN SMALL LETTER P'}], 'alias': 'GREEK', 'character': 'ρ'}]
Parameters
• string (str) – A unicode string
• greedy (bool) – Don’t stop on finding one confusable character - find all of them.
• preferred_aliases (list(str)) – Script blocks aliases which we don’t want string’s
characters to be confused with.
Returns
False if not confusable, all confusable characters and with what they are confusable
otherwise.
Returntype
bool or list
confusable_homoglyphs.confusables.is_dangerous(string,preferred_aliases=[])
Checks if string can be dangerous, i.e. is it not only mixed-scripts but also contains characters
from other scripts than the ones in preferred_aliases that might be confusable with characters
from scripts in preferred_aliases
For preferred_aliases examples, see is_confusable docstring.
>>> bool(confusables.is_dangerous('Allo'))
False
>>> bool(confusables.is_dangerous('AlloΓ', preferred_aliases=['latin']))
False
>>> bool(confusables.is_dangerous('Alloρ'))
True
>>> bool(confusables.is_dangerous('AlaskaJazz'))
False
>>> bool(confusables.is_dangerous('ΑlaskaJazz'))
True
Parameters
• string (str) – A unicode string
• preferred_aliases (list(str)) – Script blocks aliases which we don’t want string’s
characters to be confused with.
Returns
Is it dangerous.
Returntype
bool
confusable_homoglyphs.confusables.is_mixed_script(string,allowed_aliases=['COMMON'])
Checks if string contains mixed-scripts content, excluding script blocks aliases in
allowed_aliases.
E.g. B.C is not considered mixed-scripts by default: it contains characters from Latin and
Common, but Common is excluded by default.
>>> confusables.is_mixed_script('Abç')
False
>>> confusables.is_mixed_script('ρτ.τ')
False
>>> confusables.is_mixed_script('ρτ.τ', allowed_aliases=[])
True
>>> confusables.is_mixed_script('Alloτ')
True
Parameters
• string (str) – A unicode string
• allowed_aliases (list(str)) – Script blocks aliases not to consider.
Returns
Whether string is considered mixed-scripts or not.
Returntype
bool
confusable_homoglyphs.utilsmoduleconfusable_homoglyphs.utils.delete(filename)
Deletes a JSON data file if it exists.
confusable_homoglyphs.utils.dump(filename,data)confusable_homoglyphs.utils.get(url,timeout=None)confusable_homoglyphs.utils.load(filename)
Loads a JSON data file.
Returns
A dict.
Returntype
dict
confusable_homoglyphs.utils.path(filename)
Returns a file path relative to the data directory.
This is the package directory by default, or the env variable CONFUSABLE_DATA if set.
Returns
A file path string.
Returntype
str
confusable_homoglyphs.utils.u(x)ModulecontentsConfusable_Homoglyphs [Doc]
This project has been adopted from theoriginalconfusable_homoglyphsbyVictorFelder.
ahomoglyphisoneoftwoormoregraphemes,characters,orglyphswithshapesthatappearidenticalorverysimilarwikipedia:Homoglyph
Unicode homoglyphs can be a nuisance on the web. Your most popular client, AlaskaJazz, might be upset to
be impersonated by a trickster who deliberately chose the username ΑlaskaJazz.
• AlaskaJazz is single script: only Latin characters.
• ΑlaskaJazz is mixed-script: the first character is a greek letter.
You might also want to avoid people being tricked into entering their password on www.microsоft.com or
www.faϲebook.com instead of www.microsoft.com or www.facebook.com. Hereisautility to play with these
confusablehomoglyphs.
Not all mixed-script strings have to be ruled out though, you could only exclude mixed-script strings
containing characters that might be confused with a character from some unicode blocks of your choosing.
• Allo and ρττ are fine: single script.
• AlloΓ is fine when our preferred script alias is ‘latin’: mixed script, but Γ is not confusable.
• Alloρ is dangerous: mixed script and ρ could be confused with p.
This library is compatible with Python 3.
APIdocumentationIsthedatauptodate?
Yep.
The unicode blocks aliases and names for each character are extracted from thisfile provided by the
unicode consortium.
The matrix of which character can be confused with which other characters is built using thisfile
provided by the unicode consortium.
This data is stored in two JSON files: categories.json and confusables.json. If you delete them, they
will both be recreated by downloading and parsing the two abovementioned files and stored as JSON files
again.
Contributing
Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will
always be given.
You can contribute in many ways:
TypesofContributionsReportBugs
Report bugs at https://todo.sr.ht/~valhalla/confusable_homoglyphs
If you are reporting a bug, please include:
• Any details about your local setup that might be helpful in troubleshooting.
• Detailed steps to reproduce the bug.
FixBugs
Look through the sourcehut tickets for bugs. Anything tagged with “bug” is open to whoever wants to
implement it.
ImplementFeatures
Look through the sourcehut tickets for features. Anything tagged with “feature” is open to whoever wants
to implement it.
WriteDocumentation
confusable_homoglyphs could always use more documentation, whether as part of the official
confusable_homoglyphs docs, in docstrings, or even on the web in blog posts, articles, and such.
SubmitFeedback
The best way to send feedback is to file an issue at https://todo.sr.ht/~valhalla/confusable_homoglyphs.
If you are proposing a feature:
• Explain in detail how it would work.
• Keep the scope as narrow as possible, to make it easier to implement.
• Remember that this is a volunteer-driven project, and that contributions are welcome :)
GetStarted!
Ready to contribute? Here’s how to set up confusable_homoglyphs for local development.
1. Clone the git repository from sourcehut:
$ git clone https://git.sr.ht/~valhalla/confusable_homoglyphs
2. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how
you set up your fork for local development:
$ mkvirtualenv confusable_homoglyphs
$ cd confusable_homoglyphs/
$ python setup.py develop
3. Create a branch for local development:
$ git checkout -b name-of-your-bugfix-or-feature
Now you can make your changes locally.
4. When you’re done making changes, check that your changes pass flake8 and the tests, including testing
other Python versions with tox:
$ flake8 confusable_homoglyphs tests
$ python setup.py test
$ tox
To get flake8 and tox, just pip install them into your virtualenv.
5. Commit your changes:
$ git add .
$ git commit -m "Your detailed description of your changes."
7. Send the patch to mailto:~valhalla/confusable_homoglyphs-devel@lists.sr.ht:
$ git send-email \
--to="mailto:~valhalla/confusable_homoglyphs-devel@lists.sr.ht" \
HEAD^
you can see https://git-send-email.io/ for details on how to install and configure git-send-email.
PullRequestGuidelines
Before you submit a pull request, check that it meets these guidelines:
1. The pull request should include tests.
2. If the pull request adds functionality, the docs should be updated. Put your new functionality into a
function with a docstring, and add the feature to the list in README.rst.
3. The pull request should work for all supported Python versions.
Copyright
2024, Victor Felder
3.3.1 Jan 30, 2024 CONFUSABLE_HOMOGLYPHS(1)
Credits
OriginalAuthorandFormerMaintainer
• Victor Felder <victorfelder@gmail.com>
CurrentMaintainer
• Elena “of Valhalla” Grandi <valhalla@trueelena.org>
Contributors
• Ryan P Kilby <rpkilby@ncsu.edu>
History
1.0.0
Initial release.
2.0.0
• allowed_categories renamed to allowed_aliases2.0.1
• Fix a TypeError: https://github.com/vhf/confusable_homoglyphs/pull/23.0.0
Courtesy of Ryan P Kilby, via https://github.com/vhf/confusable_homoglyphs/pull/6 :
• Changed file paths to be relative to the confusable_homoglyphs package directory instead of the user’s
current working directory.
• Data files are now distributed with the packaging.
• Fixes tests so that they use the installed distribution instead of the local files. (Originally, the
data files were erroneously showing up during testing, despite not being included in the distribution).
• Moves the data file generation into a simple CLI. This way, users have a method for controlling when
the data files are updated.
• Since the data files are now included in the distribution, the CLI is made optional. Its dependencies
can be installed with the cli bundle, eg. pipinstallconfusable_homoglyphs[cli].
3.1.0
• Update unicode data
3.1.1
• Update unicode data (via ftp)
3.2.0
• Drop support for Python 3.3
• Fix #11: work as expected when char not found in datafiles
3.3.0
• Drop support for Python 2
• Drop support for Python < 3.7, add support for Python up to 3.12
• Allow using data files from a custom location set with the CONFUSABLE_DATA environment variable.
• Fix the return value of confusables.is_dangerous() to the documented API of a boolean value. It used to
return either False or the list output of confusable.is_confusable().
• Added a check command for command line use.
3.3.1
• Update unicode data
Installation
If available, install anappropriatepackagefromyourdistribution:
Otherwise you can install from PyPi:
at the command line:
$ easy_install confusable_homoglyphs
or, if you have virtualenvwrapper installed:
$ mkvirtualenv confusable_homoglyphs
$ pip install confusable_homoglyphs
Name
confusable_homoglyphs - confusable_homoglyphs Documentation
Contents:
Usage
To use confusable_homoglyphs in a project:
pip install confusable_homoglyphs
import confusable_homoglyphs
To update the data files, you first need to install the “cli” bundle, then run the “update” command:
pip install confusable_homoglyphs[cli]
confusable_homoglyphs update
