unihist generates a histogram of the characters in its input, which must be encoded in UTF-8 Unicode. By
default, for each character it prints the frequency of the character as a percentage of the total, the
absolute number of tokens in the input, the UTF-32 code in hexadecimal, and, if the character is
displayable, the glyph itself as UTF-8 Unicode. Command line flags allow unwanted information to be
suppressed. In particular, note that by suppressing the percentages and counts it is possible to
generate a list of the unique characters in the input.
Output is produced ordered by character code. To sort it in descending order of frequency, pipe the
output into the command:
sort -k1 -n -r
By default, unihist handles all of Unicode. To reduce memory usage and increase speed, it may be compiled
so as to handle only the Basic Multilingual Plane (plane 0) by defining BMPONLY.