Sort Command Examples
Understanding the Linux Sort Command
The sort command in Linux is a powerful utility used to
sort lines of text files. It can arrange data alphabetically,
numerically, or based on specific fields within each line. This
makes it indispensable for data processing and analysis directly
from the command line. Understanding its various options allows for
flexible manipulation of text-based data.
Sorting Lines Alphabetically and Reverse
By default, sort arranges lines based on the ASCII
table. The -r flag reverses the sort order, which is
useful for displaying data from largest to smallest or Z to A.
# sort
# Sort lines of text files
# Return the contents of the British English dictionary, in reverse order.
sort -r /usr/share/dict/british-english
Filtering Adjacent Duplicate Lines with Sort
While uniq is primarily for removing duplicate lines,
sort with the -u flag can filter out
adjacent duplicate lines after sorting. For more complex uniqueness
checks or when sort's capabilities are insufficient,
consider using tools like AWK.
# The GNU sort(1) command can also filter out adjacent duplicate lines and can
# therefore overlap with the uniq(1) command. However, uniq(1) has some options
# that sort(1) cannot do so refer to the man page for you situation if you
# require something beyond a basic uniqueness check. In addition, there is the
# potential for parallizing the processing by piping sort(1) into uniq(1) for
# non trivial tasks.
#
# By default, sort(1) sorts lines or fields using the ASCII table. Here, we're
# essentially getting alphanumeric sorting, where case is handled separately; -
# this results in these words being adjacent to one another, thus duplicates
# are removed.
#
# If you need better uniq-ing, you could refer to AWK & its associative arrays.
printf '%s\n' this is a list of of random words with duplicate words | sort -u
Numerical Sorting
When sorting numbers, it's crucial to use the -n flag.
Without it, sort treats numbers as strings, leading to
incorrect ordering (e.g., 1, 10, 11, 2). The -n flag
ensures proper numerical comparison.
# Sort numerically. If you don't provide the `-n` flag, sort(1) will instead
# sort by the ASCII table, as mentioned above, meaning it'll display as 1, 10, -
# 11, 2, 3, 4, etc.
printf '%d\n' {1..9} 10 11 | sort -n
Sorting Human-Readable Sizes
The sort command can also handle human-readable file
sizes (like KB, MB, GB) using the -h flag. Combined
with the -k flag to specify the sort key (column) and
-r for reverse order, it's excellent for analyzing disk
usage from commands like df.
# You can even sort human-readable sizes. In this example, the 2nd column is
# being sorted, thanks to the use of the `-k` flag, and the sorting is
# reversed, so that the top-most storage space hungry filesystems are displayed
# from df(1).
df -ht ext4 /dev/sd[a-z][1-9]* | sed '1d' | sort -rhk 2