Uniq Command
Uniq: Report or Omit Repeated Lines
The uniq command is a powerful command-line utility
used to process text files by reporting or omitting repeated lines.
It is essential for data cleaning and analysis, especially when
dealing with sorted input. To effectively use uniq, the
input file must be sorted beforehand, as it compares adjacent lines.
Key Features and Options
The uniq command offers several options to customize
its behavior:
-
-c: Show repetition counts
This option prefixes each output line with the number of times it occurred in the input. -
-d: Print only repeated lines
With this flag,uniqwill only output lines that appear more than once in the input, and each such line will be printed only once. -
-u: Print only unique lines
This option ensures that only lines that appear exactly once in the input are printed. -
-i: Case-insensitive comparison
This flag makes the comparison of lines case-insensitive, treating 'Apple' and 'apple' as identical.
Usage Example
To see how many times each line appears in a sorted file named
sorted_data.txt, you would use:
sort sorted_data.txt | uniq -c
Understanding Sorted Input
It is crucial to remember that uniq only compares
adjacent lines. If your data is not sorted, you might get unexpected
results. For instance, if a line appears twice but with other lines
in between, uniq will not consider them as duplicates
unless the file is sorted first.
External Resources
| Option | Description |
|---|---|
-c |
Show how many times a line is repeated. |
-d |
Prints only the repeated lines only once. |
-u |
Prints only the unique lines. |
-i |
Case insensitive comparison. |