Comm Command
The comm command is a powerful utility in Unix-like operating systems used to compare two sorted files line by line. It outputs three columns: lines unique to the first file, lines unique to the second file, and lines common to both files. This makes it invaluable for identifying differences and commonalities between datasets, configuration files, or any text-based data.
Understanding Comm Command Options
The comm command relies on the input files being sorted. If they are not, the results will be unpredictable. You can suppress any of the three output columns using flags:
-1: Suppress lines unique to file 1.-2: Suppress lines unique to file 2.-3: Suppress lines common to both files.
Common Comm Command Use Cases
Finding Common Lines Between Two Files
To display only the lines that appear in both file1.csv and file2.csv, you would suppress the unique lines from each file:
comm -12 <(sort file1.csv) <(sort file2.csv)
This example uses process substitution <(...) to sort the files on the fly before passing them to comm.
Finding Lines Unique to the First File
To show only the lines that are present in file-1 but not in file-2, suppress the lines unique to file 2 and the common lines:
comm -23 <file-1> <file-2>
Finding Lines Unique to the Second File
Conversely, to display lines unique to file-2, suppress lines unique to file 1 and common lines:
comm -13 <file-1> <file-2>
Advanced File Comparison
For more complex comparisons, especially with large CSV files generated from databases, specialized tools might offer more features. Consider exploring options like:
For diffing CSVs from a database, consider: CSVDiff
The comm command is a fundamental tool for text file comparison, offering a straightforward way to manage and analyze differences between sorted datasets.