Awk Command Examples - Process Text Files with Awk - cmd Cheatsheets

Awk is a powerful text-processing utility that scans files line by line and performs actions based on specified patterns. It's widely used in Unix-like operating systems for data extraction, reporting, and manipulation.

Sum Integers and Process Fields

Learn how to sum integers from a file or standard input, and how to process fields using different delimiters.

# Sum integers from a file or STDIN, one integer per line.
printf '1\n2\n3\n' | awk '{sum += $1} END {print sum}'

# Using specific character as separator to sum integers from a file or STDIN.
printf '1:2:3' | awk -F ":" '{print $1+$2+$3}'

Filtering Lines with Patterns

Discover how to select specific lines from a file based on line numbers, regular expressions, or field values.

# Print line number 12 of file.txt
awk 'NR==12' file.txt

# Print first field for each record in file.txt
awk '{print $1}' file.txt

# Print only lines that match regex in file.txt
awk '/regex/' file.txt

# Print only lines that do not match regex in file.txt
awk '!/regex/' file.txt

# Print any line where field 2 is equal to "foo" in file.txt
awk '$2 == "foo"' file.txt

# Print lines where field 2 is NOT equal to "foo" in file.txt
awk '$2 != "foo"' file.txt

# Print line if field 1 matches regex in file.txt
awk '$1 ~ /regex/' file.txt

# Print line if field 1 does NOT match regex in file.txt
awk '$1 !~ /regex/' file.txt

# Print lines with more than 80 characters in file.txt
awk 'length > 80' file.txt

Advanced Awk Techniques

Explore more advanced Awk features such as custom output separators, paragraph processing, and accessing environment variables.

# Print a multiplication table.
awk -v RS='' '
    {
        for(i=1;i<=NF;i++){
            printf("%dx%d=%d%s", i, NR, i*NR, i==NR?"\n":"\t")
        }
    }
' <<< "$(seq 9 | sed 'H;g')"

# Specify output separator character.
printf '1 2 3' | awk 'BEGIN {OFS=":"}; {print $1,$2,$3}'

# Search paragraph for the given REGEX match.
# Paragraphs will be collapsed together.
awk -v RS='' '/42B/' file

# Search paragraph for the given REGEX match.
# Paragraphs will be separated with a new line.
awk -v RS= ORS='\n\n' '/42B/' file

# Display only first field in text taken from STDIN.
echo 'Field_1  Field_2  Field_3' | awk '{print $1}'
# Note that in this case, you're far better off using cut(1).

# Use AWK solo; without the need for something via STDIN.
awk 'BEGIN {print("Example text.")}'

# Access environment variables from within AWK.
awk 'BEGIN {print ENVIRON["LS_COLORS"]}'

# Count number of lines taken from STDIN.
free | awk '{L++} END {print(L)}'
# Cleaner, more efficient approach to the above.
free | awk 'END {print(NR)}'

# Output unique list of available sections under which to create a DEB package.
awk '!A[$1]++ {print($1)}' <<< "$(dpkg-query --show -f='${Section}\n')"

# Using process substitution (`<()` is NOT command substitution), with AWK and
# its associative array variables, we can print just column 2 for lines whose
# first column is equal to what's between the double-quotes.
awk '{NR != 1 && A[$1]=$2} END {print(A["Mem:"])}' <(free -h)
# While below is an easier and simpler solution to the above, it's not at all
# the same, and in other cases, the above is definitely preferable.
awk '/^Mem:/ {print($2)}' <(free -h)

# Output list of unique uppercase-only, sigil-omitted variables used in FILE.
awk '
    {
        for(F=0; F<NF; F++){
            if($F~/^\$[A-Z_]+$/){
                A[$F]++
            }
        }
    }
    END {
        for(I in A){
            X=substr(I, 2, length(I))
            printf("%s\n", X)
        }
    }
' FILE

# Output only lines from FILE between PATTERN_1 and PATTERN_2. Good for logs.
awk '/PATTERN_1/,/PATTERN_2/ {print}' FILE

# Pretty-print a table of an overview of the non-system users on the system.
awk -F ':' '
    BEGIN {
        printf("%-17s %-4s %-4s %-s\n", "NAME", "UID", "GID", "SHELL")
    }
    $3 >= 1000 && $1 != "nobody" {
        printf("%-17s %-d %-d %-s\n", $1, $3, $4, $7)
    }
' /etc/passwd

# Display the total amount of MiB of RAM available in the machine. This is also
# a painful but useful workaround to get the units comma-separated, as would be
# doable with Bash's own `printf` built-in.
awk '/^MemTotal:/ {printf("%'"'"'dMiB\n", $2 / 1024)}'

# It's possible to sort strings in AWK, as well as uniq-ing, meaning you can
# replace uniq(1) and sort(1) calls with just the one call of AWK. Granted, you
# can use `sort -u` to do both, but AWK offers much more functionality.
#
# Unlike when using AWK to uniq-ify, uniq(1) only works by adjacency, meaning
# the duplicate lines must be adjacent to one another for them to be handled.
awk '
	{
		!Lines[$0]++
	}
	END {
		asorti(Lines, Sorted)
		for (Line in Sorted) {
			print(Sorted[Line])
		}
	}
' FILE

# Remove duplicate lines
awk '!seen[$0]++' file.txt

# Remove all empty lines
awk 'NF > 0' file.txt

Further Resources

For more in-depth information and advanced usage of Awk, refer to the official documentation and community resources.

GNU Awk Manual
MDN Web Docs: Date (for date/time related processing)
Regular-Expressions.info (for understanding regex patterns)