Awk Command Line Utility - Text Processing & Transformation - cmd Cheatsheets

Understanding the Awk Command Line Utility

awk is a powerful command-line utility designed for finding, processing, and transforming text files. It operates on lines of text, treating each line as a record and breaking it down into fields. Its fundamental syntax revolves around the `pattern { action }` structure, where a pattern is matched against each input line, and if it matches, a specified action is executed.

Core Awk Concepts

Pattern Matching: Patterns can be regular expressions. When a pattern matches an input line, the associated action is performed. If no pattern is provided, the action is applied to every line. For example, /^HTTP/ {print} prints every line that starts with "HTTP".
BEGIN and END Patterns: Awk supports two special patterns:
- BEGIN: Specifies actions to be performed before any input line is read.
- END: Specifies actions to be performed after all input lines have been read.
- Example: awk 'BEGIN{print "start"} {print} END{print "end"}' file.txt will first print "start", then all lines of file.txt, and finally "end".
Fields and Records: Awk interprets each line as a record. By default, one or more consecutive spaces or tabs act as delimiters between fields. Fields are accessed using $1, $2, etc., representing the first, second field, and so on. $0 represents the entire input line.
Data Types: Awk supports two primary data types: strings and numbers. It automatically converts variables based on the context.
Associative Arrays: Awk supports associative arrays, allowing you to store data using keys, like var[key] = value.
Arithmetic Operations: Basic arithmetic operations (+, -, *, /, %) are supported on numbers, along with autoincrement (++) and decrement (--) operators.

Awk Actions and Control Flow

Awk provides a set of actions to manipulate text data:

Action	Description
`{ print $0; }`	Prints the entire current record (line).
`{ exit; }`	Terminates the Awk program immediately.
`{ next; }`	Skips the rest of the actions for the current line and proceeds to the next line.
`{ a=$1; b="X" }`	Assigns the value of the first field to variable 'a' and the string "X" to variable 'b'.
`{ c[$1] = $2 }`	Assigns the value of the second field to an element in the associative array 'c', using the first field as the key.
`{if (condition) { action } else if (condition) { action } else { action }}`	Implements conditional logic with if-else if-else statements.
`{ for (i=1; i < x; i++) { action } }`	Executes a loop a specified number of times.
`{ for (item in c) { action } }`	Iterates over the elements of an associative array.

Awk Special Variables for Text Processing

Awk utilizes several special variables to manage input and output processing:

Variable	Description
`FS`	Input Field Separator. This can be modified to change how fields are delimited (e.g., comma, colon).
`RS`	Input Record Separator. Defaults to a newline character. Can be modified to process data with different record structures.
`OFS`	Output Field Separator. Determines the separator used when printing multiple fields.
`ORS`	Output Record Separator. Defaults to a newline character. Controls the separator between output records.
`NF`	Number of Fields in the current line (record). This variable cannot be updated by the user.
`NR`	Number of lines processed so far. This variable cannot be updated by the user.

Note: The -F option can be used to directly set the input field separator on the command line, for example: awk -F":" '{ print $1 }' file.txt.

Practical Awk Examples

Split a comma-separated file and print the second field:
awk -F"," '{print $2}' file.txt
Print the third field of a CSV if the second field exists and is not empty:
awk -F"," '{if ($2) print $3}' file.txt
Print the last field in each line of a comma-separated file:
awk -F"," '{ print $NF }' file.txt
Print the line immediately following a line matching a specific pattern:
awk '/pattern/ { i=1; next; } {if(i) {i--; print;}}' file.txt
Print a line and the two lines following it after matching a regular expression:
awk '/regexp/ {i=3;} { if(i) {i--; print;}}' file.txt
Print lines from a file starting at a line matching "start" until a line matching "stop":
awk '/start/,/stop/' file.txt
Count the total number of lines in a file (equivalent to wc -l):
awk 'END{print NR}' file.txt
Print lines matching a pattern (equivalent to grep):
awk '/pattern/'
Print lines that do not match a pattern (equivalent to grep -v):
awk '!/pattern/'
Remove duplicate consecutive lines (equivalent to uniq):
awk 'a !~ $0 {print}; {a=$0}' file.txt
Print the first 10 lines of a file (equivalent to head):
awk 'NR < 11' file.txt
Print the last 10 lines of a file (equivalent to tail):
awk '{vect[NR]=$0;} END{for(i=NR-9;i<=NR;i++) {print vect[i];}}' file.txt
Calculate the total number of bytes used by files listed in ls -l output:
ls -l | awk '{ x += $5 } END { print "Total bytes: " x }'
Read a CSV file and print the first and third fields, separated by a semicolon:
awk 'BEGIN{FS=",";OFS=";"}{print $1, $3}' file.txt

For more advanced text processing and data manipulation on the command line, Awk is an indispensable tool for developers and system administrators.