Awk Command Line Utility
Understanding the Awk Command Line Utility
awk is a powerful command-line utility designed for finding, processing, and transforming text files. It operates on lines of text, treating each line as a record and breaking it down into fields. Its fundamental syntax revolves around the `pattern { action }` structure, where a pattern is matched against each input line, and if it matches, a specified action is executed.
Core Awk Concepts
- Pattern Matching: Patterns can be regular expressions. When a pattern matches an input line, the associated action is performed. If no pattern is provided, the action is applied to every line. For example,
/^HTTP/ {print}prints every line that starts with "HTTP". - BEGIN and END Patterns: Awk supports two special patterns:
BEGIN: Specifies actions to be performed before any input line is read.END: Specifies actions to be performed after all input lines have been read.- Example:
awk 'BEGIN{print "start"} {print} END{print "end"}' file.txtwill first print "start", then all lines offile.txt, and finally "end".
- Fields and Records: Awk interprets each line as a record. By default, one or more consecutive spaces or tabs act as delimiters between fields. Fields are accessed using
$1,$2, etc., representing the first, second field, and so on.$0represents the entire input line. - Data Types: Awk supports two primary data types: strings and numbers. It automatically converts variables based on the context.
- Associative Arrays: Awk supports associative arrays, allowing you to store data using keys, like
var[key] = value. - Arithmetic Operations: Basic arithmetic operations (+, -, *, /, %) are supported on numbers, along with autoincrement (++) and decrement (--) operators.
Awk Actions and Control Flow
Awk provides a set of actions to manipulate text data:
| Action | Description |
|---|---|
{ print $0; } |
Prints the entire current record (line). |
{ exit; } |
Terminates the Awk program immediately. |
{ next; } |
Skips the rest of the actions for the current line and proceeds to the next line. |
{ a=$1; b="X" } |
Assigns the value of the first field to variable 'a' and the string "X" to variable 'b'. |
{ c[$1] = $2 } |
Assigns the value of the second field to an element in the associative array 'c', using the first field as the key. |
{if (condition) { action } else if (condition) { action } else { action }} |
Implements conditional logic with if-else if-else statements. |
{ for (i=1; i < x; i++) { action } } |
Executes a loop a specified number of times. |
{ for (item in c) { action } } |
Iterates over the elements of an associative array. |
Awk Special Variables for Text Processing
Awk utilizes several special variables to manage input and output processing:
| Variable | Description |
|---|---|
FS |
Input Field Separator. This can be modified to change how fields are delimited (e.g., comma, colon). |
RS |
Input Record Separator. Defaults to a newline character. Can be modified to process data with different record structures. |
OFS |
Output Field Separator. Determines the separator used when printing multiple fields. |
ORS |
Output Record Separator. Defaults to a newline character. Controls the separator between output records. |
NF |
Number of Fields in the current line (record). This variable cannot be updated by the user. |
NR |
Number of lines processed so far. This variable cannot be updated by the user. |
Note: The -F option can be used to directly set the input field separator on the command line, for example: awk -F":" '{ print $1 }' file.txt.
Practical Awk Examples
- Split a comma-separated file and print the second field:
awk -F"," '{print $2}' file.txt - Print the third field of a CSV if the second field exists and is not empty:
awk -F"," '{if ($2) print $3}' file.txt - Print the last field in each line of a comma-separated file:
awk -F"," '{ print $NF }' file.txt - Print the line immediately following a line matching a specific pattern:
awk '/pattern/ { i=1; next; } {if(i) {i--; print;}}' file.txt - Print a line and the two lines following it after matching a regular expression:
awk '/regexp/ {i=3;} { if(i) {i--; print;}}' file.txt - Print lines from a file starting at a line matching "start" until a line matching "stop":
awk '/start/,/stop/' file.txt - Count the total number of lines in a file (equivalent to
wc -l):awk 'END{print NR}' file.txt - Print lines matching a pattern (equivalent to
grep):awk '/pattern/' - Print lines that do not match a pattern (equivalent to
grep -v):awk '!/pattern/' - Remove duplicate consecutive lines (equivalent to
uniq):awk 'a !~ $0 {print}; {a=$0}' file.txt - Print the first 10 lines of a file (equivalent to
head):awk 'NR < 11' file.txt - Print the last 10 lines of a file (equivalent to
tail):awk '{vect[NR]=$0;} END{for(i=NR-9;i<=NR;i++) {print vect[i];}}' file.txt - Calculate the total number of bytes used by files listed in
ls -loutput:ls -l | awk '{ x += $5 } END { print "Total bytes: " x }' - Read a CSV file and print the first and third fields, separated by a semicolon:
awk 'BEGIN{FS=",";OFS=";"}{print $1, $3}' file.txt
For more advanced text processing and data manipulation on the command line, Awk is an indispensable tool for developers and system administrators.