Awk Command Line Utility - Text Processing & Transformation

Learn Awk, a powerful command-line utility for finding, processing, and transforming text files. Explore patterns, actions, special variables, and practical examples.

Awk Command Line Utility

Understanding the Awk Command Line Utility

awk is a powerful command-line utility designed for finding, processing, and transforming text files. It operates on lines of text, treating each line as a record and breaking it down into fields. Its fundamental syntax revolves around the `pattern { action }` structure, where a pattern is matched against each input line, and if it matches, a specified action is executed.

Core Awk Concepts

  • Pattern Matching: Patterns can be regular expressions. When a pattern matches an input line, the associated action is performed. If no pattern is provided, the action is applied to every line. For example, /^HTTP/ {print} prints every line that starts with "HTTP".
  • BEGIN and END Patterns: Awk supports two special patterns:
    • BEGIN: Specifies actions to be performed before any input line is read.
    • END: Specifies actions to be performed after all input lines have been read.
    • Example: awk 'BEGIN{print "start"} {print} END{print "end"}' file.txt will first print "start", then all lines of file.txt, and finally "end".
  • Fields and Records: Awk interprets each line as a record. By default, one or more consecutive spaces or tabs act as delimiters between fields. Fields are accessed using $1, $2, etc., representing the first, second field, and so on. $0 represents the entire input line.
  • Data Types: Awk supports two primary data types: strings and numbers. It automatically converts variables based on the context.
  • Associative Arrays: Awk supports associative arrays, allowing you to store data using keys, like var[key] = value.
  • Arithmetic Operations: Basic arithmetic operations (+, -, *, /, %) are supported on numbers, along with autoincrement (++) and decrement (--) operators.

Awk Actions and Control Flow

Awk provides a set of actions to manipulate text data:

Action Description
{ print $0; } Prints the entire current record (line).
{ exit; } Terminates the Awk program immediately.
{ next; } Skips the rest of the actions for the current line and proceeds to the next line.
{ a=$1; b="X" } Assigns the value of the first field to variable 'a' and the string "X" to variable 'b'.
{ c[$1] = $2 } Assigns the value of the second field to an element in the associative array 'c', using the first field as the key.
{if (condition) { action } else if (condition) { action } else { action }} Implements conditional logic with if-else if-else statements.
{ for (i=1; i < x; i++) { action } } Executes a loop a specified number of times.
{ for (item in c) { action } } Iterates over the elements of an associative array.

Awk Special Variables for Text Processing

Awk utilizes several special variables to manage input and output processing:

Variable Description
FS Input Field Separator. This can be modified to change how fields are delimited (e.g., comma, colon).
RS Input Record Separator. Defaults to a newline character. Can be modified to process data with different record structures.
OFS Output Field Separator. Determines the separator used when printing multiple fields.
ORS Output Record Separator. Defaults to a newline character. Controls the separator between output records.
NF Number of Fields in the current line (record). This variable cannot be updated by the user.
NR Number of lines processed so far. This variable cannot be updated by the user.

Note: The -F option can be used to directly set the input field separator on the command line, for example: awk -F":" '{ print $1 }' file.txt.

Practical Awk Examples

  • Split a comma-separated file and print the second field:
    awk -F"," '{print $2}' file.txt
  • Print the third field of a CSV if the second field exists and is not empty:
    awk -F"," '{if ($2) print $3}' file.txt
  • Print the last field in each line of a comma-separated file:
    awk -F"," '{ print $NF }' file.txt
  • Print the line immediately following a line matching a specific pattern:
    awk '/pattern/ { i=1; next; } {if(i) {i--; print;}}' file.txt
  • Print a line and the two lines following it after matching a regular expression:
    awk '/regexp/ {i=3;} { if(i) {i--; print;}}' file.txt
  • Print lines from a file starting at a line matching "start" until a line matching "stop":
    awk '/start/,/stop/' file.txt
  • Count the total number of lines in a file (equivalent to wc -l):
    awk 'END{print NR}' file.txt
  • Print lines matching a pattern (equivalent to grep):
    awk '/pattern/'
  • Print lines that do not match a pattern (equivalent to grep -v):
    awk '!/pattern/'
  • Remove duplicate consecutive lines (equivalent to uniq):
    awk 'a !~ $0 {print}; {a=$0}' file.txt
  • Print the first 10 lines of a file (equivalent to head):
    awk 'NR < 11' file.txt
  • Print the last 10 lines of a file (equivalent to tail):
    awk '{vect[NR]=$0;} END{for(i=NR-9;i<=NR;i++) {print vect[i];}}' file.txt
  • Calculate the total number of bytes used by files listed in ls -l output:
    ls -l | awk '{ x += $5 } END { print "Total bytes: " x }'
  • Read a CSV file and print the first and third fields, separated by a semicolon:
    awk 'BEGIN{FS=",";OFS=";"}{print $1, $3}' file.txt

For more advanced text processing and data manipulation on the command line, Awk is an indispensable tool for developers and system administrators.