The thing I most frequently see
awk used for is in
pipelines --
one-liners, rather than true
scripts. One great way to use
awk is to split up
variable-length fields
delimited by some
character, or by
whitespace (the default). For example, the following:
who | awk '{print $1}'
prints a list of users currently
logged on the
system. In a longer pipeline, it can be more useful:
who | awk '{print $1}' | sort | uniq
This sorts the list of users alphabetically, then removes the
duplicates. (The
sort is a
prerequisite for the
uniq.)
The single-quotes around the awk statement are necessary to escape it from the shell. The curly braces with nothing before them indicate an awk statement which is to be executed for every line of the input file (stdin, in this case); the print $1 prints the first field on each line.
Fields are not always delimited by whitespace, however; the password file, for example, uses colons. Not to worry; awk -F changes the delimiter.
cat /etc/passwd | awk -F: '{print $1}'
This will print a list of all
users who
exist on the system; it simply prints the first field of each line of the password file, where fields are delimited by colons,
not whitespace.
Awk is very convenient just for all the programs of the form
something | awk '{something else}'
which perform some
operation unconditionally on every line of the first program's output (or a
file). This power is increased a hundredfold by another
simple addition:
regular expressions. I will not get into the gory details of constructing one here, but it is extremely
powerful to write a program of the form
something | awk '/expression/ {something 1}; {something 2}'
which will do "something 1" to matching lines and "something 2" to all lines. For example, to
comment out any line in a
perl program containing the word '
excrement', the following awk one-liner suffices:
cat program.old | awk '/excrement/ {$0 = "#" $0}; {print $0}' | cat > program.new
OR, more succinctly,
awk '/excrement/ {$0 = "#" $0}; {print $0}' < program.old > program.new
(NB: $0 represents the whole line.)
This has the effect of going,
line-by-line, through program.old and printing out each line, but for matching lines, first
prepending a "#". It looks funny, but try it --
it works.
Getting into the realm of Programs That Shouldn't Really Be One-Liners, we find many uses of awk in pipelines which are occasionally useful, but more often just fun to write. Most of these involve BEGIN and END expressions. Without wasting too much more of your precious time, a BEGIN expression is written 'BEGIN {something}' and an END expression is written 'END {something}'. Note the similarity to regular expression lines -- BEGIN matches before the program starts, and END matches after it is done. This allows things like:
cat file | awk 'BEGIN {lines = 0}; {lines++}; END {print lines}'
which is a fancy line-counter, not useful as such, but
unique in that it is the first program we have seen
thus far which keeps internal
state in the form of the lines
variable. The BEGIN is not strictly
necessary, which is often the case; END,
OTOH, is very useful for summarizing results, such as line counts, word counts, and the like. Now go write the hardest awk one-liners you can think of! It's a great mental exercise, and if you like a
challenge, you'll enjoy it. If you are really interested, read the awk manpage for more ways to match lines (before the {}) and more ways to manipulate them (inside the {}). I have only scratched the surface.