awk (idea) by vyrus - Everything2.com

The thing I most frequently see awk used for is in pipelines -- one-liners, rather than true scripts. One great way to use awk is to split up variable-length fields delimited by some character, or by whitespace (the default). For example, the following:

who | awk '{print $1}'

prints a list of users currently logged on the system. In a longer pipeline, it can be more useful:

who | awk '{print $1}' | sort | uniq

This sorts the list of users alphabetically, then removes the duplicates. (The sort is a prerequisite for the uniq.)

The single-quotes around the awk statement are necessary to escape it from the shell. The curly braces with nothing before them indicate an awk statement which is to be executed for every line of the input file (stdin, in this case); the print $1 prints the first field on each line.

Fields are not always delimited by whitespace, however; the password file, for example, uses colons. Not to worry; awk -F changes the delimiter.

cat /etc/passwd | awk -F: '{print $1}'

This will print a list of all users who exist on the system; it simply prints the first field of each line of the password file, where fields are delimited by colons, not whitespace.

Awk is very convenient just for all the programs of the form

something | awk '{something else}'

which perform some operation unconditionally on every line of the first program's output (or a file). This power is increased a hundredfold by another simple addition: regular expressions. I will not get into the gory details of constructing one here, but it is extremely powerful to write a program of the form

something | awk '/expression/ {something 1}; {something 2}'

which will do "something 1" to matching lines and "something 2" to all lines. For example, to comment out any line in a perl program containing the word 'excrement', the following awk one-liner suffices:

cat program.old | awk '/excrement/ {$0 = "#" $0}; {print $0}' | cat > program.new

OR, more succinctly,

awk '/excrement/ {$0 = "#" $0}; {print $0}' < program.old > program.new

(NB: $0 represents the whole line.)
This has the effect of going, line-by-line, through program.old and printing out each line, but for matching lines, first prepending a "#". It looks funny, but try it -- it works.

Getting into the realm of Programs That Shouldn't Really Be One-Liners, we find many uses of awk in pipelines which are occasionally useful, but more often just fun to write. Most of these involve BEGIN and END expressions. Without wasting too much more of your precious time, a BEGIN expression is written 'BEGIN {something}' and an END expression is written 'END {something}'. Note the similarity to regular expression lines -- BEGIN matches before the program starts, and END matches after it is done. This allows things like:

cat file | awk 'BEGIN {lines = 0}; {lines++}; END {print lines}'

which is a fancy line-counter, not useful as such, but unique in that it is the first program we have seen thus far which keeps internal state in the form of the lines variable. The BEGIN is not strictly necessary, which is often the case; END, OTOH, is very useful for summarizing results, such as line counts, word counts, and the like. Now go write the hardest awk one-liners you can think of! It's a great mental exercise, and if you like a challenge, you'll enjoy it. If you are really interested, read the awk manpage for more ways to match lines (before the {}) and more ways to manipulate them (inside the {}). I have only scratched the surface.

World's most narrowly useful programming language	SED	Perl	B5
The Programming Languages Genealogy Project	Mastering Regular Expressions	How to get a pseudo random .signature	gawk
Whoever You Are Holding Me Now in Hand	Dijkstra's algorithm	People with programming languages named after them	Avatar
The Unix Programming Environment	Auk	Brian Kernighan	Alfred V. Aho
a2p	How many melodies are there in the universe?	Mawk	maul
/proc	C

awk (idea)

Page category: