Today we will work on AWK command line
tool in Unix .This article helps you to have a basic understanding on how awk
works and some of the internal structure of awk.
Awk is a simple and elegant pattern
scanning and processing language. It is created in the late 70’s.the name was
composed from the initial letters of three original authors Alfred V. Aho,
Brian W. Kernighan, and Peter J. Weinberger. It is commonly used as a command
line filter in pipers to reformat the output of other commands.
Some others features of awk are
Its ability to view a text
file as made up of records and fields in a textual database.
Its use of variables to
manipulate the database.
Its use of arithmetic and
string operators.
Its use of common programming
constructs such as loops and conditionals.
Its ability to generate
formatted reports.
awk takes two inputs : data
file & command line
the command line file can be
absent and necessary commands can be sent as an argument.
An awk would take the command
line syntax as ,
awk ‘{pattern + action}’
{filename}
The pattern represents what
awk is looking for in the data and action is a series of commands executed when
a match for the pattern is found. Curly brackets are not always required around
your program, but they can be used to group series of instructions based on a
specific pattern.
Understanding Fields: a
common use of awk is to process files by formatting and then displaying the
necessary data from files.awk separates each input file into Records. A Record
is nothing but a single line of input and each Record contains multiple fields.
The field separator is space or tab and can be changed.
Let’s see a simple text file
to illustrate awk. Create a file student with the following data
jagadesh 10
2010 h
kiran 30
1997 k
pavan 123
10000 n
pavan 345
2009 j
jagan 345
400 p
madan 345
2007 k
naren 1009
1200 l
gagan 234
100 m
Now we will try a simple awk
command awk
'{print $1 " " $2}' student
this command prints the first
field and second field. We used the awk along with a pattern and a file name,
in the form of arguments.
Try this : awk '{print "first
Name is MR." $1 " " $2}' student
Working With Patterns : an
awk can contain a pattern and a procedure ,
Pattern { procedure }
Both are optional, if pattern
is missing , { procedure } is applied to all lines , if {procedure } is missing
, the matched line is printed.
A pattern can take the
following form ,
/regular expression/
Relational expression
Pattern-matching expression
BEGIN && END
Regular Expression: let’s
search for a string in the file above , in order to use a regular expression ,
we need to write as /String to Search /
. let write a simple command to search a “jagadesh”
awk '/jagadesh/' student : This command searches for the string jagadesh and prints
the data related to it.
Relational expression: we can
use a releational expression in order to retrieve results , let’s see a simple
one , awk '$2==10' student
we can also print firleds
that we are intersted in awk
'/kiran/ {print $2}' student
print the second field in the
record which has a matching record to the pattern
multiple commands for the
same set of data can be used by using a ; between them like
awk
'/10/ {print $1 ;print $2}' student .print the
first field and second field with the matching pattern from the student data
file.
We can also insert the field
separators like new line '\n' , new tab '\t' e.t.c to display data in a
appropriate way.
Searching data with multiple
patterns is also possible with awk , this can done by including a ‘|’ pipe in
the awk command as awk ‘/jagadesh|2010/’ student : in this iam searching for records with jagadesh and
2010 in them.
Now lets try a more advanced
example of searching for k in file
awk ‘/k/’ student :
this returns all the records with ‘k’ in them . from the above data file we
will get kiran and madan who has k in their records . but I want to get records
whose first field contains ‘k’ . This is where we get pattern matching regular
expression comes into position.
Pattern Matching Expressions:
as said above if we need to check for a particular field by out pattern, we
will be using the pattern matching regular expression as
awk '$1 ~ /k/' student : the “~” tilde operator makes sure that the k is being
search in the first field only . this gives us the result as only 1 record with
[kiran].
Similarly we can search for a
4 in 3rd row as
awk ‘$3 ~ /4/’ student
The opposite to the tilde
operator is negotiation operator ‘!~’ which gives all the records form the one
that we are currently searching like
awk ‘$1 !~ /k/’ student : displays all the records that don’t have ‘k’ in their
first field.
Before going to the other
patten matching expression ‘BEGIN’ and ‘END’ , we will have a look at the awk
built in variable and operators that awk support .
Build in Variables: awk
provides some built in variable which can be used while performing a search on
a data file. These are the built in variable available in awk
FS: filed separator
NF: number of fields
NR: number of current row
OFMT: output format for
numbers “%.6g” and for conversion to string
OFS: output field separator
ORS: output record separator
RD: record separator
$0: entire input record
$n: nth field in current
record
We can use the awk built in
variables to get better results.If we need to get the data depending on the
number of fields , we can write as awk ‘NF==4’ student , which gets all the records which has 4
fields
awk ‘NF==4 &&
/jagadesh/’ student : retrieves all the records
which has 4 fields and has ‘jagadesh’ in it.
Operators: the following are
the operators available in the awk ,
= += -= *= /= %= ^= **= : Assignment
|| : Logical
OR (short-circuit)
&&
: Logical AND (short-circuit)
~ !~ : Match regular expression and negation
< <= > >= != ==
: Relational operators
(blank)
: Concatenation
+ -
: Addition, subtraction
* / %
: Multiplication, division, and modulus (remainder)
+ - !
: Unary plus and minus, and logical negation
^ **
: Exponentiation
++ -- : Increment and decrement, either prefix or
postfix
$
: Field reference
BEGIN && END : a
begin and end pattern rules can be applied to get better results . a beging
rule is executed before the first records is read and an end rule is executed
after all records are read. Normally, awk executes
each block of your script's code once for each input line. However, there are
many programming situations where you may need to execute initialization code
before awk begins processing the text from the input file. For such situations,
awk allows you to define a BEGIN block. We used a BEGIN block in the previous
example. Because the BEGIN block is evaluated before awk starts processing the
input file, it's an excellent place to initialize the FS (field separator)
variable, print a heading, or initialize other global variables that you'll
reference later in the program.
Awk also provides another
special block, called the END block. Awk executes this block after all lines in
the input file have been processed. Typically, the END block is used to perform
final calculations or print summaries that should appear at the end of the
output stream.
Lets see a simple syntax
awk '
> BEGIN { print "jagadesh" }
> /jagadesh/
> END { print "done" }'
student
What iam doing here is iam
searching for a pattern /jagadesh/ , before reading the first record , I want
to print “jagadesh” then search the string and print results. After all records
are read , I want to print “done” .
An
awk program may have multiple BEGIN and/or END rules. They are executed in the
order in which they appear: all the BEGIN rules at startup and all the END
rules at termination. BEGIN and END rules may be intermixed with other rules. Multiple BEGIN and END rules are useful for writing
library functions, because each library file can have its own BEGIN and/or END
rule to do its own initialization and/or cleanup. The order in which library
functions are named on the command line controls the order in which their BEGIN
and END rules are executed.
Some examples ,
Print the firstNames in the
Data File :
awk '
> BEGIN { print "First
Names" }
> { print $1 }' student
Display values of the 2,3 and
4 columns
awk '
> BEGIN { print "Names
" }
> BEGIN { print
"------" }
> { print $2+$3+$4 }
> END { } ' student
awk '/jagadesh/ {++x} END
{print x}' student
awk '{total +=2 } END {print
total }' student
We can dig into for more
examples
Empty Pattern : A Empty
pattern is considered as a match to every record in file.
awk ‘ { print $0 } ‘ student
Variables: variables in awk
are assigned by “=” operator , like
FS=”,”
Arrays: Arrays in Awk are
associate arrays , that is they contain a index and a associated value to the
index.
Element 3 Value 30
Element 1
Value "foo"
Element 0 Value 8
Element 2 Value ""
The
pairs are shown in jumbled order because their order is irrelevant. One
advantage of associative arrays is that the elements can be added at any time.
Array
can be created as
arr[0]=”jagadesh”
or
for(i=0;i<5;i++)
arr[i]=i
iterating
over arrays : awk has a handy mechanism for iterating over arrays , it has for
construct as follows ,
for(x in myarray)
print myarray[x]
elemets
in the array can be deleted by using the delete in awk
delete myarray[1]
Escape
Sequences :
Within
string and regular expression constants, the following escape sequences may be
used. Note: The \x escape sequence is a common extension.
Sequence
Meaning Sequence Meaning
\a Alert (bell)
\v Vertical tab
\b Backspace
\\ Literal backslash
\f Form feed
\nnn
Octal value nnn
\n
Newline
\xnn
Hexadecimal value nn
\r
Carriage return
\"
Literal double quote (in strings)
\t
Tab
\/
Literal slash (in regular expressions
Functions:
lets move to more advanced concept of using functions and writing our own.
There
are 2 types of functions availalable ,
Built
in (&&)
User
–defined
Built
in function comes under 3 types I/o ,
String and math . To call one of awk's built-in functions,
write the name of the function followed by arguments in parentheses.a simple
syntax is
awk ‘
{ print sqrt(16) }‘ student
awk
provides functions that work on numbers like sin(x) ,tan(x),sqrt(x) . string
functions like getting the length of string , spiriting the string e.t.c.and
even I/o ,I18n and even functions on Time and Date.
These
are the basics of Awk .
A
more detailed information can be found at
http://www.gnu.org/manual/gawk/html_node/index.html