Writing Shell Scripts
is very important for a administrator to automate various tasks. It is also important
for him to write code which will not cause any performance impact. There are many
other programming languages available in linux like Perl, Python and PHP which
provide more benefits while processing data. In this article we will see the 2
important tools of linux “sed and awk“ and how they impact the performance when
used in Scripts.
Sed – Stream Editor
Sed is a stream
editor , non-interactive text editor. It is called as stream editor since it
works on streams of characters on a per-line basis. It has a Primitive
programming language structure, go-to style loops and simple conditionals. This
also contains pattern matching features. There are only 2 variables available
in this pattern Space and Hold Space. All Commands in the sed script are
applied in order to the each input line.
One important element
that we need to understand with sed is Pattern Space and Hold Space.
So when a sed reads
a file line by line, the line line that is currently read is inserted into the
pattern Space. This is much like a temporary buffer where the current
information is stored whereas the hold space is long term storage, so that we
can store some thing and use that later when sed is processing another line.
One important thing here is that we cannot process the pattern space rather we
need to copy or append to the current pattern space if we want to process it.
Consider a Example much
like tac command in linux,
sed -n '1!G;h;$p'
In this case ,
there are 3 commands “1!G” “h” and “$p”. “1!G” is first line but ”!” means that
the command will not be applied to the first line rather than second line
onwards. So lets see how this works
- first line is read and inserted
automatically into the pattern space
- On the first line, first command is
not executed; h copies the first line into the hold space.
- Now the second line replaces whatever
was in the pattern space
- On the second line, first we execute G,
appending the contents of the hold buffer to the pattern buffer,
separating it by a newline. The pattern space now contains the second
line, a newline, and the first line.
- Then, h command inserts the
concatenated contents of the pattern buffer into the hold space, which now
holds the reversed lines two and one.
- We proceed to line number three -- go
to the point (3) above.
Sed Architecture
Advantages
1) Sed is
considered fast
2) regular
expression usage with sed
Drawbacks
1) Cannot remember
text from one line to another
2) Not possible to
go backward in a file
3) Math can be
extremely hard
4) Syntax can be
cumbersome some times
AWK
Awk is oriented
towards fields on a per-line basis. It has much more better programming
constructs including if/else,while , do/while and for loops.
Awk has support for
Variables and arrays.. Mathematical operations resemble those in C. It has printf and
functions.
GNU awk (gawk)
has numerous extensions, including true multidimensional arrays in the latest
version. There are other variations of awk including mawk and nawk.
Advantages
1) awk can maintain
state and can operate using multiple passes over the same data
2) awk has better
programming structure with variable support and array support.
When to use?
1) Both programs
use regular expressions for selecting and processing text.
2) I would tend to
use sed where there are patterns in the text and use awk when
the text looks more like rows and columns or, as awk refers to them
"records" and "fields".
3) One main
difference is that an awk program can maintain state and can operate using
multiple passes over the same data. A sed invocation is necessarily stateless
single-pass because sed (Stream EDitor) is inherently stream-oriented.
4) One important
difference between the utilities is that while shell scripts can easily pass
arguments to sed, it is more cumbersome for awk
More to Come, happy
learning
No comments :
Post a Comment