Pages

Tuesday, June 2, 2015

Sed vs Awk

Writing Shell Scripts is very important for a administrator to automate various tasks. It is also important for him to write code which will not cause any performance impact. There are many other programming languages available in linux like Perl, Python and PHP which provide more benefits while processing data. In this article we will see the 2 important tools of linux “sed and awk“ and how they impact the performance when used in Scripts.

Sed – Stream Editor
Sed is a stream editor , non-interactive text editor. It is called as stream editor since it works on streams of characters on a per-line basis. It has a Primitive programming language structure, go-to style loops and simple conditionals. This also contains pattern matching features. There are only 2 variables available in this pattern Space and Hold Space. All Commands in the sed script are applied in order to the each input line.

One important element that we need to understand with sed is Pattern Space and Hold Space.

So when a sed reads a file line by line, the line line that is currently read is inserted into the pattern Space. This is much like a temporary buffer where the current information is stored whereas the hold space is long term storage, so that we can store some thing and use that later when sed is processing another line. One important thing here is that we cannot process the pattern space rather we need to copy or append to the current pattern space if we want to process it.

Consider a Example much like tac command in linux,
sed -n '1!G;h;$p'

In this case , there are 3 commands “1!G” “h” and “$p”. “1!G” is first line but ”!” means that the command will not be applied to the first line rather than second line onwards. So lets see how this works
  1. first line is read and inserted automatically into the pattern space
  2. On the first line, first command is not executed; h copies the first line into the hold space.
  3. Now the second line replaces whatever was in the pattern space
  4. On the second line, first we execute G, appending the contents of the hold buffer to the pattern buffer, separating it by a newline. The pattern space now contains the second line, a newline, and the first line.
  5. Then, h command inserts the concatenated contents of the pattern buffer into the hold space, which now holds the reversed lines two and one.
  6. We proceed to line number three -- go to the point (3) above.
Sed Architecture























Advantages
1) Sed is considered fast
2) regular expression usage with sed

Drawbacks
1) Cannot remember text from one line to another
2) Not possible to go backward in a file
3) Math can be extremely hard
4) Syntax can be cumbersome some times

AWK
Awk is oriented towards fields on a per-line basis. It has much more better programming constructs including if/else,while , do/while and for loops.
Awk has support for Variables and arrays.. Mathematical operations resemble those in C. It has printf and functions.

GNU awk (gawk) has numerous extensions, including true multidimensional arrays in the latest version. There are other variations of awk including mawk and nawk.
Advantages
1) awk can maintain state and can operate using multiple passes over the same data
2) awk has better programming structure with variable support and array support.

When to use?
1) Both programs use regular expressions for selecting and processing text.

2) I would tend to use sed where there are patterns in the text and use awk when the text looks more like rows and columns or, as awk refers to them "records" and "fields".

3) One main difference is that an awk program can maintain state and can operate using multiple passes over the same data. A sed invocation is necessarily stateless single-pass because sed (Stream EDitor) is inherently stream-oriented.

4) One important difference between the utilities is that while shell scripts can easily pass arguments to sed, it is more cumbersome for awk

More to Come, happy learning

No comments :

Post a Comment