Text
Processing is one of the important Operations when working with a
Linux Environment. This is more complex in linux since we need to be
familiar with the available tools and commands for text processing in
linux. Even though we have Word Processing tools available in linux
we need to be familiar with commands provided by linux since we may
work with linux with no GUI mode.
The
following article tells about the available tools and command
provided by linux in handling the Text processing Operations.
Concat
Text
Cat
is a command available in linux to concat text.We can also do various
other operations using the cat command
[root@vx111a
~]# cat text1 # Displays the contents of the text1 files
[root@vx111a
~]# cat text1 text2 # Concats the contents of the both text1 and
text2 files
[root@vx111a
~]# cat text1 >> text2 # Push the content of the text1 to text2
Head
and Tail
The
head and tail commands let you preview the first or last bits of a
text file you’re working with, letting you narrow down the file
that you need without opening it in a text editor.
The
head command lets you view the first part of a text file.The syntax
will be
head
text1
The
tail command lets you view the last part of a text file. The syntax
will be
tail
text1
Controlling
the lines
We
can control the number of lines that can be display with either head
or the tail command like
head
-n 30 text1 #displays the first 30 lines
tail
-n 30 text1 # displays the last 30 lines
tr
The tr command is used to translate specified characters into
other characters or to delete them.
[root@vx111a ~]# echo hello world | tr -s "hl" "kf"
kefo worfd
[root@vx111a ~]# echo hello world | tr -s "hl" "kf"
kefo worfd
[root@vx111a ~]# echo hello world | tr -s "hl" "kf"
kefo worfd
pr
The
pr command is used to format files for printing. The default header
includes the file name and file creation date and time, along with a
page number and two lines of blank footer.
[root@vx111a ~]# pr text1 | head
2011-12-19 21:49 text1 Page 1
1 apple
2 pear
3 banana
[root@vx111a ~]# pr text1 | head
2011-12-19 21:49 text1 Page 1
1 apple
2 pear
3 banana
nl
nl
Command is used for numbering the lines like,
[root@vx111a
~]# nl text1
1 1 apple
2 2 pear
3 3 banana
1 1 apple
2 2 pear
3 3 banana
look
look command is used to display strings that begin with the given strings.In other words, look command can be used to check the spelling of a word, by giving the words prefix.
The command look works like grep, but does a lookup on a "dictionary," a sorted word list. By default, look searches for a match in /usr/dict/words, but a different dictionary file may be specified.
look command is used to display strings that begin with the given strings.In other words, look command can be used to check the spelling of a word, by giving the words prefix.
The command look works like grep, but does a lookup on a "dictionary," a sorted word list. By default, look searches for a match in /usr/dict/words, but a different dictionary file may be specified.
sort
The sort command sorts the input using the collating sequence for the locale (LC_COLLATE) of the system. The sort command can also merge already sorted files and check whether a file is sorted or not.
[root@vx111a ~]# sort text2
10 apple
3 banana
9 plum
[root@vx111a ~]# tsort text1 # perform topological sort
1
2
3
4
pear
banana
apple
The sort command sorts the input using the collating sequence for the locale (LC_COLLATE) of the system. The sort command can also merge already sorted files and check whether a file is sorted or not.
[root@vx111a ~]# sort text2
10 apple
3 banana
9 plum
[root@vx111a ~]# tsort text1 # perform topological sort
1
2
3
4
pear
banana
apple
uniq
uniq can be used to display, count, or delete adjacent duplicate lines from a file or standard input (stdin). If duplicate lines in a file are not adjacent to one another, uniq will not treat them as duplicates:
[root@vx111a ~]# cat samp
unix commands
shell script
command prompt
unix commands
unix system administration
shell script
unix commands
uniq can be used to display, count, or delete adjacent duplicate lines from a file or standard input (stdin). If duplicate lines in a file are not adjacent to one another, uniq will not treat them as duplicates:
[root@vx111a ~]# cat samp
unix commands
shell script
command prompt
unix commands
unix system administration
shell script
unix commands
[root@vx111a
~]# sort samp | uniq
command prompt
shell script
unix commands
unix system administration
command prompt
shell script
unix commands
unix system administration
diff
diff is a command available in linux which checks for the difference between 2 files like,
[root@vx111a ~]# echo "this is jagadesh" >> s1
[root@vx111a ~]# echo "jagadesh is this" >> s2
[root@vx111a ~]# diff s1 s2
1c1
< this is jagadesh
---
> jagadesh is this
The
diff command can also recursively compare directories (for the
filenames present).
[root@vx111a ~]# diff -r ~/notes1 ~/notes2
Only in /home/bozo/notes1: file02
Only in /home/bozo/notes1: file03
Only in /home/bozo/notes2: file04
cut
You
can use this command to extract portion of text from a file by
selecting columns.
$
cat test.txt
cat
command for file oriented operations.
cp
command for copy files or directories.
ls
command to list out files and directories with its attributes.
[root@vx111a
~]# cut -c2 test.txt #second Character from file
a
p
s
[root@vx111a
~]# cut -c1-3 test.txt #Column of Characters using Range
cat
cp
ls
[root@vx111a
~]# cut -d':' -f1 /etc/passwd #Specific Field from a File
root
daemon
bin
sys
sync
games
bala
join
join
is a command available in linux which helps in joining 2 files based
on a similar field available in both files like
[root@vx111a
~]# cat s1
100 Shoes
200 Laces
300 Socks
100 Shoes
200 Laces
300 Socks
[root@vx111a ~]# cat s2
100 $40.00
200 $1.00
300 $2.00
[root@vx111a ~]# join s1 s2
100 Shoes $40.00
200 Laces $1.00
300 Socks $2.00
fold
A
filter that wraps lines of input to a specified width. This is
especially useful with the -s option, which breaks lines at word
spaces
This
is much like a command line utility to make a text file word wrap.
File
File
is a command available in linux which tells you about the file type
like
[root@vx111a
~]# file perl
perl: directory
perl: directory
[root@vx111a
~]#file td.log
td.log:
ASCII text
Rev
rev
is a command available in linux which reverses the contents of the
files
[root@vx111a
~]# echo "hai hello" > none
[root@vx111a
~]# rev none
olleh
iah
Source
The source command in shell is used to execute commands from a file in the current shell. This is useful to load function or variables stored in another file.
Source
The source command in shell is used to execute commands from a file in the current shell. This is useful to load function or variables stored in another file.
Consider
if we change any of the things in ~/.bashrc , ~/.bash_profile files
by adding new Env variables we can just use the
source
~/.bashrc rather than using a re-login.
Strings
print
the strings of printable characters in files.
strings
filename | more
Display the strings contained in the binary file called filename. "strings" could, for example, be a useful first step to a close examination of an unknown executable.
Display the strings contained in the binary file called filename. "strings" could, for example, be a useful first step to a close examination of an unknown executable.
cmp
The cmp command is a simpler version of diff, above. Whereas diff reports the differences between two files, cmp merely shows at what point they differ.
[root@vx111a ~]# echo "hello" > s1
The cmp command is a simpler version of diff, above. Whereas diff reports the differences between two files, cmp merely shows at what point they differ.
[root@vx111a ~]# echo "hello" > s1
[root@vx111a
~]# echo "mello" > sw
[root@vx111a
~]# cmp s1 sw
s1
sw differ: byte 1, line 1
paste
paste
is a command available in linux used to merge lines of
files
[root@vx111a ~]# echo "hello" > s1
[root@vx111a ~]# echo "hello" > s1
[root@vx111a
~]# echo "mello" > sw
[root@vx111a
~]# paste s1 sw
hello
mello
expand
The
expand command converts all tabs to spaces
unexpand
Unfortunately, you cannot use unexpand to replace the spaces in text1 with tabs, as unexpand requires at least two spaces to convert to tabs.
Unfortunately, you cannot use unexpand to replace the spaces in text1 with tabs, as unexpand requires at least two spaces to convert to tabs.
More
To Come , Happy learning :-)