Pages

Wednesday, July 4, 2012

Archiving & Compression

Share it Please

File Compression is very useful in our daily activities. We use the compression for storing or moving large files. When we compress file, we actually shrink the files to save disk space. There are many compression algorithms available to perform the compression. Once we compress the files, we need to UN compress them to view them again. Linux provides various tools for performing the compression operations.

Compressing a File ( Gzip )
For performing the compression we can gzip ("GNU Zip") tool .It takes the arguments like the name of the file and also files that need to be compressed. The output will be a file with '.gz' extension. This uses Lempel-Ziv coding (LZ77).

 gzip one

We are trying to compress the "one" file. Once the command is executed, we can see one.gz available in the same location. The original file will be deleted

Decompressing a File
To access the contents of a compressed file, use gunzip to decompress it.

Like gzip, gunzip takes as an argument the name of the file or files to work on. It expands the specified files, writing the output to new files without the `.gz' extensions, and then deletes the compressed files.

gunzip one.gz (or) gzip -d one.gz

This command expands the file `one.gz' and puts it in a new file called `one'; gunzip then deletes the compressed file, `one.gz'.

Multiple Files can be concatenated in which gunzip will extract all members at one. For example:

       gzip -c one1  > sam.gz
       gzip -c one2 >> sam.gz

then gunzip -c sam is equivalent to cat one1 one2

Note: gunzip -c myfile.gz > myfile.txt

Uncompress the file myfile.tz to the myfile.txt file, however, don't delete the .gz file. This is useful if you don't want to delete the .gz and keep it with the uncompressed file.

Bzip Compression

bzip2 compresses files using the Burrows-Wheeler block sorting text compression algorithm, and Huffman coding. Compression is generally considerably better than that achieved by bzip command

The syntax will be same as gzip but the extension will be '.bz2'

bzip2 one

Decompressing a File: Decompress the file using

bzip2 -d one.bz2
bunzip one.bz2

NOTE: gzip vs. bzip2: bzip2 takes more time to compress and decompress than gzip. bzip2 archival size is less than gzip.

Zip compression

Zip compression is one of the most basic one in most operating systems. Compressing files using this zip command can be done,

zip sam.zip one two

The syntax will be little different zip {.zip-filename} {filename-to-compress}.In this compression, the original files are not deleted once the compression is done

Decompressing a File: Decompress the file using
unzip sam.zip

Viewing The Compressed contents

Zcat: We can use Zcat command to view the contents in the compressed files without uncompressing them. This is useful when we want to view but not to write any changes to it.

Zcat can also be used for viewing the contents of the files that does not have a '.gz' extension. Here is scenario

[root@vx111a test]# cat one
This is Jagadesh
This Is Kiran
This Is Tarun
This Is Jagan
This Is Madan
This Is Naren
This is Pavan
This Is Bhuvan

[root@vx111a test]# gzip one

[root@vx111a test]# mv one.gz one

[root@vx111a test]# gunzip one
gunzip: one: unknown suffix -- ignored

[root@vx111a test]# zcat one
This is Jagadesh
This Is Kiran
This Is Tarun
This Is Jagan
This Is Madan
This Is Naren
This is Pavan
This Is Bhuvan

Zless & Zmore: We can use Zless & Zmore commands to view the contents in the compressed files without uncompressing them. This is useful when we want to view but not to write any changes to it.

[root@vx111a test]# zcat filename.gz | more
[root@vx111a test]# zcat filename.gz | less

(or)

[root@vx111a test]# zless filename.gz
[root@vx111a test]# zmore filename.gz

Searching inside the compressed file with zgrep / zegrep

Linux also provides utilities for searching inside the compressed files using zgrep / zegrep. These commands are same as 'grep -i filename' where file name is an uncompressed file.

[root@vx111a test]# zgrep -i pavan one.gz
This is Pavan

Comparison & Difference

We can compare and find differences in the compressed files using zdiff / zcmp.

[root@vx111a test]# cat > file1
this is jagadesh
this is sam
[root@vx111a test]# cat file1
this is jagadesh
this is sam
[root@vx111a test]# cat > file2
this is jagadesh
this is ram

[root@vx111a test]# diff file1 file2
2c2
< this is sam
---
> this is ram
[root@vx111a test]# gzip file1
[root@vx111a test]# gzip file2
[root@vx111a test]# zdiff file1.gz file2.gz
2c2
< this is sam
---
> this is ram


[root@vx111a test]# zcmp file1.gz file2.gz
- /tmp/file2.xXFFTg7034 differ: byte 26, line 2

File Archives

An archive is a single file that contains a collection of other files, and often directories. Archives are usually used to transfer or make a backup copy of a collection of files and directories -- this way; you can work with only one file instead of many. This single file can be easily compressed as explained in the previous section, and the files in the archive retain the structure and permissions of the original files.

Tar Ball

Linux provides a utility call 'tar' which can be used to create, list and extract files from archives. The extension will be '.tar'.

    * Creating Archives: Creating an archive of files.
    * Listing Archives: Listing the contents of an archive.
    * Extracting Archives: Extracting the files from an archive.

Creating a File Archive : Creating a file achieve is done using,

[root@vx111a test]# tar -vcf sam.tar file1.gz file2.gz
file1.gz
file2.gz

The syntax will be much like,

tar -zcvf {.tgz-file} {files} : To compress files using gzip
tar -jcvf {.tbz2-file} {files} : To compress files using bzip2

This command creates an archive file called `sam.tar' containing the `file1.gz and file2.gz' directory and all of its contents. The original files remains unchanged.

Use the `-z' option to compress the archive as it is being written. This yields the same output as creating an uncompressed archive and then using gzip to compress it, but it eliminates the extra step.

To list the contents of a tar archive without extracting them, use tar with the `-t' option.

[root@vx111a test]# tar -tvf sam.tar
-rw-r--r-- root/root     10240 2012-07-04 15:09:33 file1.gz
-rw-r--r-- root/root        48 2012-07-04 15:03:07 file2.gz

Extracting Files from an Archive

To extract (or unpack) the contents of a tar archive, use tar with the `-x' ("extract") option.

tar -zxvf {.tgz-file} {files} : To un compress files using gzip
tar -jxvf {.tbz2-file} {files} : To un compress files using bzip2

[root@vx111a test]# tar -xvf sam.tar
file1.gz
file2.gz

Some more additional examples


tar -tf archive.tar : To Show the contents of the Archive
tar -xvf archive.tar -C /tmp : To Extract a Tar ball into Tmp
tar cvfj archive_name.tar.bz2 dirname : Creating a bzipped tar archive
tar tvf archive_name.tar  : Listing an Archive
tar xvf archive_file.tar /path/file : Extract a single file from tar
tar xvfz archive_file.tar.gz /path/file : Extract a single file from gz ( Carefull with the arguments )
tar xvfj archive_file.tar.bz2 /path/file : Extract a single file from bgiz ( Carefull with the arguments )
tar xvf archive_file.tar /path/to/dir/ : extract a Single directory
tar xvf archive_file.tar /path/dir1/ /path/dir2/ : Extract Multiple Directories
tar xvf archive_file.tar --wildcards '*.pl' : Extract all the files with pl extension
tar rvf archive_name.tar newfile : Add a file to the existing tar file
tar -cf - /directory/to/archive/ | wc -c : Tar size
tar --delete -f sam.tar ./sample : Delete a file from the tar
tar --wildcards --delete -f sam.tar './sam*' : Delete similar files from tar using wild cards
tar --list --verbose --file=music.tar practice : to find out about files in the directory `practice', in the archive file 'music.tar'

Happy Learning J