Pages

Sunday, December 27, 2015

What is GIT

Git is a distributed Version control System. What is a distributed in here. Git has a feature that is not available in other Version Control systems and that is it allows developers to work on a project without requiring them to share a common network.

Much like in other Systems, Git maintains a Repository locally and developer will make all changes to the local. Once the developer thinks that changes needs to be pushed, then he commits changes from the local repository to the remote (main) repository.

The available version control tools are much like peer-to-peer approach. Git gives us the client-server approach. Rather than a single, central repository on which clients synchronize, each peer's working copy of the codebase is a complete repository
So every Git working directory in a machine is a full-fledged repository with complete history and full version tracking capabilities independent to the network access or a central server
Git when configured contains 2 data structures. A Stage location (or cache) that caches information regarding the working directory and next version to be committed. The other one is a object database
The files when pushed to the GIT repository are stored in the Object Database. It follows a process when storing the files,
1) Blob (Binary large Object) is stored with the contents of the file.
2) A Tree object which holds the structure of the directory being stored. This describes a snapshot of the source tree. This contains a list of file names with the blob information that has the file contents.
3) There exists another object like container which contains information regarding the commit object corresponding to a particular release of the data being tracked by Git

The index serves as connection point between the object database and the working tree.
The above objects are identified by a SHA-A hash of its contents. The computation is done by GIT and uses the value for the object name. The object is put into a directory matching the first two characters of its hash. The rest of the hash is used as the file name for that object.

The blob objects are compressed using the Zlib compression. GIT also uses other compression tools to compress this Zlib blob files. Git servers typically listen on TCP port 9418

Git also provides ways to clean objects. Every object in the Git database which is not referred to may be cleaned up by using a garbage collection command, or automatically. This is due to the way blobs and objects are linked and references.

Why do we need GIT – Svn vs Git

As we do have many version control tools available in market? Why do we need to go to Git. 
Git as said is distributed. This is the main difference. 
So consider a case, where you want to go back to 3 years for some code. In other tools , this can be complex. The repository may be in a different location that we cannot reach or we cannot commit. Now If you want to make a copy of your code, you have to literally copy/paste it.

With Git, you do not have this problem. Your local copy is a repository, and you can commit to it and get all benefits of source control. When you regain connectivity to the main repository, you can commit against it.
Some other differences include,

1) Git has a Clean command. Every Source control tool dumps extra files , git provides us the facility to clean these with commands which still need to be available in SVN

2) SVN creates .svn directories in every single folder (Git only creates one .git directory). Every script you write, and every grep you do, will need to be written to ignore these .svn directories.

3) You have to tell SVN whenever you move or delete something. Git will just figure it out.
4) Ignore semantics – If you want to ignore a pattern to coming (such as *.pyc), it will be ignored for all subdirectories. But in SVN it is not possible.

5) GIT allows us to track content of the files rather than just files

6) Branches in GIT are light weight and easy to maintain

7) It's distributed, basically every repository is a branch. It's much easier to develop concurrently and collaboratively than with Subversion, in my opinion. It also makes offline development possible.

8) The staging area is awesome, it allows you to see the changes you will commit, commit partial changes and do various other stuff.

9) Git repositories are much smaller in file size than Subversion repositories. There's only one ".git" directory, as opposed to dozens of ".svn" repositories

10) When we are working with a subversion , we create working copes on the machine by checking-out version. This represents a snapshot in time of what the repository looks like. You update your working copy via updates, and you update the repository via commits.
But with GIT ,we don’t have a snapshot but a full codebase.
11) Want to check out code from last 3 months, we don’t need to connect to the remote repository as in SVN since in git it is available in local only
12) SVN is a single point of failure. That is when the repository on the remote machine fails all fails including the code base too but in the case GIT, every developer has his own repository and there is no single point of failure.
13) SSH with Git – It allows other developers to ssh to a GIT server on a developer machines and access the repository. This does not work in this case of SVN

More To come J

No comments :

Post a Comment