My Technical Works: February 2019

Friday, February 22, 2019

Introducing GIT

Git is a distributed Version control System. What is a distributed in here. Git has a feature that is not available in other Version Control systems and that is it allows developers to work on a project without requiring them to share a common network.

Much like in other Systems, Git maintains a Repository locally and developer will make all changes to the local. Once the developer thinks that changes needs to be pushed, then he commits changes from the local repository to the remote (main) repository.

The available version control tools are much like peer-to-peer approach. Git gives us the client-server approach. Rather than a single, central repository on which clients synchronize, each peer's working copy of the codebase is a complete repository

So every Git working directory in a machine is a full-fledged repository with complete history and full version tracking capabilities independent to the network access or a central server. Git when configured contains 2 data structures. A Stage location (or cache) that caches information regarding the working directory and next version to be committed. The other one is a object database

The files when pushed to the GIT repository are stored in the Object Database. It follows a process when storing the files,

1) Blob (Binary large Object) is stored with the contents of the file.

2) A Tree object which holds the structure of the directory being stored. This describes a snapshot of the source tree. This contains a list of file names with the blob information that has the file contents.

3) There exists another object like container which contains information regarding the commit object corresponding to a particular release of the data being tracked by Git.

The index serves as connection point between the object database and the working tree.

The above objects are identified by a SHA-A hash of its contents. The computation is done by GIT and uses the value for the object name. The object is put into a directory matching the first two characters of its hash. The rest of the hash is used as the file name for that object.

The blob objects are compressed using the Zlib compression. GIT also uses other compression tools to compress this Zlib blob files. Git servers typically listen on TCP port 9418

Git also provides ways to clean objects. Every object in the Git database which is not referred to may be cleaned up by using a garbage collection command, or automatically. This is due to the way blobs and objects are linked and references.

Why do we need GIT – Svn vs Git

As we do have many version control tools available in market? Why do we need to go to Git?

Git as said is distributed. This is the main difference.

So consider a case, where you want to go back to 3 years for some code. In other tools , this can be complex. The repository may be in a different location that we cannot reach or we cannot commit. Now If you want to make a copy of your code, you have to literally copy/paste it.

With Git, you do not have this problem. Your local copy is a repository, and you can commit to it and get all benefits of source control. When you regain connectivity to the main repository, you can commit against it.

Some other differences include,

1) Git has a Clean command. Every Source control tool dumps extra files , git provides us the facility to clean these with commands which still need to be available in SVN

2) SVN creates .svn directories in every single folder (Git only creates one .git directory). Every script you write, and every grep you do, will need to be written to ignore these .svn directories.

3) You have to tell SVN whenever you move or delete something. Git will just figure it out.

4) Ignore semantics – If you want to ignore a pattern to coming (such as *.pyc), it will be ignored for all subdirectories. But in SVN it is not possible.

5) GIT allows us to track content of the files rather than just files

6) Branches in GIT are light weight and easy to maintain

7) It's distributed, basically every repository is a branch. It's much easier to develop concurrently and collaboratively than with Subversion, in my opinion. It also makes offline development possible.

8) The staging area is awesome, it allows you to see the changes you will commit, commit partial changes and do various other stuff.

9) Git repositories are much smaller in file size than Subversion repositories. There's only one ".git" directory, as opposed to dozens of ".svn" repositories

10) When we are working with a subversion , we create working copes on the machine by checking-out version. This represents a snapshot in time of what the repository looks like. You update your working copy via updates, and you update the repository via commits.

But with GIT ,we don’t have a snapshot but a full codebase.

11) Want to check out code from last 3 months, we don’t need to connect to the remote repository as in SVN since in git it is available in local only

12) SVN is a single point of failure. That is when the repository on the remote machine fails all fails including the code base too but in the case GIT, every developer has his own repository and there is no single point of failure.

13) SSH with Git – It allows other developers to ssh to a GIT server on a developer machines and access the repository. This does not work in this case of SVN

Installation - Installing Git is very easy. Since git is developed Linus who developed Linux OS, git comes by default with the new versions of Linux.

[root@vx111a Downloads]# yum install git*

Loaded plugins: langpacks, product-id, subscription-manager

This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.

Resolving Dependencies

--> Running transaction check

---> Package git.x86_64 0:1.8.3.1-4.el7 will be installed

--> Processing Dependency: perl-Git = 1.8.3.1-4.el7 for package: git-1.8.3.1-4.el7.x86_64

--> Processing Dependency: perl(Git) for package: git-1.8.3.1-4.el7.x86_64

--> Running transaction check

---> Package perl-Git.noarch 0:1.8.3.1-4.el7 will be installed

--> Finished Dependency Resolution

Dependencies Resolved

Once the installation is done ,we can test the git using,

[root@vx11a] git –version

Git version 1.8.3.1

If we see the git version printed, we can confirm the installation is good. We can also use,

[root@vx11a] whereis git

/usr/bin/git

Managing code is always hard. In the Early days when we used to write code, we usually save them to a disk location for future references. One developer working on a project knows where the code is saved, what changed and how it works.

But what if the project is written by multiple people. Saving the code to a same location on the disk can be conflicting. One user makes the changes and other gets confused with them. Moving the code to production can be very confusing.

What is

We made a change to the code, realised it was a mistake and wanted to revert back?
We lost the code?
What is we lost the code and we had a backup which is very old then new code?
What if i need to maintain multiple versions of the same code for different projects?
How can we prove that a particular change has broken the code or fixed the code?
What is we need to submit a change to some others code?
What if we want to see how much work is being done, and where and by whom?
What if we need to experiment with the new feature without interfering with the working copy of the code?

Source Code Management System is an answer for the above problems?

Source Code Management System ( SCM ) or Version Control System ( VCS ) or Revision Control System ( RVCS ) is a system that records changes to a file or a set of files over time.

VCS allows you to track the history of a collection of files by creating different version of the collection of file ( or files). Each version captures a snapshot of the files at a certain point of time. Vcs allows to switch between these versions. These versions are stored place specific place called repository.

Vcs also allows us to revert files back to a previous state, revert a project back to its previous good state, compare changes over time, can show who made changes lastly, whose code introduced a bug etc.

Lastly if we screw things up or loose files we can easily recover using Vcs

Now that we understood what Vcs is , we will see the types of Vcs , their advantages and disadvantages.

Local Version Control System - In this system, developers store multiple version of the files in separate directories and used them when needed. This is a pretty easy and simple to use with smaller projects but very error prone.

Most of the times , it is easy to forget which directory you are in and accidently write to a wrong file or copy over files that we don't mean to. In order to deal with this situation we came up with the Local Version Control System.

Lvcs has a simple database that kept all the changed to files under revision control. A revision control system is capable of reverting a modification done to a file to its earlier state. It allows users to identify and correct errors and provide security to the data and information.

Though this used to work well, the problem with this is that if developers want to collaborate with other developers the Lvcs are not suitable. Since we make all the code changes or code saving to a local version control system, other developers working on different systems can’t access this database. In order to solve this problem , people came up with the Centralized Version Control System ( Cvcs ).

Centralized Version Control System - In order to solve multiple users accessing the Vcs , Cvcs was created. In this a Single independent server is maintained which will have all users versioned files, and multiple users or clients can connect to this machine to download the code or make changes to the code. The problem with this type of system is a single point of failure. If this single server containing all users code is crashed , then there is no way we can get back the code. Some of these type of tools include CVS, Subversion, perforce etc

In order to solve this single point of failure server, people came up with the Distributed Version Control System ( Dvcs )

Distributed Version Control System - In this type of system, there will be a single server where all the source code will be available. New users or existing users who want the code will download the code. In this case they don’t just download the code but mirror the full repository. If the machine where the source code exists crashes or dies , users don’t need to worry about the source code since every one will have the full mirror of the source code. Repositories mirrored on the client machine can be used to restore the repository on the server machine.

Finally what does the Version Control provides?

Backup/Restore: Files saved can be backed up or restored to a specific moment in time. Need a File to be changed to a version last year , we have that

Synchronization: Lets people share files and stay up-to-date with the latest version

Track Changes – As details about the files updated, merged, deleted will be available in the history maintained.

Branching and merging. A larger sandbox. You can branch a copy of your code into a separate area and modify it in isolation (tracking changes separately). Later, you can merge your work back into the common area.

My Technical Works

Pages

Friday, February 22, 2019

Introducing GIT

Understanding Version Control System