Sunday, October 7, 2018

Distributed File System - GlusterFS

There are many cases where applications would require accessing data or would require a place to upload files. To support these, we have a shared file system. Services like Nfs or Samba provides shared drives access to application running anywhere. What if we need to have high availability for the shared drives?

In the technology world, it is always crucial to keep data highly available to ensure it is accessible to every application/user. High availability of data is achieved by distributing the data across multiple nodes or multiple volumes in multiple nodes.
Client machines/users can access the storage as like local storage when mounted. The advantage is the volumes are configured as distributed. 

Imagine if the users are doing a heavy read/write operation on the same NFS volume. The Memory or cpu on the machine hosting the volume can become slow due to load. What if we can combine the memory and processing power of 2 machines and their individual discs to form a single volume accessed by the clients?. This is where the distributed file systems come into picture.

In computing, a distributed file system (DFS) or network file system is any file system that allows access to files from multiple hosts sharing via a computer network. This makes it possible for multiple users on multiple machines to share files and storage resources. So we can create 2 machines and have directories created which will be shared as single volume to the external world.

What is GlusterFS?
GlusterFs does the same thing of combining multiple storage servers to form a large, distributed drive. GlusterFs is a open source, scalable network file system suitable for high data intensive workloads such as media streaming, storage, content delivery etc
In this article we will see how we can configure GlusterFS on a Centos 7 Machines.

For this we will use 3 machines of which 2 are servers providing the volumes and other one acts as a client.
1.  Configure 3 machines with Centos 7. 

2. Add the details of the 3 machines to /etc/hosts file in all 3 machines. Below is my configuration     server1     server2        client

3. Add the extra repo details to the Centos 7 using the below commands,
rpm -ivh epel-release-latest-7.noarch.rpm

4. Create a Glusterfs repo. Create a file glusterfs.repo in /etc/yum.repos.d/glusterfs.repo with the below content

[root@manja17-I18062 ~]# cat /etc/yum.repos.d/gluster.repo
name=Gluster 4.1

5. Install and start the glusterfs, yum install glusterfs-server samba -y
    Start the service using ,  systemctl enable glusterd.service and
    systemctl start glusterd.service

6. Check the glusterfs version
[root@manja17-I18063 ~]# glusterfsd --version
glusterfs 4.1.5
Repository revision: git://
Copyright (c) 2006-2016 Red Hat, Inc.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.

7. Disable the firewall using, systemctl stop firewalld

8. If you use a firewall, we need to make sure Tcp ports 111, 24007, 24008,24009 are open on the server1 and server2 if we have not disabled the firewall with the above step

9. Next we need to configure the trusted pool storage. We will be adding server2 as a trusted pool to server1. For this we will run the glusterfs command from server1.

[root@manja17-I18062 ~]# gluster peer probe
peer probe: success.

10. Next check the status using,
[root@manja17-I18062 ~]# gluster peer status
Number of Peers: 1

Uuid: 9964b028-585d-4cf1-b67b-4bbf9be2a976
State: Peer in Cluster (Connected)

11. Now Let's create a share named testervol with  two replicas. We need to understand that the number of replicas are equal to the number of servers that we configured since we need to set up mirroring.   We will be configuring this directory on both machines with location /data and to the external world it will shown as testervol.

[root@manja17-I18062 ~]# gluster volume create testervol replica 2 transport tcp force
volume create: testervol: success: please start the volume to access data

Once this is done,we will see /data location created in both machines.

12. Start the Volume
[root@manja17-I18063 data]# gluster volume start testervol
volume start: testervol: success

13 . Check the volume info using
[root@manja17-I18063 data]# gluster volume info
Volume Name: testervol
Type: Replicate
Volume ID: 7196569d-2a27-4cc9-9918-acb952b42549
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off

By default all clients can access the volume defined. We need to define some access controls on who will access these.

Setting the GlusterFs Client ( on the client Machine )
1. Create a directory in the /mnt location with glusterfs
[root@manja17-I18064 ~]# mkdir /mnt/glusterfs

2. Mount the volume
[root@manja17-I18064 ~]# mount.glusterfs /mnt/glusterfs

3. Check for the volume
[root@manja17-I18064 ~]# mount | grep testervol on /mnt/glusterfs type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

[root@manja17-I18064 ~]# df -h
Filesystem                          Size     Used    Avail   Use% Mounted on
/dev/mapper/centos-root     50G     4.1G   46G      9%     /
devtmpfs                            3.9G    0        3.9G     0%    /dev
/dev/sdb1                          60G      33M    60G      1%    /loddisk2
/dev/sda1                          1014M  161M  854M    16%   /boot   50G  4.7G   46G  10% /mnt/glusterfs

Test the volume,
On client run , [root@manja17-I18064 ~]# touch /mnt/glusterfs/test1

Check on the server1 or server2
[root@manja17-I18062 data]# pwd

[root@manja17-I18062 data]# ls

More to Come,Happy learning :-)

No comments :

Post a Comment