My Technical Works: Distributed File System

There are many cases where applications would require accessing data or would require a place to upload files. To support these, we have a shared file system. Services like Nfs or Samba provides shared drives access to application running anywhere. What if we need to have high availability for the shared drives?

In the technology world, it is always crucial to keep data highly available to ensure it is accessible to every application/user. High availability of data is achieved by distributing the data across multiple nodes or multiple volumes in multiple nodes.

Client machines/users can access the storage as like local storage when mounted. The advantage is the volumes are configured as distributed.

Imagine if the users are doing a heavy read/write operation on the same NFS volume. The Memory or cpu on the machine hosting the volume can become slow due to load. What if we can combine the memory and processing power of 2 machines and their individual discs to form a single volume accessed by the clients?. This is where the distributed file systems come into picture.

In computing, a distributed file system (DFS) or network file system is any file system that allows access to files from multiple hosts sharing via a computer network. This makes it possible for multiple users on multiple machines to share files and storage resources. So we can create 2 machines and have directories created which will be shared as single volume to the external world.

What is GlusterFS?

GlusterFs does the same thing of combining multiple storage servers to form a large, distributed drive. GlusterFs is a open source, scalable network file system suitable for high data intensive workloads such as media streaming, storage, content delivery etc

In this article we will see how we can configure GlusterFS on a Centos 7 Machines.

For this we will use 3 machines of which 2 are servers providing the volumes and other one acts as a client.

1. Configure 3 machines with Centos 7.

2. Add the details of the 3 machines to /etc/hosts file in all 3 machines. Below is my configuration

10.131.224.54 server1.example.com server1

10.131.224.149 server2.example.com server2

10.131.225.130 client.example.com client

3. Add the extra repo details to the Centos 7 using the below commands,

wget http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

rpm -ivh epel-release-latest-7.noarch.rpm

4. Create a Glusterfs repo. Create a file glusterfs.repo in /etc/yum.repos.d/glusterfs.repo with the below content

[root@manja17-I18062 ~]# cat /etc/yum.repos.d/gluster.repo

[gluster41]

name=Gluster 4.1

baseurl=http://mirror.centos.org/centos/7/storage/x86_64/gluster-4.1/

gpgcheck=0

enabled=1

5. Install and start the glusterfs, yum install glusterfs-server samba -y

Start the service using , systemctl enable glusterd.service and

systemctl start glusterd.service

6. Check the glusterfs version

[root@manja17-I18063 ~]# glusterfsd --version

glusterfs 4.1.5

Repository revision: git://git.gluster.org/glusterfs.git

GlusterFS comes with ABSOLUTELY NO WARRANTY.

It is licensed to you under your choice of the GNU Lesser

General Public License, version 3 or any later version (LGPLv3

or later), or the GNU General Public License, version 2 (GPLv2),

in all cases as published by the Free Software Foundation.

7. Disable the firewall using, systemctl stop firewalld

8. If you use a firewall, we need to make sure Tcp ports 111, 24007, 24008,24009 are open on the server1 and server2 if we have not disabled the firewall with the above step

9. Next we need to configure the trusted pool storage. We will be adding server2 as a trusted pool to server1. For this we will run the glusterfs command from server1.

[root@manja17-I18062 ~]# gluster peer probe server2.example.com

peer probe: success.

10. Next check the status using,

[root@manja17-I18062 ~]# gluster peer status

Number of Peers: 1

Hostname: server2.example.com

Uuid: 9964b028-585d-4cf1-b67b-4bbf9be2a976

State: Peer in Cluster (Connected)

11. Now Let's create a share named testervol with two replicas. We need to understand that the number of replicas are equal to the number of servers that we configured since we need to set up mirroring. We will be configuring this directory on both machines with location /data and to the external world it will shown as testervol.

[root@manja17-I18062 ~]# gluster volume create testervol replica 2 transport tcp server1.example.com:/data server2.example.com:/data force

volume create: testervol: success: please start the volume to access data

Once this is done,we will see /data location created in both machines.

12. Start the Volume

[root@manja17-I18063 data]# gluster volume start testervol

volume start: testervol: success

13 . Check the volume info using

[root@manja17-I18063 data]# gluster volume info

Volume Name: testervol

Type: Replicate

Volume ID: 7196569d-2a27-4cc9-9918-acb952b42549

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x 2 = 2

Transport-type: tcp

Bricks:

Brick1: server1.example.com:/data

Brick2: server2.example.com:/data

Options Reconfigured:

transport.address-family: inet

nfs.disable: on

performance.client-io-threads: off

By default all clients can access the volume defined. We need to define some access controls on who will access these.

Setting the GlusterFs Client ( on the client Machine )

1. Create a directory in the /mnt location with glusterfs

[root@manja17-I18064 ~]# mkdir /mnt/glusterfs

2. Mount the volume

[root@manja17-I18064 ~]# mount.glusterfs server1.example.com:/testervol /mnt/glusterfs

3. Check for the volume

[root@manja17-I18064 ~]# mount | grep testervol

server1.example.com:/testervol on /mnt/glusterfs type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

[root@manja17-I18064 ~]# df -h

Filesystem Size Used Avail Use% Mounted on

/dev/mapper/centos-root 50G 4.1G 46G 9% /

devtmpfs 3.9G 0 3.9G 0% /dev

/dev/sdb1 60G 33M 60G 1% /loddisk2

/dev/sda1 1014M 161M 854M 16% /boot

server1.example.com:/testervol 50G 4.7G 46G 10% /mnt/glusterfs

Test the volume,

On client run , [root@manja17-I18064 ~]# touch /mnt/glusterfs/test1

Check on the server1 or server2

[root@manja17-I18062 data]# pwd

/data

[root@manja17-I18062 data]# ls

test1

More to Come,Happy learning :-)

My Technical Works

Pages

Sunday, October 7, 2018

Distributed File System - GlusterFS

No comments :

Post a Comment