Monday, August 8, 2016

Monit - Monitor System

Monit - In production environment , the Ops team manages and ensures that the application is performing as desired and it is up and running all the time. Continuous monitoring is the process and technology used to detect compliance and risk issues with an organization operational environment.

This requires that Ops team uses tools that can monitor application performance and issues.
It is often job of the Ops team to sit with the Dev team in building self-monitoring or analytics gathering capabilities right into the application they build. This will allow a end-to-end monitoring capabilities.

In this article we will see a monitoring tool called "monit" which is a tool that not only automatically monitors and managers server programs but also ensures that the programs stay online. They also monitors and manages the file size, checksum or permissions are always correct.

Monit is said to be a process supervision tool which is a form of operating system service management in which some master process remains the parent of the service processes.

Some of the advantages of Monit include
1. Monit can react for an error condition. Consider if an process like apache or nginx goes down, the monit process can identify that and start them
2. Can be used to monitor file systems , directories and files for changes such as timestamps changes, checksum changes or size changes
3. Can be used to monitor system resources like CPU ,Memory etc
4. It can also be used to monitor process , Ports and can be used to run programs that needs execution at specific time.

Installing Monit
To install monit , run the "yum install monit" on Centos system. Once installed we can see a file called "/etc/monitrc" file which is a single configuration file for monit where we can configure resources to be monitored.

Monit pre-requisites
Before starting monit there are few modifications that needs to be done.

1. un comment the Httpd Port in the /etc/monitrc file

Enable Httpd port

2. Un comment the "set httpd port 2812" and the preceding lines for checking the monit web page using httpd server

set httpd port 2812 and
    use address  # only accept connection from localhost
    allow        # allow localhost to connect to the server and
    allow admin:monit      # require user 'admin' with password 'monit'

3. Make sure you change the IP address in place of the localhost as above.

4.Start HTTPD for accessing the Monit Web page

5. To check whether the configuration of monit correct or not we can use the

[root@puppet etc]# monit -t
Control file syntax OK

6. Start the monit process.

Once the monit process is successfully started , we can see the above Webpage by accessing "IP address:2812". use "admin" as user name and "monit" as password for accessing the web page.
Now once the process is started, Lets start monitoring a process and System resource.

1. Monitor a process - Now in order to monitor a process we can use the pid file for that process. Lets see how we can monitor the Jenkins process.

1. Check for the Pid file for the jenkins ( check for the jenkins script to see where it is created)

2. Once the jenkins pid location is available , configure the monit for monitoring the jenkins. add the  below content to the /etc/monitrc file

#Check Jenkins
check process jenkins with pidfile /var/run/
  start program = "/etc/init.d/jenkins start" with timeout 60 seconds
  stop program  = "/etc/init.d/jenkins stop"

Monit uses its own Domain Specific Language (DSL). Service checks are done by using the keyword "check" followed by the service name. The server name can be anything but unique to be identified by monit. This name is used by monit to refer to the service internally and in all interactions with the user.

In the above snippet, we have written a snippet which will check for the jenkins process by monitoring the file. we also made sure that if the process is stopped for some reason the commands to start and stop available. We also configured monit to check the jenkins process for every 60 seconds.

For testing ,stop the jenkins process and confirm that the jenkins pid file is removed. Once the process is stopped wait for 60 seconds to see the log below in /var/log/monit.log

[root@puppet run]# tail -f /var/log/monit.log
[EDT Aug  5 05:11:52] info     : Monit daemon with pid [18347] stopped
[EDT Aug  5 05:11:52] info     : 'puppet' Monit 5.14 stopped
[EDT Aug  5 05:30:10] info     : Starting Monit 5.14 daemon with http interface at []:2812
[EDT Aug  5 05:30:10] info     : Starting Monit HTTP server at []:2812
[EDT Aug  5 05:30:10] info     : Monit HTTP server started
[EDT Aug  5 05:30:10] info     : 'puppet' Monit 5.14 started
[EDT Aug  5 05:30:10] error    : 'jenkins' process is not running
[EDT Aug  5 05:30:10] info     : 'jenkins' trying to restart
[EDT Aug  5 05:30:10] info     : 'jenkins' start: /etc/init.d/jenkins
[EDT Aug  5 05:30:41] info     : 'jenkins' process is running with pid 21640

We can see the the jenkins process is started after 60 seconds since it is stopped.

Upon Checking the web console, we can see the

2. Monitor System Stats -

Lets write the below snippet in the monitrc file for monitoring the Load average of a local system.

check system $HOST
  if loadavg (1min) > 0.90 then alert
  if loadavg (5min) > 0.95 then alert

In the above snippet we are checking whether the load average of the system in 1 minute is more than 0.90 then alert or alert if the load average of the system in 5 minutes is more than 0.95. In order to test the condition execute the dd command as "dd if=/dev/zero of=/dev/null" in other console which will make the load average to high.

Once the load average is more than thershold defined we can see the below in the log files.

[EDT Aug  5 05:55:13] info     : 'puppet' Monit 5.14 started
[EDT Aug  5 05:56:13] error    : 'puppet' loadavg(1min) of 0.9 matches resource limit [loadavg(1min)<0.8]

And we can also see the web console which shows the status as "Resource Limit Reached.

More to come on Monit. Happy learning :-)

No comments :