Pages

Sunday, May 24, 2020

Cloud Custodian - Security and Governance for Cloud


On boarding hundreds of developers and applications onto public cloud can certainly be a case study for chaos engineering. The risks of giant bills, security and compliance can lead to many challenges.

There is a huge need to enable developers to practice more modern cloud native practices, but enforce guardrails to keep them from shooting themselves in the foot. Organizations can greatly increase developer satisfaction and productivity by making the cloud easy , compliant and secure.

Bringing in Cloud Custodian
Cloud Custodian is an open source framework and rules engine which is used to manage your cloud. It allows users to define policies to enable a well managed cloud to be both secure and cost optimized. 

Cloud Custodian can be used to manage Aws, Azure and Gcp Environments by ensuring real time compliance to security policies , cost management via garbage collection of unused resources and off-hours resource management.

Cloud custodian manage the cloud by using policies that are written in a yaml file. Users specify policies on resources types like Ec2, Redshift etc in this yaml file. The policies is like a rule which is put in the yaml file to find out if something is not compliant. It integrates with the cloud native capabilities of each provider to provide real time monitoring, enforcement of policies with builtin provisioning. 

A policy can also be passed to cloud custodian by command line or it can run as a simple cron job on a server against large existing fleets.

Installing and running Cloud Custodian
Installing and running cloud custodian is very easy. All we need is a machine with python installed on that. For the demo, I have used a Centos Ec2 Instance with Python pre-installed. Once python installation is confirmed, run the below commands

[root@ip-192-168-1-60 centos]# pip install virtualenv 
[root@ip-192-168-1-60 centos]# pip install git
[root@ip-192-168-1-60 centos]# virtualenv --python=python2 custodian
[root@ip-192-168-1-60 centos]# Source custodian/bin/activate
(custodian) [root@ip-192-168-1-60 centos]# pip install c7n
(custodian) [root@ip-192-168-1-60 centos]# pip install awscli
(custodian) [root@ip-192-168-1-60 centos]# aws configure

Note - virtualenv creates a virtual environment, a self-contained directory tree that contains a Python installation for a particular version of Python, plus a number of additional packages. Once a virtualenv is created, activate that. 

Once the Virtualenv is activated, install the C7n (cloud Custodian) and awscli. After installation configure the aws using the “aws configure” command.

For the Aws Configure, provide your aws account access key and secret. This helps in accessing your aws account from cloud custodian.

Writing your First Policy
Once the installation is done, let's write a first policy. The policy is written in yaml format. 

Policy 1 : Find all Ec2 instances that are currently running.
(custodian) [root@ip-192-168-1-60 centos]# cat first.yml
policies:
  - name: my-first-policy
    resource: ec2
    filters:
      - "State.Name": running

The policy is quite understandable.

  • Name: A machine-readable name for the policy. my-first-policy
  • Resource: A short identifier for the AWS resource type to act on (ec2, rds, s3 etc).
  • Filters: A list of filters that determine which resources the policy will act on.
  • Actions: A list of actions to perform on the matching resources

Validate, Dry run and Execute the policy
Validation : Once the policy is written validate the policy using,

(custodian) [root@ip-192-168-1-60 centos]# custodian validate first.yml
2020-02-27 08:16:21,439: custodian.commands:INFO Configuration valid: first.yml

This will give you all validation errors if any.
Dry run the Policy : Once validated do a dry run. This will execute the policy but will not execute the actions defined in policy

(custodian) [root@ip-192-168-1-60 centos]# custodian run --dryrun -s . first.yml
2020-02-27 08:17:21,088: custodian.policy:INFO policy:my-first-policy resource:ec2 region:eu-west-1 count:1 time:0.49

The count value is important here. We can see that we have one 1 instance in running state now.

Execute the Policy : to execute the policy with all actions defined, run the policy 
(custodian) [root@ip-192-168-1-60 centos]#custodian run --output-dir=. first.yml
2020-02-27 08:17:48,291: custodian.policy:INFO policy:my-first-policy resource:ec2 region:eu-west-1 count:1 time:0.00

Policy 2 : Stop all Instances that are in running State and have custodian tag defined and Instance type is “t2.small”

policies:
  - name: my-second-policy
    resource: aws.ec2
    Filters:
         - "State.Name": running
       - "tag:Custodian": present
         - "InstanceType": "t2.small"
    actions:
      - stop

The above yml file is quite easy to understand. We have added an action element to the policy. Once the results are found using the filters elements, the action is executed on the results obtained. In this case, if any of the instances are found which are in running state ,have custodian tag defined and instance type is “t2.small”, stop is executed on those instances.

Advanced Policy : Run a Cron Job for every 1 minute to find all instances that are type “t2.micro” and Custodian Tag defined and stop them.

policies:
  - name: custodian_tag_state_check_cron
    resource: ec2
    mode:
       role: arn:aws:iam::941303440747:role/custodian
       type: periodic
       schedule: 'cron(0/1 * * * ? *)'
    filters:
       - "tag:custodian": present
       - "InstanceType": "t2.micro"
    actions:
      - stop

Most of the elements are self-explanatory. The new element added is the mode element. The mode element specifies the mode in which the policy will be run such as periodic, pull, cloudtrail etc. In the above policy periodic creates a CloudWatch event to trigger the lambda on a given schedule. The schedule is specified using scheduler syntax.

What happens exactly : When the above policy is run for the first time, 2 things are created,

CloudWatch event is created to run for every 1 minute. This cloud watch event runs for every 1 minute to find all instances that are type “t2.micro” and Custodian Tag defined. Once it finds any such instances it sends the details to the lambda.

Lambda function is created that will be triggered if any instances are found as a part of the cloud watch event. The Lambda is created by Cloud Custodian itself. If you see the lambda functions available we can see custodian_tag_state_check_cron lambda created. This is the same as our policy name defined in the yaml file.


Another important element in the above yaml file is the role element under the mode. The role element defines the role that has the lambda execution permission. I have created a role with name custodian and defined the trust relationship with below assume roles,

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    },
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

The role ARN is defined in the policy file. Hope this helps in starting with the cloud custodian.

Hope this helps in understanding the basics of cloud custodian and how it can be used with your cloud environment to apply security and governance.

No comments :

Post a Comment