Pages

Sunday, May 24, 2020

Cloud Custodian - Security and Governance for Cloud


On boarding hundreds of developers and applications onto public cloud can certainly be a case study for chaos engineering. The risks of giant bills, security and compliance can lead to many challenges.

There is a huge need to enable developers to practice more modern cloud native practices, but enforce guardrails to keep them from shooting themselves in the foot. Organizations can greatly increase developer satisfaction and productivity by making the cloud easy , compliant and secure.

Bringing in Cloud Custodian
Cloud Custodian is an open source framework and rules engine which is used to manage your cloud. It allows users to define policies to enable a well managed cloud to be both secure and cost optimized. 

Cloud Custodian can be used to manage Aws, Azure and Gcp Environments by ensuring real time compliance to security policies , cost management via garbage collection of unused resources and off-hours resource management.

Cloud custodian manage the cloud by using policies that are written in a yaml file. Users specify policies on resources types like Ec2, Redshift etc in this yaml file. The policies is like a rule which is put in the yaml file to find out if something is not compliant. It integrates with the cloud native capabilities of each provider to provide real time monitoring, enforcement of policies with builtin provisioning. 

A policy can also be passed to cloud custodian by command line or it can run as a simple cron job on a server against large existing fleets.

Installing and running Cloud Custodian
Installing and running cloud custodian is very easy. All we need is a machine with python installed on that. For the demo, I have used a Centos Ec2 Instance with Python pre-installed. Once python installation is confirmed, run the below commands

[root@ip-192-168-1-60 centos]# pip install virtualenv 
[root@ip-192-168-1-60 centos]# pip install git
[root@ip-192-168-1-60 centos]# virtualenv --python=python2 custodian
[root@ip-192-168-1-60 centos]# Source custodian/bin/activate
(custodian) [root@ip-192-168-1-60 centos]# pip install c7n
(custodian) [root@ip-192-168-1-60 centos]# pip install awscli
(custodian) [root@ip-192-168-1-60 centos]# aws configure

Note - virtualenv creates a virtual environment, a self-contained directory tree that contains a Python installation for a particular version of Python, plus a number of additional packages. Once a virtualenv is created, activate that. 

Once the Virtualenv is activated, install the C7n (cloud Custodian) and awscli. After installation configure the aws using the “aws configure” command.

For the Aws Configure, provide your aws account access key and secret. This helps in accessing your aws account from cloud custodian.

Writing your First Policy
Once the installation is done, let's write a first policy. The policy is written in yaml format. 

Policy 1 : Find all Ec2 instances that are currently running.
(custodian) [root@ip-192-168-1-60 centos]# cat first.yml
policies:
  - name: my-first-policy
    resource: ec2
    filters:
      - "State.Name": running

The policy is quite understandable.

  • Name: A machine-readable name for the policy. my-first-policy
  • Resource: A short identifier for the AWS resource type to act on (ec2, rds, s3 etc).
  • Filters: A list of filters that determine which resources the policy will act on.
  • Actions: A list of actions to perform on the matching resources

Validate, Dry run and Execute the policy
Validation : Once the policy is written validate the policy using,

(custodian) [root@ip-192-168-1-60 centos]# custodian validate first.yml
2020-02-27 08:16:21,439: custodian.commands:INFO Configuration valid: first.yml

This will give you all validation errors if any.
Dry run the Policy : Once validated do a dry run. This will execute the policy but will not execute the actions defined in policy

(custodian) [root@ip-192-168-1-60 centos]# custodian run --dryrun -s . first.yml
2020-02-27 08:17:21,088: custodian.policy:INFO policy:my-first-policy resource:ec2 region:eu-west-1 count:1 time:0.49

The count value is important here. We can see that we have one 1 instance in running state now.

Execute the Policy : to execute the policy with all actions defined, run the policy 
(custodian) [root@ip-192-168-1-60 centos]#custodian run --output-dir=. first.yml
2020-02-27 08:17:48,291: custodian.policy:INFO policy:my-first-policy resource:ec2 region:eu-west-1 count:1 time:0.00

Policy 2 : Stop all Instances that are in running State and have custodian tag defined and Instance type is “t2.small”

policies:
  - name: my-second-policy
    resource: aws.ec2
    Filters:
         - "State.Name": running
       - "tag:Custodian": present
         - "InstanceType": "t2.small"
    actions:
      - stop

The above yml file is quite easy to understand. We have added an action element to the policy. Once the results are found using the filters elements, the action is executed on the results obtained. In this case, if any of the instances are found which are in running state ,have custodian tag defined and instance type is “t2.small”, stop is executed on those instances.

Advanced Policy : Run a Cron Job for every 1 minute to find all instances that are type “t2.micro” and Custodian Tag defined and stop them.

policies:
  - name: custodian_tag_state_check_cron
    resource: ec2
    mode:
       role: arn:aws:iam::941303440747:role/custodian
       type: periodic
       schedule: 'cron(0/1 * * * ? *)'
    filters:
       - "tag:custodian": present
       - "InstanceType": "t2.micro"
    actions:
      - stop

Most of the elements are self-explanatory. The new element added is the mode element. The mode element specifies the mode in which the policy will be run such as periodic, pull, cloudtrail etc. In the above policy periodic creates a CloudWatch event to trigger the lambda on a given schedule. The schedule is specified using scheduler syntax.

What happens exactly : When the above policy is run for the first time, 2 things are created,

CloudWatch event is created to run for every 1 minute. This cloud watch event runs for every 1 minute to find all instances that are type “t2.micro” and Custodian Tag defined. Once it finds any such instances it sends the details to the lambda.

Lambda function is created that will be triggered if any instances are found as a part of the cloud watch event. The Lambda is created by Cloud Custodian itself. If you see the lambda functions available we can see custodian_tag_state_check_cron lambda created. This is the same as our policy name defined in the yaml file.


Another important element in the above yaml file is the role element under the mode. The role element defines the role that has the lambda execution permission. I have created a role with name custodian and defined the trust relationship with below assume roles,

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    },
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

The role ARN is defined in the policy file. Hope this helps in starting with the cloud custodian.

Hope this helps in understanding the basics of cloud custodian and how it can be used with your cloud environment to apply security and governance.
Read More

Container Registries - A War

One of the main advantages of using containers is the continuous availability. Rather than taking the whole system down, the container running with micro service can be replaced on the fly. Developers usually prepare a new container image with the updated micro service and switch it with the existing image and a new version is up and running. 

It is always advisable to archive or store different versions of the image for safety reasons or for rollback purposes. But there can be a lot of different versions or tags of the same image available. This is when image registries or repositories come into picture.
 
Repository vs. Registry
Before we go and understand about container images and registries, it is important to understand the difference between a container registry and container repository. At the core both of them do the same job but with little differences.

A Container repository is used to store a collection of related images i.e. container manifests of the same name for setup and deployment. We can access the images from container repositories via secure HTTPs endpoints. We can perform various operations like push,pull or manage images.

A Container registry on the other hand stores a collection of repositories as well as indexes, access control rules , api paths etc. A Docker registry can be hosted by a third party,  as public or private registry like the following,
Docker hub
Quay
Google Container registry
Aws Elastic Container registry  or you can host the docker registry by yourself

Docker repository is a collection of different docker images with the same name, that have different tags. For example, if we see check the link “https://hub.docker.com/r/library/python/tags/”, there are many different tags of the official python image, these tags are members of the official python repository which are hosted by the Docker Registry.

Why Use a Container Image Registry ?
Developers run their code in physical or virtual machines by creating packages along with co-dependencies in unique versions for operating systems and machine variants. With the arrival of containers, things changed allowing developers to compose small, portable units called images that can be bundled with all necessary packages and their dependencies, run anywhere and be deployed using automation.

In the old model when a problem arises, developers were asked to analyze and patch a running system one at a time. In the new model of containers, developers continuously produce new container imaged versions to fix issues and add features. These newer versions of images flow into a pipeline and reside in a specialized cataloged storage and wait for further processing steps like vulnerability validation, image scanning and followed by a deployment.

The specialized cataloged storage is what we call as Image registry. During the entire process, the registry remains a source of truth for the images we want to run. The main advantage is that we have the container created with the same image running everywhere. 

Public vs Private Registries
Once we start using containers and images, the next question is to where we store our images. There are 2 options in this case, public and private registries.

Public Registries - public container registries are generally faster and easier to route when initiating a container registry. These registries are ideal for smaller teams and for applications that are less critical in the organization. The public registries provides some more additional facilities image scanning for vulnerabilities, webhooks ( trigger actions after a successful push to a repository to integrate docker hub with other services ), builds ( automatically build container images for source code repositories like Github and push then to docker hub). It also provides us various facilities to use official images as well as publisher images.

Private registries - when it comes to securing the code, we can’t keep the images in a public registry. A private registry is a container registry that is set up by the organization It teams. Private registries are either hosted or on-premise and are typically used by a larger organization or enterprise that is more set on using a container registry. Having complete control over the registry in development allows an organization more freedom in how they choose to manage it. This is why private registries are seen to be the more secure route when it comes to implementing a container registry, as an organization can apply as many security measures as they feel needed.

Security - Public Containers are seen as less secure because container images may contain malicious and outdated code which if unpatched could lead to attacks and data breaches. These images may also be unknown to who has read or write access to the image. This makes the need for private repositories in organizations. If security is the priority in an organization then the first move is to implement a private registry.

Available Players
There are many players in the Container registries currently. 

Docker Hub - Docker Hub is the most popular among the players. A standard for open source container images. It provides a free storage for your images and needs to take a premium if you want to hast internally. The premium also provides additional image scanning facilities to identify vulnerabilities in images. Other advantages with Docker hub is the ease of use and integration options. It also provides various facilities with the ability to automate things with Webhooks and Builds. The features are limited with Dockerhub when compared with other registries like Quay that offer comprehensive access management. Only supports docker images. Setting up Docker hub in house can require additional work like installing necessary dependencies beforehand. Another drawback when using DockerHub internally is that when harddisk fills up, it is very hard to manage and delete those unnecessary images.

Quay.io - A container registry from Redhat designed to offer enterprise level features. Besides basic container management , it also offers detailed access control, logging, auditing, comprehensive access management and additional security features. Quay also integrated with the open source container security scanning tool called clair. This helps in scanning images for vulnerabilities when the image is being pushed or pulled. The Notifications alert lets you know about vulnerabilities. Quay.io has a beautiful and easy to understand web based UI. 

HarborEntry of VMware into the container world provides some good tools for use. Harbor container registry is one such tool. This is an enterprise class registry server that stores and distributes container images. Harbor extends the open source Docker distribution by adding additional functionalities that enterprises need like security, identity and management with enhanced performance.


Harbor is an open source trusted cloud native registry project that stores, signs, and scans content. One of the best features of the Harbor registry is Garbage Collection. As we deal with multiple repositories and container images, there will be many images for different stacks like development, staging and production. Storing them without having any deletion strategy results in no space left on the host machine. Harbor provides a way to delete the image first which is soft deletion and finally when we need to run the garbage collector which will take care of cleaning up the disk space and links etc.


The mission of Harbor is to provide users in cloud native environments with the ability to confidently manage and securely serve container images. Here are some of the features of Harbor Registry,

Ability to scan and sign container Images

Multi tenant content signing and validation

Security and vulnerability analysis

Identity integration and role based access control

Image replication between instances

Extensible API and graphical UI

Internationalization (currently English and Chinese)

Audit Logging

Label management


A Harbor registry can be deployed as a stand alone registry by using Docker compose or using Helm Chart to a Kubernetes cluster.


Cloud based Registries - Another option for container registries are by using the cloud based registries from major players like Aws , Azure and Google cloud,
Aws Ecr ( Elastic Container registry ), Google Container registry and Azure Container registry are some of the options for hosting container images provided by Cloud platforms. All these registries integrate very well with other services of the respective cloud. Since all cloud platforms provide the Security services by default, access management facilities are by default available. These cloud registries also provide image scanning facilities, hooks to automate container deployments, build management, auditing and logging. Though some of these provide additional facilities like Azure managing network latency by leveraging its vast cloud computing network and making sure that the closest clusters are used and Gcp being the most affordable option on the market by paying only for the storage and bandwidth we use  and also image encryption.

Other types - Besides having public and  private repositories to store containers, we have other platforms available which can save not just container images but all other types of artifacts, archives and packages. If the organization purpose is not just to store container images but also to store code, java archives, python packages, operating system packages or other types of software we can go with Jfrog artifactory , sonatype nexus repository or cloudsmith package for storing any type of artifacts or packages.

Choosing a registry
Container registry comparison is a matter of understanding your platform and application requirements and finding the one that suits the needs best. Taking into account the cloud environment will help in choosing the best. 
With so many choices, here are few factors to consider while choosing a registry
  • Do we need an on-premises or hosted registry, some registries from cloud platforms only work with cloud based services, others can only run on local servers or on-premises.
  • Do you want to host things in addition to container images? Most container registries are designed for the sole purpose of hosting containers images. However, some, such as Artifactory, can host other types of files, too. The latter are a better fit if you’re looking to build a repository for more than just Docker images.
  • If security is a priority then we need to focus more on security focused registries like Flawcheck container registry or Quay registry.
  • Do you want tight integration with a particular container stack ?Container stacks like Openshift provides an internal , integrated container registry that can be used to store images though it supports external registries too.
  • Does the Registry be exposed on the Web where developers can use basic API ( REST ) based calls to perform or trigger actions on the Images.
  • Does the registry provide integration facilities like web hooks to trigger deployments when things changes
  • auditing facilities where all the operations to the repositories are tracked.
  • Graphical User Portal where users can easily browse, search repositories, and manage projects.
  • Image Authenticity facilities ,digital signing and content trust facilities - Content trust provides the ability to use digital signatures for data sent to and received from remote Docker registries. These signatures allow client-side verification of the integrity and publisher of specific image tags.
  • Image Deletion and Garbage Collection - Facilities to identify unused images, unreferenced layers in the images and cleaning them
  • Repository Replication - images replication to replicate repositories from one instance of registry to another and to other regions or locations
  • Label management - labels are used to isolate image resources globally or a project level.
  • Build file management - beside storing container images, it is often required to store the build file like Dockerfile, helm chart, or deploymentConfig along with image.
  • Role-Based Access Control: Users and repositories are organized into projects. Users can have different permissions for the images in different projects.
  • Integration with internal Authentication tools like Active directory or LDAP to allow people working in the organization perform actions on the registries or images.
  • Does it have Organization and teams support? So that each team has their own control over the images under that.
  • Does it support multiple storage models for use with the registries?
  • Does it have a facility to support faster downloads? For example redhat quay has BitTorrent downloads to decrease wait times.
More to Come, Happy Learning :-)
Read More