Container technology came into existence with the need to run certain workloads in isolation, not to protect those workloads but to protect others from those workloads.
Most of the times when working with dangerous, unverified software we often use the sandboxes. These are nothing but special environments that isolate or restrict programs and code from accessing data outside of the environment. These sandbox limit software network access, OS interactions and other information etc.
On the other hand, containers are not the same as sandbox. Applications running inside the container can gain access to the kernel and compromise it. That is why security has become more important for containers. As we discussed there are couple of security mechanisms available for containers. One such is the seccomp.
Seccomp (short for secure computing) is a Linux kernel mechanism that lets you restrict the system calls a process can use. If hackers gain access, seccomp won’t let them use any calls that haven’t already been declared.
This is linux kernel feature that allows a user space program to setup syscal filters. A User space is a system memory allocated to running applications. We can use this to restrict actions available with in a container. We can use this feature to restrict application access. The filters that are specified using this can define which system calls are permitted and which are not.
System Call - programs that we write are basically instructions which will get some work done. Sometimes these programs outsource certain instructions to external things for carrying the work. These instructions that are outsources are called system calls and the external entity is called kernel. Instructions like write to file etc are system calls that our program outsource to the kernel.
What is the problem - The kernel as we discussed has the highest level of access. Any exploits in a kernel are also going to be the highest level and can cause severe damage to the host. Programs using this exploit in kernel will not have the highest level.
The other problem is that there are many system calls and new one are coming with an update in kernel. This means kernel attack surface is increasing. One way of making sure our programs does not cause damage is by setting filters on the system calls that the program does. By whitelisting syscalls we implement absolute least privilege, meaning that programs can only use the syscalls that are really needed. This is where seccomp filters come into by reinforcing least privilege. Seccomp uses Berkeley packet filter ( BPF ) rules to filter syscalls and control how they are handled. These filters can significantly limit a containers access to the Docker Host Linux kernel.
Check if seccomp is enabled on the machine using,
[root@ip-172-31-36-247 ~]# grep SECCOMP /boot/config-$(uname -r)
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
CONFIG_SECCOMP_FILTER=y
CONFIG_SECCOMP=y
Check if seccomp is enabled for the Docker Runtime using,
[root@ip-172-31-36-247 ~]# docker info | grep seccomp
WARNING: You're not using the default seccomp profile
seccomp
Profile: /etc/docker/seccomp.json
If the above output does not return a line with seccomp then your system does not have seccomp enabled in its kernel.
Docker and Seccomp - Docker uses seccomp for version 1.10. Docker has its own JSON based domain scripting language to define seccomp profiles. These profiles are then compiled to seccomp filters. When we are running a container, it gets a default seccomp profile unless we override it with the --security-opt flag to the run command.
For example, the following command will start the ubuntu container with an interactive mode overriding the default seccomp profile with profile.json.
docker run -it --rm --security-opt seccomp=profile.json ubuntu /bin/bash
The seccomp json file that we passed is sent to the docker daemon where it is compiled into filter by using a go wrapper around the libseccomp library. The seccomp profile operates using a whitelist approach that specifies only allowed syscalls. Only syscalls on this whitelist are allowed.
As we already discussed that there are other security mechanisms for docker like linux capabilities and apparmor which we need to disable for testing the seccomp profiles. For disabling the apparmor and dropping capabilities pass the arguments to the docker run as below, “--cap-add ALL --security-opt apparmor=unconfined”
root@ip-172-31-36-247 centos]# cat block-all.json
{
"defaultAction": "SCMP_ACT_ERRNO",
"architectures": [
"SCMP_ARCH_X86_64",
"SCMP_ARCH_X86",
"SCMP_ARCH_X32"
],
"syscalls": [
]
}
Now run the container as,
[root@ip-172-31-36-247 centos]# docker run --rm -it --cap-add ALL --security-opt apparmor=unconfined --security-opt seccomp=block-all.json ubuntu sh
Unable to find image 'ubuntu:latest' locally
Trying to pull repository docker.io/library/ubuntu ...
latest: Pulling from docker.io/library/ubuntu
898c46f3b1a1: Pull complete
63366dfa0a50: Pull complete
041d4cd74a92: Pull complete
6e1bee0f8701: Pull complete
Digest: sha256:d019bdb3ad5af96fa1541f9465f070394c0daf0ffd692646983f491ce077b70f
Status: Downloaded newer image for docker.io/ubuntu:latest
/usr/bin/docker-current: Error response from daemon: exit status 1: "cannot start a container that has run and stopped\none or more of container start failed\n".
Now we can create the container because we did not define any syscalls in the block-all.json file. Lets see another example of blocking specific actions. In the current profile, iam blocking all “mkdir” operations inside the container. The profile looks as,
[root@ip-172-31-36-247 centos]# cat allow-all-block-chmod.json
{
"defaultAction": "SCMP_ACT_ALLOW",
"architectures": [
"SCMP_ARCH_X86_64",
"SCMP_ARCH_X86",
"SCMP_ARCH_X32"
],
"syscalls": [
{
"name": "mkdir",
"action": "SCMP_ACT_ERRNO",
"args": []
}
]
}
We can see in the syscall section , for the mkdir command iam telling to throw and error which i defined in the action. If i run the container with this,
[root@ip-172-31-36-247 centos]# docker run --rm -it --cap-add ALL --security-opt apparmor=unconfined --security-opt seccomp=allow-all-block-chmod.json ubuntu mkdir /tmp/hello
mkdir: cannot create directory '/tmp/hello': Operation not permitted
We can see that the operation is not permitted. This is a short introduction to the seccomp security mechanism.
Most of the times when working with dangerous, unverified software we often use the sandboxes. These are nothing but special environments that isolate or restrict programs and code from accessing data outside of the environment. These sandbox limit software network access, OS interactions and other information etc.
On the other hand, containers are not the same as sandbox. Applications running inside the container can gain access to the kernel and compromise it. That is why security has become more important for containers. As we discussed there are couple of security mechanisms available for containers. One such is the seccomp.
Seccomp (short for secure computing) is a Linux kernel mechanism that lets you restrict the system calls a process can use. If hackers gain access, seccomp won’t let them use any calls that haven’t already been declared.
This is linux kernel feature that allows a user space program to setup syscal filters. A User space is a system memory allocated to running applications. We can use this to restrict actions available with in a container. We can use this feature to restrict application access. The filters that are specified using this can define which system calls are permitted and which are not.
System Call - programs that we write are basically instructions which will get some work done. Sometimes these programs outsource certain instructions to external things for carrying the work. These instructions that are outsources are called system calls and the external entity is called kernel. Instructions like write to file etc are system calls that our program outsource to the kernel.
What is the problem - The kernel as we discussed has the highest level of access. Any exploits in a kernel are also going to be the highest level and can cause severe damage to the host. Programs using this exploit in kernel will not have the highest level.
The other problem is that there are many system calls and new one are coming with an update in kernel. This means kernel attack surface is increasing. One way of making sure our programs does not cause damage is by setting filters on the system calls that the program does. By whitelisting syscalls we implement absolute least privilege, meaning that programs can only use the syscalls that are really needed. This is where seccomp filters come into by reinforcing least privilege. Seccomp uses Berkeley packet filter ( BPF ) rules to filter syscalls and control how they are handled. These filters can significantly limit a containers access to the Docker Host Linux kernel.
Check if seccomp is enabled on the machine using,
[root@ip-172-31-36-247 ~]# grep SECCOMP /boot/config-$(uname -r)
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
CONFIG_SECCOMP_FILTER=y
CONFIG_SECCOMP=y
Check if seccomp is enabled for the Docker Runtime using,
[root@ip-172-31-36-247 ~]# docker info | grep seccomp
WARNING: You're not using the default seccomp profile
seccomp
Profile: /etc/docker/seccomp.json
If the above output does not return a line with seccomp then your system does not have seccomp enabled in its kernel.
Docker and Seccomp - Docker uses seccomp for version 1.10. Docker has its own JSON based domain scripting language to define seccomp profiles. These profiles are then compiled to seccomp filters. When we are running a container, it gets a default seccomp profile unless we override it with the --security-opt flag to the run command.
For example, the following command will start the ubuntu container with an interactive mode overriding the default seccomp profile with profile.json.
docker run -it --rm --security-opt seccomp=profile.json ubuntu /bin/bash
The seccomp json file that we passed is sent to the docker daemon where it is compiled into filter by using a go wrapper around the libseccomp library. The seccomp profile operates using a whitelist approach that specifies only allowed syscalls. Only syscalls on this whitelist are allowed.
As we already discussed that there are other security mechanisms for docker like linux capabilities and apparmor which we need to disable for testing the seccomp profiles. For disabling the apparmor and dropping capabilities pass the arguments to the docker run as below, “--cap-add ALL --security-opt apparmor=unconfined”
root@ip-172-31-36-247 centos]# cat block-all.json
{
"defaultAction": "SCMP_ACT_ERRNO",
"architectures": [
"SCMP_ARCH_X86_64",
"SCMP_ARCH_X86",
"SCMP_ARCH_X32"
],
"syscalls": [
]
}
Now run the container as,
[root@ip-172-31-36-247 centos]# docker run --rm -it --cap-add ALL --security-opt apparmor=unconfined --security-opt seccomp=block-all.json ubuntu sh
Unable to find image 'ubuntu:latest' locally
Trying to pull repository docker.io/library/ubuntu ...
latest: Pulling from docker.io/library/ubuntu
898c46f3b1a1: Pull complete
63366dfa0a50: Pull complete
041d4cd74a92: Pull complete
6e1bee0f8701: Pull complete
Digest: sha256:d019bdb3ad5af96fa1541f9465f070394c0daf0ffd692646983f491ce077b70f
Status: Downloaded newer image for docker.io/ubuntu:latest
/usr/bin/docker-current: Error response from daemon: exit status 1: "cannot start a container that has run and stopped\none or more of container start failed\n".
Now we can create the container because we did not define any syscalls in the block-all.json file. Lets see another example of blocking specific actions. In the current profile, iam blocking all “mkdir” operations inside the container. The profile looks as,
[root@ip-172-31-36-247 centos]# cat allow-all-block-chmod.json
{
"defaultAction": "SCMP_ACT_ALLOW",
"architectures": [
"SCMP_ARCH_X86_64",
"SCMP_ARCH_X86",
"SCMP_ARCH_X32"
],
"syscalls": [
{
"name": "mkdir",
"action": "SCMP_ACT_ERRNO",
"args": []
}
]
}
We can see in the syscall section , for the mkdir command iam telling to throw and error which i defined in the action. If i run the container with this,
[root@ip-172-31-36-247 centos]# docker run --rm -it --cap-add ALL --security-opt apparmor=unconfined --security-opt seccomp=allow-all-block-chmod.json ubuntu mkdir /tmp/hello
mkdir: cannot create directory '/tmp/hello': Operation not permitted
We can see that the operation is not permitted. This is a short introduction to the seccomp security mechanism.
No comments :
Post a Comment