Secure your Kubernetes Workloads with gVisor

Moshe Beladev
4 min readNov 11, 2022

We are all familiar with and love containers. Containers share the same host kernel which makes them a pretty portable and lightweight solution. On the other hand, we would prefer to mitigate the risk of harming the host machine or operating system which can be achieved with sandbox technologies, and gVisor is one of them.

Let’s start with some motivation

Check out CVE-2020–14386 — it is a Linux kernel vulnerability resulting in container escape. Basically, this vulnerability takes advantage of the CAP_NET_RAW capability to cause memory corruption resulting in attacker root access.

In Docker, this capability is enabled by default (I believe is it probably due to the fact that it is commonly used by networking tools such as ping and tcpdump and might be re-enabled for troubleshooting purposes)

Workloads that run with gVisor were not affected by this CVE — which leads us to our next part: understand what is gVisor and how we can enjoy it too.

What is gVisor?

A gVisor container runtime sandbox is an environment of processes to run containers.
It’s basically an application kernel (written in Go 🎉) that implements a substantial portion of the Linux system call interface and provides an additional layer of isolation between running applications and the host operating system.
This architecture provides significant security while lowering the cost associated with virtualization.

How does it work?

Each sandbox has its own isolated instance of Sentry and Gofer.

Gofer — provides file system access to the containers
Host process that starts with each container. Communicates with the Sentry. The Sentry process is started in a restricted seccomp container without access to file system resources. The Gofer mediates all access to these resources, providing an additional level of isolation.

Sentry — application kernel for the containers
Supplying all the kernel functionality needed by the application, including system calls, signal delivery, memory management, and more.
When the application makes a system call, the calls are redirected to the Sentry, which will do the necessary work to service it.

For example, the Sentry is not able to open files directly: file system operations that extend beyond the sandbox are sent to the Gofer.

Platform & runsc
The gVisor Platform implements interception of syscalls, basic context switching, and memory mapping functionality.

The entrypoint to running a sandboxed container is runsc executable. runsc implements the OCI runtime specification, which is used by Docker and Kubernetes resulting in such a seamless integration.
By configuring the runsc context we can choose between a number of implementations for the platform:

ptrace — utilizing PTRACE_SYSEMU to execute user code without allowing it to execute host system calls. Very common utility and it’s usually the better choice when running inside a VM or when virtualization is not supported.

KVM — using Linux KVM to allow the sentry to act as both Guest OS and VMM (which is responsible for creating and managing VMs on the physical host machine). Will provide better performance and will mostly be used on bare-metal and not under a virtual machine.

GKE Sandbox — uses a custom platform implementation that provides better performance than ptrace and KVM.

What’s wrong with the existing container security tools?

That’s a good question. AppArmor, SELinux, and Seccomp are some great examples of such tools that can help us specify fine-grained policies for applications with native performance. They mostly rely on hooks implemented in the kernel for enforcement.
Also, besides the additional overhead, we might face there are some compatibility issues with gVisor.

Having said that, in practice, it’s extremely difficult to define it universally which makes it usable for very few applications where the application surface is small enough to define a tailor-made security policy.

I am convinced, how can I have it?

Thanks to the OCI runtime specification, integrating gVisor is a piece of cake. Let’s demonstrate how you can run (and validate) your secure workflows on GKE with a few steps:

  1. Make sure you have a node pool with gvisor sandbox enabled. For example, if the CLI command is used make sure to include the --sandbox type=gvisor (See: https://cloud.google.com/kubernetes-engine/docs/concepts/sandbox-pods)
gcloud container node-pools create smt-enabled \
--cluster=CLUSTER_NAME \
--machine-type=MACHINE_TYPE \
--node-labels=cloud.google.com/gke-smt-disabled=false \
--image-type=cos_containerd \
--sandbox type=gvisor

2. Verify gVisor is enabled. Can be done using the following command:

› kubectl get runtimeclasses
NAME HANDLER AGE
gvisor gvisor 1d

if you find gvisorruntime class like the above — it’s a good sign.

3. Make sure your pod spec YAML fieldspec.template.spec.runtimeClassName is set to gvisor. For example:

# httpd.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: httpd
labels:
app: httpd
spec:
replicas: 1
selector:
matchLabels:
app: httpd
template:
metadata:
labels:
app: httpd
spec:
runtimeClassName: gvisor
containers:
- name: httpd
image: httpd

Tip specifically for GKE to make sure all your configuration is wrapped up together — you can kubectl exec your workload pods and make sure this command will return an empty response:

curl -s "http://metadata.google.internal/computeMetadata/v1/instance/attributes/kube-env" -H "Metadata-Flavor: Google"

To Sum Up

gVisor can be a great solution to secure your workloads with a super-easy setup on Docker and Kubernetes environment thanks to the OCI runtime spec. Enjoy your newly (more) secured workloads!

--

--