Enforce allow/deny Policies in kubernetes at a base controller level
I was at kubecon the in mid-November with an old roommate (in the same industry as me) and some coworkers but got to see some cool kubernetes stuff.
Since my job crosses the Security boundary, and I’m fairly new to k8s, most of my talks were somewhere on the basic level or opa level. With OPA what you get is a lightweight language to write policies your kubernetes cluster should adhere to.
Typically (I think) this is traditionally run by building your policies into a sidecar and forcing it to be an admission controller.
Gatekeeper simplifies this by removing the side-car for native CRD’s and adds a few things like audit logs.
I’m going to build up an example of this using EKS.
I assume we have a working EKS cluster:
123
$ aws-vault exec home -- kubectl get all
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 172.20.0.1 <none> 443/TCP 6h15m
Before we deploy anything, let’s set up GateKeeper v3:
1
$ aws-vault exec home -- kubectl apply -f https://raw.githubusercontent.com/open-policy-agent/gatekeeper/master/deploy/gatekeeper.yaml
Gatekeeper should now be running (albeit with no policies):
123456789
$ aws-vault exec home -- kubectl get all -n gatekeeper-system
NAME READY STATUS RESTARTS AGE
pod/gatekeeper-controller-manager-0 1/1 Running 1 144m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/gatekeeper-controller-manager-service ClusterIP 172.20.41.90 <none> 443/TCP 144m
NAME READY AGE
statefulset.apps/gatekeeper-controller-manager 1/1 144m
Next, before we move on, I’m going to set up policykit so that we can use real REGO (the language for OPA) instead of rego embedded inside yaml (allows policy testing etc later on):
123456789101112131415161718
$ pip install policykit
$ cat <<EOF >k8sallowedrepos.rego
package k8sallowedrepos
violation[{"msg": msg}] {
container := input.review.object.spec.containers[_]
satisfied := [good | repo = input.parameters.repos[_] ; good = startswith(container.image, repo)]
not any(satisfied)
msg := sprintf("container <%v> has an invalid image repo <%v>, allowed repos are %v", [container.name, container.image, input.parameters.repos])
}
violation[{"msg": msg}] {
container := input.review.object.spec.initContainers[_]
satisfied := [good | repo = input.parameters.repos[_] ; good = startswith(container.image, repo)]
not any(satisfied)
msg := sprintf("container <%v> has an invalid image repo <%v>, allowed repos are %v", [container.name, container.image, input.parameters.repos])
}
EOF
What we have now is a rego file that will validate base images (parameterized). Let’s write some tests.
...snip noise...
{"level":"info","ts":1575315769.4250739,"logger":"controller","msg":"constraint","metaKind":"audit","count of constraints":1}
...snip noise...
Let’s push up a Helm chart that violates it:
123456789101112131415161718
$ cat charts/cloud-custodian/values.yaml | grep repos
repository: 11111111111.dkr.ecr.us-east-1.amazonaws.com/cloud-custodian
$ helm package charts/cloud-custodian/
Successfully packaged chart and saved it to: charts/cloud-custodian-0.1.2.tgz
$ aws-vault exec home -- helm install --name cloud-custodian charts/cloud-custodian-0.1.2.tgz
NAME: cloud-custodian
LAST DEPLOYED: Mon Dec 2 13:44:47 2019
NAMESPACE: default
STATUS: DEPLOYED
RESOURCES:
==> v1/Deployment
NAME READY UP-TO-DATE AVAILABLE AGE
cloud-custodian 0/1 0 0 0s
==> v1/ServiceAccount
NAME SECRETS AGE
cloud-custodian 1 0s
Did it work?
123456789
$ aws-vault exec home -- kubectl get all
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 172.20.0.1 <none> 443/TCP 6h25m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/cloud-custodian 0/1 0 0 18s
NAME DESIRED CURRENT READY AGE
replicaset.apps/cloud-custodian-744bd4768d 1 0 0 18s
It’s not running. let’s look:
12345
$ aws-vault exec home -- kubectl describe replicaset.apps/cloud-custodian-744bd4768d | tail -n 4
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreate 3s (x14 over 44s) replicaset-controller Error creating: admission webhook "validation.gatekeeper.sh" denied the request: [denied by default-repo-is-wrong] container <cloud-custodian> has an invalid image repo <11111111111.dkr.ecr.us-east-1.amazonaws.com/cloud-custodian:latest>, allowed repos are ["alpine"]
And just like that we enforced a policy that prevented this from running!
More to come from OPA as I get more comfortable with it and learn how to test the policies before they go into K8S!