More often than not organizations need to apply various kinds of policies on the environments where they run their applications. These policies might be required to meet compliance requirements, achieve a higher degree of security, achieve standardization across multiple environments, etc. This calls for an automated/declarative way to define and enforce these policies. Policy engines like OPA help us achieve the same.
Motivation behind Open Policy Agent (OPA)
When we run our application, it generally comprises multiple subsystems. Even in the simplest of cases, we will be having an API gateway/load balancer, 1–2 applications, and a database. Generally, all these subsystems will have different mechanisms for authorizing the requests, for example, the application might be using JWT tokens to authorize the request, but your database is using grants to authorize the request, it is also possible that your application is accessing some third-party APIs or cloud services which will again have a different way of authorizing the request. Add to this your CI/CD servers, your log server, etc and you can see how many different ways of authorization can exist even in a small system.
The existence of so many authorization models in our system makes life difficult when we need to meet compliance or information security requirements or even some self-imposed organizational policies. For example, if we need to adhere to some new compliance requirements then we need to understand and implement the same for all the components which do authorization in our system.
“The main motivation behind OPA is to achieve unified policy enforcements across the stack”
What are Open Policy Agent (OPA) and OPA Gatekeeper
The OPA is an open-source, general-purpose policy engine that can be used to enforce policies on various types of software systems like microservices, CI/CD pipelines, gateways, Kubernetes, etc. OPA was developed by Styra and is currently a part of CNCF.
OPA provides us with REST APIs which our system can call to check if the policies are being met for a request payload or not. It also provides us with a high-level declarative language, Rego which allows us to specify the policies we want to enforce as code. This provides us with lots of flexibility while defining our policies.
The above image shows the architecture of OPA. It exposes APIs which any service that needs to make an authorization or policy decision, can call (policy query), and then OPA can make a decision based on the Rego code for the policy and return a decision to the service that further processes the request accordingly. The enforcement is done by the actual service itself, OPA is responsible only for making the decision. This is how OPA becomes a general-purpose policy engine and supports a large number of services.
The Gatekeeper project is a Kubernetes specific implementation of the OPA. Gatekeeper allows us to use OPA in a Kubernetes native way to enforce the desired policies.
How Gatekeeper enforces policies
On the Kubernetes cluster, the Gatekeeper is installed as a ValidatingAdmissionWebhook. The Admission Controllers can intercept requests after they have been authenticated and authorized by the K8s API server, but before they are persisted in the database. If any of the admission controllers rejects the request then the overall request is rejected. The limitation of admission controllers is that they need to be compiled into the kube-apiserver and can be enabled only when the apiserver starts up.
To overcome this rigidity of the admission controller, admission webhooks were introduced. Once we enable admission webhooks controllers in our cluster, they can send admission requests to external HTTP callbacks and receive admission responses. Admission webhook can be of two types MutatingAdmissionWebhook and ValidatingAdmissionWebhook. The difference between the two is that mutating webhooks can modify the objects that they receive while validating webhooks cannot. The below image roughly shows the flow of an API request once both mutating and validating admission controllers are enabled.
The role of Gatekeeper is to simply check if the request meets the defined policy or not, that is why it is installed as a validating webhook.
Now we have Gatekeeper up and running in our cluster. The above installation also created a CRD named `constrainttemplates.templates.gatekeeper.sh’. This CRD allows us to create constraint templates for the policy we want to enforce. In the constraint template, we define the constraints logic using the Rego code and also its schema. Once the constraint template is created, we can create the constraints which are instances of the constraint templates, created for specific resources. Think of it as function and actual function calls, the constraint templates are like functions that are invoked with different values of the parameter (resource kind and other values) by constraints.
To get a better understanding of the same, let’s go ahead and create constraints templates and constraints.
The policy that we want to enforce is to prevent developers from creating a service of type LoadBalancer in the `dev` namespace of the cluster, where they verify the working of other code. Creating services of type LoadBalancer in the dev environment is adding unnecessary costs.
Below is the constraint template for the same.
In the constraint template spec, we define a new object kind/type which we will use while creating the constraints, then in the target, we specify the Rego code which will verify if the request meets the policy or not. In the Rego code, we specify a violation that if the request is to create a service of type LoadBalancer then the request should be denied.
Using the above template, we can now define constraints:
Here we have specified the kind of the Kubernetes object (Service) on which we want to apply the constraint and we have specified the namespace as dev because we want the constraint to be enforced only on the dev namespace.
Let’s go ahead and create the constraint template and constraint:
Note: After creating the constraint template, please check if its status is true or not, otherwise you will get an error while creating the constraints. Also it is advisable to verify the Rego code snippet before using them in the constraints template.
Now let’s try to create a service of type LoadBalancer in the dev namespace:
When we tried to create a service of type LoadBalancer in the dev namespace, we got the error that it was denied by the admission webhook due to `deny-lb-type-svc-dev-ns` constraint, but when we try to create the service in the default namespace, we were able to do so.
Here we are not passing any parameters to the Rego policy from our constraints, but we can certainly do so to make our policy more generic, for example, we can add a field named servicetype to constraint template and in the policy code, deny all the request where the servicetype value defined in the constraint matches the value of the request. With this, we will be able to deny service of types other than LoadBalancer as well in any namespace of our cluster.
Gatekeeper also provides auditing for resources that were created before the constraint was applied. The information is available in the status of the constraint objects. This helps us in identifying which objects in our cluster are not compliant with our constraints.
OPA allows us to apply fine-grained policies in our Kubernetes clusters and can be instrumental in improving the overall security of Kubernetes clusters which has always been a concern for many organizations while adopting or migrating to Kubernetes. It also makes meeting the compliance and audit requirements much simpler. There is some learning curve as we need to get familiar with Rego to code our policies, but the language is very simple and there are quite a few good examples to help in getting started.