At the start of the pandemic I was migrating our platform to Kubernetes from virtual machines managed by Puppet. My primary goal was to build an observability system similar to what we had when we managed Puppet at Twitter prior to the acquisition. I started building the observability system with the official prometheus community charts [1], but quickly ran into issues where the individual charts didn’t work with each other. I was frustrated with how difficult it was to configure these charts. They weren’t well integrated, so I switched to the kube-prometheus-stack [2] umbrella chart which attempts to solve this integration problem.
The umbrella chart got us further but we quickly ran into operational challenges. Upgrading the chart introduced breaking changes we couldn’t see until they were applied, causing incidents. We needed to manage secrets securely so we mixed in ExternalSecrets with many of the charts. We decided to handle these customizations by implementing the rendered manifests pattern [3] using scripts in our CI pipeline.
These CI scripts got us further, but we found them costly to maintain. We needed to be careful to execute them with the same context they were executed in CI. We realized we were reinventing tools to manage a hierarchy of helm values.yaml files to inject into multiple charts.
We saw the value in the rendered manifests pattern but could not find an agreed upon implementation. I’d been thinking about the comments from the Why are we templating YAML? [4][5] posts and wondering what an answer to this question would look like, so I built a Go command line tool to implement the pattern as a data pipeline. We still didn’t have a good way to handle the data values. We were still templating YAML which didn’t catch errors early enough. It was too easy to render invalid resources Kubernetes rejected.
I searched for a solution to manage and merge helm values. A few HN comments mentioned CUE [6], and an engineer we worked with at Twitter used CUE to configure Envoy at scale, so I gave it a try. I quickly appreciated how CUE provides both strong type checking and validation of constraints, unifies all configuration data, and provides clarity into where values originate from.
Take a look at Holos if you’re looking to implement the rendered manifests pattern or can’t shake that feeling it should be easier to integrate third party software into Kubernetes like we felt. We recently overhauled our docs to be easier to get started and work locally on your device.
In the future we’re planning to use Holos much like Debian uses APT, to integrate open source software into a holistic k8s distribution.
[1]: <https://github.com/prometheus-community/helm-charts>
[2]: <https://github.com/prometheus-community/helm-charts/tree/mai...>
[3]: <https://akuity.io/blog/the-rendered-manifests-pattern>
[4]: Why are we templating YAML? (2019) - <https://news.ycombinator.com/item?id=19108787>
[5]: Why are we templating YAML? (2024) - <https://news.ycombinator.com/item?id=39101828>
[6]: <https://cuelang.org/>
My unpopular opinion is that config languages with logic are bad because they're usually very hard to debug. The best config language I've used is Bazel's starlark, which is just python with some additional restrictions.
Is gcl great? No. Cue and dhal have better semantics. Do some people abuse it? Sure, but still much better than helm templating.
I agree, it's better than helm templating. But it's not better than python!
I wrote up why we selected CUE here, with links to explanations from Marcel who explains it better than I: https://holos.run/blog/why-cue-for-configuration/
These comments in particular reminded me of Marcel's video linked at the bottom of that article, where he talks about the history of CUE in the context of configuration languages at Google: https://www.youtube.com/watch?v=jSRXobu1jHk
Python with z3 and some AST magic can support constraint validation.
In fact, type checking can be seen as a form of that. I have a fork of typpete, a z3 based type checker from 3 years ago that should still work.
My gut feeling so far is that they don't know the benefits of using strictly typed languages. They only see upfront cost of braincycles and more typing. They rather forego that and run into problems once it's deployed.
CUE is a close cousin to Go, the authors are deeply involved in Go. Marcel worked with Rob Pike on the design of CUE. I could see how it’d feel foreign, without first appreciating Go maybe CUE wouldn’t have clicked for me.
In both cases it's tedious to manage the helm values, and they're usually managed without strong type checking or having been integrated into your platform as a whole. For example, you might pass a domain name to one chart and another value derived from the domain name to another chart, but the two charts likely use different field names for the inputs so the values are often inconsistent.
Holos uses CUE to unify the configuration into one holistic data structure, so we're able to look up data from well defined structures and pass them into Helm. We have an example of how this helps integrate the prometheus charts together at: https://holos.run/docs/v1alpha5/tutorial/helm-values/
This unification of configuration into one data structure isn't limited to Helm, you can produce resources for Kustomize from CUE in the same way, something that's otherwise quite difficult because Kustomize doesn't support templating. You can also mix-in resources to your existing Helm charts from CUE without needing to template yaml.
This approach works equally well for both in-house Helm charts you may have created to deploy your own software, or third party charts you're using to deploy off the shelf software.
How does it compare to timoni[0]?
We take a slightly different approach in a few important ways.
1. Holos stops short of applying manifests to a cluster, leaving that responsibility to existing tools like ArgoCD, Flux, plain kubectl apply, or other tools. We're intentional about leaving space for other tools to operate after manifests are rendered before they're applied. For example, we pair nicely with Kargo for progressive rollouts. Kargo sits between Holos and the Kubernetes API and fits well with Holos because both tools are focused on the rendered manifest pattern.
2. Holos focus on the integration of multiple Components into an overall Platform. I capitalized them because they mean specific things in Holos, a Component is our way of wrapping existing Helm charts, Kustomize bases, and plain raw yaml files, in CUE to implement the rendering pipeline.
3. We're explicit about the rendering pipeline, our intent is that any tool that generates k8s config manifests could be wrapped up in a Generator in our pipeline. The output of that tool can then be fed into existing transformers like Kustomize.
I built the rendering pipeline this way because I often would take an upstream helm chart, mix in some ExternalSecret resources or what not, then pipe the output through Kustomize to tweak some things here and there. Holos is a generalization of that, it's been useful because I no longer need to write any YAML to work with Kustomize, it's all pure data in CUE to do the same thing.
Those are the major difference. I'd summarize them as holos focuses on the integration layer, integrating multiple things together. Kind of like an umbrella chart, but using well defined CUE as the method of integrating things together. The output of each tool is another main difference. Holos outputs fully rendered manifests to the local file system intended to be committed to a GitOps repo. Timoni acts as a controller, applying manifests to the cluster.
You've greatly piqued my interest (we're currently using Timoni). However, how does this integration work exactly? We've been wanting to try Kargo for awhile but it appears there's no support for passing off the YAML rendering to an external tool.
The specific integration point I explored last week was configuring Holos to write a Kustomize kustomization.yaml file along side each of the components that have promotable artifacts. This is essentially just a stub for Kargo to come along and edit with a promotion step.
For example the output of the Holos rendering process for our demo "Bank of Holos" front end is this directory:
https://github.com/holos-run/bank-of-holos/tree/v0.6.2/deplo...
Normally, ArgoCD would deploy this straight from Git, but we changed the Application resource we configure to instead look where Kargo suggests, which is a different branch. Aside, I'm not sure I like the multiple branches approach but I'm following Kargo's docs initially then will look at doing it again in the same branch but using different folders.
https://github.com/holos-run/bank-of-holos/blob/v0.6.2/deplo...
Note the targetRevision: stage/dev Kargo is responsible for this branch.
The Kargo promotion stages are configured by this manifest Holos renders:
https://github.com/holos-run/bank-of-holos/blob/v0.6.2/deplo...
Holos writes this file out from the CUE configuration when it renders the platform. The corresponding CUE configuration responsible for producing that rendered manifest is at:
https://github.com/holos-run/bank-of-holos/blob/v0.6.2/proje...
This is all a very detailed way of saying: Holos renders the manifests, you commit them to git, Kargo uses Kustomize edit to patch in the promoted container image and renders the manifests _again_ somewhere else (branch or folder). ArgoCD applies the manifests Kargo writes.
I'll write it up properly soon, but the initial notes from my spike last week are at https://github.com/holos-run/bank-of-holos/blob/main/docs/ka...
I recorded a quick video of it at https://www.youtube.com/watch?v=m0bpQugSbzA which is more a demo of Kargo than Holos really.
What we were concerned about was we'd need to wait for Kargo to support arbitrary commands as promotion steps, something they're not targeting on having until v1.3 next year. Luckily, we work very well with their existing promotion steps because we take care to stop once we write the rendered manifests, leaving it up you to use additional tools like kubectl apply, Kargo, Flux, ArgoCD, etc...
You also want to worry about all of the above as little as possible, which is why an abstraction like holos could be useful. I've been hoping to see an ergonomic application of CUE to k8s manifests for a while now.
Also k8s is not, in and of itself, a complication. You're going to have to worry about the same set of problems whether or not you use k8s. Deployments, load-balancing, self-healing, etc. It's not like ops teams didn't do anything before k8s came along.
Holos is designed to produce configuration for Kubernetes. That configuration can be as small as a 10 line yaml file, but in larger operations it often ends up being multiple millions of lines of yaml spread over multiple clusters across the globe.
Once Holos produces the configuration, we stop. We leave it up to you to decide what to do next with it. For example we pass it to ArgoCD to deploy, others pass it to Flux. In development I pass it directly to kubectl apply.
I also use helm for some software, but I never edit it, just configure using values.
Honestly it all seems like craziness and the day kustomize will not be enough for me, I'll just write small JavaScript program generating JSONs. Don't need to make it harder than that...
This allows one to easily stitch together both k8s as with all your other infra.
* k8s yaml in the repo, kubectl
* kustomize
* tanka and jsonnet
* pulumi
So far jsonnet is the only method that I've used successfully for a long time. I'm working on some new stuff with pulumi, though.