Show HN: Holos – Configure Kubernetes with CUE data structures instead of YAML

Hi HN! I’m excited to share Holos, a Go command line tool we wrote to fill the configuration management gap in Kubernetes. Holos uses CUE to configure software distributed with Helm and Kustomize using a well defined, type safe language eliminating the need to template YAML. You probably know (or are) someone who has suffered with the complexity of plain text YAML templates and merging multiple values.yaml files together to configure software running in Kubernetes. We built Holos so we don’t have to template YAML but we can still integrate software distributed with Helm and Kustomize holistically into one unified configuration.

At the start of the pandemic I was migrating our platform to Kubernetes from virtual machines managed by Puppet. My primary goal was to build an observability system similar to what we had when we managed Puppet at Twitter prior to the acquisition. I started building the observability system with the official prometheus community charts [1], but quickly ran into issues where the individual charts didn’t work with each other. I was frustrated with how difficult it was to configure these charts. They weren’t well integrated, so I switched to the kube-prometheus-stack [2] umbrella chart which attempts to solve this integration problem.

The umbrella chart got us further but we quickly ran into operational challenges. Upgrading the chart introduced breaking changes we couldn’t see until they were applied, causing incidents. We needed to manage secrets securely so we mixed in ExternalSecrets with many of the charts. We decided to handle these customizations by implementing the rendered manifests pattern [3] using scripts in our CI pipeline.

These CI scripts got us further, but we found them costly to maintain. We needed to be careful to execute them with the same context they were executed in CI. We realized we were reinventing tools to manage a hierarchy of helm values.yaml files to inject into multiple charts.

We saw the value in the rendered manifests pattern but could not find an agreed upon implementation. I’d been thinking about the comments from the Why are we templating YAML? [4][5] posts and wondering what an answer to this question would look like, so I built a Go command line tool to implement the pattern as a data pipeline. We still didn’t have a good way to handle the data values. We were still templating YAML which didn’t catch errors early enough. It was too easy to render invalid resources Kubernetes rejected.

I searched for a solution to manage and merge helm values. A few HN comments mentioned CUE [6], and an engineer we worked with at Twitter used CUE to configure Envoy at scale, so I gave it a try. I quickly appreciated how CUE provides both strong type checking and validation of constraints, unifies all configuration data, and provides clarity into where values originate from.

Take a look at Holos if you’re looking to implement the rendered manifests pattern or can’t shake that feeling it should be easier to integrate third party software into Kubernetes like we felt. We recently overhauled our docs to be easier to get started and work locally on your device.

In the future we’re planning to use Holos much like Debian uses APT, to integrate open source software into a holistic k8s distribution.

[1]: <https://github.com/prometheus-community/helm-charts>

[2]: <https://github.com/prometheus-community/helm-charts/tree/mai...>

[3]: <https://akuity.io/blog/the-rendered-manifests-pattern>

[4]: Why are we templating YAML? (2019) - <https://news.ycombinator.com/item?id=19108787>

[5]: Why are we templating YAML? (2024) - <https://news.ycombinator.com/item?id=39101828>

[6]: <https://cuelang.org/>

77
34
JeffMcCune
10 months ago
holos.run

eloip
·
10 months ago
·
[ - ]

This is wonderful, thank you! A relieve for devops/YAML engineers that need to reason about many key/values coming from many places. Because in the end this is all there is for the user interface of IaC/XaaS, k8s and all cloud apis. There was some effort for "configuration management" but few realizes the complexity, the many layers and aspects there is to "it". YAML ain't mearly enough... But the space of "configuration PLs" (Dhall,Nickel,Pkl,KCL,CUE,Jsonnet,etc.) is still young. Biggest problem I see is usability, CUE focuses on it so people shouldn't be afraid. But it is also little behind the others in term of features, but also have the greatest potential! IMO any new tool in the cloud space that uses code abstractions cannot be serious by not thinking about the language. Transitions may be though but they ough to happen.

nosefrog
·
10 months ago
·
[ - ]

I got burned so bad by config languages at Google (specifically gcl) that we're generating Kubernetes yamls using the Kubernetes python client now.

My unpopular opinion is that config languages with logic are bad because they're usually very hard to debug. The best config language I've used is Bazel's starlark, which is just python with some additional restrictions.

eloip
·
10 months ago
·
[ - ]

Good point. Earlier I mentioned languages insisting on the key/value side, but I totally include starlark and the imperative paradigm into the equation. There is also things done in the Typescript side and the goal is always similar: add safety for catching errors earlier. There should be a choice for everyone's flavor of that.

progbits
·
10 months ago
·
[ - ]

I'm in the opposite camp and now that I can't use gcl I miss it. The whole yaml string templating and copypasta approach is the worst one possible.

Is gcl great? No. Cue and dhal have better semantics. Do some people abuse it? Sure, but still much better than helm templating.

nosefrog
·
10 months ago
·
[ - ]

I worked on the largest gcl deployment at Google (Union). We had multiple outages caused by our config system.

I agree, it's better than helm templating. But it's not better than python!

JeffMcCune
·
10 months ago
·
[ - ]

I share Marcel's view, a DSL is the ideal position on the configuration complexity clock. CUE in particular is ideal for configuration because of how it handles unification.

I wrote up why we selected CUE here, with links to explanations from Marcel who explains it better than I: https://holos.run/blog/why-cue-for-configuration/

These comments in particular reminded me of Marcel's video linked at the bottom of that article, where he talks about the history of CUE in the context of configuration languages at Google: https://www.youtube.com/watch?v=jSRXobu1jHk

adsharma
·
10 months ago
·
[ - ]

> Many general purpose languages support type checking, but few support constraints and validation of data. We must write our own validation logic which often means validation happens haphazardly, if at all.

Python with z3 and some AST magic can support constraint validation.

In fact, type checking can be seen as a form of that. I have a fork of typpete, a z3 based type checker from 3 years ago that should still work.

https://adsharma.github.io/pysmt/

nosefrog
·
10 months ago
·
[ - ]

That's why I said it's an unpopular opinion :P

JeffMcCune
·
10 months ago
·
[ - ]

Got you, I did not mean to be argumentative, only to explain why Holos is what it is.

hbogert
·
10 months ago
·
[ - ]

Personally I love using Cuelang, but there's something about it that makes, at least my colleagues really reluctant to adopt it. Not sure what it is. They don't see the benefit.

My gut feeling so far is that they don't know the benefits of using strictly typed languages. They only see upfront cost of braincycles and more typing. They rather forego that and run into problems once it's deployed.

JeffMcCune
·
10 months ago
·
[ - ]

Go was the first language where I deeply appreciated strong typing. I’d used other before but it was go to definition working 100% of the time that did it for me. Maybe start there with them?

CUE is a close cousin to Go, the authors are deeply involved in Go. Marcel worked with Rob Pike on the design of CUE. I could see how it’d feel foreign, without first appreciating Go maybe CUE wouldn’t have clicked for me.

jtmcn
·
10 months ago
·
[ - ]

We already have an existing project with a bunch of Helm charts deployed using ArgoCD. What would be the benefit of using Holos now?

JeffMcCune
·
10 months ago
·
[ - ]

Thanks for asking! The teams we've worked with do one of two things when deploying Helm charts with ArgoCD. They either pass values directly to the chart from the Application resource, or they use scripts to merge values.yaml files together and pass them to the helm template command to implement the rendered manifests pattern.

In both cases it's tedious to manage the helm values, and they're usually managed without strong type checking or having been integrated into your platform as a whole. For example, you might pass a domain name to one chart and another value derived from the domain name to another chart, but the two charts likely use different field names for the inputs so the values are often inconsistent.

Holos uses CUE to unify the configuration into one holistic data structure, so we're able to look up data from well defined structures and pass them into Helm. We have an example of how this helps integrate the prometheus charts together at: https://holos.run/docs/v1alpha5/tutorial/helm-values/

This unification of configuration into one data structure isn't limited to Helm, you can produce resources for Kustomize from CUE in the same way, something that's otherwise quite difficult because Kustomize doesn't support templating. You can also mix-in resources to your existing Helm charts from CUE without needing to template yaml.

This approach works equally well for both in-house Helm charts you may have created to deploy your own software, or third party charts you're using to deploy off the shelf software.

Kinrany
·
10 months ago
·
[ - ]

Still waiting for a unified abstraction that covers both frontend libraries like React and GitOps. Where an application is composed out of components that communicate via declarative descriptions of desired states.

woile
·
10 months ago
·
[ - ]

Hey congrats for the launch!

How does it compare to timoni[0]?

[0]: https://github.com/stefanprodan/timoni

notafanatall
·
10 months ago
·
[ - ]

Timoni seems to be focused on managing applications by evaluating CUE modules stored in OCI containers, and produces OCI containers as artifacts. It looks like Holos is more of a configuration management solution that also uses CUE, but produces rendered Kubernetes manifests instead of OCI containers. Holos narrowly focuses on integrating different types of components and producing "hydrated" manifests while relying on other solutions (Argo/Flux) to apply/enforce the manifests.

JeffMcCune
·
10 months ago
·
[ - ]

This is a great summary, thank you.

JeffMcCune
·
10 months ago
·
[ - ]

Thanks! Holos and Timoni solve similar problems but at slightly different levels of the stack and with a different approach. Timoni is focused on managing applications by evaluating CUE stored in an OCI container. Stephan expressed his intention to have a controller apply the resulting manifests similar to Flux. Timoni defers to Flux to manage Helm charts in-cluster, Holos renders Helm charts to manifests much like ArgoCD does, using helm template within the rendering pipeline.

We take a slightly different approach in a few important ways.

1. Holos stops short of applying manifests to a cluster, leaving that responsibility to existing tools like ArgoCD, Flux, plain kubectl apply, or other tools. We're intentional about leaving space for other tools to operate after manifests are rendered before they're applied. For example, we pair nicely with Kargo for progressive rollouts. Kargo sits between Holos and the Kubernetes API and fits well with Holos because both tools are focused on the rendered manifest pattern.

2. Holos focus on the integration of multiple Components into an overall Platform. I capitalized them because they mean specific things in Holos, a Component is our way of wrapping existing Helm charts, Kustomize bases, and plain raw yaml files, in CUE to implement the rendering pipeline.

3. We're explicit about the rendering pipeline, our intent is that any tool that generates k8s config manifests could be wrapped up in a Generator in our pipeline. The output of that tool can then be fed into existing transformers like Kustomize.

I built the rendering pipeline this way because I often would take an upstream helm chart, mix in some ExternalSecret resources or what not, then pipe the output through Kustomize to tweak some things here and there. Holos is a generalization of that, it's been useful because I no longer need to write any YAML to work with Kustomize, it's all pure data in CUE to do the same thing.

Those are the major difference. I'd summarize them as holos focuses on the integration layer, integrating multiple things together. Kind of like an umbrella chart, but using well defined CUE as the method of integrating things together. The output of each tool is another main difference. Holos outputs fully rendered manifests to the local file system intended to be committed to a GitOps repo. Timoni acts as a controller, applying manifests to the cluster.

aliasxneo
·
10 months ago
·
[ - ]

> For example, we pair nicely with Kargo for progressive rollouts

You've greatly piqued my interest (we're currently using Timoni). However, how does this integration work exactly? We've been wanting to try Kargo for awhile but it appears there's no support for passing off the YAML rendering to an external tool.

JeffMcCune
·
9 months ago
·
[ - ]

For posterity, I'm documenting the integration between Holos and Kargo at https://holos.run/docs/kargo/

JeffMcCune
·
10 months ago
·
[ - ]

Holos writes fully rendered manifests to the local filesystem, a gitops repository. Kargo watches a container registry for new images, then uses the rendered manifests Holos produced as the input to the promotion process.

The specific integration point I explored last week was configuring Holos to write a Kustomize kustomization.yaml file along side each of the components that have promotable artifacts. This is essentially just a stub for Kargo to come along and edit with a promotion step.

For example the output of the Holos rendering process for our demo "Bank of Holos" front end is this directory:

https://github.com/holos-run/bank-of-holos/tree/v0.6.2/deplo...

Normally, ArgoCD would deploy this straight from Git, but we changed the Application resource we configure to instead look where Kargo suggests, which is a different branch. Aside, I'm not sure I like the multiple branches approach but I'm following Kargo's docs initially then will look at doing it again in the same branch but using different folders.

https://github.com/holos-run/bank-of-holos/blob/v0.6.2/deplo...

Note the targetRevision: stage/dev Kargo is responsible for this branch.

The Kargo promotion stages are configured by this manifest Holos renders:

https://github.com/holos-run/bank-of-holos/blob/v0.6.2/deplo...

Holos writes this file out from the CUE configuration when it renders the platform. The corresponding CUE configuration responsible for producing that rendered manifest is at:

https://github.com/holos-run/bank-of-holos/blob/v0.6.2/proje...

This is all a very detailed way of saying: Holos renders the manifests, you commit them to git, Kargo uses Kustomize edit to patch in the promoted container image and renders the manifests _again_ somewhere else (branch or folder). ArgoCD applies the manifests Kargo writes.

I'll write it up properly soon, but the initial notes from my spike last week are at https://github.com/holos-run/bank-of-holos/blob/main/docs/ka...

I recorded a quick video of it at https://www.youtube.com/watch?v=m0bpQugSbzA which is more a demo of Kargo than Holos really.

What we were concerned about was we'd need to wait for Kargo to support arbitrary commands as promotion steps, something they're not targeting on having until v1.3 next year. Luckily, we work very well with their existing promotion steps because we take care to stop once we write the rendered manifests, leaving it up you to use additional tools like kubectl apply, Kargo, Flux, ArgoCD, etc...

·
10 months ago
·
[ - ]

movedx
·
10 months ago
·
[ - ]

The diagram on the front page immediately makes me question why all of this is so complex given the end goal is to run a process.

bccdee
·
10 months ago
·
[ - ]

You don't just want to run a process. You probably want to run several, across several different machines, such that they'll scale automatically in response to load, restart if they become unhealthy, load balance, failover across hardware, and admit zero-downtime deployments. You may want to do all of this across several parallel clusters with different configurations (staging, production, production-europe, etc).

You also want to worry about all of the above as little as possible, which is why an abstraction like holos could be useful. I've been hoping to see an ergonomic application of CUE to k8s manifests for a while now.

aliasxneo
·
10 months ago
·
[ - ]

My guess is you haven't been at the scale where it starts to make sense. If you always approach these tools with the mindset of running a single process on a server somewhere, they are always going to appear as over complicating things. It's not a fair comparison.

movedx
·
10 months ago
·
[ - ]

That's a fair remark. I think the biggest scale I've operated at was maybe a few million customers, consumer and business, for Flight Centre, which is a travel agent that serves pretty much the entire of Australia and overseas too. A few big banks too. None of them used K8s at the time and operated fine. Might have drank the kool aide by now, though.

bccdee
·
10 months ago
·
[ - ]

I think scale has less to do with how many customers you're serving and more to do with how much data you're processing. Streaming video to 10,000 people is much more data-intensive than exposing a CRUD interface to 1M.

Also k8s is not, in and of itself, a complication. You're going to have to worry about the same set of problems whether or not you use k8s. Deployments, load-balancing, self-healing, etc. It's not like ops teams didn't do anything before k8s came along.

politelemon
·
10 months ago
·
[ - ]

Not necessarily true, the complication that k8s brings isn't the same that other implementations bring. More moving parts in a highly pluggable architecture brings with it more opinions and optimisations and overhead. Also managed services exist which means several specific issues don't exist. The ops teams simply do something else.

movedx
·
10 months ago
·
[ - ]

I’ve yet to find a problem that two servers and a load balancer can’t resolve — see the above scales I’ve worked at. All simple setups.

JeffMcCune
·
10 months ago
·
[ - ]

Could you clarify what process you're speaking of as the end goal?

Holos is designed to produce configuration for Kubernetes. That configuration can be as small as a 10 line yaml file, but in larger operations it often ends up being multiple millions of lines of yaml spread over multiple clusters across the globe.

Once Holos produces the configuration, we stop. We leave it up to you to decide what to do next with it. For example we pass it to ArgoCD to deploy, others pass it to Flux. In development I pass it directly to kubectl apply.

_def
·
10 months ago
·
[ - ]

tangent: how do people in general manage their k8s yaml? Do you keep manifests around, or stuff them away in helm charts? Something completely different? I especially wonder about ways to manage deduplication

vbezhenar
·
10 months ago
·
[ - ]

For deployment descriptors that I need to write, I'm using kustomize templating everywhere. Works good enough for me, so never felt the need to reach helm or anything else.

I also use helm for some software, but I never edit it, just configure using values.

Honestly it all seems like craziness and the day kustomize will not be enough for me, I'll just write small JavaScript program generating JSONs. Don't need to make it harder than that...

seer
·
10 months ago
·
[ - ]

We use terraform to manage both raw manifests as well as helm charts. It works reasonably well for either, just need to configure to ignore some annotations from time to time.

This allows one to easily stitch together both k8s as with all your other infra.

NewJazz
·
10 months ago
·
[ - ]

I've done a lot of ways:

* k8s yaml in the repo, kubectl

* kustomize

* tanka and jsonnet

* pulumi

So far jsonnet is the only method that I've used successfully for a long time. I'm working on some new stuff with pulumi, though.

cantoni
·
10 months ago
·
[ - ]

Love this, putting in my back pocket for future Kubernetes integrations.