The Pain That Is GitHub Actions

deng
·
3 months ago
·
[ - ]

Already see people saying GitLab is better: yes it is, but it also sucks in different ways.

After years of dealing with this (first Jenkins, then GitLab, then GitHub), my takeaway is:

* Write as much CI logic as possible in your own code. Does not really matter what you use (shell scripts, make, just, doit, mage, whatever) as long as it is proper, maintainable code.

* Invest time that your pipelines can run locally on a developer machine as well (as much as possible at least), otherwise testing/debugging pipelines becomes a nightmare.

* Avoid YAML as much as possible, period.

* Don't bind yourself to some fancy new VC-financed thing that will solve CI once and for all but needs to get monetized eventually (see: earthly, dagger, etc.)

* Always use your own runners, on-premise if possible

hi_hi
·
3 months ago
·
[ - ]

I came to the exact same conclusion accidentally in my first role as a Tech Lead a few years back.

It was a large enterprise CMS project. The client had previously told everyone they couldn't automate deployments due to the hosted platform security, so deployments of code and configs were all done manually by a specific support engineer following a complex multistep run sheet. That was going about as well as you'd expect.

I first solved my own headaches by creating a bunch of bash scripts to package and deploy to my local server. Then I shared that with the squads to solve their headaches. Once the bugs were ironed out, the scripts were updated to deploy from local to the dev instance. Jenkins was then brought in an quickly setup to use the same bash scripts, so now we had full CI/CD working to dev and test. Then the platform support guy got bored manually following the run sheet approach and started using our (now mature) scripts to automate deployments to stage and prod.

By the time the client found out I'd completely ignored their direction they were over the moon because we had repeatable and error free automated deployments from local all the way up to prod. I was quite proud of that piece of gorilla consulting :-)

badloginagain
·
3 months ago
·
[ - ]

I hate the fact that CI peaked with Jenkins. I hate Jenkins, I hate Groovy, but for every company I've worked for there's been a 6-year-uptime Jenkins instance casually holding up the entire company.

There's probably a lesson in there.

mike_hearn
·
3 months ago
·
[ - ]

It peaked with Jenkins? I'm curious which CI platforms you've used.

I swear by TeamCity. It doesn't seem to have any of these problems other people are facing with GitHub Actions. You can configure it with a GUI, or in XML, or using a type safe Kotlin DSL. These all actually interact so you can 'patch' a config via the GUI even if the system is configured via code, and TeamCity knows how to store config in a git repository and make commits when changes are made, which is great for quick things where it's not worth looking up the DSL docs or for experimentation.

The UI is clean and intuitive. It has all the features you'd need. It scales. It isn't riddled with insecure patterns like GH Actions is.

DanielHB
·
3 months ago
·
[ - ]

I think people just hate CI set up by other people. I used TeamCity in a job a few years back and I absolutely hated it, however I imagine a lot of my hatred was the way it was set up.

CI is just the thing no one wants to deal with, yet everyone wants to just work. And like any code or process, you need engineering to make it good. And like any project, you can't just blame bad tools for crappy results.

orthoxerox
·
3 months ago
·
[ - ]

I think people just hate enterprise CI/CD. Setting up a pipeline for your own project isn't this hard and provides immediate value. But then you start getting additional requirements like "no touching your CI/CD code", "no using plugins except A, B or C" and "deployment must be integrated with Rally/Microfocus/another corporate change management system". Suddenly your pipelines become weird and brittle and feel like busywork.

mike_hearn
·
3 months ago
·
[ - ]

It seems to inspire strong feelings. I set it up at a previous company and at some point after I left they replaced it with Jenkins. However, nobody could explain to me why or what problems they thought they were solving. The feedback was the sort of thing you're saying now: a dislike that can't be articulated.

Whereas, I could articulate why I didn't like Jenkins just fine :)

bluGill
·
3 months ago
·
[ - ]

I would feel that way but I've had the misfortune to work with a wide open ci system where any developer could make thanges and one guy did. The locked down system prevents me form some changes I want but in return my builds don't suddenly start failing because some ci option was turned on for everyone.

HeavyStorm
·
3 months ago
·
[ - ]

I totally prefer to have the ci break from time to time and be able to fix it than having the risk of it being broken and having no way of fixing it

bluGill
·
3 months ago
·
[ - ]

The people who admin our CI system do a good job so it doesn't break, (well it does all the time, but onnetwork type errors not configuration - that is IT's fault not their fault.)

The thing I want to change are things that I do in the build system so that it is checked in and previous versions when we need to build them (we are embedded where field failure is expensive so there are typically branches for the current release, next release, and head). This also means anything that can fail on CI can fail on my local system (unless it depends on something like the number of cores on the machine running the build).

While the details can be slightly different, how we have CI is how it should be. most developers should have better things to do than worry about how to configure CI.

DanielHB
·
3 months ago
·
[ - ]

In our CI we do a lot of clever stuff like posting comments to github PRs, sending messages on slack, etc. Even though those are useful things it makes the CI a bit harder to make changes to and test. Making it do more things also makes it a bit of a blackbox.

__float
·
3 months ago
·
[ - ]

TeamCity's "config as code" feels a bit like an afterthought to me. (It's very Windows-style, where PowerShell got bolted on, and you're still fighting a bit of an upstream current getting clickops users out of old habits. I've also only experienced it at .NET-stack jobs, though, so I might be a bit biased :-)

(I don't recall _loving_ it, though I don't have as many bad memories of it as I do for VSTS/TFS, GitLab, GH Actions, Jenkins Groovyfiles, ...)

eXpl0it3r
·
3 months ago
·
[ - ]

The quotes around "config as code" are necessary unfortunately, because TeamCity only allows minimal config changes. The UI will always show the configuration from the main branch and if you remove or add steps it might not work.

We needed two more or less completely different configurations for old a new versions of the same software (think hotfix for past releases), but TeamCity can't handle this scenario at all. So now we have duplicated the configuration and some hacky version checks that cancel incompatible builds.

Maybe their new Pipeline stuff fixes some of these short comings.

dmuso
·
3 months ago
·
[ - ]

Try doing a clean git clone in TeamCity. Nope, not even with the plugins that claim “clean clone” capability. You should be confident that CI can build/run/test an app with a clean starting point. If the CI forces a cached state on an agent that you can’t clear… TeamCity just does it wrong.

mike_hearn
·
3 months ago
·
[ - ]

You just check the "delete files in checkout directory" box in the run screen. Are you thinking of something different? I've never had trouble doing a clean clone.

dmuso
·
3 months ago
·
[ - ]

It’s been a while since I used it but I do remember that it doesn’t do a clean checkout and you can’t force it to. It leaves artifacts on the agent that can interfere with subsequent builds. I assume they do it for speed but it can affect reliability of builds

mike_hearn
·
3 months ago
·
[ - ]

I don't know when you used it, but I've used it for years and it's always had that feature in every version I've used.

GabrielTFS
·
3 months ago
·
[ - ]

git clean refuses to work ?

ForTheKidz
·
3 months ago
·
[ - ]

> You can configure it with a GUI, or in XML, or using a type safe Kotlin DSL.

This is making me realize I want a CI with as few features as possible. If I'm going to spend months of my life debugging this thing I want as few corners to check as I can manage.

mike_hearn
·
3 months ago
·
[ - ]

I've never had to spend time debugging TeamCity setups. It's very transparent and easy to understand (to me, at least).

I tend to stick with the GUI because if you're doing JVM style work the complexity and tasks is all in the build you can run locally, the CI system is more about task scheduling so it's not that hard to configure. But being able to migrate from GUI to code when the setup becomes complex enough to justify it is a very nice thing.

finnthehuman
·
3 months ago
·
[ - ]

Jenkins is cron with bells and whistles. The result is a pile of plugins to capture all the dimensions of complexity you are likely to otherwise bury in the shell script but want them easier to point and click at. I'll hate on jenkins with the rest of them, but entropy is gonna grow and Jenkins isn't gonna say "no, you can't do that here". I deal with multiple tools where if tried to make fun about how low the jenkins plugin install starts are, you'd know exactly where I work. Once I've calmed down from working on CI I can appreciate Jenkins' attempts to manage all of it.

Any CI product play has to differentiate in a way that makes you dependent on them. Sure it can be superficially nicer when staying inside the guard rails, but in the age of docker why has the number of ways I configure running boring shell scripts gone UP? Because they need me unable to use a lunch break to say "fuck you I don't need the integrations you reserve exclusively for your CI" and port all the jobs back to cron.

And that's why jenkins is king.

marcosdumay
·
3 months ago
·
[ - ]

And the lesson is that you want a simple UI to launch shell scripts, maybe with complex triggers but probably not.

If you make anything more than that, your CI will fail. And you can do that with Jenkins, so the people that did it saw it work. (But Jenkins can do so much more, what is the entire reason so many people have nightmares just by hearing that name.)

skor
·
3 months ago
·
[ - ]

well, I got tired of Groovy and found out that using Jenkins with plain bash under source control is just right for us. Runs everywhere, very fast to test/develop and its all easy to change and improve.

We build Docker images mostly so ymmv.

I have a "port to github actions" ticket in the backlog but I think we're not going to go down that road now.

__float
·
3 months ago
·
[ - ]

Yeah, I've come back around to this: you do not want "end users" writing Groovy, because the tooling around it is horrible.

You'll have to explain the weird CPS transformations, you'll probably end up reading the Jenkins plugins' code, and there's nothing fun down this path.

k4rli
·
3 months ago
·
[ - ]

It's feature complete. Anything more will just be bloat, probably 25% of it could be reduced at least.

rrr_oh_man
·
3 months ago
·
[ - ]

> gorilla consulting

Probably 'guerilla', but I like your version more.

hi_hi
·
3 months ago
·
[ - ]

Haha, I'm gonna admit it, all these years and I thought gorilla/guerilla was one of those American/British spelling things, like cheque/check or gaol/jail. Boy do I feel stupid.

lenova
·
3 months ago
·
[ - ]

Ahaha, I love the honesty here. I think we should adopt gorilla consulting into mainstream nonetheless.

Aeolun
·
3 months ago
·
[ - ]

I feel like gorilla consulting is one of those things that’s often deliberately misspelled? For no other reason than that it’s funny.

maest
·
3 months ago
·
[ - ]

..."gaol"?

QuercusMax
·
3 months ago
·
[ - ]

It's the British spelling of "jail", as in "John Bunyan, a prominent Puritan preacher and author, spent 12 years in Bedford Gaol from 1660 to 1672." Pronounced jail, I believe.

sd9
·
3 months ago
·
[ - ]

It’s the Gaelic spelling of “jail”. It hasn’t been used in mainstream British English since the 60s, outside of specific place names. Everyone in England says “jail” or “prison” today. It might be a bit different in Ireland.

williamdclt
·
3 months ago
·
[ - ]

Interesting! I assumed it was one of these loaned-but-misspelled words from French (geôle, pronounced johl with a soft j). I wonder if there’s a common etymology between the French and Gaelic

squiggleblaz
·
3 months ago
·
[ - ]

I think the PP was mistaken when they attributed it to Gaelic. It does indeed come from an antecedent of geôle; probably the spelling comes from the Norman form whereas the pronunciation comes from more widespread French forms. In any case, it isn't a case of "loaned-but-misspelled"; English got most of these words from times before French had standard spellings or - as in this case - pronunciations. And once they became part of English, they were subject to the future developments of English as English words, no longer French. It's like saying "geôle" is just misspelt Latin "caveola".

DonHopkins
·
3 months ago
·
[ - ]

That's when the devs all wear gorilla suits in Zoo meetings.

Wikipedia: Gorilla Suit: National Gorilla Suit Day:

https://en.wikipedia.org/wiki/Gorilla_suit#National_Gorilla_...

Put the Gorilla back in National Gorilla Suit Day:

https://www.instagram.com/mad.magazine/p/C2xgmVqOjL_/

Gorilla Suit Day – January 31, 2026:

https://nationaltoday.com/gorilla-suit-day/

National Gorilla Suit Day:

https://www.youtube.com/watch?v=N2n5gAN3IlI

noplacelikehome
·
3 months ago
·
[ - ]

There sure is a lot of chest beating

noplacelikehome
·
3 months ago
·
[ - ]

Nix is awesome for this -- write your entire series of CI tools in she'll or Python and run them locally in the exact same environment as they will run in CI. Add SOPS to bring secrets along for the ride.

jimbokun
·
3 months ago
·
[ - ]

Would Nix work well with GitHub Actions? Or is it more of a replacement? How do you automate running tests and deploying to dev on every push, for example?

lewo
·
3 months ago
·
[ - ]

> Would Nix work well with GitHub Actions?

You can use Nix with GitHub actions since there is a Nix GitHub action: https://github.com/marketplace/actions/install-nix. Every time the action is triggered, Nix rebuilds everything, but thanks to its caching (need to be configured), it only rebuilds targets that has changed.

> How do you automate running tests and deploying to dev on every push

Nix is a build tool and it's main purpose is not to deploy artifacts. There are however a lot of tools to deploy artifacts built by Nix: https://github.com/nix-community/awesome-nix?tab=readme-ov-f...

Note there are also several Nix CI that can do a better job than a raw GitHub actions, because they are designed for Nix (Hydra, Garnix, Hercules, ...).

noplacelikehome
·
3 months ago
·
[ - ]

One neat Nix feature is development shells, which let you define isolated shell environments that can be activated by invoking `nix develop` (or via direnv upon entering a directory):

    devShells.default = pkgs.mkShell {
      packages = with pkgs; [ opentofu terragrunt ];
    };

I can then use these tools inside the devShell from my jobs like so:

    jobs:
      terragrunt-plan:
        runs-on: [self-hosted, Linux, X64]
        defaults:
          run:
            shell: nix develop --command bash -e {0}
        steps:
          - name: Checkout
            uses: actions/checkout@v4
          - name: Plan
            run: terragrunt --terragrunt-non-interactive run-all plan

Since I'm doing this within a Nix flake all of the dependencies for this environment are recorded in a lock file. Provided my clone of the repo is up to date I should have the same versions.

SOLAR_FIELDS
·
3 months ago
·
[ - ]

You can combine this with direnv and auto-activate the nix environment when you `cd` into directories as well. We do this, and just activate the shell in ci environments with a cache. Works great.

turboponyy
·
3 months ago
·
[ - ]

Yes. GitHub actions can be just a thin wrapper to call any Nix commands that you can run locally.

> How do you automate running tests

You just build the Nix derivation that runs your tests, e.g. `nix build #tests` or `nix flake check` in your workflow file.

> deploying to dev on every push

You can set up a Nix `devShell` as a staging area for any operations you'd need to perform for a deployment. You can use the same devShell both locally and in CI. You'd have to inject any required secrets into the Action environment in your repository settings, still. It doesn't matter what your staging environment is comprised of, Nix can handle it.

mikepurvis
·
3 months ago
·
[ - ]

Strongly isolated systems like Nix and Bazel are amazing for giving no-fuss local reproducibility.

Every CI "platform" is trying to seduce you into breaking things out into steps so that you can see their little visualizations of what's running in parallel or write special logic in groovy or JS to talk to an API and generate notifications or badges or whatever on the build page. All of that is cute, but it's ultimately the tail wagging the dog— the underlying build tool should be what is managing and ordering the build, not the GUI.

What I'd really like for next gen CI is a system that can get deep hooks into local-first tools. Don't make me define a bunch of "steps" for you to run, instead talk to my build tool and just display for me what the build tool is doing. Show me the order of things it built, show me the individual logs of everything it did.

Same thing with test runners. How are we still stuck in a world where the test runner has its own totally opaque parallelism regime and our only insight is whatever it chooses to dump into XML at the end, which will be probably be nothing if the test executable crashes? Why can't the test runner tell the CI system what all the processes are that it forked off and where each one's respective log file and exit status is expected to be?

steeleduncan
·
3 months ago
·
[ - ]

> Write as much CI logic as possible in your own code

Nix really helps with this. Its not just that you do everything via a single script invocation, local or ci, you do it in an identical environment, local or ci. You are not trying to debug the difference between Ubuntu as setup in GHA or Arch as it is on your laptop.

Setting up a nix build cache also means that any artefact built by your CI is instantly available locally which can speed up some workflows a lot.

mikepurvis
·
3 months ago
·
[ - ]

Absolutely. Being able to have a single `nix build` line that gets all the way from source to your final asset (iso, ova, container image, whatever) with everything being aggressively cached all the way along is a game changer. I think it's worth the activation energy for a lot more organizations than realize it.

fiddlerwoaroof
·
3 months ago
·
[ - ]

Imo, the “activation energy” is a lot lower than it appears too. The determinate systems nix installer solves a lot of the local development issues and it’s fairly easy, as a first pass, to write a simple derivation that just copies your current build process and uses nix for dependencies.

jimbokun
·
3 months ago
·
[ - ]

Sounds like a business that surfaces the power of Nix with a gentle learning curve as a simpler, cleaner CI tool could have some success. Every time I see Nix come up, it’s described as very powerful but difficult to learn to use.

shykes
·
3 months ago
·
[ - ]

Dagger.io does this out of the box:

- Everything sandboxed in containers (works the same locally and in CI)

- Integrate your build tools by executing them in containers

- Send traces, metrics and logs for everything at full resolution, in the OTEL format. Visualize in our proprietary web UI, or in your favorite observability tool

nand_gate
·
3 months ago
·
[ - ]

It's doa, Sol.

shykes
·
3 months ago
·
[ - ]

Since you're using my first name in a weird and creepy way, I'll assume you're a hater. It's been so long since I've had one! Are you reactivating from the Docker days, or are you the first of a new cohort? That would be an exciting development, since getting haters is a leading sign of runaway success.

nand_gate
·
3 months ago
·
[ - ]

Not a hater - nobody will contest that Docker was a huge success, congrats... I just don't think Dagger has legs tbh.

nand_gate
·
3 months ago
·
[ - ]

Why would you need extra visualisation anyway, tooling like Nix is already what you see is what you get!

jkarni
·
3 months ago
·
[ - ]

It’s still helpful to eg fold different phases in Nix, and different derivation output.

I work on garnix.io, which is exactly a Nix-based CI alternative for GitHub, and we had to build a lot of these small things to make the experience better.

squiggleblaz
·
3 months ago
·
[ - ]

Basically an online version of nix-output-monitor. Might be half an idea. But it doesn't get you 100%: you get CI, but not CD.

mikepurvis
·
3 months ago
·
[ - ]

Delivery meaning the deployment part? I think by necessity that does differ a bit from happens locally just because suddenly there's auth, inventory, maybe a staging target, whatever.

All of that is a lot more than what a local dev would want, deploying to their own private test instance, probably with a bunch of API keys that are read-only or able to write only to other areas meant for validation.

specialist
·
3 months ago
·
[ - ]

We used to just tail the build script's output.

Maybe add some semi-structured log/trace statements for the CI to scrap.

No hooks necessary.

mikepurvis
·
3 months ago
·
[ - ]

That works so long as the build script is just doing a linear series of things. But if it's anything remotely modern then a bunch of stuff is going on in parallel, and if all the output is being funneled to a single log, you can end up with a fair bit of wind-down spew you have to scroll through to find the real/initial failure.

How much better would it be if the CI web client could just say, here's everything the build tool built, with their individual logs, and here's a direct link to the one that failed, which canceled everything else?

teeray
·
3 months ago
·
[ - ]

> What I'd really like for next gen CI is a system that can get deep hooks into local-first tools.

But how do you get that sweet, sweet vendor-lock that way? /s

doix
·
3 months ago
·
[ - ]

I came from the semiconductor industry, where everything was locally hosted Jenkins + bash scripts. The Jenkins job would just launch the bash script that was stored in perforce(vcs), so all you had to do to run things locally was run the same bash script.

When I joined my first web SaaS startup I had a bit of a culture shock. Everything was running on 3rd party services with their own proprietary config/language/etc. The base knowledge of POSIX/Linux/whatever was almost completely useless.

I'm kinda used to it now, but I'm not convinced it's any better. There are so many layers of abstraction now that I'm not sure anybody truly understands it all.

Xcelerate
·
3 months ago
·
[ - ]

Haha, I had the same experience going from scientific work in grad school to big tech. The phrase “a solution in search of a problem” comes to mind. The additional complexity does create new problems however, which is fine for devops, because now we have a recursive system of ensuring job security.

It blows my mind what is involved in creating a simple web app nowadays compared to when I was a kid in the mid-2000s. Do kids even do that nowadays? I’m not sure I’d even want to get started with all the complexity involved.

DrFalkyn
·
3 months ago
·
[ - ]

Creating a simple web app isn’t that hard.

If you want to use a framework The React tutorials from Traversy media are pretty good. You can even do cross platform into mobile app with frameworks like React Native or Flutter if you want iOS/Android native apps.

Vite has been a godsend for React/Vue. It’s no longer the circus it was in the mid 2010s. Google’s monopoly has made things easier for web devs. No more babel or polyfill or createReactApp.

People do still avoid frameworks and use raw HTML/CSS/Javascript. HTMX has made sever fetches a lot easier.

You probably want a decent CSS framework for reponsive design. Everyone used to use minimalist ones like Tailwimd have become more popular.

If you need a backend and want to do something simple you can use BaaS (Backend as a service) platforms like Firebase. Otherwise setting up a NodeJS server with some SQL or KV store like SQLLite or MongoDB isn’t too difficult

CI/CD systems exist to streamline testing and deployment for large complex apps. But for individual hobbyist projects it’s not worth it.

·
3 months ago
·
[ - ]

sgarland
·
3 months ago
·
[ - ]

> I'm kinda used to it now, but I'm not convinced it's any better.

It’s demonstrably worse.

> The base knowledge of POSIX/Linux/whatever was almost completely useless.

Guarantee you, 99% of the engineering team there doesn’t have that base knowledge to start with, because of:

> There are so many layers of abstraction now that I'm not sure anybody truly understands it all.

Everything is constantly on fire, because everything is a house of cards made up of a collection of XaaS, all of which are themselves houses of cards written by people similarly clueless about how computers actually operate.

I hate all of it.

zamalek
·
3 months ago
·
[ - ]

> I'm not convinced it's any better.

Your Jenkins experience is more valuable and worth replicating when you get the opportunity.

verdverm
·
3 months ago
·
[ - ]

We're doing the same, but replacing the bash script with Dagger.

Once you get on Dagger, you can turn your CI into minimal Dagger invocations and write the logic in the language of your choice. Runs the same locally and in automation

pdimitar
·
3 months ago
·
[ - ]

Would love to see a more detailed write-up on this way of using Dagger.

verdverm
·
3 months ago
·
[ - ]

The idea is a common pattern among Dagger users, but you can do the same with bash scripts, python, or any entrypoint. It's more of a CI ethos, and for me Dagger is an implementation detail.

I personally hold Dagger a bit different from most, by writing a custom CLI and using the Dagger Go SDK directly. This allows you to do more host level commands, as everything in a Dagger session runs in a container (builds and arbitrary commands).

I've adopted the mono/megarepo organization and have a pattern that also includes CUE in the solution. Starting to write that up here: https://verdverm.com/topics/dev/dx

nsonha
·
3 months ago
·
[ - ]

it's just common sense, which is unfortunately lost with sloppy devs. People go straight from junior dev to SRE without learning engineering principles through building products first.

jimbokun
·
3 months ago
·
[ - ]

I feel like more time is spent getting CI working these days than on the actual applications.

Between that and upgrading for security patches. Developing user impacting code is becoming a smaller and smaller part of software development.

cookiengineer
·
3 months ago
·
[ - ]

This.

I heavily invested in a local runner based CI/CD workflow. First I was using gogs and drone, now the forgejo and woodpecker CI forks.

It runs with multiple redundancies because it's a pretty easy setup to replicate on decentralized hardware. The only thing that's a little painful is authentication and cross-system pull requests, so we still need our single point of failure to merge feature branches and do code reviews.

Due to us building everything in go, we also decided to have always a /toolchain/build.go so that we have everything in a single language, and don't need even bash in our CI/CD podman/docker images. We just use FROM scratch, with go, and that's it. The only exception being when we need to compile/rebuild our ebpf kernel modules.

To me, personally, the Github Actions CVE from August 2024 was the final nail in the coffin. I blogged about it in more technical detail [1] and guess what was the reason that the TJ actions have been compromised last week? Yep, you guessed right, the same attack surface that Github refuses to fix, a year later.

The only tool, as far as I know, that somehow validates against these kind of vulnerabilities, is zizmor [2]. All other tools validate schemas, not vulnerabilities and weaknesses.

[1] https://cookie.engineer/weblog/articles/malware-insights-git...

[2] https://github.com/woodruffw/zizmor

pcthrowaway
·
3 months ago
·
[ - ]

My years using Concourse were a dream compared to the CI/CD pains of trying to make github actions work (which I fortunately didn't have to do a lot of). Add that to the list of options for people who want open source and their own runners

regularfry
·
3 months ago
·
[ - ]

One of the very few CI platforms that I've heard spoken well of was a big shared Concourse instance where the entire pipeline was predefined. You added some scripts named by convention to your project to do the right thing at each step, and it all just worked for you. Keeping it running was the job of a specific team.

sleepybrett
·
3 months ago
·
[ - ]

Did they finally actually say how the tj actions repo got compromised. When I was fixing that shit on saturday it was still 'we don't know how they got access!?!?'

cookiengineer
·
3 months ago
·
[ - ]

(I'm assuming you read my technical article about the problem)

If you take a look at the pull requests in e.g. the changed-files repo, it's pretty obvious what happened. You can still see some of the malformed git branch names and other things that the bots tried out. There were lots of "fixes" that just changed environment variable names from PAT_TOKEN to GITHUB_TOKEN and similar things afterwards, which kind of just delays the problem until malware is executed with a different code again.

As a snarky sidenote: The Wiz article about it is pretty useless as a forensics report, I expected much more from them. [1]

The conceptual issue is that this is not fixable unless github decides to rewrite their whole CI/CD pipeline, because of the arbitrary data sources that are exposed as variables in the yaml files.

The proper way to fix this (as Github) would be to implement a mandatory linter step or similar, and let a tool like zizmor check the file for the workflow. If it fails, refuse to do the workflow run.

[1] https://www.wiz.io/blog/github-action-tj-actions-changed-fil...

JanMa
·
3 months ago
·
[ - ]

Whenever possible I now just use GitHub actions as a thin wrapper around a Makefile and this has improved my experience with it a lot. The Makefile takes care of installing all necessary dependencies and runs the relevant build/Test commands. This also enables me to test that stuff locally again without the long feedback loop mentioned in other comments in this thread.

oulipo
·
3 months ago
·
[ - ]

mise (https://mise.jdx.dev/) and dagger (https://github.com/dagger/dagger) seem like nice candidates too!

Mise can install all your deps, and run tasks

jimmcslim
·
3 months ago
·
[ - ]

In addition to the other comments suggesting dagger is not the saviour due to being VC-funded, it seems like they have decided there's no money in CI, but AI... yes there's money there! And "something something agents".

From dagger.io...

"The open platform for agentic software.

Build powerful, controllable agents on an open ecosystem. Deploy agentic applications with complete visibility and cross-language capabilities in a modular, extensible platform.

Use Dagger to modernize your CI, customize AI workflows, build MCP servers, or create incredible agents."

shykes
·
3 months ago
·
[ - ]

Hello! Dagger CEO here. Yes, we discovered that, in addition to running CI pipelines, Dagger can run AI agents. We learned this because our own users have told us.

So now we are trying to capitalize on it, hence the ongoing changes to our website. We are trying to avoid the "something something agents" effect, but clearly, we still have work to do there :) It's hard to explain in marketing terms why a ephemeral execution engine, cross-language component system, deep observability and interactive CLI can be great at running both types of workloads... But we're going to keep trying!

Internally we never thought of ourselves as a CI company, but as an operating system company operating in the CI market. Now we are expanding opportunistically to a new market: AI agents. We will continue to support both, because our platform can run both.

If you are interested, I shared more details here: https://x.com/solomonstre/status/1895671390176747682

colemannerd
·
3 months ago
·
[ - ]

Please be careful. I'd love to adopt Dagger, but the UI in comparison to GHA, is just not a value add. I'd hate for y'all to go the AI route that Arc did... and lose all your users. There is A LOT to CICD, which can be profitable. I think there's still a lot more features needed before it's compelling and I would worry Agentic AI will lead you to a hyper-configurable, muddled message.

shykes
·
3 months ago
·
[ - ]

Thank you. Yes, I worry about muddling the message. We are looking for a way to communicate more clearly on the fundamentals, then layer use cases on top. It is the curse of all general-purpose platforms (we had the same problem with Docker).

The risk of muddling is limited to the marketing, though. It's the exact same product powering both use cases. We would not even consider this expansion if it wasn't the case.

For example, Dagger Cloud implements a complete tracing suite (based on OTEL). Customers use it for observability of their builds and tests. Well it turns out, you can use the exact same tracing product for observability of AI agents too. And it turns out that observability is huge unresolved problem of AI agents! The reason is because, fundamentally, AI agents work exactly like complicated builds: the LLM is building its state, one transformation at a time, and sometimes it has side effects along the way via tool calling. That is exactly what Dagger was built for.

So, although we are still struggling to explain this reality to the market: it is actually true that the Dagger platform can run both CI and AI workflows, because they are built on the same fundamentals.

oulipo
·
3 months ago
·
[ - ]

On another note, I'd love if you open-sourced your timeline / log UI component :) it's quite pretty, and could be useful in many contexts haha

Or I'll ask v0.dev to reimplement it, but I think it'd be more complete if you did it

oulipo
·
3 months ago
·
[ - ]

Hmmmm... so I think the crux of the matter is here: that you clearly articulate why your platform (for both containers and agents) is really helpful to handle cases where there are both states and side-effects

I can understand what you're trying to say, but because I don't have clear "examples" at hand which show me why in practice handling such cases are problematic and why your platform makes that smooth, I don't "immediately" see the value-added

For me right now, the biggest "value-added" that I perceive from your platform is just the "CI/CD as code", a bit the same as say Pulumi vs Terraform

But I don't see clearly the other differences that you mention (eg observability is nice, but it's more "sugar" on top, not a big thing)

I have the feeling that indeed the clean handling of "state" vs "side-effects" (and what it implies for caching / retries / etc) is probably the real value here, but I fail to perceive it clearly (mostly because I probably don't (or not yet) have those issues in my build pipelines)

If you were to give a few examples / ELI5 of this, it would probably help convert more people (eg: I would definitely adopt a "clean by default" way of doing things if I knew it would help me down the road when some new complex-to-handle use-cases will inevitably pop up)

oulipo
·
3 months ago
·
[ - ]

I did ask you on your Discord, but perhaps you could summarize here / ELI5 why Dagger would be a great fit for a kind of "MCP-like" operating system?

__float
·
3 months ago
·
[ - ]

I can't really fault them too much for hopping on the latest bandwagon, if their software is general enough at running workflows for it to fit.

They do seem to have a nice "quickstart for CI" they haven't abandoned, yet: https://docs.dagger.io/ci/quickstart

(As much as I personally like working with CI and build systems, it's true there's not a ton of money in it!)

jimbokun
·
3 months ago
·
[ - ]

Take out each reference to AI and the meaning doesn’t change in the slightest.

lou1306
·
3 months ago
·
[ - ]

> * Don't bind yourself to some fancy new VC-financed thing that will solve CI once and for all but needs to get monetized eventually (see: earthly, dagger, etc.)

Literally from comment at the root of this thread.

verdverm
·
3 months ago
·
[ - ]

Docker has raised money, we all use it. Dagger is by the originators of Docker, I personally feel comfortable relying on them, they are making revenues too.

triyambakam
·
3 months ago
·
[ - ]

But are mise and dagger VC funded? I don't see any pricing pages there.

deng
·
3 months ago
·
[ - ]

Dagger is even YCombinator funded.

https://www.boringbusinessnerd.com/startups/dagger

Mise indeed isn't, but its scope is quite a bit smaller than Dagger.

internetter
·
3 months ago
·
[ - ]

> Mise indeed isn't, but its scope is quite a bit smaller than Dagger.

A lot of us could learn... do one thing and do it well

arcanemachiner
·
3 months ago
·
[ - ]

Ironic, because mise is a glued-together combination of asdf, direnv, and Makefiles.

·
3 months ago
·
[ - ]

fireflash38
·
3 months ago
·
[ - ]

I implemented a thing such that the makefiles locally use the same podman/docker images as the CI/CD uses. Every command looks something like:

target: $(DOCKER_PREFIX) build

When run in gitlab, the DOCKER_PREFIX is a no-op (it's literally empty due to the CI=true var), and the 'build' command (whatever it is) runs in the CI/CD docker image. When run locally, it effectively is a `docker run -v $(pwd):$(pwd) build`.

It's really convenient for ensuring that if it builds locally, it can build in CI/CD.

akanapuli
·
3 months ago
·
[ - ]

I dont quite understand the benefit. How does running commands from the Makefile differ from running commands directly on the runner ? What benefit does Makefile brings here ?

ZeWaka
·
3 months ago
·
[ - ]

You can't run GitHub actions yml workflows locally (officially, there's tools like act).

fiddlerwoaroof
·
3 months ago
·
[ - ]

If you have your CI runner use the same commands as local dev, CI basically becomes an integration test for the dev workflow. This also solves the “broken setup instructions” problem.

mwenge
·
3 months ago
·
[ - ]

Do you have a public example of this? I'd love to see how to do this with Github Actions.

cmsj
·
3 months ago
·
[ - ]

I don't have a makefile example, but I do functionally the same thing with shell scripts.

I let GitHub actions do things like the initial environment configuration and the post-run formatting/annotation, but all of the actual work is done by my scripts:

https://github.com/Hammerspoon/hammerspoon/blob/master/.gith...

JanMa
·
3 months ago
·
[ - ]

Sure, here's one example: https://github.com/JanMa/nomad-driver-nspawn/blob/master/.gi...

williamcotton
·
3 months ago
·
[ - ]

It doesn't (perhaps yet?) install the dependencies from the Makefile, but it runs a number of commands from the Makefile, eg, make test-leaks:

https://github.com/williamcotton/webdsl/blob/main/.github/wo...

ehansdais
·
3 months ago
·
[ - ]

After years of trial and error our team has come to the same conclusion. I know some people might consider this insanity, but we actually run all of our scripts as a separate C# CLI application (The main application is a C# web server). Effectively no bash scripts, except as the entry point here and there. The build step and passing the executable around is a small price to pay for the gain in static type checking, being able to pull in libraries as needed, and knowing that our CI is not going to down because someone made a dumb typo somewhere.

The other thing I would add is consider passing in all environment variables as args. This makes it easy to see what dependencies the script actually needs, and has the bonus of being even more portable.

baq
·
3 months ago
·
[ - ]

> I know some people might consider this insanity

Some people here still can’t believe YAML is used for not only configuration, but complex code like optimized CI pipelines. This is insane. You’re actually introducing much needed sanity into the process by admitting that a real programming language is the tool to use here.

I can’t imagine the cognitive dissonance Lisp folks have when dealing with this madness, not being one myself.

TeMPOraL
·
3 months ago
·
[ - ]

> I can’t imagine the cognitive dissonance Lisp folks have when dealing with this madness, not being one myself.

After a decade trying to fight it, this one Lisper here just gave up. It was the only way to stay sane.

I remain hopeful that some day, maybe within our lifetimes, the rapid inflation phase of software industry will end, and we'll have time to rethink and redo the fundamentals properly. Until then, one can at least enjoy some shiny stuff, and stay away from the bleeding edge, aka. where sewage flows out of pipe and meets the sea.

(It's gotten a little easier now, as you can have LLMs deal with YAML-programming and other modern worse-is-better "wisdom" for you.)

no_wizard
·
3 months ago
·
[ - ]

I'm shocked there isn't a 'language for config' that hasn't become the de facto standard and its YAML all the way down seemingly. I am with you 100%.

It would really benefit from a language that intrinsically understood its being used to control a state machine. As it is, that is what nearly all folks want in practice is a way to run different things based on different states of CI.

A lisp DSL would be perfect for this. Macros would make things alot easier in many respects.

Unfortunately, there's no industry consensus and none of the big CI platforms have adopted support for anything like that, they all use variants of YAML (I always wondered who started it with YAML and why everyone copied that, if anyone knows I'd love to read about it).

Honestly, I can say the same complaints hold up against the cloud providers too. Those 'infrastructure as code' SDKs really don't lean into the 'as code' part very well

julik
·
3 months ago
·
[ - ]

I think the issue here is mostly the background the CI setups came from. They were frequently coming from the "ops" part of the ecosystem, and some ops folks held some ideas very strongly back then (I heard this first hand):

"Application and configuration should be separate, ideally in separate repos. It is the admin's job to configure, not the developer's"

"I do not need to learn your garbage language to understand how to deploy or test your application"

"...as a matter of fact, I don't need to learn the code and flows of the application itself either - give me a binary that runs. But it should work with stale configs in my repo."

"...I know language X works for the application but we need something more ubiquitous for infra"

Then there was a crossover of three streams, as I would call it:

YAML was emerging "hard" on the shoulders of Rails

Everyone started hating on XML (and for a good reason)

Folks working on CI services (CruiseControl and other early solutions) and ops tooling (chef, ansible) saw JSON's shortcomings (now an entire ecosystem has configuration files with no capability to put in a comment)

Since everybody hated each other's languages, the lowest common denominator for "configuration code" came out to be YAML, and people begrudgingly agreed to use it

The situation then escalated severely with k8s, which adopted YAML as "the" configuration language, and a whole ecosystem of tooling sprung up on top using textual templating (!) of YAML as a layer of abstraction. For k8s having a configuration language was an acute need, because with a compiled language you need something for configuration that you don't have to compile with the same toolchain just to use - and I perfectly "get it" why they settled for YAML. I do also get why tools like Helm were built on top of YAML trickery - because, be it that Helm were written in some other language, and have its charts use that, they would alienate all the developers that either hate that language personally, or do not have it on the list of "golden permitted" at their org.

Net result is that YAML was chosen not because it is good, but because it is universally terrible in the same way for everyone, and people begrudgingly settled on it.

With CI there is an extra twist that a good CI setup functions as a DAG - some tasks can - and should - run in parallel for optimization. These tasks produce artifacts which can be cached and reused, and a well-set CI pipeline should be able to make use of that.

Consequently, I think a possible escape path - albeit an expensive one - would be for a "next gen" CI system to expose those _task primitives_ via an API that is easy to write SDKs for. Read: not a grpc API. From there, YAML could be ditched as "actual code" would manipulate the CI primitives during build.

mdaniel
·
3 months ago
·
[ - ]

> (I always wondered who started it with YAML and why everyone copied that, if anyone knows I'd love to read about it).

I know this isn't a definite answer to your question, but it was still super interesting to me and hopefully it will inspire someone else to dig into finding the actual answer

The best guess I have as far as CI/CD specifically appears to be <https://en.wikipedia.org/wiki/Travis_CI#:~:text=travis%20ci%...> which launched in 2011 offering free CI and I found a reference to their .travis.yml in GitLab's repo in 2011, too

- CruiseControl (2004) was "ant as a service," so it was XML https://web.archive.org/web/20040812214609/http://confluence...

- Hudson (2007) https://web.archive.org/web/20140701020639/https://www.java.... was also XML, and was by that point driving Maven 2 builds (also XML)

- I was shocked that GitHub existed in 2008 https://web.archive.org/web/20081230235955/http://github.com... with an especial nod to no longer a pain in the ass and Not only is Git the new hotness, it's a fast, efficient, distributed version control system ideal for the collaborative development of software but this was just "for funsies" link since they were very, very late to the CI/CD game

- I was surprised but k8s 1.0.0 still had references to .json PodSpec files in 2010 https://github.com/kubernetes/kubernetes/blob/v1.0.0/example...

- cloud-init had yaml in 2010 https://github.com/openstack-archive/cloud-init/blob/0.7.0/d... so that's a plausible "it started here" since they were yaml declarations of steps to perform upon machine boot (and still, unquestionably, my favorite user-init thing)

- just for giggles, GitLab 1.0.2 (2011) didn't even have CI/CD https://gitlab.com/gitlab-org/gitlab/-/tree/v1.0.2 -- however, while digging into that I found .travis.yml in v2.0.0 (also 2011) so that's a very plausible citation <https://gitlab.com/gitlab-org/gitlab/-/blob/v2.0.0/.travis.y...>

- Ansible 1.0 in 2012 was also "execution in yaml" https://github.com/ansible/ansible/blob/v1.0/examples/playbo...

motorest
·
3 months ago
·
[ - ]

> Some people here still can’t believe YAML is used for not only configuration, but complex code like optimized CI pipelines.

I've been using YAML for ages and I never had any issue with it. What do you think is wrong with YAML?

mst
·
3 months ago
·
[ - ]

Turing complete YAML ends up being an app specific terrible programming language.

Many of us would rather use a less terrible programming language instead.

hadlock
·
3 months ago
·
[ - ]

Something went horribly wrong if your coworkers are putting switching logic inside your config

motorest
·
3 months ago
·
[ - ]

[flagged]

hadlock
·
3 months ago
·
[ - ]

I've been using YAML professionally for a decade and other than forgetting to wrap some values in quotes, has been an absolute non issue.

Some people talk about YAML being a turing complete language, if people try to do that in your CI/CD system just fire them

I'll allow helm style templating but that's about it.

mschuster91
·
3 months ago
·
[ - ]

> Some people here still can’t believe YAML is used for not only configuration, but complex code like optimized CI pipelines. This is insane.

It's miles better than Jenkins and the horrors people created there. GitLab CI can at least be easily migrated to any other GitLab instance and stuff should Just Work because it is in the end not much more than self contained bash scripts, but Jenkins... is a clown show, especially for Ops people of larger instances. On one side, you got 50 plugins with CVEs but you can't update them because you need to find a slot that works for all development teams to have a week or two to fix their pipelines again, and on the other side you got a Jenkins instance for each project which lessens the coordination effort but you gotta worry about dozens of Jenkins instances. Oh and that doesn't include the fact many old pipelines aren't written in Groovy or, in fact, in any code at all but only in Jenkins's UI...

Github Actions however, I'd say for someone coming from GitLab, is even worse to work with than Jenkins.

·
3 months ago
·
[ - ]

robinwassen
·
3 months ago
·
[ - ]

Did a similar thing when we needed to do complex operations towards aws.

Instead of wrapping the aws cli command I wrote small Go applications using the boto3 library.

Removed the headaches when passing in complex params, parsing output and and also made the logic portable as we need to do the builds on different platforms (Windows, Linux and macOS).

noworriesnate
·
3 months ago
·
[ - ]

I've used nuke.build for this in the past. This makes it nice for injecting environment variables into properties and for auto-generating CI YAML to wrap the main commands, but it is a bit of a pain when it comes to scaling the build. E.g. we did infrastructure as code using Pulumi, and that caused the build code to dramatically increase to the point the Nuke script became unwieldy. I wish we had gone the plain C# CLI app from the beginning.

ozim
·
3 months ago
·
[ - ]

I don’t think it is insanity quite the opposite - insanity is trying to force everything in yaml or pipeline.

I have seen people doing absolutely insane setups because they thought they have to do it in yaml and pipeline and there is absolutely no other option or it is somehow wrong to drop some stuff to code.

motorest
·
3 months ago
·
[ - ]

> I don’t think it is insanity quite the opposite - insanity is trying to force everything in yaml or pipeline.

I'm not sure I understood what you're saying because it sounds too absurd to be real. The whole point of a CICD pipeline is that it automates all aspects of your CICD needs. All mainstream CICD systems support this as their happy path. You specify build stages and build jobs, you manage your build artifacts, you setup how things are tested, deployed and/or delivered.

That's their happy path.

And you're calling the most basic usecases of a standard class if tools as "insanity"?

Please help me explain what point you are trying to make.

ozim
·
3 months ago
·
[ - ]

In the article Strange Way to Enforce Status Checks with Merge Queue.

All aspects of your CICD pipeline - rebasing PRs is not 'basic CICD' need.

CICD pipeline should take a commit state and produce artifacts from that state, not lint and not autofix trivial issues.

Everything that is not "take code state - run tests - build - deploy (eventualy fail)" is insanity.

Autofixing/linting for example should be separate process waay before CICD starts. And people do stuff like that because they think it is part of integration and testing. Trying to shove it inside is insanity.

mst
·
3 months ago
·
[ - ]

Honestly, "using the same language as the application" is often a solid choice no matter what the application is written in. (and I suspect that for any given language somebody might propose as an exception to that rule, there's more than one team out there doing it anyway and finding it works better for them than everything else they've tried)

7bit
·
3 months ago
·
[ - ]

> The other thing I would add is consider passing in all environment variables as args. This makes it easy to see what dependencies the script actually needs, and has the bonus of being even more portable.

This is the dumbest thing I see installers do a lot lately.

no_wizard
·
3 months ago
·
[ - ]

Am I an outlier in that not only do I find GitHub actions pleasant to use, but that most folks over complicate their CI/CD pipelines? I've had to re-write alot of actions configurations over the last few years, and in every case, the issue was simply not thinking through the limits of the platform, or when things would be better to run as custom docker images (which you can do via GitHub Actions) etc.

It tends to be that folks want to shoehorn some technology into the pipeline that doesn't really fit, or they make these giant one shot configurations instead of running multiple small parallel jobs by setting up different configurations for different concerns etc.

davidham
·
3 months ago
·
[ - ]

I'm with you! I kind of love GitHub Actions, and as long as I keep it to tools and actions I understand, I think it works great. It's super flexible and has many event hooks. It's reasonably easy to get it to do the things I want. And my current company has a pretty robust CI suite that catches most problems before they get merged in. It's my favorite of the CI platforms I have used.

gchamonlive
·
3 months ago
·
[ - ]

The way that gitlab shines is just fundamentally better than GitHub actions.

It's really easy to extend and compose jobs, so it's simple to unit test your pipeline: https://gitlab.com/nunet/test-suite/-/tree/main/cicd/tests?r...

This way I can code my pipeline and use the same infrastructure to isolate groups of jobs that compose a relevant functionality and test it in isolation to the rest of the pipeline.

I just wish components didn't have such a rigid opinion on folder structure, because they are really powerful, but you have to adopt gitlab prescription

rbongers
·
3 months ago
·
[ - ]

In my opinion, unless if you need its ability to figure out when something should rebuild or potentially if you already use it, Make is not the right tool for the job. You should capture your pipeline jobs in scripts or similar, but Make just adds another language for developers to learn on top of everything. Make is not a simple script runner.

I maintained a Javascript project that used Make and it just turned into a mess. We simply changed all of our `make some-job` jobs into `./scripts/some-job.sh` and not only was the code much nicer, less experienced developers were suddenly more comfortable making changes to scripts. We didn't really need Make to figure out when to rebuild anything, all of our tools already had caching.

JanMa
·
3 months ago
·
[ - ]

Make is definitely just my personal preference. If using bash scripts, Just, Taskfile or something similar works better for you then by all means use it.

The main argument I wanted to make is that it works very well to just use GitHub actions to execute your tool of choice.

DanHulton
·
3 months ago
·
[ - ]

This is why I've become a huge fan of Just, which is just a command runner, not a build caching system or anything.

It allows you to define a central interface into your project (largely what I find people justify using Make for), but smoothes out so many of the weird little bumps you run into from "using Make wrong."

Plus, you can an any point just drop into running a script in a different language as your command, so it basically "supports bash scripts" too.

https://github.com/casey/just

stinos
·
3 months ago
·
[ - ]

This. I don't know which guru came up with it but this is the 'one-click build' principle. If youcan't do that, you have a problem.

So if even remotely possible we write all CI as a single 'one-click' script which can do it all by itself. Makes developing/testing the whole CI easy. Makes changing between CI implementations easy. Can solve really nasty issues (think: CI is down, need to send update to customer) easily because if you want a release you just build it locally.

The only thing it won't automaticaly do out of the box is being fast, because obviously this script also needs to setup most of the build environment. So depending on the exact implementation there's variation in the split between what constitutes setting up a build environment and running the CI script. As in: for some tools our CI scripts will do 'everything' so starting from a minimal OS install. Whereas others expect an OS with build tools and possibly some dependencies already available.

xyzal
·
3 months ago
·
[ - ]

I think it was mentioned as a part of the 'Joel test'

https://www.joelonsoftware.com/2000/08/09/the-joel-test-12-s...

stinos
·
3 months ago
·
[ - ]

Yeah spot on, this was definitley it. I now remember reading this probably right after it came out and being somewhat proud to be able to tick most stuff of the list without ever being told directly to do so. But the 'Can you make a build in one step?' was not one of them so I figured that since the rest of the list made so much sense, I'd better get started on that one as well. I also really like that most of this list is practical, low-level advice. No 'use tech X' or 'agile ftw', just basic stuff which automatically happens anyway if you'd opt to use tech X or agile - should those be the right tools for the job, but which would cause more friction if not.

tailspin2019
·
3 months ago
·
[ - ]

25 years later and we’re still having to relearn some of his lessons!

cruffle_duffle
·
3 months ago
·
[ - ]

At least we generally aren’t fighting “use source control”. Maybe the VCS used by the shop is dogshit but it’s better than nothing!

makeitdouble
·
3 months ago
·
[ - ]

> * Invest time that your pipelines can run locally on a developer machine as well (as much as possible at least), otherwise testing/debugging pipelines becomes a nightmare.

Yes, a thousand time.

Deploy scripts are tougher to deal with, as they'll naturally rely on a flurry of environment variables, protected credentials etc.

But for everything else writing the script for local execution first, and generalizating them for CI one they run well enough is the absolute best approach. It doesn't even need to run in the local shell, having all the CI stuff in a dedicated docker image is fine if it requires specific libraries or env.

20thr
·
3 months ago
·
[ - ]

I spend a lot of time in CI (building https://namespace.so) and I agree with most of this:

- Treat pipelines as code. - Make pipelines parts composable, as code. - Be mindful of vendor lock-in and/or lack of portability (it is a trade-off).

For on-promise: if you're already deeply invested in running your own infrastructure, that seems like a good fit.

When thinking about how we build Namespace -- there are parts that are so important that we just build and run internally; and there are others where we find that the products in the market just bring a tremendous amount of value beyond self-hosting (Honeycomb is a prime example).

Use the tools that work best for you.

LukaD
·
3 months ago
·
[ - ]

> [...] use (shell scripts, make, just, doit, mage, whatever) as long as it is proper, maintainable code

I fully agree with the recommendation to use maintainable code. But that effectively rules out shell scripts in my oppinion. CI shell scripts tend to become big ball of mud rather quickly as you run into the limitations of bash. I think most devs only have superficial knowledge of shell scripts, so do yourself a favor and skip them and go straight to whatever language your team is comfortable with.

sgarland
·
3 months ago
·
[ - ]

Maybe people should get better at shell, instead. Read the bash / zsh manual. Use ShellCheck.

int_19h
·
3 months ago
·
[ - ]

Shellcode is just a horrible PL, period. Not only it's weird and unlike anything else out there, there's way too many footguns.

One can learn to use it to the point where it's usable to do advanced automation... but why, when there are so many better options available?

sgarland
·
3 months ago
·
[ - ]

Because it’s never going away, and it’s always going to be there. It is the lowest common denominator. Also, a shell script generally doesn’t have any other dependencies (modulo writing one that calls jq or something). No risk of solver hell.

cnotv
·
3 months ago
·
[ - ]

Once, a reliable and wise colleague told me "Use in CI what you use locally" and that has been the best devop advice that never failed me to save my time.

The second one has been, from someone else: if you can use anything else than bash, do that.

amtamt
·
3 months ago
·
[ - ]

Try brainfuck...

Jokes aside... it's so trendy to bash bash that it's not funny anymore. Bash is still quite reliable for work that usually gets done in CI, and nearly maintenance free if used well.

badmintonbaseba
·
3 months ago
·
[ - ]

I prefer python there, although we do test/deploy on Windows too, so it's nice to have a common python script for Windows and Linux CI. Not interested in making bash work on Windows or scripting in powershell. And although it's a lot more awkward to use python than bash for invoking subprocesses, it's nicer in most other ways.

balls187
·
3 months ago
·
[ - ]

These days, AI Copilots are quite good at helping write and maintain bash (and shell) scripts, that it's not much of a prolem.

psyclobe
·
3 months ago
·
[ - ]

'* Write as much CI logic as possible in your own code. Does not really matter what you use (shell scripts, make, just, doit, mage, whatever) as long as it is proper, maintainable code.'

THIS 10000% percent.

jpgvm
·
3 months ago
·
[ - ]

This is the way.

My personal favourite solution is Bazel specifically because it can be so isolated from those layers.

No need for Docker (or Docker in Docker as many of these solutions end up requiring) or other exotic stuff, can produce OCI image artifacts with `rules_oci` directly.

By requiring so little of the runner you really don't care for runner features, you can then restrict your CI/CD runner selection to just reliability, cost, performance and ease of integration.

otikik
·
3 months ago
·
[ - ]

> * Avoid YAML as much as possible, period.

That's also a very valid takeaway for life in general

ozim
·
3 months ago
·
[ - ]

Good insight, because that is just a complex issue - especially when there is team churn and everyone adds their parts in yaml or configuration.

Doesn’t matter Jenkins or actions - it is just complicated. Making it simpler is on devs/ops not the tool.

jiehong
·
3 months ago
·
[ - ]

I always thought it could be cool to use systemd as a CI agent replacement someday:

Each systemd service could represent a step built by running a script, and each service can say what it depends on, thus helping parallelize any step that can be.

I have not found anyone trying that so far. Is anybody aware of something similar and more POSIX/cross platform that allows writing a DAG of scripts to execute?

forrestthewoods
·
3 months ago
·
[ - ]

> pipelines can run locally on a developer machine as well (as much as possible at least)

Facts.

However I’ll go a step further and say “only implement your logic in a tool that has a debugger”.

YAML is the worse. But shell scripts are second worst. Use a real language.

never_inline
·
3 months ago
·
[ - ]

Python with click and PyYAML can go a long way - then you can build it as a CLI application and use the same from CI. In a java shop, picocli + graalvm probably. I wouldn't like Go for this purpose (against the conventional wisdom - because boilerplate and pretty bad debugging capabilities).

That said, if you absolutely need to use shell script for reasons, keep it all in single script, define logging functions including debug logs, rigorously check every constraint and variable, use shellcheck, factor the code well into functions - I should sometimes write a blog post about it.

folmar
·
3 months ago
·
[ - ]

There is a debugger for bash: https://github.com/Trepan-Debuggers/bashdb Not that I'm recommending 10k-line programs in bash, but a debugger is useful when you need it.

sgarland
·
3 months ago
·
[ - ]

Shell is a very real language, and it has a debugger; it’s called set -x and/or strace.

forrestthewoods
·
3 months ago
·
[ - ]

Like I said, use something with a real debugger.

julienEar
·
3 months ago
·
[ - ]

Completely agree. Keeping CI logic in actual code instead of YAML is a lifesaver. The GitHub Actions security issues just reinforce why self-hosted runners are the way to go.

crabbone
·
3 months ago
·
[ - ]

First of all, I cannot agree more, given what we have today.

Unfortunately, this isn't a good plan going forward... :( Going forward I'd wish for a tool that's as ubiquitous as Git, has good integration with editors like language servers, can be sold as a service or run completely in-house. And it would allow defining the actions of the automated builds and tests, have a way of dealing with releases, expose interface for collecting statistics, integrate with bug tracing software for the purpose of excluding / including tests in test runs, allowed organizing tests in groups (eg. sanity / nightly / rc).

The problem is that tools today don't come anywhere close to being what I want for CI, neither free nor commercial tools aren't even going in the desired direction. So, the best option is simply to minimize their use.

philistine
·
3 months ago
·
[ - ]

> * Avoid YAML as much as possible, period.

Why does YAML have any traction when JSON is right there? I'm an idiot amateur and even I learned this lesson; my 1 MB YAML file full of data took 15 seconds to parse each time. I quickly learned to use JSON instead, takes half a second.

12_throw_away
·
3 months ago
·
[ - ]

> Why does YAML have any traction when JSON is right there?

Because it has comments, which are utterly essential for anything used as a human readable/writable configuration file format (your use case, with 1 MB of data, needs a data interchange format, for which yes JSON is at least much better than YAML).

fireflash38
·
3 months ago
·
[ - ]

JSON is valid YAML.

YAML has comments. YAML is easily & trivially written by humans. JSON is easily & trivially written by code.

My lesson learned here? When generating YAML, instead generate JSON. If it's meant to be read and updated by humans, use something that can communicate to the humans (comments). And don't use YAML as a data interchange format.

sofixa
·
3 months ago
·
[ - ]

Because YAML, as much as it sucks, is relatively straightforward to write by humans. It sucks to read and parse, you can make tons of small mistakes that screw it up entirely, but it's still less cruft than tons of needless "": { } .

For short configs, YAML is acceptable-ish. For anything longer I'd take TOML or something else.

iloveitaly
·
3 months ago
·
[ - ]

I very much agree here. I've had the best luck when there is as little as possible config in CI as possible:

- mise for lang config

- direnv for environment loading

- op for secret injection

- justfile for lint, build, etc

Here's a template repo that I've been working on that has all of this implemented:

https://github.com/iloveitaly/python-starter-template

It's more complex than I would like it to be, but it's consistent and avoids having to deal with GHA too much.

I've also found having a GHA playground is helpful:

https://github.com/iloveitaly/github-action-playground

qweiopqweiop
·
3 months ago
·
[ - ]

Can you explain YAML? I've found declarative pipelines with it have been... fine?

deng
·
3 months ago
·
[ - ]

YAML is fine for what it is: a markup language. I have no problem with it being used in simple configuration files, for instance.

However, CI is not "configured", it is coded. It is simply the wrong tool. YAML was continuously extended to deal with that, so it developed into much more than just "markup", but it grew into this terrible chimera. Once you start using advanced features in GitLab's YAML like anchors and references to avoid writing the same stuff again and again, you'll notice that the whole tooling around YAML is simply not there. How does the resulting YAML look like? How do you run this stuff locally? How do you debug this? Just don't go there.

You will not be able to avoid YAML completely, obviously, but use it the way it was originally intended to.

LeonM
·
3 months ago
·
[ - ]

> CI is not "configured", it is coded.

Finally! I was always struggling to explain to others why YAML is OK-ish as a language, but then never seems to work well for the things people tried doing with it. Especially stuff that needs to run commands, such as CI.

> How does the resulting YAML look like? How do you run this stuff locally? How do you debug this? Just don't go there.

Agreed. GitHub actions, or any remote CI runner for that matter, makes the problem even worse. The whole cycle of having to push CI code, wait 10 minutes while praying for it to work, still getting an error, trying to figure out the mistake, fixing one subtle syntax error, then pushing the code again in the hope that that works is just a terrible workflow. Massive waste of time.

> You will not be able to avoid YAML completely, obviously, but use it the way it was originally intended to.

Even for configurations YAML remains a pain, unfortunately. It could have been great for configs, but in my experience the whole strict whitespace (tabs-vs-spaces) part ruined it. It isn't a problem when you work from an IDE that protects you from accidentally using tabs (also, auto-formatting for the win!) but when you have to write YAML configuration (for example: Netplan) on a remote server using just an editor it quickly becomes a game of whack-a-mole.

motorest
·
3 months ago
·
[ - ]

> Especially stuff that needs to run commands, such as CI.

I don't understand what problem you could possibly be experiencing. What exactly do you find hard about running commands in, say, GitLab CICD?

cmsj
·
3 months ago
·
[ - ]

So, I'm not interested in the debate about the correctness (or otherwise) of yaml as a declarative programming language, but I will say this...

iterating a GitHub Actions workflow is a gigantic pain in the ass. Capturing all of the important logic in a script/makefile/whatever means I can iterate it locally way faster and then all I need github to do is provision an environment and call my scripts in the order I require.

motorest
·
3 months ago
·
[ - ]

> iterating a GitHub Actions workflow is a gigantic pain in the ass. Capturing all of the important logic in a script/makefile/whatever means I can iterate it locally way faster and then all I need github to do is provision an environment and call my scripts in the order I require.

What's wrong with this?

https://docs.github.com/en/actions/writing-workflows/choosin...

kbolino
·
3 months ago
·
[ - ]

When it gets realistic, with conditions, variable substitutions, etc., it ends up being 20 steps in a language that isn't shell but is calling shell over and over again, and can't be run outside of CI. Whereas, if you just wrote one shell script, it could've done all of those things in one language and been runnable locally too.

motorest
·
3 months ago
·
[ - ]

> When it gets realistic, with conditions, variable substitutions, etc.,

What exactly do you find hard in writing your own scripts with a scripting language? Surely you are not a software developer who feels conditionals and variable substitutions are hard.

> it ends up being 20 steps in a language that isn't shell but is calling shell over and over again, and can't be run outside of CI.

Why are you writing your CICD scripts in a way that you cannot run them outside of a CICD pipeline? I mean, you're writing them yourself, aren't you? Why are you failing to meet your own requirements?

If you have a requirement to run your own scripts outside of a pipeline, how come you're not writing them like that? It's CICD 101 that those scripts should be runnable outside of the pipeline. From your description, you're failing to even follow the most basic recommendations and best practices. Why?

That doesn't sound like a YAML problem, does it?

kbolino
·
3 months ago
·
[ - ]

This is not about YAML in some general or abstract sense, it is about a YAML-based domain-specific language. If you think this is just about YAML, you are hyper-focused on the wrong detail.

In order to use this domain-specific language properly, you first must learn it, and learning YAML is but a small part of that. Moreover, it is not immediately obvious that, once you know it, you actually want to avoid it. But you can't avoid it entirely, because it is the core language of the CI/CD platform. And you can't know how to avoid it effectively until you have spent some time just using it directly. Simplicity comes from tearing away what is unnecessary, but to discern necessary from unnecessary requires judgment gained by experience. There is no world in which this knowledge transfers immediately, frictionlessly, and losslessly.

Furthermore, there is a lot that GitHub (replace with platform of choice) could have done to make this better. They largely have no incentive to do so, because platform lock-in isn't a bad thing to the platform owner, and it's a nontrivial amount of work on their part, just as it is a nontrivial amount of work on your part to learn and use their platform in a way that doesn't lock you into it.

maratc
·
3 months ago
·
[ - ]

Q: How do you determine what date it was 180 days ago?

A: Easy! You just spin up a Kubernetes pod with Alpine image, map a couple of files inside, run a bash script of "date" with some parameters, redirect output to a mapped file, and then read the resulting file. That's all. Here's a YAML for you. Configuration, baby!

(based on actual events)

jiggawatts
·
3 months ago
·
[ - ]

At first I assumed you were kidding, then I realised that sadly… you probably weren’t.

maratc
·
3 months ago
·
[ - ]

I wasn’t. This goes to show that when all you have is a YAML hammer, every problem has to look like a YAML-able nail. Still there would be people who would say I’m “blaming my tools” and “everything is covered in chapter 1 of yaml for dummies.”

tom_
·
3 months ago
·
[ - ]

Nothing significant on the face of it and I think that's pretty much exactly what's being suggested: don't have anything particularly interesting in the .yml file, just the bare minimum plus some small number of uncomplicated script invocations to install dependencies and actually do the build.

(Iterating even on this stuff by waiting for the runner is still annoying though. You need to commit to the repo, push, and wait. Hence the suggestion of having scripts that you can also run locally, so you can test changes locally when you're iterating on them. This isn't any kind of guarantee, but it's far less annoying to do (say) 15 iterations locally followed by the inevitable extra 3 remotely than it is having to do all 18 remotely, waiting for the runner each time then debugging it by staring at the output logs. Even assuming you'd be able to get away with as few as 15 given that you don't have proper access to the machine.)

bastardoperator
·
3 months ago
·
[ - ]

But GitHub recommends that, so if people don't follow best practices, and then complain when the docs are clear, who's at fault? The person writing against a system they don't understand because they haven't read the docs or the people who recommend what you're professing in the docs?

kbolino
·
3 months ago
·
[ - ]

> But GitHub recommends that

Where?

bastardoperator
·
3 months ago
·
[ - ]

Here:

https://docs.github.com/en/actions/writing-workflows/choosin...

kbolino
·
3 months ago
·
[ - ]

Nothing in that document states anything remotely like the antecedent of "that", which was:

> don't have anything particularly interesting in the .yml file, just the bare minimum plus some small number of uncomplicated script invocations to install dependencies and actually do the build

It is a very basic "how to" with no recommendations.

Moreover, they directly illustrate a bad practice:

      - name: Run the scripts
        run: |
          ./my-script.sh
          ./my-other-script.sh

This is not running two scripts, this is running a shell command that invokes two scripts, and has no error handling if the first one fails. If that's the behavior you want, fine, but then put it in one shell script, not two. What am I supposed to do with this locally? If the first shell script fails, do I need to fix it, or do I just proceed on to the second one?

bastardoperator
·
3 months ago
·
[ - ]

It does even if you don't like it. You can put your logic in a script and execute that. That is what is being conveyed here in a blistering simple fashion. You could also make it one script, or two or three, you could even break those out into steps.

This is invoking a shell and that's how shells typically work, one command at a time. Would it make you feel better if they added && or used a step like they also recommend to split these out? You can put the error handling in your script if need be, that's on you or the reader, most CI agents only understand true/false or in this case $?.

Nobody said they want that behavior, they're showing you the behavior. They actually show you the best practice behavior first, not sure if you didn't read that or are purposely omitting it. In fact, the portion you highlight, is talking about permissions, not making suggestions.

      - name: Run a script
        run: ./my-script.sh
      - name: Run another script
        run: ./my-other-script.sh

kbolino
·
3 months ago
·
[ - ]

This is not a discussion about what's possible, it's a discussion about what's best. You can write your own opinion here, and it seems like we're in violent agreement, but that doesn't make our opinion GitHub's opinion.

That page is just one small part of a much larger reference document, and it doesn't seem opinionated at all to me. Plus there are dozens of other examples elsewhere in the same reference that are not simple invocations of one shell script and nowhere are you admonished not to do things that way.

bastardoperator
·
3 months ago
·
[ - ]

And they show those patterns first. You had to take an example that is clearly about script permissions and misrepresent it. Yeah, it's not opinionated, it's fact. That's how it works...

kbolino
·
3 months ago
·
[ - ]

At best, we are talking past each other. At worst, you are misreading everything I write to play gotcha games. Whatever, I'm glad you were able to figure out exactly the right things to do from a first read of a large and complex document that doesn't say anything of the sort. As for the rest of us mere mortals, we're stuck figuring these things out by trial and error, or even worse, having to pick up the pieces from somebody else's left-behind mistakes.

tom_
·
3 months ago
·
[ - ]

It's the reference manual. It's just a list of things you can do. If you like this specific thing, and think this should be the main way you express your build process, great. I think that too. Meanwhile with GitHub Actions you can also do this big pile of shit that the manual also describes: https://docs.github.com/en/actions/writing-workflows/choosin...

motorest
·
3 months ago
·
[ - ]

> However, CI is not "configured", it is coded.

No, it really isn't. I'll clarify why.

Pretty much all pipeline services share the same architecture pattern:

* A pipeline run is comprised of one or more build jobs,

* Pipeline runs are triggered by external events

* Build jobs have contexts and can output artifacts,

* Build jobs are grouped into stages,

* Stages are organized as a directed graph,

* Transitions between stages in the directed graph is ruled by a set of rules, some supported by default (i.e., if a job fails then the stage fails) complemented by custom rules (manual or automatic approvals, API tests, baking periods, etc).

This is the textbook scenario ideal for DSLs. You already are bound to an architecture pattern, this there is no point of reinventing the wheel each time. Just specify your stages and which jobs run as part of each stage, manage artifacts and promotion logic, and you're done.

You do not need to take my word for it. Take a look at GitLab CICD for a pipeline with build, test, and delivery stage. See what a mess you will put together if you support the same feature set with whatever scripting language you choose. There is no discussion or debate.

baq
·
3 months ago
·
[ - ]

I can’t understand how you can say DSL and YAML in the same sentence and say it’s fine. YAML is a serialization format. A bad DSL would be a welcome improvement over GHA pipelines in YAML. You’re fundamentally confusing concepts here, you want to restrict flexibility (I agree with that btw) by using a simplistic language, but what it actually does is increase complexity of the code comprising the pipeline with zero hard restrictions.

motorest
·
3 months ago
·
[ - ]

[flagged]

maratc
·
3 months ago
·
[ - ]

> * Stages are organized as a directed graph

The problem starts when that graph cannot be determined in advance and needs to be computed in runtime. It's a bit better when it's possible to compute that graph as a first step, and it's a lot worse when one needs to do a couple of stages before being able to compute the next elements of the graph. The graph computation is terrible enough in e.g. Groovy, but having to do it in YAML is absolutely horrendous.

> Take a look at GitLab CICD for a pipeline with build, test, and delivery stage

Yeah, if your workflow fits in a kindergarten example of "build, test, and delivery", then yeah, it's YAML all the way baby. Not everyone is so fortunate.

duped
·
3 months ago
·
[ - ]

It's funny how you say this is the textbook scenario ideal for DSLs and I see it as the textbook scenario ideal for a real programming language. Organizing stages as a DAG with "transition ruled by a set of rules" is bonkers, I know how to write code with conditional logic and subroutine calls, give that to me.

Wrapping it in a DSL encoded as YAML has zero benefit other than it being easier for a team with weak design skills to implement and harder for users to migrate off of.

deng
·
3 months ago
·
[ - ]

We don't disagree here. There are tools which support you in doing this, and I mentioned a few of them in my post (Make, Just, doit, mage). There are many more. I also think that re-inventing these tools is a waste of time, but it is still better than shoehorning this into YAML. You seem to think YAML is some kind of DSL for pipelines. It really is not.

Lammy
·
3 months ago
·
[ - ]

> YAML is fine for what it is: a markup language.

Pardon my pedantry, but the meaning of YAML's name was changed from the original “Yet Another Markup Language” to “YAML Ain't Markup Language” in a 2002 draft spec because YAML is, in fact, not a markup language :)

Compare:

https://yaml.org/spec/history/2001-12-10.html

https://yaml.org/spec/history/2002-04-07.html

adolph
·
3 months ago
·
[ - ]

> However, CI is not "configured", it is coded. . . . YAML was continuously extended to deal with that, so it developed into much more than just "markup", but it grew into this terrible chimera.

Brings to mind the classic "Kingdom of Nouns" [0] parable, which I read to my kid just last week. The multi-line "run" nodes in GitHub actions give me the heebie-jeebies, like how MUMPS data validation was maintained in metadata of VA-Fileman [1].

0. https://steve-yegge.blogspot.com/2006/03/execution-in-kingdo...

1. https://www.hardhats.org/fileman/pm/gfs_frm.htm

bastardoperator
·
3 months ago
·
[ - ]

I've created multiple actions, reusable, composite, along with multiple Jenkins plugins and CircleCI Orbs. I disagree, code your actions, your jenkins plugins, your orbs, whatever. Those are just code wrappers that expose configuration via YAML or Pipeline DSL. Agreed, coding in YAML is pretty bad, but ultimately it's a choice.

I will take the Actions path 100% of the time. Building your own action is so insanely simple it makes me wonder if the people complaining about YAML understand the tooling because it's entirely avoidable. It also coincides with top comments about coding your own CI, if you're just "using" YAML you're barely touching the surface.

ruuda
·
3 months ago
·
[ - ]

If you have to deal with tools that need to be configured with yaml, give https://rcl-lang.org/ a try! It can be "coded" to avoid duplication, with real variables, functions, and loops. It can show you the result that it evaluates to. It can do debug tracing, and it has a built-in build command to generate files like GitHub Actions workflows from an RCL file.

crabbone
·
3 months ago
·
[ - ]

YAML's problems:

* Very easy to write the code you didn't mean to, especially in the context of CI where potentially a lot of languages are going to be mixed, a lot of quoting and escaping. YAML's string literals are a nightmare.

* YAML has no way to express inheritance. Nor does it have a good way to express variables. Both are usually desperately needed in CI scripts, and are usually bolted on top with some extra-language syntax (all those dollars in GitHub actions, Helm charts, Ansible playbooks etc.)

* Complexity skyrockets compared to the size of the file. I.e. in a language like C you can write a manageable program with millions of lines of code. In YAML you will give up after a few tens of thousands of lines (similar to SQL or any other language that doesn't have modules).

* Whitespace errors are very hard to spot and fix. Often whitespace errors in YAML result in valid YAML which, however, doesn't do what you want...

dharmab
·
3 months ago
·
[ - ]

1. The YAML spec is extremely complex with some things being ambiguous. You might not notice this if your restrict yourself to a small subset of the language. But you will notice it when different YAML libraries and programming languages interpret the same YAML file as different content.

2. Trying to encode logic and control flow in a YAML document is much more difficult than writing that flow in a "real" programming language. Debugging is especially much easier in "real" languages.

maratc
·
3 months ago
·
[ - ]

You can't put a breakpoint in YAML. You can't evaluate variables in YAML. You can't print debugging info from YAML. You can't rerun YAML from some point.

YAML is great for the happy-flow where everything works. It's absolutely terrible for any other flow.

int_19h
·
3 months ago
·
[ - ]

FWIW there's no reason why you shouldn't be able to put a breakpoint in pipeline YAML. I'm not aware of anyone implementing such a thing, but a DAP adapter that VSCode could attach to remotely should be pretty straightforward.

MSBuild, for example, is all XML, but it has debugging support in Visual Studio complete with breakpoints and expression evaluation.

motorest
·
3 months ago
·
[ - ]

> You can't put a breakpoint in YAML. You can't evaluate variables in YAML. You can't print debugging info from YAML. You can't rerun YAML from some point

It's a DSL. There is no execution, only configuration. The only thing that's executed are the custom scripts you create yourself, and any intro tutorial on the subject will eventually teach you that if you want to run anything beyond a single straight-forward command then you should move those instructions to a shell script to make them testable and reproducible.

Things are so simple and straight forward that you need to go way out of your way to create your own problems.

I wonder how many people in this discussion are blaming the tools when they even bothered to learn the very basics.

maratc
·
3 months ago
·
[ - ]

Your attitude of "how is everyone so stupid" does not help the discussion.

> It's a DSL. There is no execution, only configuration.

Jenkins pipelines are also DSL. I still can print out debugging information from them. "It's a DSL" is not an excuse for being a special case of shitty DSL.

> any intro tutorial on the subject will eventually teach you

Do these tutorials have a chapter on what to do when you join a company with 500 engineers and a ton of YAMLs that are not written in that way?

> you should move those instructions to a shell script to make them testable

Yeah, no. How am I supposed to test my script that is supposed to run on Github-supplied runner with a ton of injected secrets and Github-supplied JSON of 10,000 lines, when I don’t have the runner, the secrets, or the JSON?

dharmab
·
3 months ago
·
[ - ]

> There is no execution, only configuration.

The YAML is fed into an agent which reads it to decide what to execute. Any time you change the control flow of a system by changing data, you are doing a form of programming.

fergie
·
3 months ago
·
[ - ]

As a developer based in Norway, one fairly major drawback to YAML is the way that it processes the language code for Norwegian ("no").

LeonM
·
3 months ago
·
[ - ]

The Norwegian ISO3166 code colliding with the English word 'no' is not a YAML problem per se, I've been bitten by that a few times in other situations as well.

For example: Stripe uses constants for types of tax registration numbers (VAT/GST/TIN, etc.). So there is EU_VAT for European VAT numbers, US_TIN for US tax identification numbers, etc. But what value to use for tax-exempt organisations that don't have a tax number? Well... guess how I found out about NO_VAT...

On the bright side, I did learn that way that although Norway is in the Schengen zone, apparently they are not part of the EU (hence the separation of EU_VAT and NO_VAT). I guess the 'no' name collision has taught many developers something about Norway :-)

marcusramberg
·
3 months ago
·
[ - ]

That particular pain was fixed in yaml 1.2 :)

withinboredom
·
3 months ago
·
[ - ]

TIL I learned that yaml has versions and now I'm wondering what version of yaml parsers are running where. I think I might be on a new level of hell.

It would be better to delete your comment so nobody else has to has to ever have this crisis.

imp0cat
·
3 months ago
·
[ - ]

Depends on the complexity of your pipeline.

alex_suzuki
·
3 months ago
·
[ - ]

> * Always use your own runners, on-premise if possible

Why? I understand it in cases where security is critical or intellectual property is at stake. Are you talking about "snowflake runners" or just dumb executors of container images?

saidinesh5
·
3 months ago
·
[ - ]

Caching is nicer on own runners. No need to redownload 10+GB of "development container images" just to build your 10 lines of changed code.

With self hosted Gitlab runners it was almost as fast as doing incremental builds. When your build process can take like 15-20 minutes (medium sized C++ code base), this brought down the total time to 30 seconds or so.

imp0cat
·
3 months ago
·
[ - ]

This. Your own runners can cache everything (docker caches, apt caches, ccache outputs...) and can also share the compilation load (icecc for c++). All that gives 5x-10x speed boost.

madeofpalk
·
3 months ago
·
[ - ]

This is true because your CI steps will be running on a lower number of physical machines, ensuring higher cache hits?

saidinesh5
·
3 months ago
·
[ - ]

Kind of - you can also pin runners.("This workflow runs on this runner always"). And caching just means not deleting the artifacts from the file system from the previous runs.

Imagine building Android - even "cloning the sources" is 200GB of data transfer, build times are in hours. Not having to delete the previous sources and doing an incremental build saves a lot of everything.

imp0cat
·
3 months ago
·
[ - ]

Gitlab also has some tips here: https://docs.gitlab.com/ci/caching/ on using shared caches, which can help in some scenarios, especially runners in Kubernetes that are ephemeral, ie. created just before a job starts and destroyed afterward.

tldr; "A cache is one or more files a job downloads and saves. Subsequent jobs that use the same cache don’t have to download the files again, so they execute more quickly."

It will probably still be slower than a dedicated runner, but possibly require less maintenance ("pet" runner vs "cattle" runner).

deng
·
3 months ago
·
[ - ]

It obviously depends on your load. Fast pipelines matter, so don't run them on some weak cloud runner with the speed of a C64. Fast cloud runners are expensive. Just invest some money and buy or at least rent some beefy servers with lots of cores, RAM and storage and never look back. Use caches for everything to speed up things.

Security is another thing where this can come in handy, but properly firewalling CI runners and having mirrors of all your dependencies is a lot of work and might very well be overkill for most people.

zoobab
·
3 months ago
·
[ - ]

"Fast cloud runners are expensive."

Buy a cheap Ryzen, and put it on your desk, that's a cheap runner.

withinboredom
·
3 months ago
·
[ - ]

30 bucks a month on hetzner for a dedicated machine with 12-16 cores and 64 gb of ram and unlimited 1gbps bandwidth.

crabbone
·
3 months ago
·
[ - ]

Debugging and monitoring. When the runner is somewhere else, and is shared nobody is going to give you full access to the machine.

So many times I was biting my fingers not being able to figure out the problems GitHub runners were having with my actions and was unable to investigate.

·
3 months ago
·
[ - ]

gabyx
·
3 months ago
·
[ - ]

Ohh, @deng, my exact words and you are 100% right. Same experience, same conclusion:

- I would go even further: Do not use bash/python or any duck-typed lang. (only for simple projects, but better just dont get started). - Leverage Nix (!! no its not a joke ecosystem) : devshells or/and build devcontainers out of it. - Treat tooling code, ci code, the exact same as your other code. - Maybe generate the pipeline for your YAML based CI system in code. - If you use a CI system, gitlab, circle etc, use one which does not do stupid things with your containers (like Github: 4 years! old f** up: https://github.com/actions/runner/issues/863#issuecomment-25...)

Thats why we built our own build tool which does that, or at least helps us doing the above things:

https://github.com/sdsc-ordes/quitsh

gabyxgabyx
·
3 months ago
·
[ - ]

Ohh, @deng, my exact words and you are 100% right. Same experience, same conclusion:

- I would go even further: Do not use bash/python or any duck-typed lang. (only for simple projects, but better just dont get started).

- Leverage Nix (!! no its not a joke ecosystem) : devshells or/and build devcontainers out of it.

- Treat tooling code, ci code, the exact same as your other code.

- Maybe generate the pipeline for your YAML based CI system in code.

- If you use a CI system, gitlab, circle etc, use one which does not do stupid things with your containers (like Github: 4 years! old f** up: https://github.com/actions/runner/issues/863#issuecomment-25...). Also one which lets you run dynamically generated pipelines.

Thats why we built our own build tool which does that, or at least helps us doing the above things:

https://github.com/sdsc-ordes/quitsh

carlmr
·
3 months ago
·
[ - ]

>Invest time that your pipelines can run locally on a developer machine as well (as much as possible at least), otherwise testing/debugging pipelines becomes a nightmare.

This so much. This ties into the previous point about using as much shell as possible. Additionally I'd say environment control via Docker/Nix, as well as modularizing the pipeline so you can restart it just before the point of failure instead of rerunning the whole business just to replay one little failure.

valenterry
·
3 months ago
·
[ - ]

Amen.

To put the first 3 points into different words: you should treat the CI only as a tool that manages the interface and provides interaction with the outside world (including injecting secrets/configuration, setting triggers, storing caches etc.) and helps to visualize things.

Unfortunately, to do that, it puts constraints on how you can use it. Apart from that, no logic should live in the CI.

Tainnor
·
3 months ago
·
[ - ]

> Write as much CI logic as possible in your own code. Does not really matter what you use (shell scripts, make, just, doit, mage, whatever) as long as it is proper, maintainable code.

To an extent, yes. There should be one command to build, one to run tests, etc.

But in many cases, you do actually want the pipeline functionality that something like Gitlab CI offers - having multiple jobs instead of a single one has many benefits (better/shorter retry behaviour, parallelisation, manual triggers, caching, reacting to specific repository hooks, running subsets of tests depending on the changed files, secrets in env vars, artifact publishing, etc.). It's at this point that it becomes almost unavoidable to use many of the configuration features including branching statements, job dependencies etc. and that's where it gets messy.

The problem is really that you're forced to do all of that in YAML instead of an actual programming language.

jordanbeiber
·
3 months ago
·
[ - ]

We’ve gone full-on full-code.

Although we’re using temporal to schedule the workflows, we have a full-code typescript CI/CD setup.

We’ve been through them all starting with Jenkins ending with drone, until we realized that full-code makes it so much easier to maintain and share the work over the whole dev org.

No more yaml, code generating yaml, product quirk, groovy or DSLs!

bob1029
·
3 months ago
·
[ - ]

> Write as much CI logic as possible in your own code

This has been my entire strategy since I've been able to do this:

https://learn.microsoft.com/en-us/dotnet/core/deploying/#pub...

Pulling the latest from git, running "dotnet build" and sending the artifacts to zip/S3 is now much easier than setting up and managing Jenkins, et. al. You also get the benefit of having 100% of your CI/CD pipeline under source control alongside the product.

In my last professional application of this (B2B/SaaS; customer hosts on-prem), we didn't even have to write the deployment piece. All we needed to do was email the S3 zip link to the customer and they learned a quick procedure to extract it on the server each time.

ptx
·
3 months ago
·
[ - ]

> All we needed to do was email the S3 zip link to the customer and they learned a quick procedure to extract it on the server each time.

My concern with this kind of deployment solution, where the customer is instructed to install software from links received in e-mails, is that someone else could very easily send them a link to a malicious installer and they would be hosed. E-mail is not authenticated (usually) and the sender can be forged.

I suppose you could use a shared OneDrive folder or something, which would be safer, as long as the customer doesn't rely on receiving the link to OneDrive by e-mail.

cesnja
·
3 months ago
·
[ - ]

You can build the first pipeline with oneliners, but as long as you want to keep optimizing the pipelines, the yaml code will keep piling up with CI vendor's specific approaches to job selection, env variable delivery, caching, output sharing between jobs and so on.

·
3 months ago
·
[ - ]

wvh
·
3 months ago
·
[ - ]

I like the premise of something like Dagger, being an Actions CI that can run locally and uses Docker. I don't know if there's an up-and-coming "safe" open-source alternative that does not have that threat of a VC time bomb hanging over it.

Docker and to some extent, however unwieldy, Kubernetes at least allow you to run anywhere, anytime without locking you into a third party.

A "pipeline language" that can run anywhere, even locally, sets up dependency services and initial data, runs tests and avoids YAML overload would be a much needed addition to software engineering.

DanielHB
·
3 months ago
·
[ - ]

Man I tried this approach by making my builds dockerized, turns out docker layer caching is pretty slow on CI and adds a lot of overhead locally.

Do not recommend this approach (of using docker for building).

adra
·
3 months ago
·
[ - ]

Make builds in docker by mounting volumes and have your sources, intermediate files, caches, etc. in these volume mounts. Building a bunch of intermediate or incremental data IN the container every time you execute a new partial compile is insanity.

It's very satisfying just compile an application with a super esoteric tool chain in docker vs the nightmares of setting it up locally (and keeping it working over time).

DanielHB
·
3 months ago
·
[ - ]

I had a project that had to build for macos, linux and windows on armv7, armv8 and x64 (and there were some talks about mips too). Just setting up all the stuff required to compile for all these target archs was a nightmare.

We used a single huge docker image with all the dependencies we needed to cross compile to all architectures. The image was around 1GB, it did its job but it was super slow on CI to pull it.

amadio
·
3 months ago
·
[ - ]

I think this is good advice overall. I wrote a CMake script that does most of the heavy lifting for XRootD (see https://news.ycombinator.com/item?id=39657703). The CI is then a couple of lines, one to install the dependencies using the packaging tools, and another one calling that script. So don't underestimate the convenience that packaging can give you when installing dependencies.

Aeolun
·
3 months ago
·
[ - ]

This is where I was going to say something about dagger, but it seems it turned into AI crud.

Let me at least recommend depot.dev for having absurdly fast runners.

shykes
·
3 months ago
·
[ - ]

Hello! Dagger CEO here. We are indeed getting an influx of AI workloads (AI agents to be specific, which is the fancy industry term for "software with LLMs inside"), and are of course trying to capitalize on that in our marketing material. We're still looking for the right balance of CI and AI on our website. Crucially, it's the same engine running both. Because, as it turns out, AI agents are mostly workflows under the hood, and Dagger is great at running those.

I shared more context in this thread: https://x.com/solomonstre/status/1895671390176747682

oulipo
·
3 months ago
·
[ - ]

can you give more feedback about dagger? what is good/not good about it? I was going to start looking into it

Aeolun
·
3 months ago
·
[ - ]

I liked their setup before, though I never got around to actually using it, but the tagline on the website has changed to “AI powered workflow orchestration”, which is quite different from the original “Write pipeline once, run everywhere”

oulipo
·
3 months ago
·
[ - ]

yes but I went on their Discord, and the AI thing is more an "extension" of the CI/CD thing, it's just marketing-speak for the investors, they are still building the CI/CD tool

toastal
·
3 months ago
·
[ - ]

For starts looking at their website, it looks like all collaboration is locked behind proprietary platforms… Discord, Twitter, LinkedIn, Microsoft GitHub.

esafak
·
3 months ago
·
[ - ]

Look into it; it works, more or less.

amedvednikov
·
3 months ago
·
[ - ]

We recently migrated from YAML CI to VSH as well:

https://github.com/vlang/v/blob/master/ci/linux_ci.vsh

speleding
·
3 months ago
·
[ - ]

I would like to add one point:

* Consider whether it's not easier to do away with CI in the cloud and just build locally on the dev's laptop

With fast laptops and Docker you can get perfectly reproducible builds and tests locally that are infinitely easier to debug. It works for us.

claytonjy
·
3 months ago
·
[ - ]

How do you ensure what a dev builds and tags and pushes is coherent, meaning the tag matches the code commit it’s expected to?

I think builds must be possible locally, but i’d never rely on devs for the source of truth artifacts running in production, past a super early startup.

speleding
·
3 months ago
·
[ - ]

You can add all kind of verification scripts to git hooks, that trigger before and after someone pushes, like you do with GitHub actions. Whether you trust you devs less than your build pipeline is an organizational issue, but in our org only a few senior devs can merge to master.

·
3 months ago
·
[ - ]

ed_elliott_asc
·
3 months ago
·
[ - ]

* print out the working directory and a directory listing every time

12_throw_away
·
3 months ago
·
[ - ]

And the environment! (Also, don't put secrets in environment vars)

WhyNotHugo
·
3 months ago
·
[ - ]

If CI just installs some packages and runs `make check` (or something close), then it's going to be much much easier for others to run checks locally.

djha-skin
·
3 months ago
·
[ - ]

I couldn't agree more, really. My whole career points to this as the absolute correct advice in CI.

fahhem
·
3 months ago
·
[ - ]

Why use your own runners? If it's about cost, why not use a cheaper cloud like SonicInfra.com?

outofpaper
·
3 months ago
·
[ - ]

Agree with everything except for the avoidance of YAML. What is your rationale for this?

neves
·
3 months ago
·
[ - ]

How AWS Code Builder compares? I'm delving into AWS world now.

Kwpolska
·
3 months ago
·
[ - ]

There are tradeoffs to that. If your CI logic is in shell scripts, you will probably get worse error reporting than the dedicated tasks from the CI tool (which hook into the build system, or which know how to parse logs).

agumonkey
·
3 months ago
·
[ - ]

seconded, it was great to leverage hosted cicd at work, until we realized that local testing would now be handled differently..

as always, enough decoupling is useful

totaldude87
·
3 months ago
·
[ - ]

Gitlab's search just sucks..

julik
·
3 months ago
·
[ - ]

Seconded. Moreover...

> as long as it is proper, maintainable code

...in an imperative language you know well and which has a sufficient amount of type/null checking you can tolerate.

Ancalagon
·
3 months ago
·
[ - ]

+1 for avoiding YAML at all costs

Also lol @deng

okayishdefaults
·
3 months ago
·
[ - ]

[dead]

$ git l * cbe9658 8 weeks ago rejschaap (HEAD -> add-ci-cd) Update deploy.yml * 0d78a6e 8 weeks ago rejschaap Update deploy.yml * e223056 8 weeks ago rejschaap Update deploy.yml * 8e1e5ea 8 weeks ago rejschaap Update deploy.yml * 459b8ea 8 weeks ago rejschaap Update deploy.yml * a104e80 8 weeks ago rejschaap Update deploy.yml * 0e11d40 8 weeks ago rejschaap Update deploy.yml * 727c1d3 8 weeks ago rejschaap Create deploy.yml

on: issues: types: - opened pull_request: types: - opened permissions: contents: read issues: write pull-requests: write jobs: default: runs-on: ubuntu-latest steps: - run: gh issue edit ${{ github.event.issue.number }} --add-assignee ${{ github.repository_owner }} env: GH_TOKEN: ${{ github.token }} GH_REPO: ${{ github.repository }}

- name: Install Earthly if: steps.check_changes.outputs.relevant_changes == 'true' uses: earthly/actions-setup@v1 with: version: v${{ env.EARTHLY }} - name: Run tests and generate coverage summary if: steps.check_changes.outputs.relevant_changes == 'true' run: cd src/webapp && earthly --build-arg GO_VERSION=${{ env.GOLANG }} +coverage-summary