Debian bookworm live images now reproducible

766
200
bertman
3 months ago
lwn.net

jcmfernandes
·
3 months ago
·
[ - ]

Insane effort. This sounded like a pipe dream just a couple of years ago. Congrats to everyone involved, especially to those who drove the effort.

Joel_Mckay
·
3 months ago
·
[ - ]

The Debian group is admirable, and have positively changed the standards for OS design several times. Reminds me I should donate to their coffee fund around tax time =3

alfiedotwtf
·
3 months ago
·
[ - ]

Exactly!

I’ve said it many times and I’ll repeat it here - Debian will be one of the few Linux distros we have right now, that will still exist 100 years from now.

Yea, it’s not as modern in terms of versioning and risk compared to the likes of Arch, but that’s also a feature!

roenxi
·
3 months ago
·
[ - ]

> Debian will be one of the few Linux distros we have right now, that will still exist 100 years from now.

It'd certainly be nice, but if you've ever seen an organisation unravel it can happen with startling speed. I think the naive estimate is if you pick something at random it is half-way through its lifespan; so there isn't much call yet to say Debian will make it to 100.

Y_Y
·
3 months ago
·
[ - ]

> I think the naive estimate is if you pick something at random it is half-way through its lifespan; so there isn't much call yet to say Debian will make it to 100.

This doesn't strike me as a strong argument. That naive estimate (in whatever form[0]) is typically based on not knowing anything else about the process you're looking at. We have lots of information about Debian and similar projects, and you can update your estimate (in a Bayesian fashion) when you know this. Given that Ian Murdock started Debian 31 years ago I think more than 100 years is a very reasonable guess.

[0] see e.g. https://en.wikipedia.org/wiki/Lindy_effect

Joel_Mckay
·
3 months ago
·
[ - ]

Arguably, there is already the continuous package deprecation process that often leads to unpopular projects getting culled in the next upgrade.

In a way, Flatpak/Snap/Docker was mitigations to support old programs on new systems, and old systems with updated software no longer compatible with the OS. Not an ideal solution, but a necessary one if folks also wanted to address the win/exe dominant long-term supported program versions.

If working with unpopular oddball stuff one notices the packages cycle out of the repositories rather regularly. =3

OrderlyTiamat
·
3 months ago
·
[ - ]

I appreciate the Lindy effect, but I'd be very cautious in applying it in just any domain. In particular IT, where new project continually spring up to dethrone others. Another 30 years for Debian seems reasonable, but I'd probably bet against another 100. A metaculus question for the longevity of projects like debian would be fascinating.

gorjusborg
·
3 months ago
·
[ - ]

> In particular IT, where new project continually spring up to dethrone others.

The Lindy effect says nothing about popularity, which is how I translate your use of 'dethrone' here. It observes that something's duration of existence correlates with its chances for existence in the future.

rovr138
·
3 months ago
·
[ - ]

> In particular IT, where new project continually spring up to dethrone others

Yet, it's lasted 31 years, which is a pretty insane amount of time in tech. This on top of being kept up to date, good structure, really good contributions and advancements.

On the other hand, you look at centos, redhat, and oracle and their debacle. How much did they fragment that area.

And then we have Debian just chugging along.

Joel_Mckay
·
3 months ago
·
[ - ]

Indeed, it was sad when they ended the FreeBSD based Debian project due to a lack of interest.

I don't think traditional von Neumann architecture will even be around in 100 years, as energy demands drive more efficient designs for different classes of problems. =3

vbezhenar
·
3 months ago
·
[ - ]

I feel more safe using Arch, compared to Debian. Debian adds so much of their patches on top of original software, that the result is hardly resembles the original. Arch just ships original code almost always. And I trust much more to the original developers, than Debian maintainers.

palata
·
3 months ago
·
[ - ]

> And I trust much more to the original developers, than Debian maintainers.

Then that's a good reason not to use Debian indeed. Whatever the distro you choose, you give your trust to its maintainers.

But that's also a feature: instead of trusting random code from the Internet, you can trust random code from the Internet that has been vetted by a group of maintainers you chose to trust. Which is a bit less random, I think?

Joel_Mckay
·
3 months ago
·
[ - ]

Debian standardized the vetting process for maintainers, validation environments, and shenanigans could be attributed to individual signatures rather quickly.

If you ever want a laugh, one should read what Canonical puts the kids though for the role. One could get a job flying a plane with less paperwork...

Authenticated signed packaging is often a slow process, and some people do prefer rapid out-of-band pip/npm/cargo/go until something goes sideways... and no one knows who was responsible (or which machine/user is compromised.)

Not really random, but understandably slow given the task of reaching "stable" OS release involves hundreds of projects... =3

palata
·
3 months ago
·
[ - ]

Yeah I think that's what I was trying to say. With a distro, you get some kind of validation by maintainers. With unvetted package managers, you just get something from somewhere.

vbezhenar
·
3 months ago
·
[ - ]

I don't trust in any validation by the maintainers. There's too much code even in small projects. Big projects are oceans of code. Maintainers maintain too much packages to be able to understand even a little bit of changes. So, no, I don't trust it. It would require a specialized team of engineers for every single project to analyze changes in new versions. It just does not happen.

Best they can do is to follow developer's instructions to build a binary artefact and upload it somewhere. May be codify those instructions into a (hopefully) repeatable script like PKGBUILD.

palata
·
3 months ago
·
[ - ]

> Best they can do is to follow developer's instructions to build a binary artefact and upload it somewhere. May be codify those instructions into a (hopefully) repeatable script like PKGBUILD.

I don't understand; isn't this exactly what maintainers do? They write a recipe (be it a PKGBUILD or something else) that builds (maybe after applying a few patches) a package that they then distribute.

Whether you use Arch or Debian, you trust that the maintainers don't inject malware into the binaries they ship. And you trust that the maintainers trust the packages they distribute. Most likely you don't personally check the PKGBUILD and the upstream project.

vbezhenar
·
3 months ago
·
[ - ]

No, they alter and modify the software as they see fit.

Here's one of the recent examples: https://www.reddit.com/r/debian/comments/1cv30gu/debian_keep...

And that's applied to a lot of packages. Sometimes it leads to frustrated users who directly come to frustrated developers who have no idea what they're talking about, because developers did not intend software to be patched and built this way. Sometimes this leads straight to vulnerabilities. Sometimes this leads to unstable software, for example when maintainer "knows better" which libraries the software should link to.

yjftsjthsd-h
·
3 months ago
·
[ - ]

> Here's one of the recent examples: https://www.reddit.com/r/debian/comments/1cv30gu/debian_keep...

They used an official build option to not ship a feature by default, and have another package that does enable all features. If that's your best example of

> Debian adds so much of their patches on top of original software, that the result is hardly resembles the original.

then I'm inclined to conclude that Debian is way more vanilla than I thought.

palata
·
3 months ago
·
[ - ]

> No, they alter and modify the software as they see fit.

Well yeah, but you choose the maintainers that do it the way you prefer. In your care you say you like Arch better, because they "patch less" (if I understand your feeling).

Still they do exactly what you describe they should do: write a recipe, build it and ship a binary. You can even go with Gentoo if you want to build (and possibly patch) yourself, which I personally like.

> Here's one of the recent examples: [...]

Doesn't seem like it supports your point: the very first comment on that Reddit threads explains what they did: they split one package into two packages. Again, if you're not happy with the way the Debian maintainers do it, you can go with another distro. Doesn't change the fact that if you use a distro (as opposed to building your own from scratch), then you rely on maintainers.

Joel_Mckay
·
3 months ago
·
[ - ]

In general, the apparent use-case and actual unintended impact on OS security must be clear. There is also always extreme suspicion regarding "security" widgets that touch the web browser, shell, or email programs. Normally, after something like CVE-2023-35866 is noted, a package maintainer may assume the project is a liability given the history.

If an application requires a 3 page BS explanation about how to use a footgun without self-inflicted pwning... it seems like bad design for a posix environment.

People that attempt an escalation of coercion with admins usually get a ban at minimum. Deception, threats, and abuse will not help in most cases if the maintainer is properly trained.

https://www.youtube.com/watch?v=lITBGjNEp08

Have a nice day, =3

freedomben
·
3 months ago
·
[ - ]

I love Debian, but this is a genuine deal that many people don't know about. It also compounds if you're on Ubuntu as sometimes Canonical adds their own patches too. If you're just using Debian as a base OS to serve your own software, it doesn't matter as much but still does somewhat. It's not unusual for Debian-specific patches to be applied by the package maintainers in order to fix build errors, mismatched dependencies, etc. Most of the time those patches are harmless, but sometimes they are not. There have been security vulnerabilities for example that only existed in the Debian-based package of software. No distro is perfect and I don't intend this as a criticism of Debian (as they have legitimate reasons for doing what they do), and no distro (not even Arch) ships everything without any patches, but in my years of experience I've bumped my head on this in Debian several times.

progval
·
3 months ago
·
[ - ]

> There have been security vulnerabilities for example that only existed in the Debian-based package of software.

Any examples more recent than CVE-2008-0166?

freedomben
·
3 months ago
·
[ - ]

Currently on mobile and going from memory, but I remember having to push out quick patches for something around 2020-ish or late 2010s? The tip of my tongue says it was a use-after-free vuln in a patch to openssl, but I can't remember with confidence. I'll see if I can find it once I get home.

Worth noting lest I give the wrong impression, I don't think security is a reason to avoid Debian. For me the hacked up kernels and old packages have been much more the pain points, though I mostly stopped doing that work a few years ago. As a regular user (unless you're compiling lots of software yourself) it's a non-issue

Joel_Mckay
·
3 months ago
·
[ - ]

In general, most responsibly reported CVE allow several weeks for the patch fixes to propagate into the ecosystems before public disclosure.

Once an OS is no longer actively supported, it will begin to accumulate known problems if the attack surface is large.

Thus, a legacy complex-monolith or Desktop host often rots quicker than a bag of avocados. =3

walrus01
·
3 months ago
·
[ - ]

It's quite easy to run Debian unstable (sid) if you want a more risky approach to having the newest of every package.

cess11
·
3 months ago
·
[ - ]

Commonly these days you can also add specific repos for the things you want to be more on the edge. Then there are some tools one might install manually, at the moment I remember doing it with fzf.

presbyterian
·
3 months ago
·
[ - ]

Flatpak is also a great option for apps you might want to be more up-to-date than Debian provides in their package manager.

sgarland
·
3 months ago
·
[ - ]

Yep, I do this for a few tools. Though apt-key deprecation still hasn’t been universally accepted, so that’s always a minor annoyance to deal with.

toasteros
·
3 months ago
·
[ - ]

I always want to donate more to open source projects but as far as I know there aren't any I can get tax credits for in Canada. My budget is strapped just enough that I can't quite afford to donate for nothing.

Any Canadian residents here know of any tax credit eligible software projects to donate to?

Joel_Mckay
·
3 months ago
·
[ - ]

Depends on where you live, work, and invest. Still, I would recommend chatting with a local accountant to be sure if a significant contribution to a donee qualifies as deductible. Note most large universities will be registered in both the US/Canada.

https://www.canada.ca/content/dam/cra-arc/formspubs/pub/p113...

Best regards, =3

imcritic
·
3 months ago
·
[ - ]

I don't get how someone achieves reproducibility of builds: what about files metadata like creation/modification timestamps? Do they forge them? Or are these data treated as not important enough (like it 2 files with different metadata but identical contents should have the same checksum when hashed)?

jzb
·
3 months ago
·
[ - ]

Debian uses a tool called `strip-nondeterminism` to help with this in part: https://salsa.debian.org/reproducible-builds/strip-nondeterm...

There's lots of info on the Debian site about their reproducibility efforts, and there's a story from 2024's DebConf that may be of interest: https://lwn.net/Articles/985739/

frakkingcylons
·
3 months ago
·
[ - ]

I see this is written in Perl, is that the case with most Debian tooling?

lamby
·
3 months ago
·
[ - ]

One of the authors of strip nondeterminism is here. The primary reason it's written in Perl is that given that strip-nondeterminism is used when building 99.9% of all Debian packages, using any other language would have essentially made that language's runtime a dependency for all building Debian packages. (Perl is already required by the build process, whilst Python is not.)

flkenosad
·
3 months ago
·
[ - ]

Question: is Perl the only runtime the Debian build process relies on?

yrro
·
3 months ago
·
[ - ]

Any packages with "Essential: yes" (run 'apt list ~E' to see them) are required on any Debian system. Additionally, the 'build-essential' pulls in other packages that must be present to build Debian packages via its dependencies: https://packages.debian.org/sid/build-essential

fooker
·
3 months ago
·
[ - ]

It’s helpful to think of Perl as a superior bash, rather than a worse python, when it comes to scripting.

gjvc
·
3 months ago
·
[ - ]

stealing this, thank you

nukem222
·
3 months ago
·
[ - ]

Notably, they forgot to improve on readability and maintability, both of which are markedly worse with perl.

Look I get people use the tools they use and perl is fine, i guess, it does its job, but if you use it you can safely expect to be mocked for prioritizing string operations or whatever perl offers over writing code anyone born after 1980 can read, let alone is willing to modify.

For such a social enterprise, open source orgs can be surprisingly daft when it comes to the social side of tool selection.

Would this tool be harder to write in python? Probably. Is it a smart idea to use it regardless? Absolutely. The aesthetics of perl are an absolute dumpster fire. Larry Wall deserves persecution for his crimes.

sgarland
·
3 months ago
·
[ - ]

Did you miss the post a few above yours, where an author of this tool explained why it’s written in Perl? Introducing a new language dependency for a build, especially of an OS, is not something you undertake lightly.

nukem222
·
3 months ago
·
[ - ]

Right. Good luck finding people who want to maintain that. It just seems incredibly short-sighted unless the current batch of maintainers intend to live forever.

sgarland
·
3 months ago
·
[ - ]

Counterpoint: if someone knows Perl, they are much more likely to have the requisite skills to be a maintainer for a distro. It’s self-selection.

Imagine the filtering required for potential maintainers if they rewrote the packaging to JS.

eviks
·
3 months ago
·
[ - ]

How is that helpful to ignore a better alternative just because a worse one exists?

palata
·
3 months ago
·
[ - ]

They precisely say they use it as a better alternative to bash. Obviously they don't think that Python is a better alternative here... or did I misunderstand the question?

eviks
·
3 months ago
·
[ - ]

Not obvious to me that they think Python is worse than Perl, and make the phrase even less sensible.

dizhn
·
3 months ago
·
[ - ]

Weird wording yes. I read it as "yes perl is better than bash" (I assume for tasks that need actual programming languages), "no it's not worse than python".

ben0x539
·
3 months ago
·
[ - ]

I'm not reading it as "it's not worse than python", I am reading it as "the choice was between bash and perl, python was not an option for reasons unrelated to its merits"

palata
·
3 months ago
·
[ - ]

So you genuinely believe that they think Python is a better choice in this case, but still chose to go for Perl because they believe it's worse? How does that work?

eviks
·
3 months ago
·
[ - ]

It works by not mixing two different people: the commenter and the implementer.

Also, it works trivially even in the case of the implementer - he might believe Python is better, but chose Perl because he likes it more

fooker
·
3 months ago
·
[ - ]

The same reason people write C++ instead of better^TM alternatives.

Pick the tool you already know and focus on solving the problem.

londons_explore
·
3 months ago
·
[ - ]

Packaging and making build scripts is perhaps one of the most unrewarding tasks out there. As an open source project where most work is done for free, debian can't afford to be prescriptive about what languages are used for this sort of task.

account42
·
3 months ago
·
[ - ]

Actually it can and it is. Build system dependencies, especially ones that apply to all packages, are something that concerns the distribution as a whole and not something where each developer can just add their favorite one.

johnisgood
·
3 months ago
·
[ - ]

I checked the code. Perl is suitable for these kind of tasks.

dannyobrien
·
3 months ago
·
[ - ]

some, but not all. There's a bunch of historical code which means that Perl is in the base install, but modern tooling has a lot of Python too, as well as POSIX shell (not bash).

alfiedotwtf
·
3 months ago
·
[ - ]

Though a lot of the apt tooling is definitely written in Perl the last time I had to deep dive

johnisgood
·
3 months ago
·
[ - ]

And a lot of OpenBSD-related stuff is written in Perl, too. I do not think it is a bad thing at all.

alfiedotwtf
·
3 months ago
·
[ - ]

I absolutely love Perl. I'm just so sad Python won because Google blessed it as a language and at the time everyone wanted to work for Google.

Perl always gets hate on HN, but I actually wonder of those commenter, who has actually spent over a single hours using Perl after they've read the Camel book.

Honest opinion: if you're going to be spending time in Linux in your career, then you should read the Camel book at least once. Then and only then should you get to have an opinion on Perl!

freedomben
·
3 months ago
·
[ - ]

I mostly agree with you, though I do think Perl is genuinely harder to read than many other languages. Perl was often my goto for scripts before I learned Ruby (which has many glorious perl-isms in it even if most rubyists nowadays don't know or want to acknowledge that :-D ), and even looking back at some of my own code and knowing what it does, I have to read it a lot slower and more carefully than most other langs. Perl to me feels wonderfully optimized for writing, sometimes at the expense of reading. I love Perl's power and expressiveness, especially the string processing libs, and while I appreciate the flexibility in how many different ways there are to do things, it does mean that Perl code written by someone else with different approaches can sometimes be difficult to grok. For my own scripts I don't care about any of those issues and I often optimize for writing anyway, but there are plenty of applications where I would recommend against Perl, despite my affection for it.

And yes agree, people should read the camel book!

johnisgood
·
3 months ago
·
[ - ]

> there are plenty of applications where I would recommend against Perl

Yes of course, I would not write any type of servers in Perl, I would pick Go or Elixir or Erlang for such an use-case.

jeltz
·
3 months ago
·
[ - ]

Last time I checked a lot was also written in Python.

o11c
·
3 months ago
·
[ - ]

Timestamps are easiest part - you just set everything according to the chosen epoch.

The hard things involve things like unstable hash orderings, non-sorted filesystem listing, parallel execution, address-space randomization, ...

koolba
·
3 months ago
·
[ - ]

ASLR shouldn’t be an issue unless you intend to capture the entire memory state of the application. It’s an intermediate representation in memory, not an output of any given step of a build.

Annoying edge cases come up for things like internal object serialization to sort things like JSON keys in config files.

kazinator
·
3 months ago
·
[ - ]

ASLR means that the pointers from malloc (which may come from mmap) are not predictable.

Sometimes programs have hash tables which use object identity as key (i.e. pointer).

ASLR can cause corresponding objects in different runs of the program to have different pointers, and be ordered differently in an identity hash table.

A program producing some output which depends on this is not necessarily a bug, but becomes a reproducibility issue.

E.g. a compiler might output some object in which a symbol table is ordered by a pointer hash. The difference in order doesn't change the meaning/validity of the object file, but is is seen as the build not having reproduced exactly.

account42
·
3 months ago
·
[ - ]

That's just one example of nondeterminism in compilers though - at the end it's the responsibility of the compile to provide options not to do that.

kazinator
·
3 months ago
·
[ - ]

Not for external causes like ASLR and memory allocators; those things should have their respective options for that.

account42
·
3 months ago
·
[ - ]

There is no guarantee that memory allocation is deterministic even without ASLR. If your program is supposed to be deterministic but its output depends on the memory addresses returned by the allocator then your program is buggy.

cperciva
·
3 months ago
·
[ - ]

FreeBSD tripped over an issue recently where a C++ program (I think clang?) used a collection of pointers and output values in an order based on the pointers rather than the values they pointed to.

ASLR by itself shouldn't cause reproducibility issues, but it can certainly expose bugs.

ahartmetz
·
3 months ago
·
[ - ]

It is sometimes just fine to have a hash table with pointers as keys. It is by design an unordered collection, so you do not care about the order, only about finding entries.

Then at some point you happen to need all the entries, you iterate, and you get a random order. Which is not necessarily a problem unless you want reproducible builds, which is just a new requirement, not exposing a latent bug.

sodality2
·
3 months ago
·
[ - ]

Let’s say a compiler is doing something in a multi-threaded manner - isn’t it possible that ASLR would affect the ordering of certain events which could change the compiled output? Sure you could just set threads to 1 but there’s probably some more edge cases in there I haven’t thought of.

zamadatix
·
3 months ago
·
[ - ]

I think you'd need the compiler to guarantee serialization order of such operations regardless if you used ASLR or not. Otherwise you're just hoping thread scheduling, core clocking, thread memory access, and many other things are the same between every system trying to do a reproducible build. Even setting threads to 1 may not solve that problem class if asynchronous functions/syscalls come into play.

purkka
·
3 months ago
·
[ - ]

Generally, yes: https://reproducible-builds.org/docs/timestamps/

Since the build is reproducible, it should not matter when it was built. If you want to trace a build back to its source, there are much better ways than a timestamp.

ryandrake
·
3 months ago
·
[ - ]

C compilers offer __DATE__ and __TIME__ macros, which expand to string constants that describe the date and time that the preprocessor was invoked. Any code using these would have different strings each time it was built, and would need to be modified. I can't think of a good reason for them to be used in an actual production program, but for whatever reason, they exist.

mananaysiempre
·
3 months ago
·
[ - ]

And that’s why GCC (among others) accepts SOURCE_DATE_EPOCH from the environment, and also has -Wdate-time. As for using __DATE__ or __TIME__ in code, I suspect that was more helpful in the age before ubiquitous source control and build IDs.

cperciva
·
3 months ago
·
[ - ]

Source control only helps you if everything is committed. If you're, say, working on changes to the FreeBSD boot loader, you're probably not committing those changes every time you test something but it's very useful to know "this is the version I built ten minutes ago" vs "I just booted yesterday's version because I forgot to install the new code after I built it".

jrockway
·
3 months ago
·
[ - ]

Versions built into the code are nice. I think the correct answer is to commit before the build proper starts (automatically, without changing your HEAD ref) and put that in there. Then you can check version control for the date information, but if someone else happens to add the same bytes to the same base commit, they also have the same version that you do. (Similarly, you can always make the date "XXXXXXXXXXXXXXXXXXXXXX" or something, and just replace the bytes with the actual date after the build as you deploy it.)

What I actually did at $LAST_JOB for dev tooling was to build in <commit sha> + <git diff | sha256> which is probably not amazingly reproducible, but at least you can ask "is the code I have right now what's running" which is all I needed.

Finally, there is probably enough flexibility in most build systems to pick between "reuse a cache artifact even if it has the wrong stamping metadata", "don't add any real information", and "spend an extra 45 cpu minutes on each build because I want $time baked into a module included by every other source file". I have successfully done all 3 with Bazel, for example.

·
3 months ago
·
[ - ]

mananaysiempre
·
3 months ago
·
[ - ]

> you're probably not committing those changes every time you test something

I’m not, but I really think I should be. As in, there should be a thing that saves the state of the tree every time I type `make`, without any thought on my part.

This is (assuming Git—or Mercurial, or another feature-equivalent VCS) not hard in theory: just take your tree’s current state and put it somewhere, like in a merge commit to refs/compiles/master if you’re on refs/heads/master, or in the reflog for a special “stash”-like “compiles” ref, or whatever you like.

The reason I’m not doing it already is that, as far as I can tell, Git makes it stupendously hard to take a dirty working tree and index, do some Git to them (as opposed to a second worktree using the same gitdir), then put things back exactly as they were. I mean, that’s what `git stash` is supposed to do, right?.. Except if you don’t have anything staged then (sometimes?..) after `git stash pop` everything goes staged; and if you’ve added new files with `git add -N` then `git stash` will either refuse to work, or succeed but in such a way that a later `git stash pop` will not mark these files staged (or that might be the behaviour for plain `git add` on new files?). Gods help you if you have dirty submodules, or a merge conflict you’ve fixed but forgot to actually commit.

My point is, this sounds like a problem somebody’s bound to have solved by now. Does anyone have any pointers? As things are now, I take a look at it every so often, then remember or rediscover the abovementioned awfulness and give up. (Similarly for making precommit hooks run against the correct tree state when not all changes are being committed.)

beecasthurlbow
·
3 months ago
·
[ - ]

An easy (ish) option here is to use autosquashing [1], which lets you create individual commits (saving your work - yay!) and then eventually clean em up into a single commit!

    git commit -am “Starting work on this important feature”
    
    # make some changes
    git add . && git commit —-squash “I made a change” HEAD

Then once you’re all done, you can do an auto squash interactive rebase and combine them all into your original change commit.

You can also use `git reset —-soft $BRANCH_OR_COMITTISH` to go back to an earlier commit but leave all changes (except maybe new files? Sigh) staged.

You also might check out `git reflog` to find commits you might’ve orphaned.

[1] https://thoughtbot.com/blog/autosquashing-git-commits

lmm
·
3 months ago
·
[ - ]

> If you're, say, working on changes to the FreeBSD boot loader, you're probably not committing those changes every time you test something

Whyever not? Does the FreeBSD boot loader not have a VCS or something?

steveklabnik
·
3 months ago
·
[ - ]

A subtlety that may be lost: FreeBSD uses CVS, and so there isn't a way to commit locally while you're working, like with a DVCS.

cperciva
·
3 months ago
·
[ - ]

FreeBSD hasn't used CVS since 2008.

steveklabnik
·
3 months ago
·
[ - ]

Huh! So, before I posted this, I went to go double check, and found https://wiki.freebsd.org/VersionControl. What I missed was the (now obvious) banner saying

> The sections below are currently a historical reference covering FreeBSD's migration from CVS to Subversion.

My apologies! At the end of the day, the point still stands in that SVN isn't a DVCS and so you wouldn't want to be committing unfinished code though, correct?

(I suspect I got FreeBSD mixed up with OpenBSD in my head here, embarrassing.)

jraph
·
3 months ago
·
[ - ]

You could still use git-svn, but yeah, as another commenter wrote, I don't think reproducible build is that useful when debugging, it should be fine to have an actual timestamp in the binaries.

cperciva
·
3 months ago
·
[ - ]

Well yes, but we've actually migrated to Git now. ;-)

steveklabnik
·
3 months ago
·
[ - ]

Welp! Egg on my face twice!

cperciva
·
3 months ago
·
[ - ]

It's in the FreeBSD src tree. But we usually commit code once it's working...

lmm
·
3 months ago
·
[ - ]

Huh. If I was confident enough in a change to consider it worth doing an actual boot to test I'd certainly want to have it committed, to be able to track and go back to it. Even the broken parts of history are valuable IME.

chippiewill
·
3 months ago
·
[ - ]

Which is fine, you don't need to use a reproducible build for local dev and can just use the real timestamp.

account42
·
3 months ago
·
[ - ]

Nobody cares about reproducibility of local development builds so just limit your use of date/time to those and use a more appropriate build reference for release builds.

repiret
·
3 months ago
·
[ - ]

> I can't think of a good reason for them

I work on a product whose user interface in one place says something like “Copyright 2004-2025”. The second year there is generated from __DATE__, that way nobody has to do anything to keep it up to date.

Arelius
·
3 months ago
·
[ - ]

I mean, you could do that, it's sort-of a lie though, maybe something better would be using the date of the most recent commit, which would be both more accurate, as far as authorship goes, and actually deterministic..

Pipe something like this into your build system:

    date --date "$(git log HEAD --author-date-order --pretty=format:"%ad" --date=iso | head -n1)" +"%Y"

fmbb
·
3 months ago
·
[ - ]

Toolchains for reproducible software likely let you set these values, or ensure they are 1970-01-01 00:00:00

mikepurvis
·
3 months ago
·
[ - ]

Nix sets everything to the epoch, although I believe Debian's approach is to just use the date of the newest file in the dsc tarballs.

lamby
·
3 months ago
·
[ - ]

Debian's approach is actually to use the date specified in the top entry in the debian/changelog file. That's more transparent and resilient than any mtime.

yjftsjthsd-h
·
3 months ago
·
[ - ]

Nix can also set it to things other than 0; I think my favorite is to set it by the time of the commit from which you're building.

terinjokes
·
3 months ago
·
[ - ]

Which is also used when the contents of a derivation will be included in a zip file. The Unix epoch is about a decade older than the zip epoch.

·
3 months ago
·
[ - ]

lamby
·
3 months ago
·
[ - ]

Strangely enough, sometimes using the epoch can expose bugs in libraries (etc.) when running or building in a timezone west of Greenwich due to the negative time offset taking time "below" zero.

rtpg
·
3 months ago
·
[ - ]

It's super nice to have timestamps as a quick way to know what program you're looking at.

Sticking it into --version output is helpful to know if, for example, the Python binary you're looking at is actually the one you just built rather than something shadowing that

izacus
·
3 months ago
·
[ - ]

The whole point or reproducible builds is that you don't need to rely on timestamps and similar information to know which binary you're looking at.

paulddraper
·
3 months ago
·
[ - ]

> Do they forge them?

Yes. All archive entries and date source code macros and any other timestamps are set to a standardized date (in the past).

lamby
·
3 months ago
·
[ - ]

This is not quite right. At least in Debian, only files that are newer than some standardised date are to that standardised date. This "clamping" preserves any metadata in older files.

·
3 months ago
·
[ - ]

echoangle
·
3 months ago
·
[ - ]

Maybe dumb question but why would this change the reproducibility? If you clone a git repo, do you not get the meta data as it is stored in git? Or would the files have the modification date of the cloning?

I never actually checked that.

mathfailure
·
3 months ago
·
[ - ]

You clone source from git, but then you use them to build some artifacts. The artifacts build time may differ, yet with reproducible builds - the artifact should match.

echoangle
·
3 months ago
·
[ - ]

Right, but if you only clone and build, why would the files modification date be different compared to the version that was committed to git? Does just cloning a repo already lead to different file modification dates in my local copy?

hoten
·
3 months ago
·
[ - ]

Git does not store or restore file modification times.

codetrotter
·
3 months ago
·
[ - ]

And the reason for that in turn is because if you are on one commit and check out and older commit, then restoring file modification times to what they were at the time of the older commit would cause build tools that look at file modification times to sometimes not pick up on all the changes.

echoangle
·
3 months ago
·
[ - ]

Ah ok, that explains it.

HideousKojima
·
3 months ago
·
[ - ]

Those aren't needed to generate a hash of a file. And that metadata isn't part of the file itself (or at least doesn't need to be), it's part of the filesystem or OS

imcritic
·
3 months ago
·
[ - ]

That's an acceptable answer for the simple case when you distribute just a file, but what if your distribution is something more complex, like an archive with some sub-archives? Metadata in the internal files will affect the checksum of the resulting archive.

londons_explore
·
3 months ago
·
[ - ]

Finding and fixing cases like this are part of what the project has done...

exe34
·
3 months ago
·
[ - ]

unless you fix them to a known epoch.

·
3 months ago
·
[ - ]

c0l0
·
3 months ago
·
[ - ]

Yes.

TacticalCoder
·
3 months ago
·
[ - ]

> ... what about files metadata like creation/modification timestamps? Do they forge them?

The least difficult to solve for reproducible build but yes.

The real question is: why, in the past, was an entire ecosystem created where non-determinism was the norm and everybody thought it was somehow ok?

Instead of asking: "how one achieves reproducibility?" we may wonder "why did people got out of their way to make sure something as simple as a timestamp would screw determinism?".

For that's the anti-security mindset we have to fight. And Debian did.

brohee
·
3 months ago
·
[ - ]

TBH security is someone the source of the issues, as it often involves adding randomness. For example, replacing deterministic hashes by keyed hashes to protect from hash flooding DoS led to deterministic output becoming nondeterministic (e.g. when displaying a hash table in its natural order).

Sorting had to be added to that kind of output.

BobbyTables2
·
3 months ago
·
[ - ]

You’re forgetting that source control used to not be a mainstream practice…

Software was more artisanal in nature…

kroeckx
·
3 months ago
·
[ - ]

It's my understanding that is about generating the .iso file from the .deb files, not about generating the .deb files from source. Generating .deb from source in a reproducible way is still a work in progress.

abdullahkhalids
·
3 months ago
·
[ - ]

Is the build infrastructure for Debian also reproducible? It seems like we if someone wants to inject malware in Debian package binaries (without injecting them into the source), they have to target the build infrastructure (compilers, linkers and whatever wrapper code is written around them).

Also, is someone else also compiling these images, so we have evidence that the Debian compiling servers were not compromised?

jzb
·
3 months ago
·
[ - ]

There's a page that includes reproducibility results for Debian here: https://tests.reproducible-builds.org/debian/bookworm/index_...

I think there's also a similar thing for the images, but I might be wrong and I definitely don't have the link handy at the moment.

There's lots of documentation about all of the things on Debian's site at the links in the brief. And LWN also had a story last year about Holger Levsen's talk on the topic from DebConf: https://lwn.net/Articles/985739/

goodpoint
·
3 months ago
·
[ - ]

The whole point of reproducible builds is to ensure security even if buildbots are compromised.

layer8
·
3 months ago
·
[ - ]

And what about the hardware on which the build runs? Is it reproducible? ;)

kragen
·
3 months ago
·
[ - ]

Working on it! But in general the answer is that for most purposes it's good enough to show that many independently produced pieces of hardware can reproduce the same results.

abdullahkhalids
·
3 months ago
·
[ - ]

You are joking. But solving this problem is probably amongst the most important we can have in the information age we live in.

Every country in the world should have the capability of producing "good enough" hardware.

ratmice
·
3 months ago
·
[ - ]

And who trusting trusted the original RepRap?

orblivion
·
3 months ago
·
[ - ]

The 50th generation builds a robot that murders you

TacticalCoder
·
3 months ago
·
[ - ]

> And what about the hardware on which the build runs? Is it reproducible? ;)

"Fully Countering Trusting Trust through Diverse Double-Compiling (DDC) - Countering Trojan Horse attacks on Compilers"

https://dwheeler.com/trusting-trust/

If the build is reproducible inside VMs, then the build can be done on different architectures: say x86 and ARM. If we end up with the same live image, then we're talking something entirely different altogether: either both x86 and ARM are backdoored the same way or the attack is software. Or there's no backdoor (which is a possibility we have to fancy too).

nikisweeting
·
3 months ago
·
[ - ]

well little johnny, when one hardware loves another hardware very much...

·
3 months ago
·
[ - ]

paulddraper
·
3 months ago
·
[ - ]

A la xz.

You must ultimately root trust in some set of binaries and any hardware that you use.

XorNot
·
3 months ago
·
[ - ]

For user space? No you can definitely do a stage 0 build which depends only on about 364 bytes of x86_64 binary (though ironically I haven't managed to get this to work for me yet).

The liability is EFI underneath that, and the Intel ring -1 stuff (which we should be mandating is open source).

paulddraper
·
3 months ago
·
[ - ]

> which depends only on about 364 bytes of x86_64 binary

jesboat
·
3 months ago
·
[ - ]

that's the point at which you say (reasonably accurately) that the 364 byte thing is written in machine code. it is small enough to manually translate between the binary and asm

geocrasher
·
3 months ago
·
[ - ]

What is the significance of a reproducible build, and how is it different than a normal distribution?

csense
·
3 months ago
·
[ - ]

Reproducible: If Alice and Bob both download and compile the same source code, Alice's binary is byte-for-byte identical to Bob's binary.

Normal: Before Debian's initiative to handle this problem, most people didn't think hard about all the ways system-specific differences might wind up in binaries. For example: __DATE__ and __TIME__ macros in C, parallel builds finishing in different order, anything that produces a tar file (or zip etc.) usually by default asks the OS for the input files' modification time and puts that into the bytes of the tar file, filesystems may list files in a directory in different order and this may also get preserved in tar/zip files or other places...

Why it's important: With reproducible builds, anyone can check the official binaries of Debian match the source code. This means going forward, any bad actors who want to sneak backdoors or other malware into Debian will have to find a way to put it in the source code, where it will be easier for people to spot.

sirsinsalot
·
3 months ago
·
[ - ]

The important property that anyone can verify the untainted relationship between the binary and the source (providing we do the same for both tool chains, not relying on a blessed binary at any point) is useful if people do actually verify outside the debian sphere.

I hope they promote tools to enable easy verification on systems external to debian build machines.

walrus01
·
3 months ago
·
[ - ]

as the 'xz' backdoor was in the source code, and remained there for a while before anyone spotted it, it doesn't necessarily guarantee that backdoors/malware won't make their way into the source of a very-widely-redistributed project.

badsectoracula
·
3 months ago
·
[ - ]

Source code availability doesn't mean that backdoors wont be put in place, it just makes it relatively easier to spot and remove them. Reproducible builds mean that the people who look for backdoors, malware, etc can focus on the source code instead of the binaries.

jkaplowitz
·
3 months ago
·
[ - ]

Certainly true. But removing some attack vectors still helps security and trustworthiness. These are not all or nothing questions.

jeltz
·
3 months ago
·
[ - ]

Only part of the backdoor was in the source code. It was split like that between the tarball and the code to hide it better. But, yes, with reproducible builds they could have put all of it in the source.

floxy
·
3 months ago
·
[ - ]

> __DATE__ and __TIME__ macros in C

So how do those work in these Debian reproducible builds? Do they outlaw those directives? Or do they set those based on something other than the current date and time? Or something else?

progval
·
3 months ago
·
[ - ]

The toolchain (eg. compiler) reads the time from an environment variable if present, instead of the actual time. https://reproducible-builds.org/docs/source-date-epoch/

flkenosad
·
3 months ago
·
[ - ]

Thank you for that fantastic explaination.

orblivion
·
3 months ago
·
[ - ]

Open source means "you can see the code for what you run". Except... how do you know that your executables were actually built from that code? You either trust your distro, or you build it yourself, which can be a hassle.

Now that the build is reproducible, you don't need to trust your distro alone. It's always exactly the same binary, which means it'll have one correct sha256sum. You can have 10 other trusted entities build the same binary with the same code and publish a signature of that sha256sum, confirming they got the same thing. You can check all ten of those. The likelihood that 10 different entities are colluding to lie to you is a lot lower than just your distro lying to you.

jrockway
·
3 months ago
·
[ - ]

Reproducible builds actually solve a lot of problems. (Whether these are real problems, who really knows, but people spend a lot of money to solve them.)

At my last job, some team spent forever making our software build in a special federal government build cluster for federal government customers. (Apparently a requirement for everything now? I didn't go to those meetings.) They couldn't just pull our Docker images from Docker Hub; the container had to be assembled on their infrastructure. Meanwhile, our builds were reproducible and required no external dependencies other than Bazel, so you could git checkout our release branch, "bazel build //oci" and verify that the sha256 of the containers is identical to what's on Docker Hub. No special infrastructure necessary. It even works across architectures and platforms, so while our CI machines were linux / x86_64, you can build on your darwin / aarch64 laptop and get the exact same bytes, every time.

In a world where everything is reproducible, you don't need special computers to do secure builds. You can just build on a bunch of normal computers and verify that they all generate the same bytes. That's neat!

(I'll also note that the government's requirements made no sense. The way the build ended up working was that our CI system build the binaries, and then the binaries were sent to the special cluster, and there a special Dockerfile assembled the binaries into the image that the customers would use. As far as I can tell, this offers no guarantee that the code we said was in the image was in the image, but it checked their checkbox. I don't see that stuff getting any better over the next 4 years, so...)

genpfault
·
3 months ago
·
[ - ]

https://en.wikipedia.org/wiki/Reproducible_builds

https://wiki.debian.org/ReproducibleBuilds/About

bbarnett
·
3 months ago
·
[ - ]

It means you can build it yourself, and know the source code you have, is all there is.

It validates that publicly available downloads aren't different from what is claimed.

rstuart4133
·
3 months ago
·
[ - ]

It's a link in a chain that allows you to trust programs you run.

- At the start of the chain, developers write software they claim is secure. But very few people trust the word of just one developer.

- Over time other developers look at the code and also pronounce it secure. Once enough independent developers from different countries and backgrounds do this, people start to believe it really is secure. As measure of security this isn't perfect, but it is verifiable and measurable in the sense more is always better, so if you set the bar very high you can be very confident.

- Somebody takes that code, goes through a complex process to produce a binary, releases it, and pronounces it is secure because it is only based on code that you trust, because of the process above. You should not believe this. That somebody could have introduced malicious code and you would never know.

- Therefore before reproducible builds, your only way to get a binary you knew was built from code you had some level of trust in was to build it yourself. But most people can't do that, so they have to trust that Debian, Google, Apple, Microsoft or whoever that are no backdoors have been added. Maybe people do place their faith in those companies, but is is misplaced. It's misplaced because countries like Australia have laws that allow them to compel such companies to silently introduce malicious code and distribute it to you. Australia's law is called the "Assistance and Access Bill (2018)". Countries don't introduce such laws for no reason. It's almost certain it is being used now.

- But now the build can be reproducible. That means many developers can obtain the same trusted source code from the source the original builder claimed he used, build the binary themselves, verify it is identical to the original so publicly validate the claim. Once enough independent developers from different countries and backgrounds do this, people start to believe it really built from the trusted sources.

- Ergo reproducible builds allow everyone, as opposed to just software developers, to run binaries they can be very confident was built just from code that has some measurable and verifiable level of trustworthiness.

It's a remarkable achievement for other reasons too. Although the ideas behind reproducible builds are very simple, it turned out executing it was about as simple as other straightforward ideas like "lets put a man on old moon". It seems build something as complex as an entire OS was beyond any company, or capitalism/socialism/communism, or a country. It's the product of something we've only seen arise in the last 40 years, open source, and it been built by a bunch of idealistic volunteers who weren't paid to do it. To wit: it wasn't done by commercial organisations like RedHat, or Ubuntu. It was done by Debian. That said, other similar efforts have since arisen like F-Droid, but they aren't on this scale.

zozbot234
·
3 months ago
·
[ - ]

Nice, these live images could become the foundation for a Debian-based "immutable OS" workflow.

polynox
·
3 months ago
·
[ - ]

That is the goal of Vanilla OS! https://vanillaos.org/

moondev
·
3 months ago
·
[ - ]

Do these live images come ready with cloud-init? A cloud-init in-memory live iso seems perfect for immutable infrastructure "anywhere"

bravetraveler
·
3 months ago
·
[ - ]

Should be trivial to put in, if not. Install the package and maybe prepare some datasource hints while reproducing the image. Depends on where you'll be using it.

The trick will be in the details, as usual. User data that both does useful work... and plays nicely with immutability.

I suspect it would be more sensible to skip the gymnastics of trying to manicure something inherently resistant, and instead, lean in on reproducibility. Make it as you want it, skip the extra work.

Want another? Great - they're freely reproducible :)

kragen
·
3 months ago
·
[ - ]

This is a huge milestone: https://lists.reproducible-builds.org/pipermail/rb-general/2...

Cort3z
·
3 months ago
·
[ - ]

I’m a noob to this subject. How can a build be non-reproducible? By that, I mean, what part of the build process could return non-deterministic output? Are people putting timestamps into the build and stuff like that?

r3trohack3r
·
3 months ago
·
[ - ]

File paths, timestamps, unstable ordering of inputs/outputs, locals, version info, variations in the build environment, etc.

This pages has a good write up

https://reproducible-builds.org/docs/

jcranmer
·
3 months ago
·
[ - ]

Timestamps, timestamps, absolute paths (i.e., differences between building /src versus /home/Cort3z/source), timestamps, file inode numbering ("for file in directory" defaults to inode order rather than alphabetical order in many languages, and that means it's effectively pseudorandom), more timestamps, using random data in your build process (e.g., embedding a generated private key, or signing something), timestamps, and accidental nondeterminism within the compiler.

By far the most prevalent source of nondeterminism is timestamps, especially since timestamps crop up in file formats you don't expect (e.g., running gzip stuffs a timestamp in its output for who knows what reason). After that, it's the two big filesystem issues (absolute paths and directory iteration nondeterminism), and then it's basically a long tail of individual issues that affect but one or two packages.

yupyupyups
·
3 months ago
·
[ - ]

This is amazing news. Well done!

nwellinghoff
·
3 months ago
·
[ - ]

Does anyone have any information as to how they modified their C code such that the complier output was deterministic? I thought one of the hardest problems with a effort like this was writing your C such that the compiler would output everything in the same order (same bytes)? And I am not just talking about time stamps etc.

account42
·
3 months ago
·
[ - ]

This is something compilers themselves need to guarantee, e.g. GCC has -frandom-seed [0].

[0] https://gcc.gnu.org/onlinedocs/gcc/Developer-Options.html#in...

letters90
·
3 months ago
·
[ - ]

the update is gold, original message: "They are reproduceable" updated message "lol actually not"

eqvinox
·
3 months ago
·
[ - ]

Not really, it's just someone with a higher goal of "freedom" (no binary firmware blobs) using it to push their agenda.

I'll happily agree higher degrees of "freedom" are an admirable goal, but this is just rudely shitting on a hard-earned achievement.

amelius
·
3 months ago
·
[ - ]

How does that work with timestamps?

hackburg
·
3 months ago
·
[ - ]

[dead]

curtisszmania
·
3 months ago
·
[ - ]

Pretty wild that we’re finally nailing reproducibility in Linux images after so many years—clearly a win for stability and consistency across the board.

selfhoster
·
3 months ago
·
[ - ]

[flagged]

perdomon
·
3 months ago
·
[ - ]

Can someone please ELI5? When I hear live images, I think of iOS videos that go along with pictures you take

stuporglue
·
3 months ago
·
[ - ]

A live image is an operating system image which you can boot from and use vs. an install disk which can only install, but there's no usable environment available).

A reproducable build means you can get the same source code and compile it, and it will be identical to the published image. This is important because otherwise you don't know if the published image actually used some other source code. If it used some other source code, the published image might have a backdoor, or something that you can't find by reading the source code.

perdomon
·
3 months ago
·
[ - ]

Is that the idea behind Tails OS? It runs from removable media and disappears when ejected?

abdullahkhalids
·
3 months ago
·
[ - ]

Yes. Though the disappearing doesn't happen when you eject the removable media. When you first boot from the removable media, the OS loads itself into the RAM. If you want to open additional programs, then those are loaded from the media into RAM and then executed. However, you can remove the media at any point after boot, and after that you only run the programs that are already loaded into RAM.

Also we have had live images of various OSes for many decades. I seem to recall that we used to load DOS from floppy disks.

mjg59
·
3 months ago
·
[ - ]

Live images are Linux distributions that can be run directly from removable media instead of having to be installed to local storage.

nottorp
·
3 months ago
·
[ - ]

And iOS live images are half second movies, not images.

Vaslo
·
3 months ago
·
[ - ]

This question should be at the top. I know HN tries to stay agnostic in their report of news but they definitely fall on the wrong side of feature vs benefit (as do most open sources authors ) and plenty of folks will just pass up this article completely ignorant of the benefit.

c0l0
·
3 months ago
·
[ - ]

I never really understood the hype around reproducible builds. It seems to mostly be a vehicle to enable tivoization[0] while keeping users sufficiently calm. With reproducible buiilds, a vendor can prove to users that they did build $binary from $someopensourceproject, and then digitally sign the result so that it - and only it - would load and execute on the vendor-provided and/or vendor-controlled platform. But that still kills effective software freedom as long as I, the user, cannot do the same thing with my own build (whether it is unmodified or not) of $someopensourceproject.

Therefore, I side with Tavis Ormandy on this debate: https://web.archive.org/web/20210616083816/https://blog.cmpx...

[0]: https://en.wikipedia.org/wiki/Tivoization

myrmidon
·
3 months ago
·
[ - ]

Lets turn this around. Why would you ever want non-reproducible builds?

Every bit of nondeterminism in your binaries, even if it's just memory layout alone, might alter the behavior, i.e. break things on some builds, which is just really not desirable.

Why would you ever want builds from the same source to have potentially different performance, different output size or otherwise different behavior?

IMO tivoization is completely unrelated, because the vendor most certainly does not need reproducible builds in order to lock down a platform.

RJIb8RBYxzAMX9u
·
3 months ago
·
[ - ]

> Lets turn this around. Why would you ever want non-reproducible builds?

It's not about wanting non-reproducible builds, but what am I sacrificing to achieve reproducible builds. Debian's reproducible build efforts have been going for ten years, and it's still not yet complete. Arguably Debian could have diverted ten years of engineering resources elsewhere. There's no end to the list of worthwhile projects to tackle, and clearly Debian believes that reproducible builds is high priority, but reasonable people can disagree on that.

This not to say reproducible builds are not worth doing, just that depending on your project / org lifecycle and available resources (plus a lot of subjective judgement), you may want to do something else first.

progval
·
3 months ago
·
[ - ]

Debian didn't "divert engineering resources" to this project. People, some of whom happen to be Debian developers, decided to work on it for their own reasons. If the Reproducible Builds effort didn't exist, it doesn't mean they would have spent more time working on other areas of Debian. Maybe even less, because the RB effort was an opportunity to find and fix other bugs.

RJIb8RBYxzAMX9u
·
3 months ago
·
[ - ]

Yes, the system is not closed and certainly people may simply not contribute to Debian at all. However, my main point is that reasonable people disagree on the relative importance of RR among other things, so it's not about "want[ing] non-reproducible builds" even if one has unlimited resources, but rather wanting RR, but not at the expense of X, where X differs from person to person.

robertlagrant
·
3 months ago
·
[ - ]

"It's possible to disagree on whether a feature is worth doing" is technically true, but why is it worth discussing time spent by volunteers on something already done? People do all sorts of things in their free time; what's the opportunity cost there?

ahlCVA
·
3 months ago
·
[ - ]

For me as a developer, reproducible builds are a boon during debugging because I can be sure that I have reproduced the build environment corresponding to an artifact (which is not trivial, particularly for more complex things like whole OS image builds which are common in the embedded world, for example) in the real world precisely when I need to troubleshoot something.

Then I can be sure that I only make the changes I intend to do when building upon this state (instead of, for example, "fixing" something by accident because the link order of something changed which changed the memory layout which hides a bug).

signa11
·
3 months ago
·
[ - ]

so what you are looking for is reproducible build environment ? things like docker have been around doing just that for a while now.

myrmidon
·
3 months ago
·
[ - ]

> things like docker have been around doing just that for a while now.

Thats just not enough. If you are hunting down tricky bugs, then even extremely minor things like memory layout of your application might alter the behavior completely-- some uninitialized read might give you "0" every time in one build, while crashing everything with unexected non-zero values in another; performance characteristics might change wildly and even trigger (or avoid) race conditions in builds from the exact same source thanks to cache interactions, etc.

There is a lot of developer preference in how an "ideal" processs/toolchain/build environment looks like, but reproducible builds (unlike a lot of things that come down to preference) are an objective, qualitative improvement-- in the exact same way that it is an improvement if every release of your software corresponds to one exact set of sourcecode.

nottorp
·
3 months ago
·
[ - ]

And he said embedded.

That means it crashes on some device that is on a pole in the middle of nowhere, or in a factory where you have to wear armor to go debug it on site.

Docker is cushy ... for servers and developer machines.

turboponyy
·
3 months ago
·
[ - ]

Docker can be used to create reproducible environments (container images), but can not be used to reproduce environments from source (running a Dockerfile will always produce a different output) - that is, the build definition and build artifact are not equivalent, which is not the case for tools like Nix.

ahlCVA
·
3 months ago
·
[ - ]

I see reproducible builds more as a contract between the originator of an artifact and yourself today (the two might be the same person at different points in time!) saying "if you follow this process, you'll get a bit-identical artifact to what I have gotten when I followed this process originally".

If that process involves Docker or Nix or whatever - that's fine. The point is that there is some robust way of transforming the source code to the artifact reproducibly. (The less moving parts are involved in this process though the better, just as a matter of practicality. Locking up the original build machine in a bank vault and having to use it to reproduce the binary is a bit inconvenient.)

The point here is that there is a way for me to get to a "known good" starting point and that I can be 100% confident that it is good. Having a bit-reproducible process is the no-further-doubts-possible way of achieving that.

Sure it is possible that I still get an artifact that is equivalent in all the ways that I care about if I run the build in the exact same Docker container even if the binaries don't match (because for example some build step embeds a timestamp somewhere). But at that point I'll have to start investigating if the cause of the difference is innocuous or if there are problems.

Equivalence can only happen in one way, but there's an infinite number of ways to get inequivalence.

pavon
·
3 months ago
·
[ - ]

Tavis makes some good arguments, but since that post I've seen a couple real-world situations where reproducible builds are valuable.

One is where the upstream software developer wants to build and sign their software so that users know it came from them, but distributors also want to be the ones to build and sign the software so they know what exactly it is they are distributing. The most public example is FDroid[1]. Reproducible builds allow both the software developer and the distributor to sign-off on a single binary, giving users addition assurance that neither are sneaking something in. This is similar to the last example that Tavis gave, but shows that it is a workable process that provides real security benefit to the user, not just a hypothetical stretch.

The second is license enforcement. Companies that distribute (A/L)GPL software are required to distribute the exact source code that the binary was created from, and ability to compile and replace the software with a modified version (for GPLv3). However, a lot of companies are lazy about this and publish source code that doesn't include all their changes. A reproducible build demonstrates that the source they provided is what was used to create the binary. Of course, the lazy ones aren't going to go out of their way to create reproducible builds, but the more reproducible the upstream code build system is the fewer extraneous differences downstream builds should have. And it allows greater confidence in the good guys who are following the license.

And like others have said, I don't see the Tivoization argument at all. TiVo didn't have reproducible builds, and they Tivo'd their software just fine. At worst a reproducible build might pacify some security minded folks that would otherwise object to Tivoization, but there will still be people who object to it out of the desire to modify the system.

[1] https://f-droid.org/docs/Reproducible_Builds/

__MatrixMan__
·
3 months ago
·
[ - ]

You can still slip malware into a reproducible build, but you have to do it in the open. If you do it via injecting a tampered-with artifact via some side channel which is specific to your target, they will end up with a hash that doesn't agree with the one that is trusted by rest of the community, and will have reason for suspicion.

That benefit goes away if the rest of the community all have hashes that don't agree with each other. Then the tampered-with one doesn't stand out.

rcxdude
·
3 months ago
·
[ - ]

It basically means that not everybody needs to build from source code if they want to verify that the binaries they're using haven't had malware injected during the build process. I.e. so long as enough people check that they can reproduce the build, and call out any case where it doesn't, everyone else can just use the binaries without building from source. This means auditing efforts can focus just on the source code, which is a lot more tractable (but still hard, and imperfect. But it means a potential attacker needs to work a lot harder, as opppsed to a compromise of the build servers basically giving them free reign without much risk of detection).

It doesn't really do anything at all for tivoisation, Tivo managed it just fine without reproducable builds.

layer8
·
3 months ago
·
[ - ]

There is merit to some of the security arguments. However, one thing reproducible builds enable is to reliably identify the source code version from which a particular build was produced. If a build artifact is found to have undesirable behavior (whether malicious or just a genuine bug or misdesign), reproducible builds allow to reliably trace that behavior back to the source code, and then to only modify the undesired behavior. If, on the other hand, you can’t identify the corresponding source code version with certainty, and therefore have to fix the behavior based on a possibly different version of the source code (or of the build environment), then you don’t know that it doesn’t additionally contain any new undesired behaviors.

klysm
·
3 months ago
·
[ - ]

One of the big advantages from my perspective is you can cache a lot more effectively throughout the build process when things are deterministic.

c0l0
·
3 months ago
·
[ - ]

To achieve that it is enough to hash inputs, and cache resulting outputs. Repeating a build from scratch with an emtpy cache would not necessarily have to yield the same hashes all they way down to the last artifact, but that's actually a simplification of the whole process, and not a bad thing per se.

klysm
·
3 months ago
·
[ - ]

Outputs are used as inputs later. If everything is deterministic, you can actually cache everything by hash

mschuster91
·
3 months ago
·
[ - ]

> To achieve that it is enough to hash inputs, and cache resulting outputs.

Thing is, inputs can be nondeterministic too - some programs (used to) embed the current git commit hash into the final binary so that a `./foo --version` gives a quick and easy way for bug triage to check if the user isn't using a version from years ago.

telotortium
·
3 months ago
·
[ - ]

Adding the Git hash is reproducible, assuming you build from a clean tree (which the build script can check). Embedding the current date and time is the canonical cause of non-reproducibility, but that can be worked around in most cases by embedding the commit and/or author date of the commit instead.

layer8
·
3 months ago
·
[ - ]

This is only a problem if those nondeterministic inputs are actually included in the hash. This is often not the case, because the values are included implicitly in the build rather than explicitly.

(Just playing devil’s advocate here.)

IshKebab
·
3 months ago
·
[ - ]

Tivoisation doesn't depend on reproducible builds at all. Vendors don't need to mathematically prove the exact origin of their binaries.

oulipo
·
3 months ago
·
[ - ]

Reproducible builds are important also for: - caching artefacts - ensuring there's no malware somewhere that's been added in the build process

AceJohnny2
·
3 months ago
·
[ - ]

> ensuring there's no malware somewhere that's been added in the build process

i.e. supply-chain safety

It doesn't entirely resolve Thompson's "Trusting Trust" problem, but it goes a long way.

0cf8612b2e1e
·
3 months ago
·
[ - ]

Is it possible for mortals to rebuild gcc from scratch? Can I start with some minimal, auditable compiler (tcc?) and build up to a modern gcc? Or would it be some byzantine path where I need to compile gcc v1998, then perl, then Python 1.8, enabling you to compile gcc v2005, which lets you build Python2.3, etc.

uecker
·
3 months ago
·
[ - ]

It is a byzantine path, also because gcc switched to C++ at some point (for no good reason IMHO). But there is a project that maintains such a bootstrap path: https://www.gnu.org/software/mes/

tetha
·
3 months ago
·
[ - ]

Mh. Though, if you have deterministic builds for GCC, imagine how much of a problem some nerd in Northern Washington or Scandinavia with their own strange C build chain would be to inject something strange into these compilers into the build process.

Like, you spend millions to get that one backdoor into the compiler. And then this guy is like "Uhm. Guys. I have this largely perl-based build process reproducing a modern GCC on a Pentium with 166 Mhz swapping RAM to disk because the motherboard can't hold that much memory. But the desk fan helps cooling. It takes about 2 months or 3 to build, but that's fine. I start it and then I work in the woods. It was identical to your releases about 5 times in the last 2 years (can't build more often), and now it isn't somewhere deep in the code sections. My arduino based floppy emulator is currently moving the binaries through the network"

Sure, it's a cyberpunk hero-fantasy, but deterministic builds would make these kind of shenanigans possible.

And at the end of the day, independent validation is one of the strongest ways to fight corruption.

XorNot
·
3 months ago
·
[ - ]

It is sort of like that. It's been documented: https://github.com/fosslinux/live-bootstrap/

(This is an alternative to the Guix/Scheme thing).

fsflover
·
3 months ago
·
[ - ]

https://news.ycombinator.com/item?id=41368835

·
3 months ago
·
[ - ]

mjevans
·
3 months ago
·
[ - ]

Auditors can take a copy of the source, reproducibly build it themselves, and thus prove that the binaries someone would like to run match the provided source code.

bobmcnamara
·
3 months ago
·
[ - ]

> This diagram demonstrates how to get a trusted binary without reproducible builds.

Ages ago our device firmware release processes caught the early stage of a malware infection because the hash of one of our intermediate code generators (win32 exe) changed between two adjacent releases without any commits that should've impacted that tool.

Turns out they had hooked something into windows to monitor for exe accesses and were accidentally patching out codegen.

Eventually you just top trusting anything and live in the woods I guess.

chupasaurus
·
3 months ago
·
[ - ]

> With reproducible buiilds, a vendor can prove to users that they did build $binary from $someopensourceproject, and then digitally sign the result so that it - and only it - would load and execute on the vendor-provided and/or vendor-controlled platform.

As long as Debian provides source packages and access to their repos - digital signature has nothing to do with Reproducible Builds, you actually don't need one for the same bytes.

inglor_cz
·
3 months ago
·
[ - ]

It is not that different from tamper-proofing medications. It proves that no one added poison to whatever you are consuming, after that thing left its "factory".