Thanks for sharing a fun read.

Bitkeeper was neat, and my overall take on it mirrors Larry McVoy's: I wish he had open sourced it, made his nut running something just like github but for Bitkeeper, and that it had survived.

I only had one interaction with him. In the early '00s, I had contributed a minor amount of code to TortoiseCVS. (Stuff like improving the installer and adding a way to call a tool that could provide a reasonable display for diffs of `.doc` and `.rtf` files.) I had a new, very niche, piece of hardware that I was excited about and wanted to add support for in the Linux kernel. Having read the terms of his license agreement for Bitkeeper, and intending to maintain my patches for TortoiseCVS, I sent him an email asking if it was OK for me to use Bitkeeper anyway. He told me that it did not look like I was in the business of version control software (I wasn't!) and said to go ahead, but let him know if that changed.

I use git all the time now, because thankfully, it's good enough that I shouldn't spend any of my "innovation tokens" in this domain. But I'd still rather have bitkeeper or mercurial or fossil. I just can't justify the hit that being different would impose on collaboration.

Like I tell lots of people, check out Jujutsu. It's a very Mercurial-inspired-but-better-than-it UI (the lead dev and I worked on Mercurial together for many years) with Git as one of the main supported backends. I've been using it full time for almost a year now.
I would love to use jujutsu, and it seems like a great model. I think it'd be a bad outcome if the world starts building top a piece of software with a single company owner and a CLA, though.

I hope that the CLA goes away one day.

Note that the CLA does not transfer copyright, so "single company owner" is not accurate from a copyright perspective.
It's accurate from the perspective of "there's a single company with the right to change the licensing arbitrarily".
No, it is not accurate. That is not what Google's CLA says. (Though there are other CLAs out there that are closer to what you describe)

(*Update:* Though IANAL, you should read the child comment and the CLA itself and make up your own mind. https://cla.developers.google.com/about/google-individual. The rest of my comment is mostly independent of the previous paragraph).

OTOH, IANAL, but AFAIK anyone can fork `jj` and sell a proprietary product based on jj (and distribute it under pretty much whatever license they like, with very few restrictions) because it is currently Apache licensed, but that is unrelated to the Google CLA.

Let me conjecture even more wildly about things I don't know. The following is a guess on my part.

One way to interpret this is that Google tends to publish their projects under Apache, and there is no need to demand that people transfer copyright to Google. By releasing your work under Apache, you are already giving Google (or anyone else) all the rights it needs.

AFAIK, the main purpose of the Google Individual CLA is to have you sign a statement claiming that you own the rights to your own work and didn't give up those rights to your employer.

> Grant of Copyright License. Subject to the terms and conditions of this Agreement, You hereby grant to Google and to recipients of software distributed by Google a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare derivative works of, publicly display, publicly perform, sublicense, and distribute Your Contributions and such derivative works.

That is a substantially more permissive license than Apache-2.0 (let alone other licenses, which apply to works to which Google also applies that CLA). That term means that Google can ignore the terms of Apache-2.0 and instead use the work under this much more permissive license, while everyone else is bound by Apache-2.0. In other words, they can do whatever they want with the code. Others could ship it in a proprietary product, sure, but they can't ignore the terms of the license while doing so.

"Permissive license" doesn't mean "do whatever you want". Apache-2.0, among other things, requires maintaining license notices,

(Note that the "and to recipients of" clause doesn't imply others can ignore the license terms, because they'd still be subject to the terms of whatever license Google puts on the software they distribute, whether that's Apache-2.0 or some proprietary license.)

So I maintain that "there's a single company with the right to change the licensing arbitrarily" is a largely accurate summary/gloss. Or, if you prefer, "there's a single company with the right to ignore the license terms".

This is a good point, you can indeed argue that "there's a single company with the right to ignore the license terms" is correct. Thank you for elaborating, I added a note to my comment.

I'm still not sure whether it really matters in light of the Apache license, but I don't feel qualified to argue about that.

I guess the straw-man I was arguing against was that some people think you transfer your copyright to Google (you don't), but that's different from what you claimed.

Thank you, I appreciate your followup and edit. Copyright assignment agreements are worse than CLAs, but I'm not claiming that the Google CLA includes a copyright assignment.

It matters less for something like the Apache license than it does for a copyleft license, but there are still reasons people use Apache rather than MIT or public domain, and it does include several protections people care about.

Re your edit:

> AFAIK, the main purpose of the Google Individual CLA is to have you sign a statement claiming that you own the rights to your own work and didn't give up those rights to your employer.

The Developer Certificate of Origin (DCO, what you're signing if you use a "Signed-off-by: Your Name <email@example.org>" line) serves this same purpose, isn't a CLA, and doesn't cause any of the same problems. Legal departments generally don't have concerns with developers signing a DCO, while many will rightfully prevent or restrict signing a CLA (even when they were otherwise fine with a developer contributing to Open Source in general).

I was a heavy user of BitKeeper.

To me, Git is almost exactly like a ground-up cleaner rewrite of BitKeeper. Gitk and git-gui are essentially clones of the BitKeeper GUI.

I don't understand why you'd want to keep using BitKeeper.

I think my memory is probably colored by BitKeeper being my first DVCS. I was never a heavy user of it.

I was exposed to BitKeeper when I was managing my team's CVS server. On my next team, we moved to svn, which always felt like cvs with better porcelain from a developer perspective, but when administering that server fell onto my plate, I liked it a lot better than CVS. And I thought BitKeeper would be nicer from a developer perspective.

Then on my next team, we used mercurial. I really, really, really liked mercurial, both as a developer and as a dev infrastructure administrator. It also sucked a lot less on Windows than git or BitKeeper.

The last time I had to decide for a new team, mercurial and git were the obvious options. I went with git because that was clearly what the world liked best, and because bringing new team members up to speed would require less from me that way.

All that goes to say... my direct comparison of git and bitkeeper came from when bitkeeper was mature and git decidedly was not. Then I lumped it in with mercurial (which I really would still prefer, right now) and fossil (ditto). You're probably exactly right about BK.

Conceptually git is more powerful. But I recall the bitkeeper CLI being far more sensible in its interface.
It had its own weird quirks, and sometimes revealed that it was a front for a single file with a lot of funnily-formatted lines. We're just separated from it in time, and you can only truly hate what is familiar.
  • pests
  • ·
  • 1 day ago
  • ·
  • [ - ]
He is quoted at the end of the article saying that was what they should have done, and they did have a public offering, but hindsight and yadda yadda.
  • nmz
  • ·
  • 5 days ago
  • ·
  • [ - ]
I wouldn't put fossil in that list of collaboration, since its not really a collaborative tool, or more like, there are barriers to that collaboration, like creating a username for each fossil repository. That's a huge barrier in my view. It would be nice if there was something like a general auth identity that can be used everywhere but that's still not implemented.

FWIW, mercurial seems to have an advantage over git, and that support for BIG repositories which seems to be provided by facebook of all people, so until facebook moves to git, mercurial lives on.

Facebook doesn’t really use vanilla mercurial but its own scale-oriented rust fork. It’s open sourced as “sapling”
You can have one repository and link all the others to it via "Login Groups"

https://www.fossil-scm.org/home/doc/trunk/www/caps/login-gro...

  • ·
  • 4 days ago
  • ·
  • [ - ]
It is open source, Apache license 2. Check at https://www.bitkeeper.org/
And in light of the history, https://github.com/bitkeeper-scm/bitkeeper is funny on several levels.
Exceptional read! I love it.

It's the most complete history of git that I know now. Exceptional!

I'd love to read more historical articles like this one, of pieces of software that have helped shape our world.

  • deskr
  • ·
  • 3 days ago
  • ·
  • [ - ]
> It's the most complete history of git that I know now.

I wasn't going to read the story until I read your comment. I knew the summary of BitKeeper and the fallout, but wow this was so detailed. Thanks!

If you like computer/software history, I recommend the Abort Retry Fail[1] mailing list.

[1] https://www.abortretry.fail/

(I meant 'newsletter' , not 'mailing list')
+! to that. Great read. The field is young and accelerating. History is quite compressed. It's valuable to have articles like this.
the dream machine was a good one, though a bit more historical. http://folklore.org has a bunch of good Apple stories.
Ditto. This was a really nice read!
Thanks Andrew Tridgell for not letting the kernel get stuck with a proprietary source control. An example how sticking to your principles can make the world better in the long run even if it annoys people at first.
  • samus
  • ·
  • 3 days ago
  • ·
  • [ - ]
The Kernel was not "stuck"; Linus is ultimately a practical man and was fine using it for integration work. The question whether or not switching to an Open Source solution would have eventually been raised again, but at the time it did what it was supposed to do.
  • cxr
  • ·
  • 5 days ago
  • ·
  • [ - ]
There's a screenshot purporting to be of GitHub from May 2008. There are tell-tale signs, though, that some or all of the CSS has failed to load, and that that's not really what the site would have looked like if you visited it at the time. Indeed, if you check github.com in the Wayback Machine, you can see that its earliest crawl was May 2008, and it failed to capture the external style sheet, which results in a 404 when you try to load that copy today. Probably best to just not include a screenshot when that happens.

(Although it's especially silly in this case, since accessing that copy[1] in the Wayback Machine reveals that the GitHub website included screenshots of itself that look nothing like the screenshot in this article.)

1. <https://web.archive.org/web/20080514210148/http://github.com...>

Author here. That's a good catch, thanks! I've replaced it with a newer screenshot from August 2008.
  • cxr
  • ·
  • 4 days ago
  • ·
  • [ - ]
Larry wants to call you and discuss two corrections to this piece ("one minor, one major"). I've already passed on your email address for good measure, but you should reach out to him.
I've emailed him to follow up. Thanks for letting me know!
  • ob
  • ·
  • 2 days ago
  • ·
  • [ - ]
Thanks for writing this. This story is rarely told correctly and you mostly got it as I remember it.
Thanks - I was struggling to believe GitHub would have launched with something as bad looking - 2008 was not CERN era looking webpages!
> My biggest regret is not money, it is that Git is such an awful excuse for an SCM. It drives me nuts that the model is a tarball server. Even Linus has admitted to me that it’s a crappy design. It does what he wants, but what he wants is not what the world should want.

Why is this crappy? What would be better?

Edit: @luckydude Thank you for generously responding to the nudge, especially nearly instantly, wow :)

My issues with Git

- No rename support, it guesses

- no weave. Without going into a lot of detail, suppose someone adds N bytes on a branch and then that branch is merged. The N bytes are copied into the merge node (yeah, I know, git looks for that and dedups it but that is a slow bandaid on the problem).

- annotations are wrong, if I added the N bytes on the branch and you merged it, it will (unless this is somehow fixed now) show you as the author of the N bytes in the merge node.

- only one graph for the whole repository. This causes multiple problems: A) the GCA is the repository GCA, it can be miles away from the file GCA if there was a graph per file like BitKeeper has. B) Debugging is upside down, you start at the changeset and drill down. In BitKeeper, because there is a graph per file, let's say I had an assert() pop. You run bk revtool on that file, find the assert and look around to see what has changed before that assert. Hover over a line, it will show you the commit comments to the file and then the changeset. You find the likely line, double click on it, now you are looking at the changeset. We were a tiny company, we never hit the claimed 25 people, and we supported tons of users. This form of debugging was a huge, HUGE, part of why we could support so many people. C) commit comments are per changeset, not per file. We had a graphic check in tool that walked you through the list of files, showed you the diffs for that file and asked you to comment. When you got the the ChangeSet file, now it is asking you for what Git asks for comments but the diffs are all the file names followed by what you just wrote. It made people sort of uplevel their commit comments. We had big customers that insisted the engineers use that tool rather a command line that checked in everything with the same comment.

- submodules turned Git into CVS. Maybe that's been redone but the last time I looked at it, you couldn't do sideways pulls if you had submodules. BK got this MUCH closer to correct, the repository produced identical results to a mono repository if all the modules were present (and identical less whatever isn't populated in the sparse case). All with exactly the same semantics, same functionality mono or many repos.

- Performance. Git gets really slow in large repositories, we put a ton of work into that in BitKeeper and we were orders of magnitude faster for things like annotate.

In summary, Git isn't really a version control system and Linus has admitted it to me years ago. A version control system needs to faithfully record everything that happened, no more or less. Git doesn't record renames, it passes content across branches by value, not by reference. To me, it feels like a giant step backwards.

Here's another thing. We made a bk fast-export and a bk fast-import that are compatible with Git. You can have a tree in BK, have it updated constantly, and no matter where in the history you run bk fast-export, you will get the same repository. Our fast-export is idempotent. Git can't do that, it doesn't send the rename info because it doesn't record that. That means we have to make it up when doing a bk fast-import which means Git -> BK is not idempotent.

I don't expect to convince anyone of anything at this point, someone nudged, I tried. I don't read hackernews any more so don't expect me to defend what I said, I really don't care at this point. I'm happier away from tech, I just go fish on the ocean and don't think about this stuff.

> No rename support, it guesses

Git doesn't track changes yes, it tracks states. It has tools to compare those states but doesn't mean that it needs to track additional data to help those tools.

I'm unconvinced that tracking renames is really helpful as that is only the simplest case of of many possible state modifications. What if you split a file A into files B and C? You'd need to be able to track that too. Same for merging one file into another. And many many many more possible modifications. It makes sense to instead focus on the states and then improve the tools to compare them.

Tracking all kinds of changes also requires all development tools to be aware of your version control. You can no longer use standard tools to do mass renames and instead somehow build them on top of your vcs so it can track the operations. That's a huge tradeoff that tracking repository states doesn't have.

> submodules

I agree, neither submodules nor subtrees are ideal solutions.

  • samus
  • ·
  • 3 days ago
  • ·
  • [ - ]
> What if you split a file A into files B and C? You'd need to be able to track that too. Same for merging one file into another. And many many many more possible modifications.

I suppose Bitkeeper can meaningfully deal with that since their data model drills down into the file contents.

  • gwd
  • ·
  • 3 days ago
  • ·
  • [ - ]
> You run bk revtool on that file, find the assert and look around to see what has changed before that assert. Hover over a line, it will show you the commit comments to the file and then the changeset. You find the likely line, double click on it, now you are looking at the changeset.

I still have fond memories of the bk revool. I haven't found anything since that's been as intuitive and useful.

That's exceptionally detailed answer. One thing I remember is how microsoft windows [0] had so much trouble while migrating to git

0. https://arstechnica.com/information-technology/2017/05/90-of...

I hadn't heard of the per-file graph concept, and I can see how that would be really useful. But I have to agree that going for a fish sounds marvellous.
I fished today, 3 halibut. Fish tacos for the win! If you cook halibut, be warned that you must take it off at 125 degrees, let it get above that and it turns to shoe leather.
What's a GCA?
  • js2
  • ·
  • 3 days ago
  • ·
  • [ - ]
Greatest common ancestor (merge base in git terminology):

https://www.bitkeeper.org/man/gca.html

https://git-scm.com/docs/git-merge-base

As someone who has lived in Git for the past decade, I also fail to see why Git is a crappy design. It's easy to distribute, works well, and there's nothing wrong with a tarball server.
Exactly. While the article is good about events history, it doesn't go deep enough into the feature evolution (which is tightly connected to and reflects the evolution of the software development). Which is :

TeamWare - somewhat easy branching (by copying whole workspace from the parent and the bringover/putback of the changes, good merge tool), the history is local, partial commits.

BitKeeper added distributed mode, changesets.

Git added very easy branching, stash, etc.

Any other currently available source control usually is missing at least one of those features. Very illustrative is the case of Mercurial which emerged at about the same time responding to the same need for the modern source control at the time, yet was missing partial commits for example and had much cumbersome branching (like no local history or something like this - i looked at it last more than a decade ago) - that really allowed it to be used only in very strict/stuffy settings, for everybody else it was a non starter.

  • nmz
  • ·
  • 3 days ago
  • ·
  • [ - ]
Git is terrible at branching, constantly squashing and rebasing is not a feature but an annoyance. see fossil for how to do proper branching/merging/logging, by its very nature, Not to mention that by having the repository separate from the data, it forces you to organize it in a nice way (Mine look like Project/(repo.fossil, branch1/ branch2/ branch3/) You can achieve this with git now but I never had to think about it in fossil, its a natural consequence of the design.
>constantly squashing and rebasing is not a feature but an annoyance

it is a feature which allows, for example, to work simultaneously on several releases, patches, hot fixes, etc. Once better alternative emerges we'll jump the git ship as we did before when we jumped onto the git ship.

>the repository separate from the data

that was a feature of a bunch of source controls and a reason among others why they lost to git.

>it forces you to

that is another reason why source controls lose to git as git isn't forcing some narrow way of doing things upon you.

I don't deny of course that for some people/teams/projects other source controls work better as you comment illustrates. I'm just saying why git won and keeps the majority of situations.

> Once better alternative emerges we'll jump the git ship as we did before when we jumped onto the git ship.

It's not that easy at this point in time. git carries a lot of momentum, especially in combination with GitHub.

Anybody learning about software development learns about git and GitHub.

Software is expected to be in GitHub.

At the time git became successful there were arguably better systems like mercurial and now we got fossil, but git's shortcomings are too little of a pain point compared to universal knowledge about it and integration into every tool (any editor, any CI system, any package manager, ...) and process.

>It's not that easy at this point in time. git carries a lot of momentum, especially in combination with GitHub.

CVS back then was like this too, including public repos, etc.

>At the time git became successful there were arguably better systems like mercurial

I specifically mentioned Mercurial above because they both emerged pretty simultaneously responding to the same challenges, and Mercurial happened to be just inferior due to its design choices. Companies were jumping onto it too, for example our management back then chose it, and it was a predictable huge pain in the neck, and some years down the road it was replaced with git.

> CVS back then was like this too, including public repos, etc.

Not really.

CVS had too many flaws (no atomicity, no proper branching, no good offline work, etc.) Subversion as "natural successor" fixed some things and was eating some parts of CVS.

At the same time sourceforge, the GitHub of that time, started to alienate their users.

And then enterprises used different tools to way larger degree (VSS, sccs, Bk, perforce, whatever) while that market basically doesn't exist anymore these days and git is ubiquitous.

And many people went way longer without any version control than today. Today kids learn git fundamentals very early, even on Windows and make it a habit. Where's in the early 2000s I saw many "professional" developers where the only versioning was the ".bak" or ".old" file or copies of the source directory.

People started paying me to develop software in 1986. First time I ever used version control software was 1996. It was TERRIBLE. Two years later I left to start my own software company, but my experience with it to that point was so bad I went without version control the first few years. Around 2002 I started using CVS (or RCS? long time ago!) and quickly switched to Subversion. After learning git to work on Raku circa 2009, I switched my main $WORK repo to git in maybe 2012. Every repo I've created since then has been in git, but I still haven't moved all my svn repos over to git.
> (VSS, sccs, Bk, perforce, whatever) while that market basically doesn't exist anymore these days and git is ubiquitous.

Perforce still has a solid following in the gamedev space - even with LFS, git's handling of binaries is only mildly less than atrocious.

Yeah but market share shrunk a lot (especially since the market grew massively) and even Perforce is a tool integrating with git these days.
  • nmz
  • ·
  • 3 days ago
  • ·
  • [ - ]
> it is a feature which allows, for example, to work simultaneously on several releases, patches, hot fixes, etc. Once better alternative emerges we'll jump the git ship as we did before when we jumped onto the git ship.

What are you talking about here? I'm not talking about eliminating branching, but the fact that merging a branch is usually just a fake single commit that hides away the complexity and decisions of the branch. see [0] into how you can leverage branches and the log for a sane commit history.

> that was a feature of a bunch of source controls and a reason among others why they lost to git.

Given the article, git won because it was foss, torvalds and speed, if you have proof of a good amount of people saying "I hate the division of data and repository!" then its a believable claim, or maybe you're confusing the data/repo division with cvs? git also didn't have to fight much, the only contender was hg

[0]: https://fossil-scm.org/home/timeline

> Tridge did the following.

> “Here’s a BitKeeper address, bk://thunk.org:5000. Let’s try connecting with telnet.”

Famously, Tridge gave a talk about this, and got the audience of the talk to recreate the "reverse engineering". See https://lwn.net/Articles/133016/ for a source.

> I attended Tridge's talk today. The best part of the demonstration was that he asked the audience for each command he should type in. And the audience instantly called out each command in unison, ("telnet", "help", "echo clone | nc").

I was there for that talk, good times. Lot's of great linux.conf.au talks from Tridge over the years.
Same. I definitely remember the "help" line from it too.
This is completely untrue. There is no way that you could make a BK clone by telneting to a BK and running commands. Those commands don't tell you the network protocol, they show you the results of that protocol but show zero insight into the protocol.

Tridge neglected to tell people that he was snooping the network while Linus was running BK commands when Linus was visiting in his house. THAT is how he did the clone.

The fact that you all believe Tridge is disappointing, you should be better than that.

The fact that Tridge lied is disappointing but I've learned that open source people are willing to ignore morals if it gets them what they want. I love open source, don't love the ethics. It's not just Tridge.

> There is no way that you could make a BK clone by telneting to a BK and running commands. Those commands don't tell you the network protocol

The network protocol, according to multiple sources and the presented talk at LCA, was "send text to the port that's visible in the URL, get text back". The data received was SCCS, which was an understood format with existing tools. And the tool Tridge wrote, sourcepuller, didn't clone all of BitKeeper, it cloned enough to fetch sources, which meant "connect, send command, get back SCCS".

Anything more than that is hearsay that's entirely inconsistent with the demonstrated evidence. Do you have any references supporting either that the protocol was more complicated than he demonstrated on stage at LCA, or that Tridge committed the network surveillance you're claiming?

And to be clear, beyond that, there's absolutely nothing immoral with more extensively reverse-engineering a proprietary tool to write a compatible Open Source equivalent. (If, as you claim, he also logged a friend's network traffic without their express knowledge and consent, that is problematic, but again, the necessity of doing that seems completely inconsistent with the evidence from many sources. If that did happen, I would be mildly disappointed in that alone, but would still appreciate the net resulting contribution to the world.)

I appreciate that you were incensed by Tridge's work at the time, and may well still be now, but that doesn't make it wrong. Those of us who don't use proprietary software appreciate the net increase in available capabilities, just like we appreciate the ability to interoperate with SMB using Samba no matter how inconvenient that was for Microsoft.

> Have you tried it?

the one you're replying to, @luckydude, is Larry McVoy, who created BitKeeper.

Fascinating, I was unaware of that link (and don't systematically check people's HN profiles before replying). Thank you for the reference; I've edited my comment to take that into account.
I worked on bk

> The data received was SCCS, which was an understood format with existing tools.

You'd be surprised. SCCS is not broadly understood. And BK is not exactly SCCS.

I read the SourcePuller code when it was published (sp-01). It's pretty easy reading. I give Tridge credit for that. I wrote a little test, got it to checkout the wrong data with no errors reported. Issue was still there in sp-02 .

Rick saying "I worked on BK" is the understatement of the century. He showed up and looked at my code, I had done things in a way that you could have walked the weave and extract any number of versions at the same time. He was really impressed with that. I split apart stuff that Rick had not seen before.

Then he proceeded to fix my code over and over again. I had a basic understanding of SCCS but Rick understood the details.

Rick knows more about SCM than any guy I know.

And he is right, SCCS is not well understood and BK even less so.

  • ob
  • ·
  • 2 days ago
  • ·
  • [ - ]
> Do you have any references supporting either that the protocol was more complicated than he demonstrated on stage

BitKeeper itself is open source now and (an old version) of the protocol is documented at https://github.com/bitkeeper-scm/bitkeeper/blob/master/doc/p....

Come on, man, you should be better than this. With so many years of hindsight surely you realize by now that reverse engineering is not some moral failing? How much intellectual and cultural wealth is attributable to it? And with Google v. Oracle we've finally settled even in the eyes of the law that the externally visible APIs and behavior of an implementation are not considered intellectual property.

Tridge reverse engineering bk and kicking off a series of events that led to git is probably one of the most positively impactful things anyone has done for the software industry, ever. He does not deserve the flack he got for it, either then or today. I'm grateful to him, as we all should be. I know that it stings for you, but I hope that with all of this hindsight you're someday able to integrate the experience and move on with a positive view of this history -- because even though it didn't play out the way you would have liked, your own impact on this story is ultimately very positive and meaningful and you should take pride in it without demeaning others.

I don't like cheaters. If Tridge had done what he said he did, go him, I'm all for people being smart and figuring stuff out. But that is not what he did and it disgusts me that he pretends it is.

There is absolutely zero chance he figured out the pull protocol via telnet. I will happily pay $10,000 to anyone could do that with zero access to BK. Can't be done. If I'm wrong, I'll pay up. But I'll have a lot of questions that can't be answered.

So he cheated, he got Linus to run BK commands at his house and he snooped the network. He had no legal access to those bytes. Without those snoops, no chance he reverse engineered it.

As I have seen over and over, when the open source people want something, they will twist themselves in knots to justify getting it, legality be damned.

How about you be better than this and admit that open source is not without its skeletons?

>So he cheated, he got Linus to run BK commands at his house and he snooped the network. He had no legal access to those bytes. Without those snoops, no chance he reverse engineered it.

Snooping the network is a common and entirely legal means of reverse engineering.

>There is absolutely zero chance he figured out the pull protocol via telnet. I will happily pay $10,000 to anyone could do that with zero access to BK. Can't be done. If I'm wrong, I'll pay up. But I'll have a lot of questions that can't be answered.

I just tried this myself. Here's the telnet session:

https://paste.sr.ht/~sircmpwn/0b3f1f1d77896a96b0777471785cdc...

I confess that I had to look up the name of the BK_REMOTE_PROTOCOL environment variable after a few false starts to put the pieces together, but it would be relatively easy to guess.

I also looked over Tridge's original sourcepuller code and didn't really see anything that you couldn't infer from this telnet session about how bk works.

So, do I just send you my bank account number or?

If anything here was immoral it was locking other people's data in a proprietary tool and then denying them the ability to export it to open formats.
[dead]
  • mpe
  • ·
  • 3 days ago
  • ·
  • [ - ]
This post is BS. You should delete it.
Interesting, this story quote (without attribution!) this comment by Larry McVoy himself on HN

https://news.ycombinator.com/item?id=11671777

That part really should just be a straight quotation; the very light rewording of the original post feels in poor form.
He is now here as well :), though rather annoyed I would say. I guess business success doesn't bring happiness.

Edit:

There was another thread from him linked as well https://news.ycombinator.com/item?id=26205688

The entire comment section on that post is a goldmine, thanks!
> In a 2022 survey by Stack Overflow, Git had a market share of 94%, ...

> Never in history has a version control system dominated the market like Git. What will be the next to replace Git? Many say it might be related to AI, but no one can say for sure.

I doubt it's getting replaced. It's not just that it's got so much of the market, but also that the market is so much larger than back in the days of CVS.

It's hard to imagine everyone switching from Git. Switching from GitHub, feasible. From Git? That's much harder.

Git shortcomings are well known by this point, so "all" a successor project has to do is solve those problems. Git scales to Linux kernel sized projects, but it turns out there are bigger, even more complex projects out there, so it doesn't scale to Google-sized organizations. You would want to support centralized and decentralized operation, but be aware of both, so it would support multiple remotes, while making it easier to keep them straight. Is the copy on Github up to date with gitlab, the CI system, and my laptop and my desktop? It would have to handle binaries well, and natively, so I can check-in my 100 MiB jpeg and not stuff things up. You'd want to use it both as a monorepo and as multirepos, by allowing you to checkout just a subtree of the monorepo. Locally, the workflow would need to both support git's complexity, while also being easier to use than git.

Anyway, those are the four things you'd have to hit in order to replace git, as I see them.

If you had such a system, getting people off git wouldn't be the issue - offer git compatibility and if they don't want to use the advanced features, they can just keep using their existing workflow with git. The problem with that though, is that then why use your new system.

Which gets to the point of, how do you make this exist as a global worldwide product? FAANG-sized companies have their own internal tools team to manage source code. Anywhere smaller doesn't have the budget to create such a thing from scratch but

You can't go off and make this product and then sell it to someone because how many companies are gonna go with an unproven new workflow tool that their engineers want? What's the TAM of companies for whom "git's not good enough", and have large enough pocketbooks?

  • Borg3
  • ·
  • 5 days ago
  • ·
  • [ - ]
You are right. GIT is not DVFS, its DVCS. It was made to track source code, not binary data. If you are putting binary to DVCS, you are doing something wrong.

But, there are industries that need it, like game industry. So they should use tool that allow that. I heard that Plastic-SCM is pretty decent at it. Never used it so cant tell personally.

Replacing GIT is such a stupid idea. There is no ONE tool to handle all cases. Just use right one for your workflows. I, for example, have a need to version binary files. I know GIT handles them badly, but I really like the tool. Solution? I wrote my own simple DVFS tool for that usecase: dot.exe (138KB)

Its very simple DVFS for personal use, peer to peer syncing (local, TCP, SSH). Data and Metadata are SHA-1 checksumed. Its pretty speedy for my needs :) After weeks of use use I liked it so much, I added pack storage to handle text files and moved all my notes from SVN to DOT :)

  • nmz
  • ·
  • 3 days ago
  • ·
  • [ - ]
DVCS stands for distributed version control system, it has nothing to do with source code?

Maybe you're confusing it with SCM which are source control managers, that's the only ones that handle strict source only, but scm can mean other things.

  • Borg3
  • ·
  • 3 days ago
  • ·
  • [ - ]
Hard to say.. For me DVCS is more advanced version of DVFS. DVCS can do branching and merging, provides more metadata for revisions etc.. DVFS just do pretty much one thing, store binary blobs. And because binary blobs cannot be easly merged, I would not use it for storage here. But I guess, its just me :)
Isn't there git-annex [0] if you want to store large binary files?

[0] https://git-annex.branchable.com/

  • Borg3
  • ·
  • 3 days ago
  • ·
  • [ - ]
Yeah I know about git-annex. It might be good solution for big data. In my case, I do NOT want to decouple storage from metadata. I want single repo for single project that is self-contained. Easier to manage, its truly distributed. No need to bother w/ backups because every replica have everything allready. Its good model for several GBs of data.
  • ozim
  • ·
  • 3 days ago
  • ·
  • [ - ]
Second that way of thinking, for me GIT is as good as it gets for versioning text files.

Not handling binary files is not a downside for me because GIT should not be a tool to handle binary files versioning and we should use something else for that.

What do you use when you have a tiny .png or .jpg that needs to live alongside your source code now?
  • ozim
  • ·
  • 3 days ago
  • ·
  • [ - ]
I can put a binary file in GIT repo especially small ones and ones that don't change - things that people want are "handling binary files well", whatever that means, but putting big binaries in GIT or a lot of binary files or versioning them is not the use case for GIT.
I think what they mean by "handling binary files well" is being able to selectively not clone/pull them. Otherwise, binary files are handle as any other files, except you are not shown a diff of them by default.
  • Borg3
  • ·
  • 2 days ago
  • ·
  • [ - ]
You are wrong.. Just start putting files of GBs in size and you will see what happens.
So I just added a 2.4GB iso and other than taking longer nothing happened. Push and clone also seem to work.
  • Borg3
  • ·
  • 1 day ago
  • ·
  • [ - ]
Right :) I forgot that everyone is in 64bit systems these days. This change a lot. Anyway, GIT uses single blob for single file content. This is all right, if we speak about source code and text files, as those are not large. Thing changes for arbitrary binary files.
AFAIK, not much of anything changes (perhaps name some of these changes?).

And what is wrong with a single blob? This seems work great for deduplication, if you were using git to manage a photo library and happen to add something twice.

  • Borg3
  • ·
  • 1 day ago
  • ·
  • [ - ]
There is several issues depending where you run git on. For example mmap() can be problematic on large files in 32bit OS. On Cygwin it gets even worse.

As for deduplication, I do not think so. If you have single blob, lets say 1GB and you change just 1B, whole blob changes and no more dedup. If you use basic method of static block size, lets say 512KB, this will work much better. Futher, there are more advanced techniques to handle dedups like roling checksums to carve out even smaller sub-block.

But you're always arguing the odd case: - 32 bit systems - cygwin git on Windows instead of native - expecting deduplication on different files (???)
  • Borg3
  • ·
  • 1 day ago
  • ·
  • [ - ]
Uh, hold on. Are we trying to discuss objectivily? If not, and you just insist like GIT is best for everything fanboyism, and use it everywhere.. I am not interested.

I pointed out that GIT have some rough cases and its good to be aware of them. I said it already and will say it again: choose right tool for the task.

If your workflow with GIT works, use it. No need to discourage other solutions. Yes, I use ancient platforms and I want DVFS on them too.

Well, you said something will happen with git and big files, but it seems not much will happen for the majority of users.
Just put it in your git repo.
You say this, but Git has made great strides in scaling to huge repositories in recent years. You can currently do the "checkout just a subtree of the monorepo" just fine, and you can use shallow clones to approximate a centralized system (and most importantly to use less local storage).

> If you had such a system, getting people off git wouldn't be the issue - offer git compatibility and [...]

Git is already doing exactly that.

> "checkout just a subtree of the monorepo"

How do I check out, eg https://github.com/neovim/neovim/tree/master/scripts into a directory and work with it as if it was a repo unto itself?

You can't (since commits are snapshots of the repo root). You can have this approximation however:

    git clone --filter=blob:none --sparse https://github.com/neovim/neovim
    cd neovim
    git sparse-checkout add scripts
Unfortunately, GitHub does not support --filter=sparse:oid=master:scripts, so blobs will be fetched on demand as you use the repo.
Git itself isn’t though, not in any real way that matters. Having to know all the sub trees to clone in a mono repo is a usability nonstarter. You need a pseudo filesystem that knows how to pull files on access. And one ideally integrated with the build system to offset the cost of doing remote operations on demand and improve parallelism. Facebook is open sourcing a lot of their work but it’s based on mercurial. Microsoft is bought into git but afaik hasn’t open sourced their supporting git tooling that makes this feasible.

TLDR: the problem is more complex and pretending like “you can checkout a subtree” solves the problem is missing the proverbial forest for the (sub)tree

Microsoft's vfs for git is open source. So is scalar. These are the two main approaches used at Microsoft for large repos. Unfortunately the technically superior vfs approach was a nonstarter on macOS.
  • ·
  • 3 days ago
  • ·
  • [ - ]
It does feel like asking "What will replace ASCII?" Extensions, sure, but 0x41 is going to mean 'A' in 5050 AD.
Author here. I don’t think ASCII is the right comparison. True, it would be really hard for anything to compete with Git because a lot of infrastructures we have are already deeply integrated with Git. But think about x86 vs. ARM and how AI might change our ways of producing code.
UTF-8
That validates gp's point though: UTF-8 doesn't replace ASCII, it extends it. All valid ASCII text remains valid UTF-8 while retaining the same meaning. With the momentum behind git it will be hard for something incompatible replace it, but an extended git could catch on.
I really doubt that would happen. Got fails when it reaches Google scale repos. But most of the world isn't using such large repos anyway.

A replacement would be niche, only for the huge orgs, which is usually made by them anyway. For everyone else, git is good enough.

It’s been awhile since i actually finished reading an article this long. Very well written!

I tried to find out who the author is or how come he/she knows so much. No luck. Anyone else knows or OP care to chip in?

Great read!

I’m sure I’m not the first to point out that Junio (the appointed git “shepherd”) works at Google where mercurial is the “recommend local vcs” internally instead of git.

Large parts of Google rely on Git, most notably Chrome and Android.

Also, it is a good thing if Junio can do his job independently of Google's immediate needs.

FYI Mercurial's developer is now known as Olivia Mackall; sadly the Google infobox has failed to pick up the updated information.
Updated, thanks.
This really took me back. Back then before Git was a big thing (2010/2011-ish) I had the misfortune to work at a very large user of IBM Rational ClearCase and it was so awful. However it was so bad and so expensive that I managed to get tasked to "fix it". As part as figuring out how to do this I travelled to GitTogether 2011 from Sweden. Lots of Git folks from those days where there, at least I remember Junio, Peff and Shawn Pearce being there. I was so energised from it all I went back and formed a small team that migrated a colossal code base (oh the horror stories I have) over to Git over the next 2 years. The most rewarding thing I did early in my career.

So thank to all of you that made this possible by creating Git, Gerrit and all the life saving tools this industry was missing! The passing of Shawn Pearce was really sad, but he won't be forgotten!

This story is missing the impact that Tom Lord's TLA had on the git design.
> Additionally, Petr set up the first project homepage for Git, git.or.cz, and a code hosting service, repo.or.cz. These websites were the “official” Git sites until GitHub took over.

Is this true? I thought GitHub had no official affiliation with the git project

I think some github employees have written code that went into git, but it's not an official affiliation.

The quotes on "official" imply non-official to me. i.e. official seeming to people who don't know any better.

The git repo is on kernel.org nowadays with mirrors on repo.or.cz and GitHub.

But I think they mean here what the official git project ‘site’ is with docs and so on. And that is now https://git-scm.com/ and indeed as the article describes that was initially set up by GitHub people, to promote git

That's why "official" in in quotes. As in: "de-facto standard".
  • cxr
  • ·
  • 5 days ago
  • ·
  • [ - ]
Not really. git-scm.org is the de facto "official" site for the Git project in about the same way that French is the de facto "official" language of France.

They meant exactly what they wrote: GitHub took over hosting duties for the official Git site (because they did).

> A heavily sedated sloth with no legs is probably faster

I'm going to borrow this phrase from now on to everything slow.

Fun read.

The licensing of bitkeeper was a real thing. Although I don't follow the kernel mailing list at all nowadays, I remember Alan Cox calling it out as buttkeeper. Good Times.

  • dudus
  • ·
  • 5 days ago
  • ·
  • [ - ]
I never heard the term porcelain before, but I liked this tidbit.

"In software development terminology, comparing low-level infrastructure to plumbing is hard to trace, but the use of “porcelain” to describe high-level packaging originated in the Git mailing list. To this day, Git uses the terms “plumbing” and “porcelain” to refer to low-level and high-level commands, respectively. "

Also, unrelated, the "Ruby people, strange people" video gave me a good chuckle.

https://www.youtube.com/watch?v=0m4hlWx7oRk&t=1080s

I've heard the story before but this was still fun to read. I didn't realise quite how rudimentary the first versions of git were. It really makes you wonder: was git the last opportunity to establish a ubiquitous version control system? Will there ever be another opportunity? Regardless of git's technical merits, one thing I'm extremely happy about is that it's free software. It seemed to come just before an avalanche of free software and really changed the way things are done (hopefully for good).
Two of the key features that were part of of early git that show much git was about support Linux kernel development:

https://git-scm.com/docs/git-am

https://git-scm.com/docs/git-send-email

Git was built around supporting the linux kernel email lists. And while there are a number of other options out there that sprang up around the same time, many of them didn't fill the core need for git at that time - to reduce the stress / workload on Linus.

  • ozim
  • ·
  • 3 days ago
  • ·
  • [ - ]
It created the avalanche. I don’t think scale of free software we have now would be possible without git and GitHub.
  • ·
  • 5 days ago
  • ·
  • [ - ]
So we’re stuck with git because of rails, how amazingly poetic.
Very well written & nice article. I already knows a bit of the story. Just a plus: even Windows source code is on git! Which is pretty cool if you think about it.

Open source wins

Requesting permission from your source control tool vendor to be able to continue your work is nonsense.

It's alive today! Sr.ht has categories of work you can't host too. Still marinating.

I have no experience with c and i wonder: why Linus decided that implementing merging should go with scripting language and not in c?
re: licensing

> You couldn’t use BitKeeper for version control if you were working on version control software.

> You had to get BitMover’s permission if you wanted to run BitKeeper alongside other similar software.

That just strains credulity.

  • rasz
  • ·
  • 22 hours ago
  • ·
  • [ - ]
This actually sounds like KryoFlux (floppy imaging hardware/software) EULA https://www.reddit.com/r/vintagecomputing/comments/buyj9f/ti...

Even outcome was the same :) flurry of more reasonable competitors with fully open source GreaseWeazle offering the whole nine yards.

  • rob74
  • ·
  • 3 days ago
  • ·
  • [ - ]
> The bk clone/pull/push commands functioned similarly to git clone/pull/push.

That sounds a bit backwards: actually Git works similar to BitKeeper (can't say to what extent, as I'm not familiar with bk), not the other way around.

  • ajkjk
  • ·
  • 3 days ago
  • ·
  • [ - ]
Dang this is such a good read.
  • hgo
  • ·
  • 3 days ago
  • ·
  • [ - ]
This is why I come to HN. Thank you to the author.
  • ·
  • 3 days ago
  • ·
  • [ - ]
  • ·
  • 3 days ago
  • ·
  • [ - ]
> In January 2006, the X Window team switched from CVS to Git, which wowed Junio. He didn’t expect such a big project like X Window to go through the trouble of changing version control systems.

It's the "X Window System" or just "X".

  • ·
  • 4 days ago
  • ·
  • [ - ]