For one of these, RocksDB, one reference point is how CockroachDB was built on top of them for many years and many successful Jepsen tests (until they transitioned to an in-house solution).
https://www.cockroachlabs.com/blog/cockroachdb-on-rocksd/
https://www.cockroachlabs.com/blog/pebble-rocksdb-kv-store/
Another possibility is Apple's FoundationDB, with an interesting discussion here: https://news.ycombinator.com/item?id=37552085
Writing a Bitcask(KV wal) like db in Rust. Really cool and simple ideas. The white paper is like 5 pages.
So for a bit more effort you get a battle tested real world thing!
If I'd had to have either code or tests generated by a LLM, I'd manually write the test cases with a well-thought out API for whatever I test, then have the LLM write tests that implements what I thought up, rather than the opposite which sounds like a slow and painful death.
Everytime I've tried it for something I make no progress at all compared to just banginf out the shape that works and then writing tests to interrogate my own design.
But for more complicated topics, I never fully grasped all the details before writing code, so my tests missed aspects and I had to refactor both code and tests.
I kinda like the idea more than the reality of TDD.
Now, the issue with badly defined problems is not that it's just badly defined, it's also that we like to focus on the technical implementation specifics. To do TDD from scratch requires a mindset shift to think about actual user value (what are you trying to achieve), and then go for the minimum from that perspective. It's basically an inverse from common architecture approach, which is design data models first, and start implementing next. With TDD, you evolve your data models along with the code and architecture.
And it is freaking hard to stop yourself from thinking too far ahead and letting tests drive your architecture (code structure and APIs). Which is why I also frequently prototype without TDD, and then massage those prototypes into fully testable code that could have been produced with TDD.
If instead every test is well intentioned and focus on testing the public API of whatever you test, not making assumptions about the internal design, you can get well tested code that is also easy to change (assuming the public interface is still OK).
What really happens is that people write code, write non-unit "unit" tests for 100% coverage, and then suffer because those non-unit tests are now dependent on more than just what you are trying to test, all of them have some duplication because of it, and any tiny change is now blocked by tests.
I learned assembler by typing in listings from magazines and hand dis-assembling and debugging on paper. Your approach seems similar in spirit, but who has the times these days?
I also heard the philosopher Ken Wilber spent a few years (in what kids today call Monk Mode) writing out great books by hand.
The main effect I noticed is that I rapidly gain muscle memory in a new programming language, library or codebase.
The other effect is that I'm forced to round-trip every token through my brain, which is very helpful as my eyes tend to glaze over — often I'll be looking right at an obvious bug without seeing it.
It's not for everyone but I love it.
I've also experienced autocomplete in NetBeans IDE so slow that it was just faster to type it out.
I quite value syntax highlighting though. Back then I used Turbo Pascal 5.5 on PC XT because it was way faster and less demanding than Turbo Pascal 6.0, but I remember not having a syntax highlighting was quite worse experience. You could get used to do without though.
But it also depends on the language. I've seen some Lua code without syntax highlighting and it was just a soup of words, very unreadable. Whereas something like C with symbols is OK.
I make an effort to keep the line numbers synced. Sometimes I skip long repetitive blocks or comments. But I do type out like 80% of the actual characters in the file.
It's about 500 lines per hour fot me, so I can estimate reasonably well how long it'll take.
It's not necessarily an efficient thing to do — you'd get way more bang for your buck just poking around, asking questions, trying to make small changes. But for reasonably small projects, you can type it out in a few hours, or a day or two. Then you've "round-tripped" every single token through your brain (though sadly not with a meaningful amount of conscious reflection) -- unless you pause and ask questions along the way.
See also my other comment above.
It would be funny to type it until it builds, and then type it until the tests pass.
I've read about an author who did this (I can't remember their name right now), writing down the works of another author they wanted to learn from.
I.e., just write one file (or several) as a B-tree + a log, appending to log, and once in a while merging log entries into the B-tree in a CoW manner. Essentially that's what ZFS does, except it's optional when it really shouldn't be. The whole point of the log is to amortize the cost of the copy-on-write B-tree updates because CoW B-tree updates incur a great deal of write magnification due to having to write all new interior blocks for all leaf node writes. If you wait to accumulate a bunch of transactions then when you finally merge them into the tree you will be able to share many new interior nodes for all those leaf nodes. So just make the log a first-class part of the database.
Also, the log can include small indices of log entries since the last B-tree merge, and then you can accumulate even more transactions in the log before having to merge into the B-tree, thus further amortizing all that write magnification. This approaches an LSM, but with a B-tree at the oldest layer.
The code is available on GitHub: https://github.com/eatonphil/gosql (it's specifically a PostgreSQL implementation in Go).
It's cool to build a database in 3000 lines, but for a real production-ready database you'll need testing. Would love to see some coverage on correctness and reliability tests. For example, SQLite has about 590 times more test code than the library itself. (https://www.sqlite.org/testing.html)
The idea in mind was to use it for something like an RSS feed reader.
I'm not sold on complexity being a necessity in software engineering, as I'm sure a lot of you also aren't. Yet we see a lot of behemoth projects.
https://leanpub.com/build_your_own_database_from_scratch/c/L...
To be honest it seems a bit strange to pay for the code since all the source codes to build the database is literally inside the book.