The main way we navigate and organize code is by folder hierarchies. Everyone has a different approach: by feature, by module, by file type (template, component, etc.), by environment (backend/frontend).
Rather than folders and file names, everything could just be tagged in different ways.
Who has tried this and what is the best tool for working like this today?
https://www.google.com/books/edition/Mastering_ENVY_Develope...
~
pdf 1992 Product Review: Object Technology’s ENVY Developer
http://archive.esug.org/HistoricalDocuments/TheSmalltalkRepo...
Its just that the main way code editors present navigation follows the path hierarchy, also because its often intimately tied to how programming languages shape modules. Most editors have at least some alternative navigation however, and most people are using at least some of them: outlining by declaration symbols, search, changes, unittests, open files, bookmarks, etc.
So in a way, this is already how it is done, except the 'database' part is really tied to the code editor and its storage component nicely decoupled (in the end, databases are usually also just a bunch of files).
I think any real improvements on this model can only come from a new programming language design, and as others have pointed out, this hasn't caught on in the past. The reason for this is probably not that file oriented modularity is the best thing there is, but rather the escape velocity needed to get out of the vast ecosystem of tooling around files, like the OS, git and existing code editors and whatnot.
Userland Frontier was a wonderful scripting environment born on macOS and ported to Windows. It was a mix of an object database, storing code and data, an extensible scripting language called UserScript, and very powerful InterApplication capabilities, based on Apple's Open Scripting Architecture. Dave Winer, its author, worked on the XML-RPC standard afterwards.
memory snapshot "image"
AND "change log" text file
AND "sources" text file.
https://cuis-smalltalk.github.io/TheCuisBook/Code-Management...
If the "sources" file is missing the byte code will be decompiled to show class and method definitions, but the original names will be unknown.
A new approach to Storing code. Other tools try to recover structure from text; Unison stores code in a database. This eliminates builds, provides for instant nonbreaking renames, type-based search, and lots more.
I can do this in Lightroom or my "Photo" app, but then you are always reliant on some third-party tool. It would be nice if there was some native way for files to not have to commit to a single hierarchy, but able to switch views on the fly (without it being insanely slow for larger amount of files).
We eventually stopped because we were relying much more on external tools (eg npm, webpack) which had all sort of issues over webdav mounts. Maintaining all this code management infrastructure in parallel wasn't worth it in the end, and we moved the code back to disk, switched to git, etc.
And photoshop silently ignoring webdav I/O errors when saving designs didn't help either.
You already have tagging by type on the filesystem - the file extension. That allows you to limit file searches. Add extra metadata to extensions if the same extensions have different roles (.backend.ts, .frontend.ts, .html.template, .text.template)
These days I prefer to structure for easy removal of code - everything for eg. a widget (frontend, backend, css) goes into a folder and I only need to remove that folder when the widget is retired, and linting/validation will show me the few remaining path references I need to cleanup.
Yes, that's git, a filesystem, and an IDE -- and the physical layout of the code isn't the way I normally navigate it. It's useful structure for the tooling, though.
It's definitely true that "using git" or "putting our code on the filesystem" aren't ends in themselves, they are means to an end. If we found a way to meet our requirements that has fewer trade-offs to git then I'm sure we'd jump. Git and filesystems are possibly the worst options for organising code and history, except for all the other options out there :P.
[^1]: https://microsoft.github.io/language-server-protocol/specifi...
On the positive side, DevOps was a breeze - push a DB to a server and everything just worked. Pushing new code to all the DBs was a breeze. Any dev could immediately jump into an app and have a sense of where they would find elements of the app. All apps ran the same way, so it was realistic for small shops to deliver large products.
On the downside, source control was sub-optimal. That was a weakness in the platform even 25 years ago when it was modern, and never quite improved... although there are ways to import/export the code to make it work with modern source control like git. It also made each app heavier than it needed to be - instead of sharing centralized code, each app had its own copy. Your infrastructure footprint got big, fast.
For a modern take on it, I think other comments are hitting the key point - you might want to have fuzzier definitions of what a database and a file system are. At the end of the day, they are both ways of storing data to disk with different access methods. But it sounds like you are more concerned about DX. To get to your vision, I'd focus more on an IDE that lets you navigate code how you desire, while leaving the actual code storage as a DevOps exercise where they can focus on whatever solutions optimizes delivery and reliability.
1. During the peak phase of couchDB as application server (2006 - 2009) it was common to store not just the data but all the app assets and code in the database and replicate everything together. Plenty of the community tried to bring this to the extreme with every function being stored as versioned document (i see it as precursor to FAAS) and the whole application being editable with an integrated IDE. Also functions in my incarnation of this system were not loaded by filename but with a content addressed manifest. You would reference functions by name but the name would be resolved with a hash manifest.
2. There were several systems with erlang/BEAM to take the hot code replacement to the extreme in similar way, storing code in i believe mnesia.
3. I think bloomberg (i cannot find the hn post to confirm it was them, if someone has the link that would be great) has/had a bespoke code database with custom version control and fully integrated IDE. They leveraged this for some pretty interesting workflows
4. Probably not exactly what you mean as it does not include the runtime integration, but google and sourcegraph are building code databases with indices on symbols and semantic understanding of references and more. I hear great things from people who worked with it especially
The folder structure reflects the subdivision of code into modules. Each module may have submodules, and each module decides the visibility of its children to other modules at the same level as itself, and to its own supermodule. This is a naturally hierarchical structure, which file systems lend themselves well to. A code database would have to replicate this structure within it somehow anyway.
A non-hierarchical tag system would help model situations where you have multiple orthogonal axes along which to organise the code (as you point out). But in these cases, which axis gets the top-level hierarchy just doesn't matter. Pick one, maybe loosely informed by organisational factors or by your problem conceptualisation.
On the flipside, in situations where a stricter hierarchy would improve modularity, the tag system might _discourage_ clean crystallisation, and cause responsibilities to bleed into each other. IMO, it's more important for there to be modules at all than for their boundaries to be perfect.
id ask, is it really a bottleneck. fetching code. maybe in some systems or types of execution environments it could be worth it. really dont know.
Id assume data is stored in databases because it needs to be viewed from different angles. (join statements etc ) and it has different peeformance and layout requirements.
most code is 'read only' too, so theres no need to do stuff like synchronization / locking on writes and ordering stuff.
then again, there might be systems that dont have this aspect, and somehow have very high load on fetching code, and maybe code is writable too, and could have queries to extract certain parts of code, or combined code from various files/tables.
i think tho the main reason is this difference between how code and data are fetched and used will be the reason why in the general case it works like it works. its not been needed to work differently. so no one looked for a solution. no big problems in the space. (my guessing)
The project is oss [2] and the storage connector is "mysql". It even handles foreign key by creating links to another folder with a search query to find the table row it's associated with
Current Text based approach. 187 files changed.
Structured Data approach. Function Fobar changed to FooBar. Struct Baz field vargle changed to bargle.
once that leap is made, a whole lot of the complexities of namespaces, modules, source control, and parsing become much simpler/better. this comes at the cost of more complexity in the editor/infrastructure, but that is a singular place while in return it is simplifying every program written.
There are programming languages that store code in some kind of non-hierarchical format. For example, Unison (https://www.unison-lang.org/) stores code in a database just as you suggest, and projects it down to text for editing. A more established example is probably Smalltalk, which stores the code as part of an image that is edited live in the Smalltalk environment.
On the other side, you can have filesystems that are not hierarchical, for example semantic filesystems like Tagsistant for Linux — these can be used for more flexible relationships between any kind of file, not just code.
The idea was that you don't have files, just functions that you can bring in and out of scope while editing. You have branches per-function. This all worked more or less transparently to the user using the normal emacs Sly Common Lisp flow.
It was implemented overriding the +DEFUN+ macro, so function re-definitions automatically serialized and created a new entry in the DB.
The Proof-of-Concept used SQLite, but I also envisioned a postgres backed version for jamming on programs with your friends in real time.
Where are you storing code if not in a database?
> rather than a hierarchical, text-based format.
Okay, so you mean not a hierarchical database, but rather a... Relational database, I guess?
> The main way we navigate and organize code is by folder hierarchies.
Organize I can buy, I suppose. But I navigate by AST representation (as provided by an LSP in this day of age). It turns out code is a database too!
> Rather than folders and file names, everything could just be tagged in different ways.
So you are looking for WinFS? While it suffered from many technical issues, its biggest problem is that users really didn't gain much from it.
And the keyword here `database` not need to mean the typical one. In fact, most tools (like git) are databases over the code. IDE, parsers, etc. POOR ONES, and probably in the way of 'any program is a poorly implemented half of lisp', but intentionally create a database interface with a relational(enhanced!) view with intentional CRUD+Queries make a lot of sense.
https://github.com/projectured/projectured
the depth work is almost done. IIRC there are only a couple of nontrivial issues left, but it's been abandoned.
This doesn't stop you from also accessing it in other ways. And with modern IDEs you can search across a fairly chunky codebase near-instantly, which would allow you to treat it as if it's in a database.
However its worth noting that all of the systems that rely on databases to store code (SharePoint, SAP, Power Platform) suck haaaaaaard, mainly due to issues with versioning and configuration management.
So, granted a filesystem does exhibit CRUD, and hierarchical relations, it's already a relational database.
I take this as you are arguing about the utility of a text based format?
Edit : Reformulation because I was voted down : Why change the storage format if the IDE already manage it ? And I add, For storing in database, you have to think about the granularity of your data. and it rapdily become the line, if not the character. Working daily with code stored in database, (salesforce), where the granularity is the class, is really anigthmare from a Content Version point of view.
2. Everyone organizes projects into folders differently, but in most languages the only reason why you organize things into folders is to make it easier for humans to find things. The computer doesn't care where the files are stored. So, you're proposing: Give up a feature that exists solely to make it easier for humans to find things; its extremely difficult to envision a world where this results in a more ergonomic world.
3. The hierarchy is only one way of thinking about how you can browse a filesystem. There is nothing stopping editors from indexing files in different ways, allowing you to browse by, for example, files tagged with some comment at the top, or files which contain classes versus interfaces. In fact, more comprehensive IDEs like JetBrains already do this for some languages. You don't have to change the storage substrate to get 100% of the benefits you propose, with the extremely small cost of some indexing process when a project is first opened.
4. There are twenty billion programming languages out there, like four of them do what you're suggesting, yet no one uses any of them for anything meaningful. There is nothing, period, stopping any new language from doing something like this. Golang could have been designed like this; but it wasn't.
Funny choice to use as an example. Go was designed by the same people who designed Plan 9. If no other language out there used files, Go would have still chosen to use files.
I'm sorry i don't read binary files.