This is partly due to how we've distributed software over the last 40 years. In the 80s the idea of a library of functionality was something you paid for, and painstakingly included parts of into your size constrained environment (fit it on a floppy). You probably picked apart that library and pulled the bits you needed, integrating them into your builds to be as small as possible.
Today we pile libraries on top of libraries on top of libraries. Its super easy to say `import foolib`, then call `foolib.do_thing()` and just start running. Who knows or cares what all 'foolib' contains.
At each level a caller might need 5% of the functionality of any given dependency. The deeper the dependency tree gets the more waste piles on. Eventually you end up in a world where your simple binary is 500 MiB of code you never actually call, but all you did was take that one dependency to format a number.
In some cases the languages make this worse. Go and Rust, for example, encourage everything for a single package/mod to go in the same file. Adding optional functionality can get ugly when it would require creating new modules, but if you only want to use a tiny part of the module, what do you do?
The only real solution I can think of to deal with this long term is ultra-fine-grained symbols and dependencies. Every function, type, and other top-level language construct needs to declare the set of things it needs to run (other functions, symbols, types, etc). When you depend on that one symbol it can construct, on demand, the exact graph of symbols it needs and dump the rest for any given library. You end up with the minimal set of code for the functionality you need.
Its a terrible idea and I'd hate it, but how else do you address the current setup of effectively building the whole universe of code branching from your dependencies and then dragging it around like a boat anchor of dead code.
I'm not convinced that happens that often.
As someone working on a Rust library with a fairly heavy dependency tree (Xilem), I've tried a few times to see if we could trim it by tweaking feature flags, and most of the times it turned out that they were downstream of things we needed: Vulkan support, PNG decoding, unicode shaping, etc.
When I did manage to find a superfluous dependency, it was often something small and inconsequential like once_cell. The one exception was serde_json, which we could remove after a small refactor (though we expect most of our users to depend on serde anyway).
We're looking to remove or at least decouple larger dependencies like winit and wgpu, but that requires some major architectural changes, it's not just "remove this runtime option and win 500MB".
My feeling is that Python scores fairly well in this regard. At least it used to. I haven't been following closely in recent years.
Just so you can multiply matrices or something.
I’m just not convinced that it’s worth the pain to avoid installing these packages.
You want speedy matrix math. Why would you install some second rate package just because it has a lighter footprint on disk? I want my dependencies rock solid so I don’t have to screw with debugging them. They’re not my core business - if (when) they don’t “just work” it’s a massive time sink.
NumPy isn’t “left pad” so this argument doesn’t seem strong to me.
A better design is to make it easy you to choose or hotswap your BLAS/LAPACK implementation. E.g. OpenBLAS for AMD.
Edit: To be clear, Netlib (the reference implementation) is almost always NOT what you want. It's designed to be readable, not optimized for modern CPUs.
Go and C# (.NET) are counterexamples. They both have great ecosystems and just as simple and effective package management as Rust or JS (Node). But neither Go or C# have issues with dependency hell like Rust or even more JavaScript, because they have exceptional std libs and even large frameworks like ASP.NET or EF Core.
A great std lib is obviously the solution. Some Rust defenders are talking it down by giving Python as counter example. But again, Go and C# are proving them wrong. A great std lib is a solution, but one that comes with huge efforts that can only be made by large organisations like Google (Go) or Microsoft (C#).
A large stdlib solves the problems the language is focused on. For C# and Go that is web hosts.
Try using them outside that scope and the dependencies start to pile in (Games, Desktop) or they are essentially unused (embedded, phones, wasm)
That's part of it, but it also solves the problem of vetting. When I use a Go stdlib I don't have to personally spend time to vet it like it do when looking at a crate or npm package.
In general, Go & Rust packages on github are high quality to begin with, but there is still a pronounced difference between OS packages and what is approved to be part of the language's own stdlib.
It's nice to know thousands of different companies already found the issues for me or objected to them in reviews before the library was published.
But I agree that graphics is often overlooked in std libs. However that’s a bit of a different beast. Std libs typically deal with what the OS provides. Graphics is its own world so to speak.
As for Wasm: first, that’s a runtime issue and not a language issue. I think GC is on the roadmap for Wasm. Second, Go and C# obviously predate Wasm.
In the end, not every language should be concerned with every use case. The bigger question is whether it provides a std lib for the category of programs it targets.
To take a specific example: JS isn’t great at efficiently and conveniently generating dynamic HTML. You can go far without (or minimal) dependencies and some clever patterns. But a lot of pain and work hours would have been saved if it had something that people want to use out of the box.
You don't consider games, desktop and mobile applications big use cases, each being multi billion industries?
I don't know man, I feel like you're arguing in bad faith and are intentionally ignoring what the athrowaway3z said: it works there because they're essentially languages specifically made to enable web development . That's why their standard lib is plenty for this domain.
I can understand that web development might be the only thing you care about though, it's definitely a large industry - but the thesis of a large standard lib solving the dependency issue really isnt true, as (almost) every other usecase beyond web development shows.
I don't think the dependency issue can be solved by a good std lib, but it certainly can be mitigated as some languages show.
I think JS is a very pronounced case study here.
Edit: after rereading this I feel like I may have come across sarcastic, I was legitimately impressed a guess without looking it up would peg the ratio that closely. It was off topic as a response too. So I'll add that rust never would have an asynch as good as tokio, or been able to have asynch in embedded as with embassy, if it hadn't opted for batteries excluded. I think this was the right call given its initial focus as a desktop/systems language. And it is what allowed it to be more than that as people added things. Use cargo-deny, pin the oldest version that does what you need and doesn't fail cargo deny. There are several hundred crates brought in by just the rust lang repo, if you only vet things not in that list, you can save some time too.
Even ignoring that, those are just common formats. They don't tell you what a particular web server is doing.
Take a few examples of some Go projects that either are web servers or have them as major components like Caddy or Tailscale. Wildly different types of projects.
I guess one has to expand "web server" to include general networking as well, which is definitely a well supported use case or rather category for the Go std lib, which was my original point.
You seem to have a very different definition of "web server" to me.
Maybe it's a language community thing.
As for the generic things, I think C# is the only mainstream language which has small vectors, 3x2 and 4x4 matrices, and quaternions in the standard library.
They've got SIMD-accelerated methods for calculating 3d projection matrices. No other ecosystem is even close once you start digging into the details.
If you're willing to constrain yourself to 2D games, and exclude physics engines (assume you just use one of the Box2D bindings) and also UI (2D gamedevs tend to make their own UI systems anyway)... Then your best bet in the C# world is Monogame (https://monogame.net/), which has lots of successful titles shipped on desktop and console (Stardew Valley, Celeste)
Depends. There is Godot Script. Seeing how it comes with a game engine.
But original claim was
> actually dotnet also does not need too many dependencies for games and desktop apps.
If you're including languages with big game engines. It's a tautology. Languages with good game engines, have good game engines.But general purpose programming language has very little to gain from including a niche library even if it's the best in business. Imagine if C++ shipped with Unreal.
Are you really trying to compare serde to rendering engines?
Python's standard library is big. I wouldn't call it great, because Python is over 30 years old and it's hard to add things to a standard library and even harder to remove them.
I’m still hoping we can get a decently typed argparse with a modern API though (so much better for tiny scripts without deps!)
Argument parsing, in partucular, is a great place to start realizing that you can implement what you need without adding a dozen dependencies
Don't disagree with the principle, there are a lot of trivial pythong deps, but rolling your own argument parsing is not the way
If youve never thought about it, it might seem like you need an off-the-shelf dependency. But as programmers sometimes we should think a bit more before we make that decision.
It's a very different story with heavyweight dependencies like Numpy (which include reams of tests, documentation and headers even in the wheels that people are only installing to be a dependency of something else, and covers a truly massive range of functionality including exposing BLAS and LAPACK for people who might just want to multiply some small matrices or efficiently represent an image bitmap), or the more complex ones that end up bringing in multiple things completely unrelated to your project that will never be touched at runtime. (Rich supports a ton of wide-ranging things people might want to do with text in a terminal, and I would guess most clients probably want to do exactly one of those things.)
They also have a lot narrower scope of use, which means it is easier to create stdlib usable for most people. You can't do it with more generic language.
I feel like this is an organizational problem much more than a technical one, though. Rust can be different things to different people, without necessarily forcing one group to compromise overmuch. But some tension is probably inevitable.
That depends on the language. In an interpreted language (including JIT), or a language that depends on a dynamically linked runtime (ex c and c++), it isn't directly included in your app because it is part of the runtime. But you need the runtime installed, and if your app is the only thing that uses that runtime, then the runtime size is effectively adds to your installation size.
In languages that statically link the standard library, like go and rust, it absolutely does impact binary size, although the compiler might use some methods to try to avoid including parts of the standard library that aren't used.
What you say is true enough for external-runtime languages and Go, though TinyGo is available for resource-constrained environments.
The no_std Rust only has core but this is indeed a library of code, and freestanding C does not provide such a thing = freestanding C stdlib provides no functions, just type definitions and other stuff which evaporates when compiled.
Two concrete examples to be going along with: Suppose we have a mutable foo, it's maybe foo: [i32; 40]; (forty 32-bit signed integers) or in C maybe they're int foo[40];.
In freestanding C that's fine, but we're not provided with any library code to do anything with foo, we can use the core language features to write it outselves, but nothing is provided.
Rust will happily foo.sort_unstable(); this is a fast custom in-place sort, roughly a modern form of introspective sort written for Rust by its creators and because it's in core, that code just goes into your resulting embedded firmware or whatever.
Now, suppose we want to perform a filter-map operation over that array. In C once again you're left to figure out how to write that in C, in Rust foo impl IntoIterator so you can use all the nice iterator features, the algorithms just get baked into your firmware during compilation.
Compared to go and c#, Rust std lib is mostly lacking:
- a powerful http lib
- serialization
But Rust approach, no Runtime, no GC, no Reflection, is making it very hard to provide those libraries.
Within these constraints, some high quality solutions emerged, Tokio, Serde. But they pioneered some novel approaches which would have been hard to try in the std lib.
The whole async ecosystem still has a beta vibe, giving the feeling of programming in a different language. Procedural macros are often synonymous with slow compile times and code bloat.
But what we gained, is less runtime errors, more efficiency, a more robust language.
TLDR: trade-offs everywhere, it is unfair to compare to Go/C# as they are languages with a different set of constraints.
All those AFAIR need 3rd party packages:
Regex, DateTime, base64, argument parsing, url parsing, hashing, random number generation, UUIDs, JSON
I'm not saying it's mandatory, but I would expect all those to be in the standard library before there is any http functionality.
As some of the previous commenters said, when you focus your language to make it easy to write a specific type of program, then you make tradeoffs that can trap you in those constraints like having a runtime, a garbage collector and a set of APIs that are ingrained in the stdlib.
Rust isn't like that. As a system programmer I want none of them. Rust is a systems programming language. I wouldn't use Rust if it had a bloated stdlib. I am very happy about its stdlib. Being able to swap out the regex, datetime, arg parsing and encoding are a feature. I can choose memory-heavy or cpu-heavy implementations. I can optimize for code size or performance or sometimes neither/both.
If the trade-offs were made to appease the easy (web/app) development, it wouldn't be a systems programming language for me where I can use the same async concepts on a Linux system and an embedded MCU. Rust's design enables that, no other language's design (even C++) does.
If a web developer wants to use a systems programming language, that's their trade-off for a harder to program language. The similar type safety to Rust's is provided with Kotlin or Swift.
Dependency bloat is indeed a problem. Easy inclusion of dependencies is also a contributing factor. This problem can be solved by making dependencies and features granular. If the libraries don't provide the granularity you want, you need to change libraries/audit source/contribute. No free meals.
A feature present on every language that has those in the stdlib.
The de facto standard regex library (which is excellent!) brings in nearly 2 MB of additional content for correct unicode operations and other purposes. The same author also makes regex-lite, though, which did everything we need, with the same interface, in a much smaller package. It made it trivial to toss the functionality we needed behind a trait and choose a regex library appropriately in different portions of our stack.
Regex is not 3rd party (note the 'rust-lang' in the URL):
Rust has other challenges it needs to overcome but this isn't one.
I'd put Go behind both C#/F# and Rust in this area. It has spartan tooling in odd areas it's expected to be strong at like gRPC and the serialization story in Go is quite a bit more painful and bare bones compared to what you get out of System.Text.Json and Serde.
The difference is especially stark with Regex where Go ships with a slow engine (because it does not allow writing sufficiently fast code in this area at this moment) where-as both Rust and C# have top of the line implementations in each which beat every other engine save for Intel Hyperscan[0].
[0]: https://github.com/BurntSushi/rebar?tab=readme-ov-file#summa... (note this is without .NET 9 or 10 preview updates)
I don't think that's why. Or at least, I don't think it's straight-forward to draw that conclusion yet. I don't see any reason why the lazy DFA in RE2 or the Rust regex crate couldn't be ported to Go[1] and dramatically speed things up. Indeed, it has been done[2], but it was never pushed over the finish line. My guess is it would make Go's regexp engine a fair bit more competitive in some cases. And aside from that, there's tons of literal optimizations that could still be done that don't really have much to do with Go the language.
Could a Go-written regexp engine be faster or nearly as fast because of the language? Probably not. But I think the "implementation quality" is a far bigger determinant in explaining the current gap.
In theory they should reduce it because you wouldn’t make proc macros to generate code you don’t need…right? How much coding time you save with macros compared to manually implementing them?
Part of the issue I have with the dependency bloat is how much effort we currently go through to download, distribute, compile, lint, typecheck, whatever 1000s of lines of code we don't want or need. I want software that allows me to build exactly as much as I need and never have to touch the things I don't want.
Why, in principle, wouldn't the same algorithms work before distribution?
For that matter, check out the `auditwheel` tool in the Python ecosystem.
For example, you have a function calling XML or PDF or JSON output functions depending on some output format parameter. That's three very different paths and includes, but if you don't know which values that parameter can take during runtime you will have to include all three paths, even if in reality only XML (for example) is ever used.
Or there may be higher level causes outside of any analysis, even if you managed a dynamic one. In a GUI, for example, it could be functionality only ever seen by a few with certain roles, but if there is only one app everything will have to be bundled. Similar scenarios are possible with all kinds of software, for example an analysis application that supports various input and output scenarios. It's a variation of the first example where the parameter is internal, but now it is external data not available for an analysis because it will be known only when the software is actually used.
It worked great, but it took diligence, it also forces you to interact with your deps in ways that adding a line to a deps file does not.
It feels like torture until you see the benefits, and the opposite ... the tangled mess of multiple versions and giant transitive dependency chains... agony.
I would prefer to work in shops that manage their dependencies this way. It's hard to find.
Being able to change a dependency very deep and recompile the entire thing is just magic though. I don't know if I can ever go back from that.
Alternatively, for some project it might be enough to only depend on stuff provided by Debian stable or some other LTS distro.
Kids today don't know how to do that anymore...
Compared to Rust where my experience with protobuf lib some time ago was that there is a choice of not 1 but even 3 different libraries, one of which doesn't support services, another didn't support the syntax we had to support, and the third one was unmaintained. So out of 3 choices no single one worked.
Compared that to Maven, where you have only one official supported choice that works well and well maintained.
It's even more pronounced with the main Java competitor: .Net. They look at what approach won in Java ecosystem and go all in. For example there were multiple ORM tools competing, where Microsoft adopted the most popular one. So it's even easier choice there, well supported and maintained.
That's still consolidation, and it also needs time.
Even in Rust crates like hashbrown or parkinglot have been basically subsumed in the standard library.
It's effectively an end-run around the linker.
It used to be that you'd create a library by having each function in its own compilation unit, you'd create a ".o" file, then you'd bunch them together in a ".a" archive. When someone else is compiling their code, and they need the do_thing() function, the linker sees it's unfulfiled, and plucks it out of the foolib.a archive. For namespacing you'd probably call the functions foolib_do_thing(), etc.
However, object-orientism with a "god object" is a disease. We go in through a top-level object like "foolib" that holds pointers to all its member functions like do_thing(), do_this(), do_that(), then the only reference the other person's code has is to "foolib"... and then "foolib" brings in everything else in the library.
It's not possible for the linker to know if, for example, foolib needed the reference to do_that() just to initialise its members, and then nobody else ever needed it, so it could be eliminated, or if either foolib or the user's code will somehow need it.
> Go and Rust, for example, encourage everything for a single package/mod to go in the same file.
I can say that, at least for Go, it has excellent dead code elimination. If you don't call it, it's removed. If you even have a const feature_flag = false and have an if feature_flag { foobar() } in the code, it will eliminate foobar().
It also happens to be an object, but that's just because python is a dynamic language and libraries are objects. The C++ equivalent is foolib::do_thing(); where foolib is not an object.
So, what's is the compiler doing that he doesnt remove unused code?
For example you know you will never use one of the main functions in the parsing library with one of the arguments set to "XML", because you know for sure you don't use XML in your domain (for example you have a solid project constraint that says XML is out of scope).
Unfortunately the code dealing with XML in the library is 95% of the code, and you can't tell your compiler I won't need this, I promise never to call that function with argument set to XML.
doc_format = get_user_input() parsed_doc = foolib.parse(doc_format)
You as the implementer might know the user will never input xml, so doc_format can't be 'xml' (you might even add some error handling if the user inputs this), but how can you communicate this to the compiler?
And even going beyond "debug", plenty of libraries ship features that are downright unwanted by consumers.
The two famous recent examples are Heartbleed and Log4shell.
Clarification: Go allows for a very simple multi-file. It’s one feature I really like, because it allows splitting otherwise coherent module into logical parts.
For example, you can't split up a module into foo.rs containing `Foo` and bar.rs containing `Bar`, both in module 'mymod' in such a way that you can `use mymod::Bar and foo.rs is never built/linked.
My point is the granularity of the package/mod encourages course-grained deps, which I argue is a problem.
yesn't, you can use feature flags similar to `#if` in C
but it's also not really a needed feature as dead code elimination will prune out all code functions, types, etc. you don't use. Non of it will end up in the produced binary.
Historically Rust wanted that foo.rs to be renamed foo/mod.rs but that's no longer idiomatic although of course it still works if you do that.
in rust crates are semantically one compilation unit (where in C oversimplified it's a .h/.c pair, and practically rustc will try to split it in some more units to speed up build time).
the reason I'm pointing this out is because many sources of "splitting a module across files" come from situations where 1 file is one compilation unit so you needed to have a way to split it (for organization) without splitting it (for compilation) in some sitation
https://learn.microsoft.com/en-us/dotnet/core/deploying/trim...
https://learn.microsoft.com/en-us/dotnet/core/deploying/nati...
which get reinvented all the time, like in dotnet with "trimming" or in JS with "tree-shaking".
C/C++ compiler have been doing that since before dot net was a thing, same for rust which does that since it's 1.0 release (because it's done by LLVM ;) )
The reason it gets reinvented all the time is because while it's often quite straight forward in statically compiled languages it isn't for dynamic languages as finding out what actually is unused is hard (for fine grained code elimination) or at lest unreliable (pruning submodules). Even worse for scripting languages.
Which also brings use to one area where it's not out of the box, if you build .dll/.so in one build process and then use them in another. Here additional tooling is needed to prune the dynamic linked libraries. But luckily it's not a common problem to run into in Rust.
In general most code size problems in Rust aren't caused by too huge LOC of dependencies but by an overuse of monopolization. The problem of tons of LOC in dependencies is one of supply chain trust and review ability more then anything else.
It seems to me in a strict sense the problem of eliminating dead code may be impossible for code that uses some form of eval(). For example, you could put something like eval(decrypt(<encrypted code>,key)), for a user-supplied key (or otherwise obfuscated); or simply eval(<externally supplied code>); both of which could call previously dead code. Although it seems plausible to rule out such cases. Without eval() some of the problem seems very easy otoh, like unused functions can simply be removed!
And of course there are more classical impediments, halting-problem like, which in general show that telling if a piece of code is executed is undecidable.
( Of course, we can still write conservative decisions that only cull a subset of easy to prove dead code -- halting problem is indeed decidable if you are conservative and accept "I Don't Know" as well as "Halts" / "Doesn't Halt" :) )
*monomorphization, in case anyone got confused
The comment you're replying to is talking about not pulling in dependencies at all, before compiling, if they would not be needed.
It should be easy to build and deploy profiling-aware builds (PGO/BOLT) and to get good feedback around time/instructions spent per package, as well as a measure of the ratio of each library that's cold or thrown away at build time.
I'll note that it isn't just PGO/BOLT style optimizations. Largely, it is not that at all, oddly.
Instead, the problem is one of stability. In a "foundation that doesn't move and cause you to fall over" sense of the word. Consider if people made a house where every room had a different substructure under it. That, largely, seems to be the general approach we use to building software. The idea being that you can namespace a room away from other rooms and not have any care on what happens there.
This gets equally frustrating when our metrics for determining the safety of something largely discourages inaction on any dependencies. They have to add to it, or people think it is abandoned and not usable.
Note that this isn't unique to software, mind. Hardware can and does go through massive changes over the years. They have obvious limitations that slow down how rapidly they can change, of course.
I'm not sure what the problem is here.
Are you after pinning dependencies to be sure they didn't change? Generally I want updating dependencies to fix bugs in them.
Are you after trusting them through code review or tests? I don't think there's shortcuts for this. You shouldn't trust a library, changing or not, because old bugs and new vulnerabilities make erring on both sides risky. On reviewing other's code, I think Rust helps a bit by being explicit and fencing unsafe code, but memory safety is not enough when a logic bug can ruin your business. You can't avoid testing if mistakes or crashes matter.
Examples: Google's Guava for the migration department. Apache Commons would be a good example of how not to make life painful for users there.
For sweeping features, Log4j introduced some pretty terrible security concerns.
Well, it's not required to trim code that you can prove unreachable, true. But I was thinking about trying to measure if a given library really pulls it's non-zero weight, and how much CPU is spent in it.
A library taking "too much time" for something you think can be done faster might need replacement, or swapping for a simple implementation (say the library cares about edge cases you don't face or can avoid).
My point on PGO/BOLT not being relevant was more that I see people reaching for libraries to do things such as add retries to a system. I don't think it is a terrible idea, necessarily, but it can be bad when combined with larger "retries plus some other stuff" libraries.
Now, fully granted that it can also be bad when you have developers reimplementing complicated data structures left and right. There has to be some sort of tradeoff calculation. I don't know that we have fully nailed it down, yet.
It's a terrible idea because you're trying to reinvent section splitting + `--gc-sections` at link time, which rust (which the article is about) already does by default.
Things like --gc-sections feels like a band-aid, a very practical and useful band-aid, but a band-aid none the less. You're building a bunch of things you don't need, then selectively throwing away parts (or selectively keeping parts).
IMO it all boils down to the granularity. The granularity of text source files, the granularity of units of distribution for libraries. It all contributes to a problem of large unwieldy dependency growth.
I don't have any great solutions here, its just observations of the general problem from the horrifying things that happen when dependencies grow uncontrolled.
It’s getting hard to take these conversations seriously with all of the hyperbole about things that don’t happen. Nobody is producing Rust binaries that hit 500MB or even 50MB from adding a couple simple dependencies.
You’re also not ending up with mountains of code that never gets called in Rust.
Even if my Rust binaries end up being 10MB instead of 1MB, it doesn’t really matter these days. It’s either going on a server platform where that amount of data is trivial or it’s going into an embedded device where the few extra megabytes aren’t really a big deal relative to all the other content that ends up on devices these days.
For truly space constrained systems there’s no-std and entire, albeit small, separate universe of packages that operate in that space.
For all the doom-saying, in Rust I haven’t encountered this excessive bloat problem some people fret about, even in projects with liberal use of dependencies.
Every time I read these threads I feel like the conversations get hijacked by the people at the intersection of “not invented here” and nostalgia for the good old days. Comments like this that yearn for the days of buying paid libraries and then picking them apart anyway really reinforce that idea. There’s also a lot of the usual disdain for async and even Rust itself throughout this comment section. Meanwhile it feels like there’s an entire other world of Rust developers who have just moved on and get work done, not caring for endless discussions about function coloring or rewriting libraries themselves to shave a few hundred kB off of their binaries.
If each layer of “package abstraction” is only 50% utilised, then each layer multiplies the total size by 2x over what is actually required by the end application.
Three layers — packages pulling in packages that pull their own dependencies — already gets you to 88% bloat! (Or just 12% useful code)
An example of this is the new Windows 11 calculator that can take several seconds to start because it loads junk like the Windows 10 Hello for Business account recovery helper library!
Why? Because it has currency conversion, which uses a HTTP library, which has corporate web proxy support, which needs authentication, which needs WH4B account support, which can get locked out, which needs a recovery helper UI…
…in a calculator. That you can’t launch unless you have already logged in successfully and is definitely not the “right place” for account recovery workflows to be kicked off.
But… you see… it’s just easier to package up these things and include them with a single line in the code somewhere.
This is an extreme example, but the same thing happens very often at a smaller scale. Optional functionality can't always be removed statically.
let uri = get_uri_from_stdin();
networking_library::make_request(uri);
How is the compiler supposed to prune that? let uri: Uri<HTTP> = get_uri_from_stdin().parse()?;
If the library is made in a modular way this is how it would typically be done. The `HTTP` may be inferred by calls further along in the function.If you want to mix schemes you would need to be able to handle all schemes; you can either go through all variations (through the same generics) you want to test or just just accept that you need a full URI parser and lose the generic.
let uri: Uri<FTP or HTTP or HTTPS> = parse_uri(get_uri_from_stdin()) or fail;
If your library cannot parse FTP, either you enable that feature, add that feature, or use a different library.
Rust does have a feature flagging system for this kind of optional functionality though. It's not perfect, but it would work very well for something like curl protocol backends though.
it's also not always a fair comparison, if you include tokio in LOC counting then you surely would also include V8 LOC when counting for node, or JRE for Java projects (but not JDK) etc.
It is also done on the cousin Android, and available as free beer on GraalVM and OpenJ9.
The others no longer matter, out of business.
And since we are always wrong unless proven otherwise,
https://www.graalvm.org/jdk21/reference-manual/native-image/...
https://www.graalvm.org/latest/reference-manual/native-image...
The other nice thing is that bytecode is easy to modify, so if you have a library that has some features you know you don't want, you can just knock it out and bank the savings.
[0] https://clang.llvm.org/docs/ClangCommandLineReference.html#c...
I wouldn't say completely. People still sometimes struggle to get this to work well.
Recent example: (Go Qt bindings)
The analogy I use is cooking a huge dinner, then throwing out everything but the one side dish you wanted. If you want just the side-dish you should be able to cook just the side-dish.
I just don't listen. Things should be easy. Rust is easy. Don't overthink it
Sure, don't overthink it. But underthinking it is seriously problematic too.
What? I don't know about Go, but this certainly isn't true in Rust. Rust has great support for fine-grained imports via Cargo's ability to split up an API via crate features.
Functions are defined by AST structure and are effectively content addressed. Each function is then keyed by hash in a global registry where you can pull it from for reuse.
I think that is much more of a problem in ecosystems where it is harder to add dependencies.
When it is difficult to add dependencies, you end up with large libraries that do a lot of stuff you don't need, so you only need to add a couple of dependencies. On the other hand, if dependency management is easy, you end up with a lot of smaller packages that just do one thing.
Or you have ultra-fine-grained modules, and rely on existing tree-shaking systems.... ?
That’s literally the JS module system? It’s how we do tree shaking to get those bundle sizes down.
A lot of the bloat comes from functionality that can be activated via flags, methods that set a variable to true, environment variables, or even via configuration files.
If you want to disable certain runtime features, you'd do so with feature flags.
But I think the best defense against this problem at the moment is to be extremely defensive/protective of system dependencies. You need to not import that random library that has a 10 line function. You need to just copy that function into your codebase. Don’t just slap random tools together. Developing libraries in a maintainable and forward seeking manner is the exception not the rule. Some ecosystems exceed here, but most fail. Ruby and JS is probably one of the worst. Try upgrading a Rails 4 app to modern tooling.
So… be extremely protective of your dependencies. Very easy to accrue tech debt with a simple library installation. Libraries use libraries. It becomes a compounding problem fast.
Junior engineers seem to add packages to our core repo with reckless abandon and I have to immediately come in and ask why was this needed? Do you really want to break prod some day because you needed a way to print a list of objects as a table in your cli for dev?
If anything, the 1980s is when the idea of fully reusable, separately-developed software components first became practical, with Objective-C and the like. In fact it's a significant success story of Rust that this sort of pervasive software componentry has now been widely adopted as part of a systems programming language.
The more pressing issue with dependencies is supply chain risks including security. That's why larger organizations have approval processes for using anything open source. Unfortunately the new crop of open source projects in JS and even Go seem to suffer from "IDGAF about what shit code from internet I am pulling" syndrome.
Unfortunately granularity does not solve that as long as your 1000 functions come from 1000 authors on NPM.
Also there's arguably design. Should a 'glob' library actually read the file system and give you filenames or should it just tell you if a string matches a glob and leave the reset to you? I think it's better design to do the later, the simplest thing. This means less dependencies and more flexibility. I don't have to hack it or add option to use my own file system (like for testing). I can use it with a change monitoring system, etc...
And, I'm sure there are tons of devs that like the glob is a "Do everything for me" library instead of a "do one specific thing" library which makes it worse because you get more "internet points" the more your library doesn't require the person using it to be a good dev.
I can't imagine it's any different in rust land, except maybe for the executable thing. There's just too many devs and all of them, including myself, don't always make the best choices.
The POSIX glob function after which these things are named traverses the filesystem and matches directory entries.
The pure matching function which matches a glob pattern against a filename-like string is fnmatch.
But yes, the equivalent of fnmatch should be a separate module and that could be a dependency of glob.
Nobody should be trying to implement glob from scratch using a fnmatch-like function and directory traversal. It is not so trivial.
glob performs a traversal that is guided by the pattern. It has to break the pattern into path components. It knows that "*/*/*" has three components and so the traversal will only go three levels deep. Also "dir/*" has a component which is a fixed match, and so it just has to open "dir" without scanning the current directory; if that fails, glob has failed.
If the double star ** is supported which matches multiple components, that's also best if it likewise integrated into glob.
If brace expansion is supported, that adds another difficulty because different branches of a brace can have different numbers of components, like {*/x,*/*/x,*/*/*/x}. To implement glob, it would greatly help us to have brace expansion as a separate function which expands the braces, producing multiple glob patterns, which we can then break into path components and traverse.
There’s a lot of stupid ways to implement glob and only a couple of smart ones.
Interesting, lets look at fnmatch: https://pubs.opengroup.org/onlinepubs/9699919799/functions/f...
Well, fnmatch really does two things, it parses the pattern and then applies that to a string, so really, there should be a "ptnparse" library that handles the pattern matching that fnmatch has a dependency.
Though, thinking it through, the "ptnparse" library is responsible for patterns matching single characters and multiple characters. We should split that up into "singleptn" and "multiptn" libraries that ptnparse can take as dependencies.
Oh, and those flags that fnmatch takes makes fnmatch work in several different ways, let's decompose those into three libraries so that we only have to pull in the matcher we care about: pthmatch, nscmatch, and prdmatch. Then we can compose those libraries based on what we want in fnmatch.
This is perfect, now if we don't care about part of the fnmatch functionality, we don't have to include it!
/s
This decomposition is how we wind up with the notorious leftpad situation. Knowing when to stop decomposing is important. fnmatch is a single function that does less than most syscalls. We can probably bundle that with a few more string functions without actually costing us a ton. Glob matching at a string level probably belongs with all the other string manipulation functions in the average "strings" library.
Importantly, my suggestion that fnmatch belongs in a "strings" library does align with your suggestion that fnmatch shouldn't be locked into a "glob" library that also includes the filesystem traversal components.
213M downloads, depends on zero external crates, one source file (a third of which is devoted to unit tests), and developed by the rust-lang organization itself (along with a lot of crates, which is something that people tend to miss in this discussion).
I went to page 8 and there were still glob libraries.
Also this crate is from official rust lang repo, so much less prone to individualistic misbehaving. A bad example all around.
To reiterate, lots of things that people in this thread are asking the language to provide are in fact provided by the rust-lang organization: regex, serde, etc. The goalposts are retreating over the horizon.
Rust's primary sin here is that it makes dependency usage transparent to the end-user. Nobody wants to think about how many libraries they depend upon and how many faceless people it takes to maintain those libraries, so they're uncomfortable when Rust shows you. This isn't a Rust problem, it's a software complexity problem.
The npm glob package has 6 dependencies (those dependencies have 3+ dependencies, those sub dependencies have 6+ dependencies, ...)
As you point out the rust crate is from the official repo, so while it's not part of the standard library, it is maintained by the language maintenance organization.
Maybe that could make it a bad example, but the npm one is maintained by the inventor of npm, and describes him self as "I wrote npm and a pretty considerable portion of other node related JavaScript that you might use.", so I would say that makes it a great example because the people who I would expect care the most about the language are the package maintainers of these packages, and are (hopefully) implementing what they think are the best practices for the languages, and the eco-systems.
Taste is important; programmers with good architectural taste tend to use languages that support them in their endeavour (like Rust or Zig) or at least get out of the way (C).
So I would argue the problems you list are statistically less often the case than in certain other languages (from COBOL to JavaScript).
> There's just too many devs and all of them, including myself, don't always make the best choices.
This point you raise is important: I think an uncoordinated crowd of developers will create a "pile of crates" ("bazaar" approach, in Eric Raymond's terminology), and a single language designer with experience will create a more uniform class library ("cathedral" approach).
Personally, I wish Rust had more of a "batteries included" standard library with systematically named and namespaced official crates (e.g. including all major data structures) - why not "stdlib::data_structures::automata::weighted_finite_state_transducer" instead of a confusing set of choices named "rustfst-ffi", "wfst", ... ?
Ideally, such a standard library should come with the language at release. But the good news is it could still be devised later, because the Rust language designers were smart enough to build versioning with full backwards compatibility (but not technical debt) into the language itself. My wish for Rust 2030 would be such a stdlib (it could even be implemented using the bazaar of present-day crates, as long as that is hidden from us).
Not sure how long that’ll last.
> Also there's arguably design. Should a 'glob' library actually read the file system and give you filenames or should it just tell you if a string matches a glob and leave the reset to you?
There's a function in Node's stdlib that does this as well (albeit it's marked as experimental): https://nodejs.org/docs/latest-v24.x/api/path.html#pathmatch...
Taste is what Steve Jobs was referring to when he said Microsoft had none. In software it’s defined by a humane, pleasant design that almost(?) anybody can appreciate.
Programming languages cannot be tasteful, because they require time and effort to learn and understand. Python has some degree elegance and Golang’s simplicity has a certain je ne sais quoi… but they’re not really fitting the definition.
Still, some technologies such as git, Linux or Rust stand out as particularly obscure even for the average developer, not just average human.
how can one take an api as simple as openai's one, and turn it to this steaming pile of manure ? in the end, i used reqwest and created my queries manually. I guess that's what everyone does...
I was a bit shocked to be honest.
edit : i originally misread your comment. OpenAI is an important tech, no matter what you think of the company itself. Being able to easily interface with their api is important.
You'll notice these packages are not actually used by anything.
Programming is not the same as hanging out some hoity-toity art gallery. If someone critiqued my software dev by saying I had "no taste", I'd cringe so hard I'd turn into a black hole.
I know this is hackernews, but this reeks of self-importance.
Unfortunately this particular art form requires fluency in mathematics and the sciences/computers, so it’s very inaccessible.
No. Get over yourself.
In all seriousness: why do you insist on an intellectual totem pole with art being more “hoity toity” as you say, and more worthy? And if so, why is engineering below painting?
In mathematics and some hard sciences, and in absolutely elementary aspects of other fields like programming, things can be proven to be right or wrong. In every other endeavour, we need to repeatedly apply fuzzy judgements.
Some do this poorly, some do it well.
Just in case you actually are interested in nuance, here’s some more on the topic:
https://www.paulgraham.com/hp.html
https://nealstephenson.substack.com/p/idea-having-is-not-art
Imagine if a carpenter or house builder was shitting out slop that had no taste. And then laughed at people who pointed it out. Would you hire them to build something for you?
This is a problem with SE culture.
What are you even talking about? How does this even remotely relate to software development? Are you telling me a function that adds 2 numbers has "taste"?
If I were designing a new language I think I'd be very interested in putting some sort of capability system in so I can confine entire library trees safely, and libraries can volunteer somehow what capabilities they need/offer. I think it would need to be a new language if for no other reason than ecosystems will need to be written with the concept in them from the beginning.
For instance, consider an "image loading library". In most modern languages such libraries almost invariably support loading images from a file, directly, for convenience if nothing else. In a language that supported this concept of capabilities it would be necessary to support loading them from a stream, so either the image library would need you to supply it a stream unconditionally, or if the capability support is more rich, you could say "I don't want you to be able to load files" in your manifest or something and the compiler would block the "LoadFromFile(filename)" function at compile time. Multiply that out over an entire ecosystem and I think this would be hard to retrofit. It's hugely backwards incompatible if it is done correctly, it would be a de facto fork of the entire ecosystem.
I honestly don't see any other solution to this in the long term, except to create a world where the vast majority of libraries become untargetable in supply chain attacks because they can't open sockets or read files and are thus useless to attackers, and we can reduce our attack surface to just the libraries that truly need the deep access. And I think if a language came out with this design, you'd be surprised at how few things need the dangerous permissions.
Even a culture of minimizing dependencies is just delaying the inevitable. We've been seeing Go packages getting supply-chain-attacked and it getting into people's real code bases, and that community is about as hostile to large dependency trees as any can be and still function. It's not good enough.
In your particular example of image loading, you want WUFFS. https://github.com/google/wuffs
In WUFFS most programs are impossible. Their "Hello, world" doesn't print hello world because it literally can't do that. It doesn't even have a string type, and it has no idea how to do I/O so that's both elements of the task ruled out. It can however, Wrangle Untrusted File Formats Safely which is its sole purpose.
I believe there should be more special purpose languages like this, as opposed to the General Purpose languages most of us learn. If your work needs six, sixteen or sixty WUFFS libraries to load different image formats, that's all fine because categorically they don't do anything outside their box. Yet, they're extremely fast because since they can't do anything bad by definition they don't need those routine "Better not do anything bad" checks you'd write in a language like C or the compiler would add in a language like Rust, and because they vectorize very nicely.
Which goes both ways, it can be a Gtk+ application written in C, or Electron junk, as long as it works, they will use it.
Edit - note it's just tongue in cheek. Obviously libraries being developed against the public approval wouldn't be much of a good metric. Although I do agree that a bit more common culture of the Sans-IO principles would be a good thing.
As long as all library code is compiled/run from source, a compiler/runtime can replace system calls with wrappers that check caller-specific permissions, and it can refuse to compile or insert runtime panics if the language's escape hatches would be used. It can be as safe as the language is safe, so long as you're ok with panics when the rules are broken.
It'd take some work to document and distribute capability profiles for libraries that don't care to support it, but a similar effort was proven possible with TypeScript.
I think it could be possible in Rust with a linter, something like https://github.com/geiger-rs/cargo-geiger . The Rust compiler has some unsoundness issues such as https://github.com/rust-lang/rust/issues/84366 . Those would need fixing or linter coverage.
Perhaps a capability system could work like the current "feature" flags, but for the standard library, which would mean they could be computed transitively.
One can take just about any existing language and add this constraint, the problem however is it would break the existing ecosystem of libraries.
jq -r '.capabilityInfo[] | [.capability, .depPath | split(" ") | reverse | join(" ")] | @tsv'
Languages (plural) ... no single language will work for everyone.
In the absence of a technical solution, all others basically involve someone else having to audit and constantly maintain all that code and social/legal systems of trust. If it was pulled into Rust stdlib, that team would be stuck handling it, and making changes to any of that code becomes more difficult.
I don't see the advantage. Just a different axis of disadvantage. Take python for example. It has a crazy big standard library full of stuff I will never use. Some people want C++ to go in that direction too -- even though developers are fully capable of rolling their own. Similar problem with kitchen-sink libraries like Qt. "batteries included" languages lead to higher maintenance burden for the core team, and hence various costs that all users pay: dollars, slow evolution, design overhead, use of lowest common denominator non-specialised implementations, loss of core mission focus, etc.
I think Rust really needs to do more of this. I work with both Go and Rust daily at work, Go has its library game down -- the standard library is fantastic. With Rust it's really painful to find the right library and keep up for a lot of simple things (web, tls, x509, base64 encoding, heck even generating random numbers.)
For example there are currently 3, QUIC (HTTP/3) implementations for rust: Quiche (Cloudflare), Quinn, S2N-QUIC (AWS). They are all spec compliant, but may use different SSL & I/O backends and support different options. 2 of them support C/C++ bindings. 2 are async, 1 is sync.
Having QUIC integrated into the stdlib wouuld means that all these choices would be made beforehand and be stuck in place permanently, and likely no bindings for other languages would be possible.
Even cooler, if you want to only expose read operations, you can wrap the IO library in another library that only exposes certain commands (or custom filtering, etc).
EDIT: I should say this doesn't work with systems programming, since there's always unsafe or UB code.
Yet, if someone were to write a book which explained things properly (probably a 3000 word article would suffice to turn anyone into a 10x dev), nobody would buy it. This industry is cooked.
I'm sure I'll miss some, but IIRC C++ 26 is getting the entire BLAS, two distinct delayed reclamation systems and all of the accompanying infrastructure, new container types, and a very complicated universal system of units.
All of these things are cool, but it's doubtful whether any of them could make sense in a standard library, however for C++ programers that's the easiest way to use them...
It's bedlam in there and of course the same C++ programmers who claim to be "worried" that maybe somebody hid something awful in Rust's crates.io are magically unconcerned that copy-pasting tens of millions of lines of untested code from a third party into absolutely every C++ program to be written in the future could be a bad idea.
Is it really that bad? (By my count, as a point of reference, the Python 3.13 standard library is just under 900k lines for the .py files.)
With Rust, it’s literally a random third party.
On crates.io, a good heuristic is to look at two numbers: the number of dependents and the number of downloads. If both are high, it's _probably_ fine. Otherwise, I'll manually audit the code.
That's not a complete solution, especially not if you're worried about this from a security perspective, but it's a good approximation if you're worried about the general quality of your dependencies.
Tokio on the other hand is the library whose maintainer decided to download a binary blob during build: https://github.com/tokio-rs/prost/issues/562 https://github.com/tokio-rs/prost/issues/575
Good luck catching such issues across dozens of crates.
All three modern C++ standard libraries are of course Free Software. They are respectively the GNU libstdc++, Clang's libc++ and the Microsoft STL. Because it's a huge sprawling library, you quickly leave the expertise of the paid maintainers and you're into code that some volunteer wrote for them and says it's good. Sounds like random third parties to me.
Now, I'm sure that Stephan T. Lavavej (the Microsoft employee who looks after the STL, yes, nominative determinism) is a smart and attentive maintainer, and so if you provide a contribution with a function named "_Upload_admin_creds_to_drop_box" he's not going to apply that but equally Stephen isn't inhumanly good, so subtle tricks might well get past him. Similar thoughts apply to the GNU and Clang maintainers who don't have funny names.
Having paid maintainers, code review, test suites, strict contribution guidelines, etc is state of the art for open source software that some transitive crate dependency can only dream to achieve.
No, tons of the foundational Rust crates that show up in every dependency tree are first-party crates provided by the Rust project itself.
Applications still need the functionality. The need doesn't magically disappear when installing dependencies is a pain. If a crate has a bug, the entire ecosystem can trivially get the fixed version. If the Stackoverflow snippet a C app is vendoring has a bug, that fix is never getting in the app.
You don't install a Rust crate to use it. We have enough people in this thread trying to authoritatively talk about Rust without having any experience with it, please don't bother leaving a comment if you're just going to argue from ignorance.
I thought the whole UNIX mentality was worse is better.
No build tool is without issues, my pain points with cargo, are always compiling from source, build caching requires additional work to setup, as soon as it is more than pure Rust, we get a build.rs file that can get quite creative.
It requires huge storage, for each combination of targets, and even if it is was solved some members of Rust community would see it as a step back.
Me included. They are hard to audit and are step back to OSS nature of Rust.
The issue here is getting storage and compute for build artifacs for cargo. Cargo isn't the language though.
In my experience we have more complex methodologies to the same things, but the goals are not more complex.
Cargo makes it so simple to add tons of dependencies that it is really hard not to do it. But that does not stop here: even if I try to be careful with adding dependencies, a couple dependencies are likely to pull tens of transitive dependencies each.
"Then don't depend on them", you say. Sure, but that means I won't write my project, because I won't write those things from scratch. I could probably audit the dependency (if it wasn't pulling 50 packages itself), but I can't reasonably write it myself.
It is different with C++: I can often find dependencies that don't pull tens of transitive dependencies in C++. Maybe because it's harder to add dependencies, maybe because the ecosystem is more mature, I don't know.
But it feels like the philosophy in Rust is to pull many small packages, so it doesn't seem like it will change. And that's a pity, because I like Rust-the-language better than C++-the-language. It just feels like I trade "it's not memory-safe" for "you have to pull tons of random code from the Internet".
I think it makes a good point that some of the difference here is just perception due to dependencies in C/C++ being less immediately visible since they're dynamically loaded. To some degree that is a plus though as you likely trust the maintainers of your OS distribution to provide stable, supported libraries.
As other commenters have said, perhaps this is an area where the Rust maintainers could provide some kind of extended standard library where they don't guarantee backwards compatibility forever, but do provide guarantees about ongoing fixes for security issues.
It was also posted here, shortly before this thread: https://news.ycombinator.com/item?id=43934343
(And several times in the past, too.)
> I think it makes a good point that some of the difference here is just perception due to dependencies in C/C++ being less immediately visible since they're dynamically loaded.
The point wasn't so much about the loading mechanism, but about the fact that the system (especially on Linux) provides them for you; a good amount come pre-installed, and the rest go through a system package manager so you don't have to worry about the language failing to have a good package system.
Not in my case. I manually compile all the dependencies (either because I need to cross-compile, or because I may need to patch them, etc). So I clearly see all the transitive dependencies I need in C++. And I need a lot less than in Rust, by a long shot.
edit: Also, `cargo-vet` is useful for distributed auditing of crates. There's also `cargo-crev`, but afaik it doesn't have buy in from the megacorps like cargo-vet and last I checked didn't have as many/as consistent reviews.
In practice, does this make it feasible to pick and choose the pieces you actually need?
Regex is split into subcrates, one of which is regex-syntax: the parser. But that crate is also a dependency of over 150 other crates, including lalrpop, proptest, treesitter, and polars. So other projects have benefited from Regex being split up.
It's part of the reason why software distribution on Linux has been pushed to using containers, removing the point of having shared libraries. I think Google with it's C++ replacement (Carbon) plans on doing it's own system.
It could be better, but the current solutions (npm, go, python,...) favor only the developers, not the maintainers and packagers.
e.g. Bottles, WebkitGTK (distros liked keeping this one held back even though doing so is a security risk)
IMHO it shouldn't be the responsibility of the OS vendor to package third party applications.
That said, the labor needed to keep the stuff together could be reduced a lot by the more ergonomical and universal packaging and distribution methods like Cargo (and, dare I say, npm). I think some kind of a better bridge between developers and distros could be found here.
Every tom dick and harry is making their own distro these days (even if they're just respins of Arch with Calamares and some questionable theme settings), why add more work onto developers?
We have things like Flatpak and Docker now that let application developers ignore the distros and stop them breaking things, unless you're Ubuntu whom is constantly begging to get purchased by Microsoft.
I don’t think there’s a need to do so. Only discipline is needed, by using stable and mature dependencies, and documenting the building process. And maybe some guides/scripts for the most popular distros.
My understanding of people distributing their software in containers is that they can't be arsed to learn how to do it properly. They would install their software and ship the entire computer if that was cost effective.
This is not to denigrate the huge and critical effort that makes current computing possible, and that is likely unavoidable in the real world. But software distribution needs to evolve.
I don't find it incoherent, nor huge. Unless the bar for "huge" is "anything that requires more attention than asking an LLM and copy-pasting its answer", maybe.
That is not at all a problem for open source stuff: build your project correctly, and let distros do their job. Still, open source projects are too often doing it wrong, because nobody can be arsed to learn.
> as well as fighting with distro policies that would rather ship the software with known bugs than allow two versions of a library to exist on the system.
Sounds like if you need this, you're doing it wrong. If it's a major update (e.g. 2.3.1 to 3.0.0), it's totally possible to have a new package (say `python2` and `python3`). If your users need two versions of a library that are in the same major version (e.g. 2.3.1 and 2.5.4), then you as a developer are doing it wrong. No need to fight, just learn to do it properly.
I'm not sure it's a philosophy, more a pragmatic consideration for compilation speeds. Anyone who's done a non-trivial amount of Rust knows that moment when the project gets too big and needs to split into separate crates. It's kinda sad that you can't organize code according to proper abstractions, many times I feel forced to refactor for compiler performance.
You need to think a bit harder about that, to help you decide whether your position is rational.
In Rust, I'm sometimes actually tempted to wrap a C/C++ library (and its few dependencies) instead of getting the Rust alternative (and its gazillion dependencies).
It lets you track what packages you "trust". Then you can choose to transitively trust the packages trusted by entities you trust.
This lets you have a policy like "importing a new 3rd party package requires a signoff from our dependency tzar. But, packages that Google claim to have carefully reviewed are fine".
You can also export varying definitions of "trust". E.g. Google exports statements like:
- "this package has unsafe code, one of our unsafe experts audited it and thinks it looks OK"
- "this package doesn't do any crypto"
- "this is a crypto library, one of our crypto experts audited it and thinks it looks ok"
https://github.com/google/rust-crate-audits/blob/main/auditi...
Basically it's a slightly more formal and detailed version of blessed.rs where you can easily identify all the "it's not stdlib, but, it's kinda stdlib" stuff and make it easily available to your team without going full YOLO mode.
It can also give you a "semi-YOLO" approach, it supports rationales like "this package is owned by a tokio maintainer, those folks know what they're doing, it's probably fine". I think this is a nice balance for personal projects.
tokio is a work-stealing, asynchronous runtime. This is a feature that would be an entire language. Does OP consider it reasonable to audit the entire Go language? or the V8 engine for Node? v8 is ~10x more lines than tokio.
If Cloudflare uses Node, would you expect Cloudflare to audit v8 quarterly?
This is something I've only ever seen cargo do.
No, cargo will resolve using sem ver compatibility and pick the best version. Nuget, for C# does something very similar.
npm does this (which causes [caused?] the node_modules directory to have a megazillion of files usually, but sometimes "hoisting" common dependencies helps, and there's Yarn's PnP [which hooks into Node's require() and keeps packages as ZIPs], and pnpm uses symlinks/hardlinks)
But that's not practical for all situations. For example, Web frontend developer culture might be the worst environment, to the point you often can't get many things done in feasible time, if you don't adopt the same reckless practices.
I'm also seeing it now with the cargo-culting of opaque self-hosted AI tools and models. For learning and experimenting, I'd spend more time sufficiently compartmentalizing an individual tool than with using it.
This weekend, I'm dusting off my Rust skills, for a small open source employability project (so I can't invest in expensive dependency management on this one). The main thing thing bothering me isn't allocation management, but the sinking feeling when I watch the cast-of-thousands explosion of transitive dependencies for the UI and async libraries that I want to use. It's only a matter of time before one of those is compromised, if not already, and one is all it takes.
Devs can add whatever they feel like on their workstations but it will be a sad build server if they get pushed without permission.
Anything else will get abused in the name of expediency and just-this-one-time.
Also, the process for adding a crate/gem/module/library needs to be the same as anything else: license review, code review, subscription to the appropriate mailing list or other announce channel, and assignment of responsibility. All of these except code review can be really, really fast once you have the process going.
All problems are, at least in part, dependency chain management problems.
The dependency trees for most interpreted or source-distributed languages are ridiculous, and review of even a few of those seems practically impossible in a lot of development environments.
It's an obvious one, but distasteful to many people.
Would you care to state the obvious very clearly, for the dense ones among us?
A compromised dev machine is also a problem.
it's complex problem with tons of partial solutions which each have tons of ways to implement them with often their no being a clear winner
i.e. it's the kind of hard to solve by consensus problem
e.g. the idea of a extended standard library is old (around since the beginning of rust) but for years it was believed it's probably the best to make it a separate independent project/library for various reason. One being that the saying "the standard library is the place where code goes to die" has been quite true for multiple ecosystems (most noticeably python)
as a side note ESL wouldn't reduce the LOC count it would increase it as long as you fully measure LOCs and not "skip" over some dependencies
There's literally 1000s of RFCs for rust with only a small handful that are integrated. Having this forest, IMO, makes it hard for any given proposal to really stand out. Further, it makes duplicate effort almost inevitable.
Rust's RFC process is effectively a dead letter box for most.
1. Well defined scope
2. Infrequent changes
Nomad has many of these (msgpack, envparse, cli, etc). These dependencies go years without changing so the dependency management burden rapidly approaches zero. This is an especially useful property for “leaf” dependencies with no dependencies of their own.
I wish libraries could advertise their intent to be Mature. I’d choose a Mature protobuf library over one that constantly tweaked its ergonomics and performance. Continual iterative improvement is often a boon, but sometimes it’s not worth the cost.
It is an easy way to get a somewhat OK standard library as the things you add became popular on their own merits at some point.
Once added, the lowest friction path is to just use the standard library; and as it is the standard library you have a slightly better hope someone will care to maintain it. You can still build a better one if needed for your use-case, but the batteries are included for basic usage
If you want a mature protobuf implementation you should probably buy one. Expecting some guy/gal on the internet to maintain one for your for free seems ill advised.
Nobody is asking for professional quality standards from hobby projects. At best, they are asking for hobby projects to advertise themselves as such, and not as "this is a library for [x] that you can use in your stuff with the expectations of [maintenance/performance/compatibility/etc.]."
Resume-driven development seems to cause people to oversell their hobby projects as software that is ready to have external users.
> If you want a mature protobuf implementation you should probably buy one
No software is ever developed this way. For some reason, libraries are always free. Approximately nobody will buy paid libraries.
That's also work. You don't get to ask the hobby programmer to do your work of vetting serious/maintained projects for you. As the professional with a job, you have to do that. If some rando on GitHub writes in their readme that it's maintained, but lies. You're the idiot for believing him. He's probably 12 years old, and you're supposedly a professional.
> No software is ever developed this way.
That's just inaccurate. In my day job we pay for at least 3-4 3rd party libraries that we either have support contracts on or that were developed for us along with a support contract. Besides those there's also the myriad of software products, databases, editors, Prometheus, grafana, that we pay for.
Software people really underestimate how much business guys are willing to pay for having somebody to call. It's not "infinitely scalable" in the way VC's love, but it's definitely a huge business opportunity.
That means the only threat you have as a producer of code (once the code is handed over) is the threat of withdrawing service. That means the only ways to sell licenses are:
* Build your own licensing service (or offer SaaS)
* Sell the code for a high price upfront
* Sell service contracts
I suspect this is in no small part because figuring out a licensing (edit: pricing!) model that is both appealing to consumers and sustainable for authors is damn near impossible.
Also there are lots of lovely projects maintained at high levels by hobbyists, and plenty of abandonware that was at some point paid for
There certainly are. I would never say to disregard anything because it was a hobby project. You just don't get to expect it being that way.
My basic point is that a hobby project can never take responsibility. If you have a support contract you are allowed to have some expectation of support. If you do not, then no expectation is warranted and everything you get is a gift.
A "mature" label carries the same problem. You are expecting the author to label something for you. That's work. If you're pulling from the commons, you must respect that people can label stuff whatever they like, and unmotivated blanket lies are not illegal.
I will say I get great satisfaction from the little envparse library I wrote needing near-0 maintenance. It’s a rare treat to be able to consider any project truly done.
cargo tree helps a lot on viewing dependency tree. I forgot if it does LoC count or not..
> to see what lines ACTUALLY get compiled into the final binary,
This doesn't really make much sense as a lot of the functions that make it to the binary get inlined so much that it often becomes part of 'main' function
Meanwhile, the heaviest JavaScript parser implemented in JavaScript is more lightweight.
I decided that I should leave this project alone and spend my time elsewhere.
Zero-cost abstractions don't have zero-cost debug info. In fact, all of the optimized-away stuff is intentionally preserved with full fidelity in the debug info.
Also, repository size seems an extremely irrelevant metric.
Repository size is directly related to how long it takes to run a build, which is extremely important if I were to contribute to the project.
> Serde seems exactly a case where you absolutely should use an external dependency.
I can't see any reason a parser has a hard dependency on a serialization library.
Which ones are superfluous?
There are good reasons to use dependencies. If someone has solved a problem you need to solve as well it is pointless to duplicate the effort.
>Repository size is directly related to how long it takes to run a build, which is extremely important if I were to contribute to the project.
Totally false. There is zero inherent relation.
>I can't see any reason a parser has a hard dependency on a serialization library.
And because you can't see a reason there is none?
It is totally meaningless to talk about any of this if you can not point out why this is superfluous.
> And because you can't see a reason there is none?
Somehow every other JS based parser doesn't do fancy serialization, as far as I can tell. You can come up with reasons of why one might need it, but as a user of the parser, I want the footprint to be small, and that's a requirement. In fact, that's one of the reasons I never used swc parser in my serious projects.
That you in particular might have no use for the features they bring couldn't be more irrelevant. What other parsers are doing could also not be more irrelevant.
No, because I don't have to answer that question. I can simply choose not to use this project, like what I do with npm projects. There is a project that's 500kb in code with 120 dependencies, when another one is 100kb with 10 dependencies that's also well maintained? I'll choose the latter without question, as long as it satisfies my needs. I don't care why the other one has 120 dependencies or try to justify that.
>There is a project that's 500kb in code with 120 dependencies
And therefore some project using 13 dependencies is doing it wrong? What are you on about. Obviously there is an enormous abuse of dependencies in the JS ecosystem, who cares?
Also they did point out that the parser depends on a serialisation library, so you're also mistaken about parent thinking the dependencies are necessary.
On another note, this pervasive kind of passive aggressive, hand-wavy, tribalistic, blind defense of certain technologies speak volumes about their audiences.
> Meanwhile, the heaviest JavaScript parser implemented in JavaScript is more lightweight.
The lightest weight javascript program relies on V8 to run, which has multiple orders of magnitude more dependencies. Most of which you have never heard of.
At least cargo makes it easier to get a clearer picture of what the dependencies are for a program.
If you have one huge dep it's easier to keep track you're on the latest update, also it's much less likely you'll fat finger it and import something typosquatting.
Also if you're in enterprise you'll have less 100 page SBOM reports.
Keeping track of the latest version is trivial with cargo.
consider the probabilities
>What is more likely to be vulnerable,
At the end of the day you are at much higher risks of one of those 10 packages getting owned by some external party and suddenly the next version is pulling a bitcoin miner, or something that steals everything it can from your CI/CD, or does a take over on your customers.
And it's never 10 (well at least for JS), it's hundreds, or if you're team is insane, thousands.
Actually, this isn't true. (Or at least wasn't a while back.) I used to work with a bunch of ex-V8 folks and they really despised third-party dependencies and didn't trust any code they didn't write. They used a few third-party libs but for them most part, they tried to own everything themselves.
.. as in they can afford to rewrite everything
.. can afford to suffer from not invented here syndrome
.. and are under _massive_ threat of people doing supply chain attacks compared to most other projects (as they end up running on nearly any desktop computer and half the phones out there)
this just isn't viable for most projects, not just resource/time investment wise, but also reinventing/writing everything isn't exactly good to reduce bugs if you haven't to reliably access to both resources _and_ expertise. Most companies have to live with having many very average developers, and very tight resource limits.
These have Zero dependencies. It's not rare in Go land.
- https://github.com/go-chi/chi 19k stars
- https://github.com/julienschmidt/httprouter 16k stars
- https://github.com/gorilla/mux 21k stars
- https://github.com/spf13/pflag 2.6k stars
- https://github.com/google/uuid 5.6k starts
Many others have just a few dependencies.
I feel Telegraf made a good compromise: out of the box, it comes with a _ton_ of stuff[1] to monitor everything, but they make it possible to build only with pieces that you need via build tags, and even provide a tool to extract said tags from your telegraf config[2]. But lots of supply-chain security stuff assume everything in go.mod is used, so that can results in a lot of noise.
[1] https://github.com/influxdata/telegraf/blob/master/go.mod [2] https://github.com/influxdata/telegraf/tree/master/tools/cus...
Curate a collection of libraries you use and trust. This will probably involve making a number of your own. Wheel-reinvention, if you will. If done properly, even the upfront time cost will save in the long-run. I am in the minority here, but I roll my own libs whenever possible, and the 3rd party libs I use are often ones I know, have used been for, and vetted that they have a shallow tree of their own.
Is this sustainable? I don't know. But It's the best I've come up with, in order to use what I see as the best programming language available for several domains.
There are a lot of light-weight, excellent libs I will use without hesitation, and have wide suitability. Examples:
- num_enum
- num_traits
- bytemuck
- chrono
- rand
- regex
- bincode
- rayon
- cudarc
Heavier, and periodically experience mutual-version hell, but are are very useful for GUI programs: - EGUI
- WGPU
- Winit
On a darker note, the rust web ecosystem maybe permanently lost to async and messy dependencies. Embedded is going that way too, but I have more hope there, and am doing my best to have my own tooling.Real world software ecosystems evolve slowly, requiring years of debate to shift.
- From HN's most outspoken Rust critic
For example, `tokio::spawn()` returns a task handle that lets the task keep running after the handle is dropped. `smol::spawn()` cancels the task when the task handle is dropped.
General async cancellation requires infrastructure mechanisms, and there are multiple reasonable designs.
Letting things settle in the ecosystem is a great way to find the best design to eventually incorporate in the standard library, but it takes time.
Big things you use off-the-shelf libraries for. Small things you open-code, possibly by cribbing from suitably-licensed open source libraries. You bloat your code to some degree, but reduce your need to audit external code and reduce your exposure to supply chain attacks. Still, the big libraries are a problem, but you're not going to open code everything.
This isn't just Rust. It's everything.
I should have added: "and for obvious reasons".
Didn't computer science hype up code reuse for decades before it finally started happening on a massive scale? For that to actually happen we needed programming languages with nice namespaces and packaging and distribution channels. C was never going to have the library ecosystem that Java, C++, and Rust have. Now that we're there suddenly we have a very worrisome supply chain issue, with major Reflections on Trusting Trust vibes. What to do? We can't all afford to open-code everything, so we won't, but I recommend that we open-code all the _small_ things, especially in big projects and big libraries. Well, or maybe the AI revolution will save us.
How much maintenance could you possibly need to load secrets from .env into the environment.
What this means in practice is that the call to invoke dotenv should also be marked as unsafe so that the invoker can ensure safety by placing it at the right place.
If no one is maintaining the crate, that won’t happen and someone might try to load environment variables at a bad time.
whatever the issue is, "setting an env var is unsafe" is so interesting to me that I'm now craving a blog post explaining this
> Achtung! This is a v0.* version! Expect bugs and issues all around. Submitting pull requests and issues is highly encouraged!
ZeroVer https://0ver.org/
_is fundamentally unsound thanks to unix/posix_
no way around that
hence why set env wasn't marked as unsafe _even through it not being fully save being known since extremely early rust days maybe even 1.0_
it not being unsafe wasn't a oversight but a known to not be fully sound design decision which had been revisited and changed in recently
non
small "completed" well tested libraries being flagged as security issues due to being unmaintained seem to be starting to become an issue
The tools I have found useful are:
cargo outdated # check for newer versions of deps
cargo deny check # check dependency licenses
cargo about # generate list of used licenses
cargo audit # check dependencies for known security issues
cargo geiger # check deps for unsafe rust
I haven't found a cargo tool I like for generating SBOMs, so I installed syft and run that.
cargo install-update # keep these tools updated
cargo mutants # not related to deps, but worth a mention, used when testing.
Having configured all these tools once and simply unzipping a template works well for me.
Suggestions for different or additional tools welcome!
Disclaimer: I'm not a professional rust developer.
I did this and it only solved half of the bloat:
https://crates.io/crates/safina - Safe async runtime, 6k lines
https://crates.io/crates/servlin - Modular HTTP server library, threaded handlers and async performance, 8k lines.
I use safina+servlin and 1,000 lines of Rust to run https://www.applin.dev, on a cheap VM. It serves some static files, a simple form, receives Stripe webooks, and talks to Postgres and Postmark. It depends on some heavy crate trees: async-fs, async-net, chrono, diesel, rand (libc), serde_json, ureq, and url.
2,088,283 lines of Rust are downloaded by `cargo vendor` run in the project dir.
986,513 lines using https://github.com/coreos/cargo-vendor-filterer to try to download only Linux deps with `cargo vendor-filterer --platform=x86_64-unknown-linux-gnu`. This still downloads the `winapi` crate and other Windows crates, but they contain only 22k lines.
976,338 lines omitting development dependencies with `cargo vendor-filterer --platform=x86_64-unknown-linux-gnu --keep-dep-kinds=normal`.
754,368 lines excluding tests with `cargo vendor-filterer --platform=aarch64-apple-darwin --exclude-crate-path='*#tests' deps.filtered`.
750k lines is a lot to support a 1k-line project. I guess I could remove the heavy deps with another 200 hours of work, and might end up with some lean crates. I've been waiting for someone to write a good threaded Rust Postgres client.
Why the need for going back to threaded development ?
2. Async Rust has a lot of papercuts.
3. Very little code actually needs async. For example, in an API server, every request handler will need a database connection so the concurrency is limited by the database.
I wrote the Servlin HTTP server in async rust, to handle slow clients, but it calls threaded request handlers.
Out of those 3.6 million lines, how many are lines of test code?
It's a problem because those people become overworked, and eventually have to abandon things. The deprecation of `serde_yaml` was and is a huge, huge problem, especially without any functional replacement. There was no call for new maintainers, or for someone to take over the project. I can understand the reasons why (now you're suddenly auditing people, not code), but it sucks.
You can ensure that third-party Rust dependencies have been audited by a trusted entity with cargo-vet.
And you should have taken a look at where those 3M locs come from, it's usually from Microsoft's windows-rs crates that are transitively included in your dependencies through default features and build targets of crates built to run on windows.
The author is right there's no way an individual can audit all that code. Currently all that code can run arbitrary build code at compile time on the devs machine, it can also run arbitrary unsafe code at runtime, make system calls, etc..
Software is not getting simpler, the abundance of high quality libraries is great for Rust, but there are bound to be supply chain attacks.
AI and cooperative auditing can help, but ultimately the compiler must provide more guarantees. A future addition of Rust should come with an inescapable effect system. Work on effects in Rust has already started, I am not sure if security is a goal, but it needs to be.
This is the way.
It's unfortunate that the response so far hasn't been very positive
Historically this process has been mostly informal; going forward they're trying to make sure that things get removed at a specific point after their deprecation. Python has also now adopted an annual release cadence; the combination of that with the deprecation policy effectively makes their versioning into a pseudo-calver.
Perhaps because it's a good idea.
The stdlib probably should remain simple, in my opinion. The complexity should be optional.
The problem with third party frameworks is that by definition (of being third party) there's no single standard one to use, so libraries use different ones, and then you end up with a mess of dependencies, or else multiple incompatible ecosystems, each duplicating the same functionality around a different framework.
The beauty of stdlib is that it's guaranteed to be there, meaning that any library can use it as needed. It also improves interop between libraries because the same concepts are represented by the same standard types. And even when parts of stdlib are optional - which by necessity they must be for any "batteries included" stdlib because e.g. embedded is a thing - it's still beneficial to library authors because they know that, for any given feature, all supported platforms that do have it have stdlib expose it in the same way.
The people working on Rust are a finite (probably overextended!) set of people and you can't just add more work to their plate. "Just" making the standard library bigger is probably a non-starter.
I think it'd be great if some group of people took up the very hard work to curate a set of crates that everyone would use and provide a nice façade to them, completely outside of the Rust team umbrella. Then people can start using this Katamari crate to prove out the usefulness of it.
However, many people wouldn't use it. I wouldn't because I simply don't care and am happy adding my dependencies one-by-one with minimal feature sets. Others wouldn't because it doesn't have the mystical blessing/seal-of-approval of the Rust team.
A lot.
Like, a lot a lot a lot. Browse through any programming language that has an open issue tracker for all the closed proposals sometime. Individually, perhaps a whole bunch of good ideas. The union of them? Not so much.
- All included crates can be tested for inter-compatibility
- Release all included crates under a single version, simplifying upgrades
- Sample projects as living documentation to demo integrations and upgrades
- Breaking changes can be held until all affected crates are fixed, then bump all at once
- An achievable, valuable, local goal for code review / crev coverage metrics
There could be general "everything and the kitchen sink" metalibraries, metalibraries targeted at particular domains or industries, metalibraries with different standards for stability or code review, etc. It might even be valuable enough to sell support and consulting...We do not need to saddle Rust with garbage that will feel dated like Python's standard library. Cargo does the job just fine. We just need some high quality optional batteries.
Embedded projects are unlikely to need standard library bloat. No_std should be top of mind for everyone.
Something that might make additional libraries feel more first class: if cargo finally got namespaces and if the Rust project took on "@rust/" as the org name to launch officially sanctioned and maintained packages.
Python packaging is somehow a 30 year train crash that keeps going, but the standard library is good enough that I can do most things without dependencies or with very small number of them.
I think what you're suggesting is a great idea for a new standard library layer, you're just not using that label. A set of packages in a Rust namespace, maintained by the same community of folks but under policies that comply with best practices for security and some additional support to meet those best practices. The crates shouldn't be required, so no_std should work just as it would prior to such a collection.
Rust, as a systems language, is quite good at working on a variety of systems.
And the systems language remark, I am still looking forward when sorting ABI issues for binary libraries is finally something that doesn't need to go through solutions designed for C and C++.
Python's standard library is a strength, not a weakness. Rust should be so lucky. It's wonderful to have basic functionality which is guaranteed to be there no matter what. Many people work in environments where they can't just YOLO download packages from the Internet, so they have to make do with whatever is in the stdlib or what they can write themselves.
Rust is luckier. It has the correct approach. You can find every battery you need in crates.io.
Python has had monstrosities like urllib, urllib2, http, etc. All pretty much ignored in favor of the external requests library and its kin. The standard library also has inconsistencies in calling conventions and naming conventions and it has to support those *FOREVER*.
The core language should be pristine. Rust is doing it right. Everything else you need is within grasp.
Not to mention abysmal designs inspired by cargo-cult "OOP" Java frameworks from the 90s and 00s. (Come on, folks. Object-oriented programming is supposed to be about objects, not about classes. If it were about classes, it would be called class-oriented programming.)
Standard response every time there is some criticism of Rust.
My counter argument is that the "batteries included" approach tends to atrophy and become dead weight.
Your counter seems to be "that's not an argument, that's just Rust hype."
Am I interpreting you correctly? Because I think my argument is salient and correct. I don't want to be stuck with dated APIs from 20 years of cruft in the standard library.
The Python standard library is where modules go to die. It has two test frameworks nobody uses anymore, and how many XML libraries? Seven? (The correct answer is "four", I think. And that's four too many.) The Python standard library has so much junk inside, and it can't be safely removed or cleaned up.
A standard library should be data structure/collections, filesystem/os libraries, and maybe network libraries. That's it. Everything else changes with too much regularity to be packed in.
There is a single datetime library. It covers 98% of use cases. If you want the final 2% with all the bells and whistles you can download it if you wish. There is a single JSON library. It's fast enough for almost anything you want. If you want faster libraries with different usability tradeoffs you can use one but I have never felt compelled to do so.
Same thing with CSV, filesystem access, DB api, etc. They're not the best libraries at the time of any script you're writing, but the reality is that you never really need the best, most ergonomic library ever to get you through a task.
Because of this, many big complex packages like Django have hardly any external dependencies.
If anything you're not the one getting stuck with date APIs; it's the Python core devs. Maintainers of other packages are always free to choose other dependencies, but they almost invariably find that the Python stdlib is good enough for everything.
https://dev.arie.bovenberg.net/blog/python-datetime-pitfalls...
But now you're stuck with it forever.
Python is packed full with this shit. Because it wasn't carefully planned and respect wasn't given to decisions that would last forever.
Python has two testing frameworks baked in, neither of which is good.
Python has historically had shitty HTTP libraries and has had to roll out several versions to fix the old ones because it couldn't break or remove the old ones. Newbies to the language will find those built in and will write new software with the old baggage.
Batteries included is a software smell. It's bad. You can't change the batteries even after they expire.
Your arguments seem to come from someone who doesn't have substantial software engineering experience in large systems.
All large software systems and most effective software uses libraries that are not generally super modern and not necessarily the best of the best, but they are well-understood.
In your example for datetime libraries, notice that the writer immediately ignores libraries that at some point were better than the stdlib library, but are now unmaintained. That by itself is already a red flag; it doesn't matter that a library is better if there is a large risk that it is abandoned.
Notice that no single library in the examples mentioned solves all the problems. And notice that there is no such thing as a datetime library anywhere that has consistent, uniform and order-of-magnitude improvements such that they merit dropping the stdlib.
The stdlib is _good enough_. You can build perfectly good business systems that work reasonably well and as long as you have a couple basic ideas down about how you lay down datetime usage you'll be mostly fine. I've been working with Python for over 15 years and any time I picked a different datetime library it was jut an additional maintenance burden.
> But now you're stuck with it forever.
You're "stuck" with whatever datetime library you choose. One day your oh-so-great datetime library is going to be legacy and you'll be equally bamboozled in migrating to something better.
I've heard this argument about SQLAlchemy, the Django ORM, and various other packages. The people that chose to go somewhere less maintained are now stuck in legacy mode too.
> Python is packed full with this shit. Because it wasn't carefully planned and respect wasn't given to decisions that would last forever.
This is pure ignorance. There's not a single language standard library that is absolutely amazing. Yet the batteries-included approach ends up being a far better solution long term when you look at the tradeoffs from an engineering perspective.
> Python has two testing frameworks baked in, neither of which is good.
They are good enough. They have broad support with tons of plugins. They have assertions. They get the basics right and I've had success getting useful tests to pass. This is all that matters; you tests don't become magically better because you decided to use nose or whatever framework of the day you choose.
> Python has historically had shitty HTTP libraries and has had to roll out several versions to fix the old ones because it couldn't break or remove the old ones. Newbies to the language will find those built in and will write new software with the old baggage.
The current python docs recommend requests, and requests is a well-established package that everyone uses and is not at risk of being outdated, as it's been the go-to standard for over a decade. This is fine. If you're a library writer you're better off using urllib3 and avoiding an additional dependency.
> Batteries included is a software smell. It's bad. You can't change the batteries even after they expire.
Try to revive a line of business Node.JS app written 10 years ago with hundreds of outdated dependencies. An equivalent Python app will have a half dozen dependencies at most and if you stuck to the most popular package there's a really high change an upgrade will be smooth an easy. I've done this multiple times; tons of colleagues have had to do this often. Python's decision makes this tremendously easy.
So sorry, if you're aiming for library perfection, you're not aiming for writing maintainable software. Software quality happens on the aggregate, not in choosing the fancies most modern thing.
There is a tradeoff here. Having a large, but badly maintained, standard library with varying platform support is worse than having a smaller, but well maintained, one.
Golang's core dev team is something like 30 people.
So Rust does have the resources.
1. Not every contributor contributes equally. Some contributors work full time on the project, some work a few hours a month.
2. The amount of contributors says nothing about what resources are actually required. Rust is, no doubt, a more complex language than go and is also evolving faster.
3. The amount of contributors says nothing about the amount of contributors maintaining very niche parts of the ecosystem.
However I rather have cruft that works everywhere the toolchain is fully implemented, instead of playing whack-a-mole with third party libraries when only some platforms are supported.
The C++ standard library even has this problem for something as basic as formatting (iostreams), and now it has two solutions for the same problem.
If I was to design a Rust 2.0, I'd make it so dependencies need permissions to access IO, or unsafe code, etc.
I believe that it causes more problems than it solves, but it can be a solution to the problem of adding thousands of lines of code of dependency when you could write a 10-line function yourself.
Of course, the proper thing to do is not to be the wrong kind of lazy and to understand what you are doing. I say the wrong kind of lazy because there is a right kind of lazy, and it is about not doing things you don't need to, as opposed to doing them poorly.
The solution space is basically infinite, and that's a good thing for a systems programming language. It's kind of amazing how far rust reaches into higher level stuff, and I think the way too easy to use package manager and lively crate ecosystem is a big part of that.
Sometimes I wish for a higher-level rust-like language though, opinionated as hell with garbage collector, generic functions without having to specify traits, and D's introspection.
https://docs.osgi.org/specification/osgi.core/7.0.0/framewor...
"How OSGi Changed My Life" (2008) https://queue.acm.org/detail.cfm?id=1348594
All Lego components have the same simple standard mechanism: friction coupling using concave and convex surface elements of the component. Unix pipes are the closest thing we have to a Lego like approach and there the model of "hooking pipes of bytes from sources to sinks" actually represents what happens with the software.
With components and APIs, unless we resort to some universal baseline (such as a small finite semantic API like REST's "verbs") that basically can marshall and unmarshall any arbitrary function call ('do (func, context, in-args, out-args, out-err)' the Lego metaphor break down very quickly.
The second issue are the modalities of 'interactions' between components. So this is my first encounter with "Sans-IO" (/g) but this is just addressing the interactions issue with a fiat 'no inter-actions by components'. So Lego for software: great overall expression of desired simplicity, but not remotely effective as a generative concept and imo even possibly detrimental (as it over simplifies the problem).
Now we have 2 different pieces of software tech that somewhat have managed to arrive at component orientation: using a finite set of predefined components to build general software. One is GUI components, where a small set of visual components and operational constructs ("user-events", etc.) with structural and behavioral semantics are used to create arbitrary visual interfaces for any ~domain. The other is WWW where (REST verbs of) HTTP also provide a small finite set of 'components' (here architectural) to create arbitrary services. With both, there is the tedious and painful process of mapping domain semantics to structural components.
So we can get reusable component oriented software (ecosystems) but we need to understand (per lessons of GUIs and WebApps) that a great deal of (semantic) glue code and infrastructure is necessary, just as a lot of wiring (for GUIs) and code frameworks (for WebApps) are necessary. That is what something like OSGi brings to the table.
This then leads to the question of component boundary and granulity. With things like DCOM and JEE you have fine grained components aggregated in process boundaries. The current approach is identifying process boundary as component boundary (docker, k8, microservices) (and doing away with 'application servers' in the process).
I agree that this is generally what happens, and I would like to suggest that there is a better, harder road we should be taking. The work of programming may be said to be translation, and we see that everywhere: as you say, mapping domain semantics (what-to-do) to structural components (how-to-do-it), and compilers, like RPC stub generation. So while a few verbs along the lines of RPC/REST/one-sided async IPC are the domain of the machine, we programmers don't work well with that. It's hard to agree, though, and that's not something I can sidestep. I want us to tackle the problem of standardization head-on. APIs should be easy to define and easy to standardize, so that we can use richly typed APIs with all the benefits that come from them. There's the old dream of making programs compose like procedures do. It can be done, if we address our social problems.
> The second issue are the modalities of 'interactions' between components. So this is my first encounter with "Sans-IO" (/g) but this is just addressing the interactions issue with a fiat 'no inter-actions by components'. So Lego for software: great overall expression of desired simplicity, but not remotely effective as a generative concept and imo even possibly detrimental (as it over simplifies the problem).
I'm not sure what you mean, so I may be going off on a tangent, but Sans IO, capabilities, dependency injection etc. are more about writing a single component than any inter-component code. The part that lacks IO and the part that does IO are still bundled (e.g. with component-as-process). There is a more extensive mode, where whoever controls a local subsystem of components decides where to put the IO manager.
> Now we have 2 different pieces of software tech that somewhat have managed to arrive at component orientation: using a finite set of predefined components to build general software.
> So we can get reusable component oriented software (ecosystems) but we need to understand (per lessons of GUIs and WebApps) that a great deal of (semantic) glue code and infrastructure is necessary, just as a lot of wiring (for GUIs) and code frameworks (for WebApps) are necessary.
I agree, which is why I want us to separate the baseline components from more powerful abstractions, leaving the former for the machine (the framework) and the latter for us. Does the limited scope of HTTP by itself mean we shouldn't be able to provide more semantically appropriate interfaces for services? The real issue is that those interfaces are hard to standardize, not that people don't make them.
We're likely in general agreement in terms of technical analysis. Let's focus on the concrete metric of 'economy' and hand-wavy metric of 'natural order'.
Re the latter, consider the thought that 'maybe the reason it is so difficult to standardize interfaces is because it is a false utopia?'
Re the former, the actual critical metric is 'is it more ecomical to create disposable and ad-hoc systems, or, to amortize the cost of a very "hard" task across 1 or 2 generations of software systems and workers?'
Now the industry voted with its wallets and blog propaganda of 'fresh engineers' with no skin in the component oriented approach in early '00s. That entire backlash that included "noSQL" movement was in fact, historically, a shift mainly motivated by economic considerations aided by a few black swans, like Linux and containarization. But now, the 'cost' of the complexity of assembly, deployment, and orchestration of a system based on that approach is causing information overload on the workers. And now we have generative AI, which seems to further tip the economic balance in favor of the late stage ad-hoc approach to putting a running system together.
As to why I used 'natural order'. The best "Lego like" system out there is organic chemistry. The (Alan) Kay vision of building code like nature builds organisms is of course hugely appealing. I arrived at the same notions independently when younger (post architecture school) but what I missed then and later realized is that the 'natural order' works because of the stupendous scales involved and the number of layers! Sure, maybe we can get software to be "organic" but it will naturally (pi) present the same perplexity to us as do biological systems. Do we actually fully understand how our bodies work?
(Just picking old professional scabs here)
a) Ginger Bill (the Odin language creator, no affiliation) stated on a podcast that Odin will never have an official pkg manager, since what they're, in his opinion, mainly automating is dependency hell, and this being one of the main reasons for rising software complexity and lower software quality; see https://www.youtube.com/watch?v=fYUruq352yE&t=11m26s (timestamped to the correct position) (they mention Rust explicitly as an example)
b) another programmer rather seriously worried about software quality/complexity is Jonathan Blow, who's talk "Preventing the Collapse of Civilization" is worth watching in my opinion: https://www.youtube.com/watch?v=ZSRHeXYDLko (it's not talking about package managers specifically, but is on topic regarding software complexity/quality as a whole)
Addendum: And sorry, I feel like almost everyone knows this xkcd by now, but since no one so far seems to have posted it; "obligatory xkcd reference": https://imgs.xkcd.com/comics/dependency_2x.png
The cognitive dissonance for how one can believe that Rust preventing you from derefercing freed memory at compile time is overzealous nannying by the language authors -- while at the same time deliberately making code reuse harder for users because they could make engineering decisions he doesn't like is staggering.
Perhaps this explains why Odin has found such widespread usage and popularity. /s
> b)... Jonathan Blow, who's talk "Preventing the Collapse of Civilization"
With such a grandiose title, before I first watched I thought it must be satire. Turns out, it is food for the credulous. I believe Jonathan Blow is less "seriously worried about software quality/complexity" than he is about marketing himself as the "last great hope". At least Blow's software has found success within its domain. However, I fear Blow's problem is the problem of all intellectuals: “An intellectual is a person knowledgeable in one field who speaks out only in others.” Blow has plenty of opinions about software outside his domain, but IMHO very little curiosity about why his domain may be different than your own.
My own opinion is there is little evidence to show this is a software quality problem, and any assertion that is the case needs to compare the Rust model against the putatively "better" alternatives. Complex software, which requires many people to create, sometimes across great distances of time and space, will necessarily have and require dependencies.
Can someone show me a material quality difference between ffmpeg, VLC, and Samba dependencies and any sufficiently complex Rust program (even which perhaps has many more dependencies)?
~ ldd `which ffmpeg` | wc -l
231
Now, large software dependency graphs may very well be a security problem, but it is a problem widely shared with all other software. "Perhaps this explains why Odin has found such widespread usage and popularity. /s"
What an unnecessarily snark and dismissive comment to make about someone's work. - I'd say within a certain niche Odin is becoming well known and gets its use
- you do realize using an `Odin package` is putting a program into a sub-folder and that's it
- It comes with a rich stdlib + vendor libraries out of the box
- and isn't it kind of up to the creators how to design and promote their language
I'd even argue it's laudable a language doesn't promote itself as a "fixes everything use me at all costs" kind of technology. The creator himself tells people it might not be the right tool for them/their use case, encourages them to try other languages too, sometimes outright tells them Odin doesn't fit their needs and xyz would probably do better.Odin is pragmatic & opinionated in its language design and goal. Maybe the lack of a package manager is the basis for you to disregard a programming language, for plenty of others (and likely more Odin's target group) it's the least of their concerns when choosing a language.
The snark was intended, however any dismissiveness concerning Ginger Bill's effort was not. However, when you make a decision like "Odin will never have a package manager", you may be choosing to condemn your project to niche status, in this day and age. Now, niche status is fine, but it definitionally comes with a limited audience. Like "this game will only ever be a text based roguelike."
The total number of lines of code is relevant, sure, but for most practical purposes, compile times and binary sizes are more important.
I don't know the situation in Rust, but in JS land, there's a pretty clear divide between libraries that are tree-shakable (or if you prefer, amenable to dead code elimination) and those that aren't. If you stick to tree-shakable dependencies your final bundled output will only include what you actually need and can be pretty small.
Perhaps for most practical purposes, but not for security, which the article's author seems more concerned with:
> Out of curiosity I ran toeki a tool for counting lines of code, and found a staggering 3.6 million lines of rust... How could I ever audit all of that code?
Tree-shaking can't help with that.
If you use mostly free functions things will shake out naturally, if you use lots of dynamic dispatch you'll pull in stuff that doesn't get called.
If that does become a problem, there are also techniques like https://github.com/rust-lang/rust/issues/68262 too.
One thing I've observed while managing a mid-sized Rust codebase: cargo does a decent job with versioning, but the long tail of small, redundant crates (often differing only slightly) can still bloat the tree. The lack of a strong ecosystem-level curation layer makes it hard to know which crates are battle-tested vs. weekend hacks.
Maybe it’s time the community seriously considers optional “trust scores” or soft standards (similar to crates.io keywords, but more structured) to guide adoption. Not a gatekeeping mechanism—just more context at decision time.
But you’re still open to typo squatting and similar issues like crates falling unmaintained - the article mentions the now famous dotenv vs. dotenvy issue (is this solvable with a more mature governance model for the crates ecosystem? At this point dotenv should probably be reclaimed). So after vendoring a baseline set of dependencies, you need to perform comprehensive auditing.
Maybe you can leverage LLMs to make that blob of vendored deps smaller / cheaper to own. Maybe you can distill out only the functionality you need (but at what cost, now you might struggle to backport fixes published upstream). Maybe LLMs can help with the auditing process itself.
You need a stream of notifications of upstream fixes to those vendored deps. Unfortunately in the real world the decision making will be harder than “ooh, there’s a sec fix, I should apply that”.
I always wonder why someone like JFrog don’t expand their offering to provide “trusted dependencies” or something similar. I.e. you pay to outsource that dependency governance and auditing. Xray scanning in the current product is a baby step toward the comprehensiveness I’m suggesting.
Taking a step back though, I’d be really careful not to throw the baby out with the bath water here. Rust has a fairly unique capability to compose work product from across unrelated developers thanks to its type system implementation (think about what happens with a C library, who’s responsible for freeing the memory, you or me?). Composition at scale is rusts super power, at least in terms of the productivity equation for large enterprises - in this context memory safety is not the sales pitch since they already have Java or whatever.
This is the cause of so many issues.
And its not like we're at war or trying to cure the next pandemic, we're writing CRUD apps and trying to convince people to click on adds for crap they don't need.
When has this happened? The only one I remember is the event-stream thing, and that was what, over five years ago? Doesn't seem all that common from what I can see?
A proper 'batteries included' standard library in the language and discouraging using too many libraries in a project.
The same mistakes from the Javascript community are being repeated in front of us for Cargo (and any other project that uses too many libraries).
Tokei hasn't had a stable release in over 4 years and misreports lines of code in some instances. The author in the past has basically said they would need to be paid to backport one line fixes with no merge conflicts that fix real accuracy issues in their software... Bad look in my book.
I am wondering if there is a good modern reference that provides a conceptual overview or comparative study of the various techniques that have been attempted.
It is a hard subject to define as it cuts through several layers of the stack (all the way down to the compiler system interface layer), and most book focus on one language or build technology rather than providing a more conceptual treatment of the techniques used.
I'm not very familiar with Rust, but all of Go is 1.6M lines of Go code. This includes the compiler, stdlib, tests for it all: the lot.
Not that I doubt the sincerity of the author of course, but maybe some irrelevant things are counted? Or things are counted more than once? Or the download tool does the wrong thing? Or there's tons of generated code (syscalls?)? Or ... something? I just find it hard to believe that some dependencies for web stuff in Rust is twice all of Go.
"A little copying is better than a little dependency." - grab the parts that you need and then include the library only in a test to ensure alignment down the line, an idea I liked a lot.
like e.g. if we look at sandbox boundaries we have:
- some in language permission enforcement (e.g. Java Security Manage) -- this approach turned out to be a very bad idea
- process boundaries, i.e. take the boundary the OS enforces and lock it down more (e.g. by stuff like pledge, cgroups etc.) -- this approach turned out okayish
- VM boundaries (e.g. firecracker VMs) -- tourned out well
- emulation boundaries (e.g. WASM) -- mixed history, can turn out well especially if combined with worker processes which lock themself down
but what that means in practice is that wanting the reliably sand box library dependencies will most likely lead to more or less IPC boundaries between the caller and the libary
what that means is practice it's unsuited for a lot of thing
e.g. for most utility lib it's very unsuited
e.g. for a lot (but not all) data structure libs its unsuited and might be a huge issue
e.g. you can apply it to a web-server, but then you are basically reinventing CGI, AGI which okay but can quite compete with perf.
e.g. but you can't apply it to some fundamental runtime engine (e.g. tokio), worse you now might have one copy of the engine running per sandbox... (but you can apply it to some sub-part of tokio internals)
People have tried this a lot in various ways.
But so far this always died off in the long run.
Would be nice if the latest push based around WASM would have some long term success.
Nobody said it would be easy. As an analogy, the borrow checker makes working with memory much more limited, yet some people like it because it makes things safer.
If you want to use dependencies, I wouldn't be surprised when you realise they also want to use dependencies. But you can put your money/time in the right places. Invest in the dependencies that do things well.
I'm not saying you copy-pasted those 35 lines from dotenvy, but for the sake of argument let's say you did: now you can't automatically benefit from dotenvy patching some security issue in those lines.
- them breaking something
- a supply chain attack
- them making a change which breaks your program
- you having accidentally relied on a bug or an unintended behavior of their code
(which they may fix at any moment)
- many unneeded LOC in your codebase
- absolution of ownership
- relying on a dependency versus having written it yourself
- in the latter case you'll automatically take responsibility
- think much more about code's security/quality
- have the knowledge to fix it and know exactly where to
(in your 35-lines of code you yourself wrote)
- more burdensome upgrades of your software
- longer compilation speeds
- having to monitor their program
- is it abandoned, ownership transferred to dubious party
- did the maintainer have a late night drunken stupor accepting bad pull requests
- did they react to a CVE or not
- did they change the license
- do they have a license but added their own problematic paragraph
- does the program "develop badly"
(change its target scope in any problematic way)
(take on more and more bloat, more unneeded functionality)
- having worse of an overview of your total dependencies
(since they may themselves rely on further crates you don't expect)
- ...
what's the trade-off now?There's no good solution...
vec4 rgb2hsv(vec4 rbg)
and a few tab-completes later it had filled in the body of the code with a correct color conversion routine. So that saved me searching for and pulling in some big-ass color library.Most of lodash.js can be avoided with LLMs too. Lodash's loops are easier to remember than Javascript's syntax, but if your LLM just writes the foo.forEach((value, key) => {...}) for you, you can skip the syntactic sugar library.
It lets security professionals cryptographically vouch for the trustworthiness of rust packages.
It lets security professionals audit rust packages, and cryptographically attest to their trustworthiness.
Isn't the point of a memory safe language to allow programmers to be sloppy without repercussions, i.e., to not think about managing memory and even to not understand how memory works.
Would managing dependencies be any different. Does Rust allow programmers to avoid thinking carefully about selecting dependencies.
No. The point is even the best programmers of unsafe languages regularly introduce both simple and subtle bugs into codebases while being careful about handling memory correctly, and therefore we should use languages that don't even allow those bugs for most every use case. Using these languages still allows crap programmers to waste GBs of correctly allocated and handled memory, and good programmers to write tight, resouce-sipping code.
Dependencies are orthogonal to this.
You can be _relatively_ sure that you're not introducing memory unsafety by adding a dependency, but you can't be sure that it isn't malware unless you audit it.
Are there systems languages that provide memory management but do not default to using third party libraries. If yes, then do these languages make it easier for programmers to avoid dependencies.
It's the difference between a wet mess and a dry one. Rust creates dry messes. It's still a mess.
This is a problem with all languages and actually an area where Rust shines (due to editions). Your pulled in packages will compile as they previously did. This is not true for garbage collected languages (pun intended).
> Out of curiosity I ran toeki a tool for counting lines of code, and found a staggering 3.6 million lines of rust .... How could I ever audit all of that code?
Again, another area where Rust shines. You can audit and most importantly modify the code. This is not that easy if you were using Nodejs where the runtimes are behind node/v8 or whatever. You compile these things (including TLS) yourself and have full control over them. That's why Tokio is huge.
JavaScript is backwards compatible going back effectively forever, as is Java. Rust's unique system is having a way to make breaking changes to the language without breaking old code, not that they prioritize supporting old code indefinitely.
The libraries are a different story—you're likely to have things break under you that rely on older versions of libraries when you update—but I don't see Rust actually having solved that.
> You can audit and most importantly modify the code. This is not that easy if you were using Nodejs where the runtimes are behind node/v8 or whatever.
Node and V8 are open source, which makes the code just as auditable and modifiable as the 3.6 million lines of Rust. Which is to say, both are equally unapproachable.
No language can fix that. However, I've lost count of the times my Python/JavaScript interpretation fails because of something in one of the dependencies. Usually, it's not a JS/Python problem but rather has to do with a Node/Python version update. It always boils down to the "core" issue which is the runtime. That's why I like that Rust give me a "fixed" runtime that I download/compile/package with my program.
> Node and V8 are open source, which makes the code just as auditable and modifiable as the 3.6 million lines of Rust. Which is to say, both are equally unapproachable.
I've recently patched a weird bug under Tokio/Otel and can't imagine doing that with Node/V8 without it being a major hassle. It is relatively straightforward in Rust though requires maintaining your own fork of only the dependency/branch in question.
What do you mean?