Anyone proclaiming simplicity just hasnt worked at scale. Even rewrites that have a decade old code base to be inspired from, often fail due to the sheer amount of things to consider.
A classic, Chesterton's Fence:
"There exists in such a case a certain institution or law; let us say, for the sake of simplicity, a fence or gate erected across a road. The more modern type of reformer goes gaily up to it and says, “I don’t see the use of this; let us clear it away.” To which the more intelligent type of reformer will do well to answer: “If you don’t see the use of it, I certainly won’t let you clear it away. Go away and think. Then, when you can come back and tell me that you do see the use of it, I may allow you to destroy it.”"
We can even just look at the title here: Do the simplest thing POSSIBLE.
You can't escape complexity when a problem is complex. You could certainly still complicate it even more than necessary, though. Nowhere in this article is it saying you can avoid complexity altogether, but that many of us tend to over-complicate problems for no good reason.
I think the nuance here is that “the simplest thing possible” is not always the “best solution”. As an example, it is possible to solve very many business or operational problems with a simple service sitting in front of a database. At scale, you can continue to operate, but the amount of man-hours going into keeping the lights on can grow exponentially. Is the simplest thing possible still the DB?
Complexity is more than just the code or the infrastructure; it needs to run the entire gamut of the solution. That includes looking at the incidental complexity that goes into scaling, operating, maintaining, and migrating (if a temporary ‘too simple but fast to get going’ stack was chosen).
Measure twice, cut once. Understand what you are trying to build, and work out a way to get there in stages that provide business value at each step. Easier said than done.
Edit: Replies seem to be getting hung up over the “DB” reference. This is meant to be a hypothetical where the reader infers a scenario of a technology that “can solve all problems, but is not necessarily the best solution”. Substitute for “writing files to the file system” if you prefer.
The problem is knowing when to do it and when not to do it.
If you're even the slightest bit unsure, err on the side of not thinking a few steps ahead because it is highly unlikely that you can see what complexities and hurdles lie in the future.
In short, it's easier to unfuck an under engineered system than an over engineered one.
And they followed the alternative with Itanium, and look how that turned out.
Also maybe simplicity is sometimes achieved AFTER complexity, anyway. I think the article means a solution that works now... target good enough rather than perfect. And the C2 wiki (1) has a subtitle '(if you're not sure what to do yet)'. In a related C2 wiki entry (2) Ward Cunningham says: Do the easiest thing that could possibly work, and then pound it into the simplest thing that could possibly work.
IME a lot of complexity is due to integration (in addition to things like scalability, availability, ease of operations, etc.) If I can keep interfaces and data exchange formats simple (independent, minimal, etc.) then I can refactor individual systems separately.
1. https://wiki.c2.com/?DoTheSimplestThingThatCouldPossiblyWork
The trouble is by the time you get there you will discover the problem isn't what you expected and it will all have been wasted effort.
The most fundamental issue I have witnessed with these things is that people have a very hard time taking a balanced view.
For this specific problem, should we invest in a more robust solution which takes longer to build or should we just build a scrappy version and then scale later?
There is no right or wrong. It’s depends heavily on the context.
But, some people, especially developers I am afraid, only have one answer for every situation.
The problem quickly becomes "how do you route it", and that's where we end up with something like today's IPv6. Route aggregation and PI addresses is impratical with IPv4 + extra bits.
The main changes from v4 to v6 besides the extra bits is mostly that some unnecessary complexity was dropped, which in the end is net positive for adoption.
Apart from that, IPv6 _is_ IPv4 with a bigger address space. It's so similar it's remarkable.
As is IPv4s simplicity got us incredibly far and it turns out NAT and CIDR have been quite effective at alleviating address exhaustion. With some address reallocation and future protocol extensions, its looking entirely possible that a successor was never needed.
"could possibly work" is clearly hyperbole as it would only exclude solutions that are guaranteed to fail.
But even under a more plausible interpretation, this slogan ignores the cost of failure as an independent justification for adding complexity.
It's bad advice.
Hell I just spent a week doing something which should've taken 5 minutes because rather then a settings database, someone has just been maintaining a giant ball of copy+pasted terraform code instead.
Adding the runtime complexity and maintenance work for a new database server is not a small decision.
Do you handle one "everything is perfect" happy path, and use a manual exception process for odd things?
Do you handle "most" cases, which is more tech work but shrinks the number of people you need handling one-off things?
Or do you try to computerize everything no matter how rare?
Bring the problem back to our primary contact and they've got no clue what to do. They're on like year 2 of a 7 year contract and they've just discovered that their payroll department has been interpreting the ambiguous rules somewhat randomly. No one wants to commit to an interpretation without a memorandum of understanding from the union, and no one wants to start the process of negotiating that MoU because it's going to mean backdating 2 years of payroll for an unknown number of employees, who may have been affected by it one month but not the next, depending on who processed their paystub that month.
That was fun :D
Not strictly for technical reasons, but definitely for political reasons. The client was potentially the largest organization in my province (state-run healthcare). Outsourcing payroll and scheduling with the potential of breaking the rules in the contracts with the multiple union stakeholders was a completely non-starter. Plus the idea of needing to do lay offs within the payroll department was pretty unpalatable.
If you overpay someone… getting that money back is a challenge.
To make it more complicated still, there was an element of “we’re not sure if we overpaid or underpaid” but there was also an element of “we gave person X an overtime shift but person Y was entitled to accept or deny that shift before person X would have even had an opportunity to take it”. That’s even harder to compensate for.
The programmer's mind is the faithful ally of the perfect in its war waged against the good enough.
The "best" solution for most people that have a problem is the one they can use right now.
The one you can use right now in order to get feedback from real world use, which will be much better at guiding you in improving the solution than what you thought was "best" before you had that feedback.
Real world feedback is the key. Get there as quickly as feasible, then iterate with that.
> As an example, it is possible to solve very many business or operational problems with a simple service sitting in front of a database.
If this is the simplest approach within the problem space or business's constraints, and meets the understood needs, it may indeed be the right choice.
> At scale, you can continue to operate, but the amount of man-hours going into keeping the lights on can grow exponentially. Is the simplest thing possible still the DB?
No problem in a dynamic human system can be solved statically and left alone. If the demands on a solution grows, and the problem space or business's needs changes, then the solution should be reassessed and the new conditions solved for.
Think of it alternatively as resource-constrained work allocation, or agile problem solving. If we don't have enough labor available (and we rarely do) to solve everything "best," then we need to draw a line. Decades of practice now have shown that it's a crap shoot to guess at the shape of levels of complexity down the road.
Best case you spend time that could go into something else valuable today, to solve a problem for a year from now; worst case you get the assumptions wrong and fail to solve that second "today" problem as well as still needing to spend future time on refactoring.
Don't worry, the second half of the title has this covered:
> ... that could possibly work
In the scenario you've described, the technology is not working, in the complete sense including business requirements of reasonable operating costs.
Perhaps it really did work at first, in the complete sense, when the number of users was quite small. That's where the actual content of the article kicks in: it suggests you really do use that simple solution, because maybe you'll never need to scale after all, or you'll need to rewrite everything by then anyway, or you'll have access to more engineering talent by then, etc. I'd tend to agree, but with the caveat that you should feel free to break the rule so long as you're doing it consciously. But none of that implies that you should end up in the situation you described.
This is where I am arguing nuance. These decisions are contextual; and the superficially more complicated solution may be solving inherent complexity in the problem space that only provides benefit over a time period.
As an example, some team might decide to forgo a database and read/write directly to the file system. This may enable a release in less time and that might be the right decision in certain contexts. Or it could be a terrible decision as the externalised costs begin to manifest and the business fails because of loss of customer trust.
My point is that you cannot only look at what is right in front of you, you also need to tactically plan ahead. In the big org context, you also need to strategically plan ahead.
>In the scenario you've described, the technology is not working, in the complete sense including business requirements of reasonable operating costs.
In the parent comment's reasaonble premise, they wouldn't be sure of what they would need.
When this is accounted for, “the simplest thing” approaches “the best solution”.
It's making an ambitious risky claim (make things simpler than you think they need to be) then retreating on pushback to a much safer claim (the all-encompassing "simplest thing possible")
The statement ultimately becomes meaningless because any interrogation can get waved away with "well I didn't mean as simple as that."
But nobody ever thinks their solution is more complex than necessary. The hard part is deciding what is necessary, not whether we should be complex.
You can keep on doing the simplest thing possible and arrive at something very complex, but the key is that each step should be simple. Then you are solving a real problem that you are currently experiencing, not introducing unnecessary complexity to solve a hypothetical problem you imagine you might experience.
In other words, every time you optimize only locally and in a single dimension and potentially walk very far away from a global optimum. I have worked on such systems before. Every single step in and by itself was simpler (and also faster, less work) than doing a refactoring (to keep the overall resulting system simple), so we never dared doing the latter. Unfortunately, over time this meant that every new step would incur additional costs due to all the accidental complexity we had accumulated. Time to finally refactor and do things the right way, right? No. Because the costs of refactoring had also kept increasing with every additional step we took, and every feature we patched on. At some point no one really understood the whole system anymore. So we just kept on piling things on top of each other and prayed they would never come crashing down on us.
Then one day, business decided the database layer needed to be replaced for licensing reasons. Guess which component had permeated our entire code base because we never got around doing that refactoring and never implemented proper boundaries and interfaces between database, business and view layer. So what could have been a couple months of migration work, ended up being more than four years of work (of rewriting the entire application from scratch).
Your definition rubs up against what a UX designer taught me years ago, which is that simple and complex are one spectrum, similar to but different from easy and hard.
Often, simple is confused for easy, and complex for hard. However, simple interfaces can hide a lot of information in unintuitive ways, while complex interfaces can present more information and options up front.
The main argument I've seen against this strategy of design is concern over potentially needing to make breaking changes, but in my experience, it tends to be a lot easier to try to come up with a simple design that can solve most of the common cases but leaves design space for future work to solve more niche cases that wouldn't require breaking the existing functionality than trying to anticipate every possible case up front. After a certain point, our confidence in our predictions dips low enough that I think it's smarter to bet on your ability to avoid locking yourself into a choice that would break things to change later than to make the correct choice based on those predictions.
And to address something the GP said:
> I am still shocked by the required complexity
Some of this complexity becomes required through earlier bad decisions, where the simplest thing that could possibly work wasn't chosen. Simplicity up front can reduce complexity down the line.
I think you're focusing on weasel words to avoid addressing the actual problem raided by OP, which is the elephant in the room.
Your limited understanding of the problem domain doesn't mean the problem has a simple or even simpler solution. It just means you failed to understand the needs and tradeoffs that led to complexity. Unwittingly, this misunderstanding originates even more complexity.
Listen, there are many types of complexity. Among which there is complexity intrinsic to the problem domain, but there is also accidental complexity that's needlessly created by tradeoffs and failures in analysis and even execution.
If you replace an existing solution with a solution which you believe is simpler, odds are you will have to scramble to address the impacts of all tradeoffs and oversights in your analysis. Addressing those represents complexity as well, complexity created by your solution.
Imagine a web service that has autoscaling rules based on request rates and computational limits. You might look at request patterns and say that this is far too complex, you can just manually scale the system with enough room to handle your average load, and when required you can just click a button and rescale it to meet demand. Awesome work, you simplified your system. Except your system, like all web services, experiences seasonal request patterns. Now you have schedules and meetings and even incidents that wake up your team in the middle of the night. Your pager fires because a feature was released and you didn't quite scaled the service ro accommodate for the new peak load. So now your simple system requires a fair degree of hand holding to work with any semblance of reliability. Is this not a form of complexity as well? Yes, yes it is. You didn't eliminated complexity, it is only shifted to another place. You saw complexity in autoscaling rules and believed you eliminated that complexity by replacing it with manual scaling, but you only ended up shifting that complexity somewhere else. Why? Because it's intrinsic to the problem domain, and requiring more manual work to tackle that complexity introduces more accidental complexity than what is required to address the issue.
An example I encountered was someone taking the "KISS" approach to enterprise reporting and ETL requirements. No need to make a layer between their data model and what data is given to the customers, and no need to make a separate replica of the server or db to serve these requests, as those would be complex.
This failed in so many ways I can't count. The system instantly became deeply ingrained in all customer workflows, but they connected via PowerBI via hundreds of non-technical users with bespoke reports. If an internal column name changed or structure of the data model changed so that devs can evolve the platform, users just get a generic error about Query Failed and lit up the support team. Technical explanations about needing to modify their query were totally not understood by the end users and they just want the dev team to fix it. Also no concern in any way for pagination, request complexity limiting, indexes, request rate limiting, etc was considered because those were not considered simple. But those can not be added without breaking changes because a non-tech user will not understand what to do when their report in Excel gets a rate limit on 29 of the 70 queries they launch per second. No concerns about taking prod OLTP databases down with OLAP workflows overloading them.
All in all that system was simple and took about 2 weeks to build, and was rapidly adopted into critical processes, and the team responsible left. It took the remaining team members a bit over 2 years to fix it by redesigning it and hand holding non-technical users all the way down to fixing their own Excel sheets. It was a total nightmare caused by wanting to keep things simple when really this needed: heavy abstraction models, database replicas, infrastructure scaling, caching, rewriting lots of application logic to make data presentable where needed, index tuning, automated generation of large datasets for testing, building automated tests for load testing, release process management, versioning strategies, documentation and communication processes, depreciation policies. They thought that we could avoid months of work and keep it simple and instead caused years of mess because making breaking changes is extremely difficult once you have wide adoption.
>They thought that we could avoid months of work and keep it simple and instead caused years of mess because making breaking changes is extremely difficult once you have wide adoption.
Right. Do you think a middle ground was possible? Say, a system that took 1 month to build instead of two weeks, but with a few more abstractions to help with breaking changes in the future.
Thanks for sharing your experience btw, always good to read about real world cases like this from other people.
I don't think this is an adequate interpretation. Quick time to market doesn't mean the half-baked MVP is the end result.
An adequate approach would be to include work on introducing the missing abstraction layer as technical debt to be paid right after launch. You deliver something that works in 2 weeks and then execute the remaining design as follow-up work. This is what technical debt represents, and why the "debt" analogy fits so well. Quick time to market doesn't force anyone to put together half-assed designs.
There’s a HUGE difference between the simplest thing possible, and the simplest thing that could possibly work.
The simplest thing that could possibly work conveniently lets you forget about the scale. The simplest thing possible does not.
This is different from adding pointless complexity that doesn't help solve the problem but exists only because it is established 'best practice' or 'because Google does it that way' and I've seen this many more times than complex software where the complexity is actually required. And such needlessly complex software is also usually a source of countless day-to-day problems (if it makes its way out the door in the first place) while the 'simplistic' counterpart usually just hums along in the background without anybody noticing - and if there's a problem it's easy to fix because the codebase is simple and easy to understand by anybody looking into the problem. Of course after 20 years of such changes, the originally simple code base may also grow into a messy hairball, but at least it's still doing its thing.
If it doesn't, go from there whether you need to find an alternative or add another layer of complexity.
I think when complexity does build, it can snowball when a crew comes along and finds more than could be addressed in a year or two. People have to be realistic that there's more than one way to address it. For one it could be a project to identify and curtail existing excess complexity, another approach is to reduce the rate of additional complexity, maybe take it to the next level and completely inhibit any further excess, or any additional complexity of any kind at all. Ideally all of the above.
Things are so seldom ideal, and these are professionals ;)
No matter what, the most pressing requirement is to earnestly begin mastering the existing complexity to a pretty good extent before it could possibly be addressed by a neophyte. That's addressing it right there. Bull's-eye in fact.
Once a good amount of familiarity is established this could take months to a year(s) in such a situation. By this point a fairly accurate awareness of the actual degree of complexity/debt can be determined, but you can't be doing nothing this whole time. So you have naturally added at least some debt most likely, hopefully not a lot but now you've got a better handle on how it compares to what is already there. If you're really sharp the only complexity you may add is not excess at all but completely essential and you make sure of it.
Now if you keep carrying on you may find you have added some complexity that may be excess yourself, and by this time you may be in a situation where that excess is "insignificant" compared to what is already there, or even compared to what one misguided colleague or team might be routinely erecting on their own. You may even conclude that the only possible eventual outcome is to topple completely.
What you do about it is your own decision as much as it can be, and that's most often the way it is bound to always increase in most organizations, and never come down. So that's the most common way it's been addressed so far, as can be seen.
If the software base is full of gotchas and unintended side-effects then the source of the problem is in unclean separation of concerns and tight coupling. Of course, at some point refactoring just becomes an almost insurmountable task, and if the culture of the company does not change more crap will be added before even one of your refactorings land.
Believe me, it's possible to solve complex problems by clean separation of concerns and composability of simple components. It's very hard to do well, though, so lots of programmers don't even try. That's where you need strict ownership of seniors (who must also subscribe to this point of view).
Sometimes the problem is in the edges—the way the separate concerns interact—not in the nodes. This may arise, for example, where the need for an operation/interaction between components doesn't need to be idempotent because the need for it to be never came up.
Again, wrong design. Like I said, it's very difficult to do well. Consider alternate architecture: one component adds the bulk data to request, the second component modifies it and adds other data, then the data is sent to transaction manager that commits or fails the operation, notifying both components of the result.
Now, if the first component is one k8s container already writing to the database and second is then trying to modify the database, rearchitecting that could be a major pain. So, I understand that it's difficult to do after the fact. Yet, if it's not done that way, the problem will just become bigger and bigger. In the long run, it would make more sense to rearchitect as soon as you see such a situation.
Do you know how you get such a system? When you start with a simple system and instead of redesigning it to reflect the complexity you just keep the simple system working while extending it to shoehorn the features it needs to meet the requirements.
We get this all the time, specially when junior developers join a team. Inexperienced developers are the first ones complaining about how things are too complex for what they do. More often than not that just reflects opinionated approached to problem domains they are yet to understand. Because all problems are simple once you ignore all constraints and requirements.
Shoehorning things into working systems is something I have seen juniors do. I have also seen "seniors" do this, but in my view, they are still juniors with more years working on the same code base.
I have once heard it described as "n-years of 1 year experiences". In other words, such a person never learns that program design space must continuously be explored and that recurrence of bugs in the same part of code usually means that a different design is required. They never learn that cause of the bug was not that particular change that caused the unintended side effect but that the fact that there is a side effect is a design bug onto its own.
I do agree, though, that TFA may be proposing sticking with simpler design for longer than advisable.
> Anyone proclaiming simplicity just hasnt [sic] worked at scale
I've worked in startups and large tech organizations over decades and indeed, there are definitely some problems in those places that are hard.
That said, in my opinion, the majority of technical solutions were over engineered and mostly waste.
Much simpler, more reliable, more efficient solutions were available, but inappropriately dismissed.
My team was able to demonstrate this by producing a much simpler system, deploying it and delivering it to many millions of people, every day.
Chesterton's fence is great in some contexts, especially politics, but the vast majority of software is so poorly made, it rarely applies IMO.
I also worked on some quite large organizations with quite large services that would easily take 10x to 50x the amount of time to ship if they were a smaller org.
Most of the time people were mistaking complexity caused by bad decisions (tech or otherwise) with "domain complexity" and "edge cases" and refusing to acknowledge that things are now harder because of those decisions. Just changing the point of view makes it simple again, but then you run into internal politics.
With microservices especially, the irony was that it was mostly the decisions justified as being done to "save time in the future" that ended up generating the most amount of future work, and in a few cases even problems around compliance and data sovereignty.
Mostly it is not like a movie where you hand pick the team for the job.
Usually you have to play the cards you’re dealt with so you take whatever your team is comfortable building.
Which in the end is dealing with emotions, people ambition, wishes.
I have seen stuff gold plated just because one vocal person was making fuss. I have seen good ideas blocked just because someone wanted to feel important. I have seen teams who wanted to „do proper engineering” but they thought over engineering was proper way and anything less than gold plating makes them look like amateurs.
Complexity is a learned engineering approach - it takes practice to learn to do it another way. So if all you see is complex solutions how would you learn otherwise?
I have worked at scale. I have found examples where simple solutions prevail due to inertia and inability or unwillingness to acknowledge the simple solutions failed to adequately address the requirements. The accidental complexity created by those simple solutions is downplayed as it would require reevaluating the simple solution, and thus run books and operations and maintenances are required as part of your daily operations because that's how the system is. And changing it would be too costly.
Let's not fool ourselves.
Yep- this is why it’s a silly comment to make. Now we are where we are if we didn’t qualify the conversation as being for “big scale engineers” only.
How did those replacements go? Or were you just hoping for the opportunity?
> Look at his CV. Tiny (but impactful) features ///building on existing infrastructure which has already provably scaled to millions and likely has never seen beneath what is a rest api and a react front end///
Off the top of my head he wrote the socket monitoring infrastructure for Zendesk’s unicorn workers, for example.
I certainly don’t agree with everything Sean says and admit that “picking the most important work” is a naive thing to say in most scenarios.
But writing Python in production is trivial. Why would anyone lie about that? C is different OTOH. But just because you do a single config change and get paid for that doesn’t mean it’s true for everyone.
Also, staff at GitHub requires a certain bar of excellence. So I wouldn’t blindly dismiss everything just out of spite.
And you're right about the amount of engineering that goes into solving problems. One service adjacent to my patch was more than a decade old. Was on a low TPS but critical path for a key business problem. Had not been touched in years. Hadn't caused a single page in that decade, just trudged along, really solidly well engineered service. Somebody suggested we re-write it in a modern architecture and language (it was a kind of mini-monolith in a now unfashionable language). Engineering managers and principals all vetoed that, thank goodness - would have been 5+ years of pain for zero upside.
Simple stuff had tons of long term advantages and benefits - its easy to ramp up new folks on it compared to some over-abstracted hypercomplex system because some lead dev wanted to try new shiny stuff for their cvs or out of boredom. Its easy to debug, migrate, evolve and just generally maintain, something pure devs often don't care much for unless they become more senior.
Complex optimizations are for sure required for extreme performance or massive public web but that's not the bulk of global IT work done out there.
The complexity comes from the fact that at scale, the state space of any problem domain is thoroughly (maybe totally) explored very rapidly.
That’s a way bigger problem than system complexity and pretty much any system complexity is usually the result of edge cases that need to be solved, rather than bad architecture, infrastructure or organisational issues - these problems are only significant at smaller, inexperienced companies, by the time you are at post scale (if the company survives that long) then state space exploration in implementation (features, security, non-stop operations) is where the complexity is.
At the scale you are mentioning, even "simple" solutions must be very sophisticated and nuanced. How does this transformation happen naturally from an engineer at a startup where any mainstream language + Postgres covers all your needs, to someone who can build something at Google scale?
Let's disregard the grokking of system design interview books and assume that system design interviews do look at real skills instead of learning common buzzwords.
I built a hobby system for anonymously monitoring BitTorrent by scraping the DHT, in doing this, I learned how to build a little cluster, how to handle 30,000 writes a second (which I used Cassandra for - this was new to me at the time) then build simple analytics on it to measure demand for different media.
Then my interview was just talking about this system, how the data flowed, where it can be improved, how is redundancy handled, the system consisted of about 10 different microservices so I pulled the code up for each one and I showed them.
Interested in astronomy? Build a system to track every star/comet. Interested in weather? Do SOTA predictions, interested in geography? Process the open source global gravity maps, interested in trading? Build a data aggregator for a niche.
It doesn’t really matter that whatever you build “is the best in the world or not” - the fact that you build something, practiced scaling it with whatever limited resources you have, were disciplined to take it to completion, and didn’t get stuck down some rabbit hole endlessly re-architecting stuff that doesn’t matter, this is what they’re looking for - good judgement, discipline, experience.
Also attitude is important, like really, really important - some cynical ranter is not going to get hired over the “that’s cool I can do that!” person, even if the cynical ranter has greater engineering skills, genuine enthusiasm and genuine curiosity is infectious.
There are steps that most take. Start with caching. Then you learn about caching strategies because the cache gets slow. Then you shard the database and start managing multiple database connections and readers and writers. Then you run into memory, cpu, or i/o pressure. Maybe you start horizontally scaling. Connections and file descriptors have limits you learn about. Proxies might enter your lexicon. Monitoring, alerting, and testing all need improvement. And recently teams are getting harder to manage and projects are getting slower. Maybe deploying takes forever. So now we break up into different domains. Core backend, control panel, compliance, event processing, etc.
As the org grows and continues to change, more and more stakeholders appear. Security, API design, different cost modeling, product and design, and this web of stakeholders all have competing needs.
Go back to my opening stanza. Rinse and repeat.
Doing this exposes patterns and erroneous solutions. You work to find the least complex solution necessary to solve the known constraints. Simple is not easy (great talk, look it up). The learnings from these battle scars is what makes a staff level engineer methinks. You gain stories and tools for delivering solutions that solve increasingly larger systems and organizations. I recently was the technical lead for a 40 team software project. I gained some more scars and learnings.
An expert is someone who has made and learned from many mistakes in a narrow field. Those learnings and lessons get passed down in good system design interview books, like Designing Data Intensive Applications.
Those are things that matter and can't be brushed away though.
What Conway's law describes is also optimization of the software to match the shape it can be developped and maintained with fewer frictions.
Same for infra, complexity induced by it shouldn't be simplified unless you also simplify/abatract the infra first.
As for Chesterton's Fence, you have the causality backwards. You should not build a fence or gate before you have a need for it. However, when you encounter an existing fence or gate, assume there must have been a very good reason for building it in the first place.
This isn't to say you should never try to refactor or improve things, but make sure that it's going to work for 100% of your use cases, that you're budgeted to finish what you start, and that it can be done iteratively with the result of each step being an improvement on the previous.
And that’s usually because the person or small group that began the refactor weren’t given the time and resources to do the refactor, and uninterested or unknowledgable people hijacked and over complicated the process, and others blocked it from happening, so what would have taken a few weeks for the initial team to have completed the refactor successfully, with a little help and cooperation from others, and had they not been pulled in 10 different ways to fight other fires — instead after months and months and expending tons of time and money on people mucking it up instead of fixing it, the refactor got abandoned, a million dollars was wasted, and the system as a whole was worse than it was before.
No one can predict how efficacious that attempt will be from the get-go. Eventually, often people find out that their assumptions were too naive or they don’t have enough budget to push it to completion.
Successful refactoring attempts start small and don’t try to change the universe in a single pass.
In most of these cases, a few days up front exploring edge cases would have identified the problems and likely would have red lighted the project before it started. It can make you feel like a party pooper when everyone is excited about the new approach, but I think it's important that a few people on the team are tasked with identifying these edge cases before greenlighting the project. Also, maybe productionize your easiest case first, just to get things going, but then do your hardest case second, to really see if the benefits are there, and designate a go/rollback decision point in your schedule.
Of course, such problems can come up in any project, but from what I've seen they tend to be more catastrophic in refactoring/rearchitecting projects. If nothing else, because while unforeseen difficulties can be hacked around for new feature launches, hacking around problems completely defeats the purpose of a refactoring project.
Obviously a bit hyperbolic, but matches my experience.
I prefer the way Einstein said it (or at least I've heard it attributed to him, not sure if he actually said it): "Make things as simple as possible, but no simpler".
Sounds to me like we need to distinguish between simplicity of the individual diff, and simplicity of the end result (i.e. the overall code base after applying the diff). The former is a very one-dimensional and local way of optimization, which over time can lead you far away from a global optimum.
Like yes, everyone knows that if you want to index the whole internet and have tens of thousands of searches a second there are unique challenges and you need some crazy complexity. But if you have a system that has 10 transactions a second...you probably don't. The simple thing will probably work just fine. And the vast majority of systems will never get that busy.
Computers are fast now! One powerful server (with a second powerful server, just in case) can do a lot.
With today's computers, indexing the entire internet and serving 100k QPS also isn't really that demanding architecturally. The vast majority of current implementation complexity exists for reasons other than necessity.
So although a single server goes a long way, to hit that sweet 99.999 SLA, people horizontally scales way before hitting the maximum compute capacity of a singe machine. HA makes everything way more difficult to operate and reason about.
What is far more likely is the proverbial "JS framework problem:" gah, this technology that I read about (or encounter) is too complex, I just want 1/10th that I understand from casually reading about it, so we should replace it with this simple thing. Oh, right, plus this one other thing that solves a problem. Oh, plus this other thing that solves this other problem. Gah, this thing is too complex!
It’s not the same as introducing complexity to keep yourself employed, but the result is the same and so is the cause - incentive structures aren’t aligned at most companies to solve problems simply and move on.
https://inthesetimes.com/article/capitalism-job-bullshit-dav...
I'm not saying that these jobs are bullshit in the same way that a VP of box-ticking is, just that it's not a conspiracy that a cathedral based on 'design-doc culture' might produce incentives that result in people who focus on maximising their performance on these fiscally rewarding dot points, rather than actualising their innate belief in performant and maintainable systems.
I work at a start-up so if my code doesn't run we don't get paid. This motivates me to write it well.
Most projects don't operate at scale. And before "at scale", simple, rewritable code will always evolve better, because it's less dense, and less spread out.
There is indeed a balance between the simplest code, and the gradual abstractions needed to maintain code.
I worked with startups, small and medium sized businesses, and with a larger US airline. Engineering complexity is through the roof, when it doesn't have to be. Not on any of the projects I've seen and worked on.
Now if you're an engineer in some mega corp, things could be very different, but you're talking about the 1% there. If not less.
A former project that had a codec system for serializing objects that involved scala implicits comes to mind. It involved a significant amount of internal machinery, just to avoid writing 5 toString methods. And made it so that changing imports could break significant parts of the project in crazy ways.
It's possible nobody at the beginning of the project knew they would only have 5 of these objects (if they had 5 at the beginning, how many would they have later?), but I think that comes back to the article's point. There are often significantly simpler solutions that have fewer layers of indirection, and will work better. You shouldn't reach for complexity until you need it.
* They might be serving twice as much (but definitely not ten times as much) as they were in 2005 but mostly that scales horizontally very easily.
I wish I could remember or find the proof, but in a multi-dimensional space, as the number of dimensions rise, the highest probability is for points to be located near the edges of the system -- with the limit being that they can be treated as if they all live at the edges. This is true for real systems too -- the users have found all of the limits but avoid working past them.
The system that optimally accommodates all of the edges at once is the old system.
It's not really meaningful though, at high dimensions you want to consider centrality metrics.
Business incentives are aligned around incremental delivery, not around efficient encoding of the target domain. The latter generally requires deep architectural iteration, meaning multiple complete system overhauls and/or rewrites, which by now are even vilified as a trope.
Mostly, though, I think there is just availability bias here. The simple, solid systems operating at scale and handled by a 3-person team are hard to notice over the noise that naturally arises from a 1,000-person suborganization churning on the same problem. Naturally, more devs will only experience the latter, and due to network effects, funding is also easier to come by.
See also: Google engineering practices: https://google.github.io/eng-practices/review/reviewer/looki...
And also: https://goomics.net/316
Is this really because the single problem is inherently difficult, or because you're trying to solve more than one problem (scope creep) due a fear of losing revenue? I think a lot of complexity stems from trying to group disparate problems as if they can have a single solution. If you're willing to live with a smaller customer base, then simple solutions are everywhere.
If you want simple solutions and a large customer base, that probably requires R&D.
Of course marketing and sales working hard to convince customers that they need more of everything, all the time, doesn't help.
You're not wrong. So many engineers operating in simple domains, on MVPs that don't have scale yet, on internal tools even. They introduce so much complexity thinking they're making smart moves.
Product people can be similar, in their own way. Spending lots of time making onboarding perfect when the feature could do less, cater for 95% of use cases, and need no onboarding at all.
I dont know if you only have genius friends but I can tell you many stories of things people thought warranted complexity that I thought didn't. So whatever you consider hard enough to warrant complexity, just know there's another smarter guy than you thinking you're spinning your wheels.
Also it's an impossible conversation to have without specific examples. Anyone can come and make a handwavy case about always simplifying and someone can make a case about necessary complexity but without specific example none can be proven wrong.
You're doing it wrong. More likely than not.
> Anyone proclaiming simplicity just hasnt worked at scale. Even rewrites that have a decade old code base to be inspired from, often fail due to the sheer amount of things to consider.
Or, you're just used to excusing complexity because your environment rewards complexity and "big things".
Simple is not necessarily easy. Actually simple can be way harder to think of and push for, because people are so used to complexity.
Yes. Massive scale and operations may make things harder but seeking simplicity is still the right choice and "working in big tech" is not a particular hard or rare credential in HN. Try an actual argument instead of an appeal to self authority.
Maybe some of the edge cases only apply to 2% of the customers? Could these customers move to a standard process? And what’s the cost of implementing, testing, integrating and maintaining these customer-specific solutions?
This has actually been the best solution for me to reduce complexity in my software, by talking to customers and business analysts… and making the complexity very transparent by assigning figures to it.
I think the unspoken part here is “let’s start with…”
It doesn’t mean you won’t have to “do all the things” so much as let’s start with too little so we don’t waste time doing things we end up not needing.
Once you aggregate all the simple things you may end up with a complex behemoth but hopefully you didn’t spend too much time on fruitless paths getting there.
Case in point: when I joined the BBC I was tasked with "fixing" the sports statistics platform. The existing system consisted of several dozen distinct programs instantiated into well over a hundred processes and running on around a dozen machines.
I DTSSTCPW / YAGNIed the heck out of that thing and the result was a single JAR running on a single machine that ran around 100-1000 times faster and was more than 100 times more reliable. Also about an order of magnitude less code while having more features and being easier to maintain expand.
https://link.springer.com/chapter/10.1007/978-1-4614-9299-3_...
And yeah, I was also extremely wary of tearing that thing down, because I couldn't actually understand the existing system. Nobody could. Took me over half a year to overcome that hesitancy.
Eschew Clever Rules -- Joe Condon, Bell Labs (via "Bumper Sticker Computer Science", in Programming Pearls)
https://tildesites.bowdoin.edu/~ltoma/teaching/cs340/spring0...
This is still one of my favorite software presentations.
The amount of knowledge required to first generate the codebase, that is now missing for the rewrite, is the elephant in the room for rewrites. That's a decade of decision making, business rules changing, knowledge leaving when people depart etc.
Much like your example, if you think all the information is in the codebase then you should go away and start talking to the business stakeholders until you understand the scope of what you don't currently know.
A rewrite of a decade old code base is not the simplest thing that could possibly work.
The author of the article is a staff engineer at GitHub.
edge case (n): Requirement discovered after the requirements gathering phase.
or they haven't worked in fields that are heavily regulated, or internationally.
This is why the DOGE guys were all like hey there are a bunch of people over 100 years old getting social security!! WTF!? Where someone with a wider range of experience would think, hmm, I bet there is some reason we need to figure out why they just jumped right to "this must be fraud!!"
They were cognizant of the limitations that are touched on in this article. The example they gave was of coming to a closed door. The simplest thing might be to turn the handle. But if the door is locked, then the simplest thing might be to find the key. But if you know the key is lost, the simplest thing might be to break down the door, and so on. Finding the simplest thing is not always simple, as the article states
IIRC, they were aware that this approach would leave a patchwork of technical debt (a term coined by Cunningham), but the priority on getting code working overrode that concern at least in the short term. This article would have done well to at least touch on the technical debt aspect, IMHO.
> Everything should be made as simple as possible, but not simpler.
And I found a similar quote from Aquinas
> If a thing can be done adequately by means of one, it is superfluous to do it by means of several; for we observe that nature does not employ two instruments where one suffices
(Aquinas, [BW], p. 129).
[0] https://blogs.oracle.com/javamagazine/post/interview-with-ke...
It is possible the OP came to this conclusion without knowing about Ward Cunningham?
Sometimes better to not assume the worst of people by default. Very easy to not know where something comes from, misremember or come up with something in parallel.
At no time was anything like the "worst of people" anywhere on the radar.
It's interesting you gave that example. Before my first use of a wiki I was on a team that used Lotus Notes and did project organization in a team folder. I loved that Notes would highlight which documents had been updated since the last time I read them.
In the next project, that team used a wiki. It's simpler. But, the fact it didn't tell me which documents had been updated effectively made it useless. People typed new project designs into the wiki but no one saw them since they couldn't, at a glance, know which of the hundreds of pages had been updated since they last read them.
It was too simple
Here's the page for my local makerspace's wiki, which runs on mediawiki:
https://bloominglabs.org/Special:RecentChanges?hidebots=1&li...
> It was too simple
That happens when you ignore requirements, either because they were outright discarded or because they were never recognized. "The simplest thing possible" is understood to include: "to meet all requirements."
Yesterday I had a problem with my XLSX importer (which I wrote myself--don't ask why). It turned out that I had neglected to handle XML namespaces properly because Excel always exported files with a default namespace.
Then I got a file that added a namespace to all elements and my importer instantly broke.
For example, Excel always outputs <cell ...> whereas this file has <x:cell ...>.
The "simplest thing that could possibly work" was to remove the namespace prefix and just assume that we don't have conflicting names.
But I didn't feel right about doing that. Yes, it probably would have worked fine, but I worried that I was leaving a landmine for future me.
So instead I spent 4 hours re-writing all the parsing code to handle namespaces correctly.
Whether or not you agree with my choice here, my point is that doing "the simplest thing that could possible work" is not that easy. But it does get easier the more experience you have. Of course, by then, you probably don't need this advice.
I think the author kind of mentions this: "Figuring out the simplest solution requires considering many different approaches. In other words, it requires doing engineering."
But the irony, in my opinion, is that experienced engineers don't need this advice (they are already "doing engineering"), but junior engineers can't use this advice because they don't have the experience to know what the "simplest thing" is.
Still, the advice is useful as a mantra: to remind us of things we already know but, in the heat of the moment, sometimes forget.
This avoids the endless whack-a-mole that you get with a partial solution such as "assume namespaces are superflous", which you almost certainly will eventually discover weren't optional.
Or some other hapless person using your terrible code will discover at 2am at night sitting alone in the office building while desperately trying to do something mission critical such as using a "simple" XML export tool to cut over ten thousand users from one Novel system to another so that the citizens of the state have a functioning government in the morning.
Ask me how I know that kind of "probably won't happen" thing will, actually, happen.
But there's utility in talking about it. If you teach people that good engineers prepare for Google scale, they will lean towards that. If you teach that unnecessary complexity is painful and slows you down, they will lean towards that.
Maybe we need a Rosetta stone of different simple and complex ways to do common engineering stuff!
Of course plenty of times there'll be some abstractions that make the code easier to follow, even at the expense of logic locality. And other times where extra infrastructure is really necessary to improve reliability, or when your in-memory counter hack gets more requirements and replacing it with a dedicated rate limiter lets you delete all that complexity. And in those cases, by all means, add the abstractions or infrastructural pieces as needed.
But in all such cases, I try to ask myself, if I need to hand off this project afterward, which approach is going to make things easiest to explain?
Note that my perception of this has changed over time. Long ago, I was very much in the camp of "simple" meaning: make everything as terse as possible, put everything in its own service, never write code when a piece of infrastructure could do it, decouple everything to the maximum extent, make everything config-based. I ironically remember imagining how delighted the new owners would be to receive such a well-factored thing that was almost no code at all; just abstraction upon abstraction upon event upon abstraction that fit together perfectly via some config file. Of course, transition was a complete fail, as they didn't care enough to grok how the all pieces were designed to fit together, and within a month, they'd broken just about every abstraction I'd built into it, and it was a pain for anybody to work with.
Since then, I've kept things simpler, only using abstractions and extra infra where it'd be weird not to, and always thinking what's going to be the easiest thing to transition. And even though I'm not necessarily transitioning a ton of stuff, it's generally easier to ramp up teams or onboard new hires or debug problems when the code just does what it says. And it's nice because when a need for a new abstraction becomes apparent, you don't have to go back and undo the old one first.
Ignoring the namespace creates ongoing complexity that you have to be aware of. Your solution now just works and users can use namespaces if they want.
The author deals with this in the hacks section.
The simplest thing can be very difficult to do. It require thought and understanding the system, which is what he says at the very beginning. But I think most people read the headline and just started spewing personal grievances.
But an experienced engineer already knows this!
I just think it's ironic that this advice is useless to junior engineers but unneeded by senior engineers.
That's a good way of putting it. The advice essentially boils down to "do the right thing, don't do the wrong thing". Which is good (if common sense) advice, but doesn't practically really help with making decisions.
For example, at work, the simplest solution across the whole organization was to adopt the most complex PostgreSQL deployment structure and backup solutions.
This sounds counter-intuitive at first. But this way, the company can invest ~3 full time employees on having an HA, PITR capable PostgreSQL clutser with properly archived backups around ~25 other development teams can rely on. This stack solves so many B2B problems of business continuity, security, backups, availability.
And on the other hand, for the dev-teams, the PostgreSQL is suddenly very simple. Inject ~8 variables into a container and you can claim all of these good things for your application without ever thinking about those.
The best solution is the simplest.
The quickest? No the simplest; sometimes thats longer.
So definitely not a complex solution? No, sometimes complexity is required, its the simplest solution possible given your constraints.
Soo… basically, the advice is “pick the right solution”.
Sometimes that will be quick. Sometimes slow. Sometimes complex. Sometimes config, Sometimes distributed.
It depends.
But the correct solution will be the simplest one.
Its just: “solve your problems using good solutions not bad ones”
…and that indeed both good, and totally useless advice.
We both read the article; you know as well as I do that the advice in it is to build simple reliable system that focus on actual problems not imagined ones.
…but does not say how to do that; and offers no meaningful value for someone trying to pick the “right” thing in the entire solution space that is both sufficiently complex and scalable to solve the requirements, but not too scalable, or too complex.
There’s just some vague hand waving about over engineering things at Big Corp, where, ironically, scale is an issue that mandates a certain degree of complexity in many cases.
Here’s some thing that works better than meaningless generic advice: specific detailed examples.
You will note the total lack of them in this article, and others like it.
Real articles with real advice are a mix of practical examples that illustrate the generic advice they’re giving.
You know why?
…because you can argue with a specific example. Generic advice with no examples is not falsifiable.
You can agree with the examples, or disagree with them; you can argue that examples support or do not support the generic advice. People can take the specific examples and adapt them as appropriate.
…but, generic advice on its own is just an opinion.
I can arbitrarily assert “100% code coverage is meaningless; there are hot paths that need heavy testing and irrelevant paths that do not require code coverage. 100% code coverage is a fools game that masks a lack of a deeper understanding of what you should be testing”; it may sound reasonable, it may not. That’s your opinion vs mine.
…but with some specific examples of where it is true, and perhaps, not true, you could specifically respond to it, and challenge it with counter examples.
(And indeed, you’ll see that specific examples turn up here in this comment thread as arguments against it; notably not picked up to be addressed by the OP in their hacker news feedback section)
Wouldn't it have been less effort and simpler to replace the custom code with an existing XML parser? It appears that in your case the simplest thing would have been easy, though the aphorism doesn't promise "easy".
If using a library wasn't possible for you due to NIH-related business requirements and given the wide proliferation of XML libraries under a multitude of licenses, then your pain appears to have been organizationally self-inflicted. That's going to be hard to generalize to others in different organizations.
I totally agree with you that most people should not implement their own XML parser, much less an Excel importer. But I'm grateful to have the luxury of being allowed/able to do both.
The specific choice I made doesn't matter. What matters is the process of deciding trade-offs between one approach and another.
My point is that the OP advice of "do the simplest thing that could possibly work" doesn't help a junior engineer (who doesn't have the experience to evaluate the trade-off) but it's superfluous for a senior engineer (who already has well-developed instincts).
Still, your experience with those holding "senior" job titles involves greater median expertise than I have found in my experience.
If you had just used a compliant XML parser as intended, you might not even have noticed that different encodings of namespaces was even occurring in the files! It just "doesn't register" when you let the parser handle this for you in the same sense that if you parse HTML (or XML) properly, then you won't notice all of the & and < encodings either. Or CDATA. Or Unicode escapes. Or anything else for that matter that you may not even be aware of.
You may be a few more steps away from making an XLSX importer work robustly. Did you read the spec? The container format supports splitting single documents into multiple (internal) files to support incremental saves of huge files. That can trip developers in the worst way, because you test with tiny files, but XLSX-handling custom code tends to be used to bulk import large files, which will occasionally use this splitting. You'll lose huge blocks of data in production, silently! That's not fun (or simple) to troubleshoot.
The fast, happy path is to start with something like System.IO.Packaging [2] which is the built-in .NET libary for the Open Packaging Conventions (OPC) container format, which is the underlying container format of all Office Open XML (OOXML) formats. Use the built-in XML parser, which handles namespaces very well. Then the only annoyance is that OOXML formats have two groups of namespaces that they can use, the Microsoft ones and the Open "standardised" ones.
[1] Famously! https://stackoverflow.com/questions/8577060/why-is-it-such-a...
[2] https://learn.microsoft.com/en-us/dotnet/api/system.io.packa...
Namespaces add a wrinkle, but it wasn't that hard to add. And I was able to add namespace aliasing in my API to handle the two separate "standard" namespaces that you're talking about.
But you're right about OPC/OOXML--those are massive specs and even the tiny slice that I'm handling has been error-prone. I haven't dealt with multiple internal files, so that's a future bug waiting for me. The good news is I'm building a nice library of test files for my regression tests!
It really isn't, and rolling your own parser is the diametric opposite of the "do the simplest thing" philosophy.
The XML v1.1 spec is 126 KB of text, and that doesn't even include XML Namespaces, which is a separate spec with 25 KB of text.
XML is only "simple" in the sense of being well-defined, which makes interoperability simple, in some sense. Contrast this with ill-defined or implementation-defined text formats, where it's decidedly not simple to write an interoperable parser.
As an end-user of XML, the simplest thing is to use an off-the-shelf XML parser, one that's had the bugs beaten out of it by millions of users.
There are very few programming languages out that don't have a convenient, full-featured XML parser library ready to use.
“Just because it works doesn’t mean it isn’t broken.” Is an aphorism that seems to click for people who are also handy in the physical world but many software developers think doesn’t sound right. Every handyman has at some time used a busted tool to make a repair. They know they should get a new one, and many will make an excuse to do so at the next opportunity (hardware store trip, or sale). Maybe 8 out of ten.
In software it’s probably more like 1 out of ten who will do the equivalent effort.
Then the executives would be stunned that it was done so quickly. The prototype team would pass it off to another team and then move on to the next prototype.
The team that took over would open the project and discover that it was really a proof of concept, not a working site. They wouldn't include basic things like security, validation, error messages, or any of the hundred things that a real working product requires before you can put it online.
So the team that now owned it would often have to restart entirely, building it within the structures used by the rest of our products. The executives would be angry because they saw it "work" with their own eyes and thought the deployment team was just complicating things.
Those are the worst because you don’t have done criteria you can reasonably write down. It’s whenever QA stops finding fakes in the code, plus a couple months for stragglers you might have missed.
Having the house fall on Buster was <chef's kiss>: https://youtube.com/watch?v=FN2SKWSOdGM
> It's not enough for a program to work – it has to work for the right reasons
I guess that’s basically the same statement, from a different angle.
Until recently I would say such programs are extremely rare, but now AI makes this pretty easy. Want to do some complicated project-wide edit? I sometimes get AI to write me a one-off script to do it. I don't even need to read the script, just check the output and throw it away.
But I'm nitpicking, I do agree with it 99% of the time.
By the time you’ve done something five times, it’s probably part of your actual process, and you should start treating it as normal instead of exceptional. Even if admitting so feels like a failure.
So I staple something together that works for the exact situation, then start removing the footguns I’m likely to hit, then I start shopping it to other people I see eye to eye with, fix the footguns they run into. Then we start trying to make it into an actual project, and end game is for it to be a mandatory part of our process once the late adopters start to get onboard.
On a recent project I fixed our deployment and our hotfix process and it fundamentally changed the scope of epics the team would tackle. Up to that point we were violating the first principle of Continuous: if it’s painful, do it until it isn’t. So we would barely deploy more often than we were contractually (both in the legal and internal cultural sense) obligated to do, and that meant people were very conservative about refactoring code that could lead to regressions, because the turnaround time on a failing feature toggle was a fixed tempo. You could turn a toggle on to analyze the impact but then you had to wait until the next deployment to test your fixes. Excruciating with a high deviation for estimates.
With a hotfix process that actually worked worked, people would make two or three times as many iterations, to the point we had to start coordinating to keep people from tripping over each other. And as a consequence old nasty tech debt was being fixed in every epic instead of once a year. It was a profound change.
And as is often the case, as the author I saw more benefit than most. I scooped a two year two man effort to improve response time by myself in three months, making a raft of small changes instead of a giant architectural shift. About twenty percent of the things I tried got backed out because they didn’t improve speed and didn’t make the code cleaner either. I could do that because the tooling wasn’t broken.
If they want to use those resources to prioritize quality, I'll prioritize quality. If they don't, and they just want me to hit some metric and tick a box, I'm happy to do that too.
You get what you measure. I'm happy to give my opinion on what they should measure, but I am not the one making that call.
My second lead role, the CTO and the engineering manager thought I could walk on water and so I had considerable leeway to change things I thought needed changing.
So one of the first things I did was collectively save the team about 40 hours of code-build-test time per week. Which is really underselling it because what I actually did was both build a CI pipeline at a time nobody knew what “CI” meant, and increase the number of cycles you could reliably get through without staying late from 4 to 5 cycles per day. A >20% improvement in iterations per day and a net reduction in errors. That was the job where I learned the dangers of pushing code after 3:30pm. Everyone rationalizes that the error they saw was a glitch or someone else’s bug, and they push and then come in to find the early birds are mad at them. So better to finish what we now call deep work early and do lighter stuff once you’re tired.
Edit: those changes also facilitated us scaling the team to over twice the size of any project I’d worked on before or for some time after, though the EM deserves equal credit for that feat.
Then they fired the EM and Peter Principled by far the worst manager I’ve ever worked for (fuck you Mike, everyone hated your guts), and all he wanted to know was why I was getting fewer features implemented. Because I’m making everyone else faster. Speaking of broken, the biggest performance bottleneck in the entire app was his fault. He didn’t follow the advice I gave him back when he was working in our query system. Discovering it took hiring an Oracle DB contractor (those are always exorbitant). Fixing it after it shipped was a giant pain (as to why I didn’t catch his corner cutting, I was tagged in by another lead who was triple booked, and when I tagged back out he unfortunately didn’t follow up sufficiently on the things I prescribed).
Meanwhile all the people writing agentic LLM systems: “Hold my beer”
Unfortunately, simplicity is complicated. The median engineer in industry is not a reliable judge of which of two designs is less complex.
Further, "simplicity" as an argument has become something people can parrot. So now it's a knee-jerk fallback when a coworker challenges them about the approach they are taking. They quickly say "This is simpler" in response to a much longer, more sincere, and more correct argument. Ideally the team leader would help suss out what's going on, but increasingly the team lead is a less than competent manager, and simplicity is too complicated a topic for them to give a reliable signal. They prefer not to ruffle feathers and let whoever is doing the work make the call; the team bears the complexity.
― Dijkstra
And then, there's people who do "resume-driven development" and push for more complexity in their workplace so that they can list real-life work experience for the next door to open. I know someone who made a tool that just installs Java JDK + IDE + DBearer using Rust, so that he can claim that he used Rust in the previous company he worked for.
I generally think we're more obsessed with being perceived as engineers than actually do engineering.
What you really learn over time and it’s more useful, is to think along these lines: don’t try to solve problems that don’t exist yet.
This is a mantraic, cool headline but useless. The article doesn't develop it properly either in my opinion.
It is best to prepare for problems which don't exist yet. You don't need to solve them, but design with the expectation they may arise. Failure to do so leads to tech debt.
"real mastery often involves learning when to do less, not more. The fight between an ambitious novice and an old master is a well-worn cliche in martial arts movies: the novice is a blur of motion, flipping and spinning. The master is mostly still. But somehow the novice’s attacks never seem to quite connect, and the master’s eventual attack is decisive".
Now the problem with the headline and repeating it is, when "just do a simple thing" becomes mandated from management (technical or not), there comes a certain stress about trying to keep it simple and if you try running with it for a complex problems you easily end up with those hacks that become innate knowledge that's hard to transfer instead of a good design (that seemed complex upfront).
Conversly, I think a lot of "needless complexity" comes from badly planned projects where people being bitten by having to continuously add hacks to handle wild requirements easily end up overdesigning something to catch them, only to end up with no more complexity in that area and then playing catchup with the next area needing ugly hacks (to then try to design that area that stabilized and the cycle repeats).
This is why as developers we do need to inject ourselves into meetings (however boring they are) where things that do land up on our desks are decided.
Some generalizations are necessary to formalize the experience we have accumulated in the industry and teach newcomers.
The obvious problem is that, for some strange reason, lots of concepts and patterns that may be useful when applied carefully become a cult (think clean architecture and clean code), which eventually only makes the industry worse.
For example, clean architecture/ports and adapters/hexagonal/whatever, as I see it, is a very sane and pragmatic idea in general. But somehow, all battles are around how to name folders.
But also keep in mind the audience: the kinds of people who are tempted to use J2EE (at the time) with event sourcing and Semantic Web, etc.
This is really a counterbalance to that: let's not add sophistication and complexity by default. We really are better off when we bias towards the simpler solutions vs one that's overly complex. It's like what Dan McKinley was talking about with "Choose Boring Technology". And of course that's true (by and large), but many in our industry act like the opposite is the case - that you get rewarded for flexing how novel you can make something.
I've spent much of my career unwinding the bad ideas of overly clever devs. Sometimes that clever dev was me!
So yes ... it's an overly general statement that shouldn't need to be said, and yet it's still useful given the tendency of many to over-engineer and use unnecessarily sophisticated approaches when simpler ones would suffice.
I see people adding unnecessary complexity to things all the time and advocate for keeping things simple on a daily basis probably. Otherwise designers and product managers and customers and architects will let their mind naturally add complexity to solutions which is unnecessary.
Education and training sometimes enforces prejudices, rules and stigmas that evade inspection of the subject matter in raw form.
Preference to idealism probably emerged from peace times which have no struggle. Someone would be obsessed with perfectness of a sculpture only when they don't need to hunt for the next meal. The real world runs on minimal, conservative, durable and robust approaches.
“Simple is robust”
It’s easy to over-design a system up front, and even easier to over-design improvements to said system.
Customer requirements are continually evolving, and you can never really predict what the future requirements will be (even if it feels like you can).
Breaking down the principle, it’s not just that a simple system is less error prone, it’s just as important that a simple architecture is easier to change in the future.
Should you plan for X, Y, and Z?
Yes, but counterintuitively, by keeping doors open for future and building “the simplest thing that could possibly work.”
Complexity adds constraints, these limitations make the stack more brittle over time, even when planned with the best intentions.
It doesn't. It never is. It can't be.
My favorite example of this was the Moon shot. Each step was learning how to do just that one step. Mercury was just about getting into orbit, not easy even now with SpaceX though they are standing on the shoulders of those giants. Then Gemini for multiple people and orbital maneuvering (that experience gained them lots of learning) and then Apollo 8 was still a dress rehearsal even though they flew around the Moon.
Each step HAD to be simple because complexity weighed too much. But each of those simple steps were still wildly complex.
Every time I would dive in and code up something that I though was easy, it would blow up in some weird way, and I have found that doing each step individually and getting it right, might sound like I was going really slow, but it was smoother so it was faster in the end because I wasn't chasing bugs in all the places, but just one.
> “A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system.”
I have this up on my wall and refer to it constantly (for both tech and non-tech challenges).
Was it Donald Knuth who said "premature optimization is that root of all evil"?
This article made this point very well, especially regarding the obsession with "scaling" in the SaaS world.
I've seen thousands and thousands of developer hours completely wasted, because developers were forced to massively overcomplicate greenfield code in anticipation of some entirely hypothetical future scaling requirement which either never materialized (95% of the time) or which did appear but in such a different form that the original solution only got in the way (remaining 5%).
John Ousterhout’s Philosophy of Software Design makes the case for simplicity in a book-length form. I really like how he emphasizes the importance of design simplicity for the maintainability of software; this is where I've seen it matter the most in practice.
I don't mind, I don't blame people for not predicting the future - it's a tough game. But god the hubris and attitude we put up with until the crows came home to roost.
I assume you mean astrology (prophecy), not astronomy (science)?
I happen to have a recent example with an add-on card. For reasons that I won't get into, we needed both insurance against major rearchitecturing, as well as leverage synergy with other product lines when it comes to configuration. That led me to design a fairly intricate library that runs visitors against a data model, as well as a mechanism where the firmware dumps its entire state as a BSON document in a dedicated Flash partition before upgrading. That gave us the peace of mind that whatever happens, we can always just restore from that document and nuke everything else when booting a newer firmware for the first time.
The simplest thing that could possibly work would've been to not do any of that and let future firmware versions deal with it. Instead, I designed it so that I don't end up regretting it later, regardless of how the product evolved.
The only point I do regret was missing one piece of information to save on the initial release-to-manufacturing version. I had to put in one hack that if the saved document has version 1.0, then go spelunking in the raw flash to retrieve that piece of information where it was in version 1.0 and slipstream it into the document before processing it. Given the data storage mechanism of that platform, I'd be tearing my hair apart dealing with multiple incompatible data layouts across firmware versions if I did the simplest thing.
My point is that the simplest thing that could possibly work disregards state and time as factors. Good engineers balance all requirements to derive the simplest solution that works; great engineers do so while avoiding visits from irate future colleagues.
As someone who has strived for this from early on, the problem the article overlooks is not knowing some of these various technologies everyone is talking about out, because I never felt I needed them. Am I missing something I need, but just ignorant, or is that just needless complexity that a lot of people fall for?
I don’t want to test these things out to learn them in actual projects, as I’d be adding needless complexity to systems for my own selfish ends of learning these things. I worked with someone who did this and it was a nightmare. However, without a real project, I find it’s hard to really learn something well and find the sharp edges.
Yeah, let me shoehorn that fishing trip into my schedule without a charge number, along with the one from last week...
Though there was a time when he wanted me to onboard my simple little internal website to a big complicated CICD system, just so we could see how it worked and if it would be useful for other stuff. It wouldn’t have been useful for anything else, and I already had a script that would deploy updates to my site that was simple, fast, and reliable. I simply ignored every request to look into that.
Other times I could tell him his idea wouldn’t work, and he would say “ok” and walk away. That was that. This accounted for about 30% of what he came to me with.
Ignorance plays a big role. If you don't perceive, e.g. a race condition happening, then it's much simpler to avoid complicated things like locking and synchronisation.
If you have the belief that your code will never be modified after you commit it, then it's much simpler to not write modifiable code.
If you believe there's no chance of failure, then it's simpler to not catch or think about exceptions.
The simplest thing is global variables, single-letter variable names, string-handling without consideration for escaping, etc.
It’s more about adding additional tools to the stack. I will fight hard not to add in additional layers of complexity that require more infrastructure and maintenance to manage. I want to eliminate as many points of failure as possible, and don’t want the stack to be so complex that other people can’t understand how it all fits together. If I win the lotto, or simply go on vacation, I want whoever has to take it over to be able to understand and support it.
The thing I’ll be working on next week has a lot of potential race conditions, and I need to find a simple solution that avoid them, without creating a support and maintainability burden on myself and the future team. Building a database would probably be the easy solution, but that’s one more dependency and thing to maintain, and also means I need to build a front end for people to access it. If I can do it without a database, that would be ideal.
Eventually you might start adding more things to it because of needs you haven't anticipated, do it.
If you find yourself building the tool that does "the whole thing" but worse, then now you know that you could actually use the tool that does "the whole thing".
Did you waste time not using the tool right from the start? That's almost a filosofical question, now you know what you need, you had the chance to avoid it if it turned out you didn't, and maybe 9 times out of 10 you will be right.
The in-memory rate-limiting example is a perfect case study. An in-memory solution is only simple for a single server. The moment you scale to two, the logic breaks and your effective rate limit becomes N × limit. You've accidentally created a distributed state problem, which is a much harder issue to solve. That isn't simple.
Compare that to using a managed service like DynamoDB or ElastiCache. It provides a single source of truth that works correctly for one node or a thousand. By the author's own definition that "simple systems are stable" and require less ongoing work, the managed service is the fundamentally simpler choice. It eliminates problems like data loss on restart and the need to reason about distributed state.
Perhaps the definition of "the simplest thing" has just evolved. In 2025, it's often not about avoiding external dependencies. You will often save time by leveraging battle-tested managed services that handle complexity and scale on your behalf.
But all of it comes with tradeoffs and you have to apply judgement. Just as it would be foolish to write almost anything these days in assembly, I think it would be almost as foolish to just default to a managed Amazon service because it scales without considering whether A) you actually need that scale and B) there are other concerns considerations as to why that service might not be the best technical fit (in particular, I've heard regrets due to overzealous adoption of DynamoDB on more than one occasion).
The engineers who most aggressively advocate for bespoke solutions in the name of "simplicity" often have the least experience with their managed equivalents, which can lead to the regrets you mentioned. Conversely, many engineers who only know how to use managed services would struggle to build the simple, self-contained solution the author describes. True judgment requires experience with both worlds.
This is also why I think asking "do we actually need this scale?" is often the wrong question; it requires predicting the future. Since most solutions work at a small scale, a better framework for making a trade-off is:
* Scalability: Will this work at a higher scale if we need it to?
* Operations: What is the on-call and maintenance load?
* Implementation: How much new code and configuration is needed?
For these questions, managed services frequently have a clear advantage. The main caveat is cost-at-scale, but that’s a moot point in the context of the article's argument.
How will this scale? How will this fail?
I like to be able to answer these questions from designs down to code reviews. If you hit a bottleneck or issue, how will you know?
IIUC, author is a Staff SWE, so this tracks.
See also "Worse is better" which has been debated a million times by now.
Sure, try to keep things simple. Unless it doesn't make sense. Then make them less simple. Will you get it wrong sometimes? Yes. Does it matter? Not really. You'll be wrong sometimes no matter what you do, unless you are, in fact, the Flying Spaghetti Monster. You're not, so just accept some failures from time to time and - most importantly - reflect on them, try to learn from them, and expect to be better next time.
As long as you understand that everything is a trade-off and, unfortunately, that the modern field is based on subjective opinions of popular and not necessarily competent people, you will be fine.
I fully agree with the author about the desirability of simplicity. I feel it in my bones, as someone with a background in the arts who has spent endless hours agonizing over tiny details and discarding perfectly good sections which nevertheless did not serve the whole.
What this article doesn't emphasize is that simplicity has cost. Shaving down a pile of yak hair isn't enough to reveal Brancusi's Bird in Space[1] — it needs to be visualized from multiple angles, tested, reshaped, re-imagined over and over.
In engineering, simplicity is one more axis to optimize against. For systems that endure, some measure of time spent simplifying will be worth it. But I find that when I make that argument, my case is strengthened by bearing in mind the size of the spend.
[1] https://upload.wikimedia.org/wikipedia/commons/6/69/Bird_in_...
> Instead, spend that time understanding the current system deeply, then do the simplest thing that could possibly work.
I'd argue that a fair amount of the former results in the ability to do the latter.
There's a substantial amount of wisdom that goes into designing "simple" systems (simple to understand when reading the code). Just as there's a substantial amount of wisdom that goes into making "simple" changes to those systems.
Same, or reliability-tiered separately. But in both aspects I more frequently see the resulting system to be more expensive and less reliable.
https://benoitessiambre.com/entropy.html https://benoitessiambre.com/integration.html
Once you see this, you see it everywhere. 90% of the places using "modern" web technology, are built as if they were anticipating FAANG scale, not because they are, but because the people building them hoped to be working at FAANG soon.
First of all, simplicity is the hardest thing there is. You have to first make something complex, and then strip away everything that isn't necessary. You won't even know how to do that properly until you've designed the thing multiple times and found all the flaws and things you actually need.
Second, you will often have wildly different contexts.
- Is this thing controlling nuclear reactors? Okay, so safety is paramount. That means it can be complex, even inefficient, as long as it's safe. It doesn't need to be simple. It would be great if it was, but it's not really necessary.
- Is the thing just a script to loop over some input and send an alert for a non-production thing? Then it doesn't really matter how you do it, just get it done and move on to the next thing.
- Is this a product for customers intended to solve a problem for them, and there's multiple competitors in the space, and they're all kind of bad? Okay, so simplicity might actually be a competitive advantage.
Third, "the simplest thing that could possibly work" leaves a lot of money on the table. Want to make a TV show that is "the simplest thing that could possibly work"? Get an iPhone and record 3 people in an empty room saying lines. Publish a new episode every week. That is technically a TV show - but it would probably not get many views. Critics saying that you have "the simplest show" is probably not gonna put money in your pocket.
You want a grand design principle that always applies? Here's one: "Design for what you need in the near future, get it done on time and under budget, and also if you have the time, try to make it work well."
I don't follow. I've made simple things many times without having to make a complex thing first.
You just described Podcast. It did work for many (obviously it failed for many as well). That's an excellent example of why one should start with the simplest thing that could possibly work. Probably better than the OP's examples.
The beauty of this approach is that you don't design anything you don't need. The requirements will change, and the design will change. If you didn't write much in the first place, it's easy.
An example is databases. People design their database schemas in incredibly simplistic ways, and then regret it later when the predictable stuff most people need doesn't work with the old schema, and you can't even just add columns, but you have to modify existing ones. Avoid the nightmare by making it reasonably extensible from the start. It may not be "the simplest thing that could possibly work", but it is often useful and doesn't cost you anything extra.
Just as much as people say "don't prematurely optimize", they should also say "don't prematurely make it total crap".
Really love and agree with this, and (shameless plug?) I think really aligns with a way of working I (and some colleagues) have been working on: https://delivervaluedaily.dev/
Every extra details or workaround increases the number of things you need to keep in your head, not just when building the system, but every time you come back to maintain or extend it. "Simple systems have fewer 'moving pieces': fewer things you have to think about when you're working with them."
Simplicity isn't just about getting the job done quickly; it's about making sure future you (or someone else) can actually understand and safely change the system later. Reducing cognitive load with simplicity pays off long after the job is done.
I don't see it as a blind prescription.
It doesn't imply that choosing what is simple, will be simple. Or that simplest, will be simple. Or that this is a process uniquely immune from problems or tradeoffs.
Just a reminder to never forget to aim for simplest.
A tautological cookie fortune, of something important we often functionally forget or slide on.
There is a lot of wisdom in recognizing and repeating the most important "mantras of the obvious". And listening to them reformulated, in other ways, by other people.
The greatest craftbeings never stop revisiting the basics.
Complexity is sometimes necessary, but it always creates more ways things can break.
The bigger problem I see is lack of abstraction, or systems where the components have too much knowledge of each other. These are fast to build and initially work just as well as the more "heavily engineered" systems. However as the code evolves to meet changing requirements, what happens is that code becomes more complex, either through even more coupling or through ad hoc abstractions that get implemented haphazardly.
It can be really difficult to know where to draw the lines, though. When you are trying to decide between a simpler and more complex option, I think it's worth starting with the simplest thing that could possibly work, but build it in a way that you can change that decision without impacting the rest of the system.
On a tangent, I'm not a fan of Uncle Bob (to put it mildly) but this topic makes me think of his quote "a good architecture allows decisions to be deferred". I would restate it as "a good architecture allows decisions to be changed in isolation".
this is the key practical advice. when you start designing for hypothetical use cases that may never happen you are opening up an infinite scope of design complexity. setting hard specifications for what you actually need and building that simplifies the design process, at least, and if you start with that kind of mindset one can hope that it carries over to the implementation.
the simplest things always win because simple is repeatable. not every simple thing wins (many are not useful or have defects) but the winners are always simple.
But don't forget to ask your manager if they want to be prepared for future scenarios A, B, or C.
And write down their answer for later reference.
Alas, you do not have infinite money. But you can earn money by becoming this person for other people.
The catch 22 is most people aren't going to hire the guy who bills himself as the guy who does the simplest thing that could possibly work. It turns out the complexities actually are often there for good reason. It's much more valuable to pay someone who has the ability to trade simplicity off for other desirable things.
"It turns out the complexities actually are often there for good reason" - if they're necessary, then it gets folded into the "could possibly work" part.
The vast majority of complexities I've seen in my career did not have to be there. But then you run into Chesterton's Fence - if you're going to remove something you think is unnecessary complexity, you better be damn sure you're right.
The real question is how AI tooling is going to change this. Will the AI be smart enough to realize the unnecessary bits, or are you just going to layer increasingly more levels of crap on top? My bet is it's mostly the latter, for quite a long time.
Dev cycles will feel no different to anyone working on a legacy product, in that case.
All this talk of simplicity may mislead one into building something that just about works, and calling it a day. I believe that is a mistaken approach in science and engineering. Instead, there should be a deeper understanding of the limits and constraints on the problem. In my view, the focus should be on the problem and its constraints rather than on the nature of the solution.
An analogy in terms of algorithms would be: Why take the pains to implement an O(nlogn) solution to sorting when you can implement a O(n^2) solution which is far simpler. ie, be satisfied with a feasible solution instead of seeking what could be more optimal. The road to interesting insights may not be linear and incremental. Doing the simplest thing at all times is an incremental greedy approach.
Even in complex domains, the aphorism, "everything should be made as simple as possible, but not simpler" applies.
As for working at scale, complex systems are made of small, simple, working systems, where dependencies between them are limited and manageable. Without an effort to keep interdependencies simple, the result is a Big Ball of Mud, and we all know how well that works.
Many "industry best-practices" seen in this light are make-work, a technique for expanding simple things to fill the time to keep oneself employed.
For example, the current practice of dependency injection with interfaces, services, factories, and related indirections[1] is a wonderful time waster because it can be so easily defended.
"WHAT IF we need to switch from MySQL to Oracle DB one day?" Sure, that... could happen! It won't, but it could.
[1] No! You haven't created an abstraction! You've just done the same thing, but indirectly. You've created a proxy, not a pattern. A waste of your own time and the CPU's time.
— Antoine de Saint-Exupéry, Terre des hommes, 1939
I always felt software is like physics: Given a problem domain, you should use the simplest model of your domain that meets your requirements.
As in physics, your model will be wrong, but it should be useful. The smaller it is (in terms of information), the easier it is to expand if and when you need it.
I liked the post, but these kinds of articles do make sense to people who've already been through the trenches & view the advice from their seasoned experience PoV and apply it accordingly. But if people without such experience follow it to the letter just because it's written, can have surprises ahead.
Staff. You've got developers and they will continue working on a product oftentimes way past the "perfect" stage.
Case in point: log aggregation services like Sentry/etc. It always starts with "it's so complex, let's make a sane log ingestion service with a simple web viewer" and then it inevitably spirals into an unfathomable pile of abstractions and mind-boggling complexity to a point where it is literally no longer usable.
It's similar to the problem of regulation. Looking at each individual law, it often seems reasonable. It's only when there are 10,000, and everything grinds to a halt, that people realise there's a problem.
IMHO the log services example is excellent to illustrate this: the path that leads to their hairball of complexity is perfectly clear and every solution they do is extremely logical and obvious.
This is why these services are so much similar.
But the end result is always too complex and it seems to me that "perfect" point does not exist for this line of products.
Where people fail on this most frequently is reasoning in first person pronouns whether explicitly or implicitly and whether or not stated, thought, or considered. Simplicity is but merely the result from a comparison of measures. It’s not about you and has nothing to do with your opinion.
Many people think "simple" is equivalent to easy/fast.
It often isn't. Often complexity is actually easy/fast. But simple is hard as it requires deeply understanding the problem, and fitting the right solution to where there is nothing to "simplify".
As I'm doing the simplest thing that could possibly work, I do not have an edge proxy.
Of course, the author doesn't mean _that_ kind of simplicity. There are always hidden assumptions about which pieces of complexity are assumed, and don't count against your complexity budget.
I appreciate “Philosophy of Software Design” by Ousterhout. I recently read that while rebuilding a text editor. Mind blowing experience. There is a lot of opportunity to more tightly encapsulate logic, to more clearly abstract a system, to keep a system simple yet powerful and extensible. I believe I became twice as good of a developer just by reading a chapter a day and sticking with the workflow.
But with that in mind, I do agree that a lot of systems are more complex than they need to be. I like to keep things simple.
Of course scalability adds complexity, and sometimes you need that. But you don't always need that, and making things scalable that don't need to be, makes them harder to understand and maintain.
Time and time again amazingly complex machines and they just fail to perform better than a rubber-band and bubble gum.
This stuff just can not be reimplemented that simple and be expected to work.
The music was also quite good imo.
unicorn, i.e. CGI, i.e. process-per-request, became anachronistic, gosh, more than 20 years ago at this point!
at least, if you're serving any kind of meaningful load -- a bash script in a while loop can serve 100RPS on an ec2.micro, that's (hopefully) not what anyone is talking about
I've run into a lot of situations where doing the "simplest thing that could possibly work" ultimately led to mountains of pain and failure. I've also run into situations where things were over-engineered to account for situations that never came to be resulting in similar pain.
It boils down to "it depends" -- a careful analysis of the requirements, tradeoffs, future possibilities, and a mountain of judgement based on experience.
For sure, err on the side of "simple" when there's uncertainty but don't be dogmatic about it. Apply simple principles like loose coupling, pure functions, minimal state, avoid shared state, pick boring tools, and so on to ensure that "the simplest thing that could possibly work" doesn't become "the simplest thing that used to work but is now utter hell in the face of the unexpected". It all depends.
The simplest thing to do is almost always the easiest, but knowing what is easiest thing to do is a lot trickier —- see Javascript frameworks.
But, it's terrible for 2025's median software team. I know that isn't OP's intention. But inevitably, all good advice falls prey to misinterpretation.
In contemporary software orgs, build fast is the norm. PMs, managers, sales & leadership want Engg to build something that works, with haste. In this imagination, simple = fast. Let me say this again. No, you cannot convince them otherwise. To the median org, SIMPLE = FAST. The org always chooses the fastest option, well past the point of diminishing returns. Now once something exists, product managers and org leaders will push to deploy it and sales teams will sell dreams around it. Now you have an albatross around your neck. Welcome to life as a fire fighter.
For the health of a company and the oncall's sanity, Engg must tug at the rope in the opposite direction from (perceived) simplicity. The negotiated middle ground can get close to the 'simple' that OP proposes. But today, if Engg starts off with 'simple', they will be rewarded with a 'demo on life support'. At a time when vibe coding is trivial, I fear things will get worse before they get better.
All that being said, OP's intended advice is one that I personally live by. But often, simple is slow. So, I keep it to myself.
Said that in in a telephone call one time, and the guy leading that was all "I'm mildly disturbed that you had a verbalization for that."
Can your software run for millions of years?
YAGANI.
A term coined by consultants who don’t understand an industry who basically say “do the least possible thing that will work” because they don’t understand the domain and don’t understand what requirements are often non-negotiable table stakes of complexity you need to compete.
It reminded me of a Martin Fowler post where he was showing implementation of discounts in some system and advocating to just hard code the first discount in the method (literally getDiscount() {return 0.5}).
Even the most shallow analysis would show this was incredibly stupid and short sighted.
But this was the state of the art, or so we were told.
See also Ward Cunningham trying and failing to solve Sudoku using TDD.
The reality is most business domains are actually complex, and the one who best tackles that complexity up front can take home all the marbles.
> Even the most shallow analysis would show this was incredibly stupid and short sighted.
Why? Yes, there's a high probability that this single line of code is ultimately wrong. But having it allows for testing the system (to ensure that getDiscount gets called, that the resulting price is between zero and the undiscounted price, etc.) and it can trivially be replaced when the actual discounting logic becomes known. Nothing can reasonably be called "short sighted" that doesn't actually limit you in the future.
> See also Ward Cunningham trying and failing to solve Sudoku using TDD.
It was Ron Jeffries who was struggling with Sudoku, not Ward Cunningham.
And he didn't struggle because of "doing simple things", but because he didn't actually know how to solve Sudoku, and rather than doing research (or trying a few by hand and reflecting on his own thought process) he expected to gain special insight into the solver by modeling the problem more accurately.
If he had actually known, then he would have just applied the same techniques to modeling the solver, and avoided all the embarrassment.
TDD is orthogonal to "doing the simplest thing", just preached by many of the same people. The problem isn't with writing tests for early iterations of the project that aren't anywhere near fully functional. The problem is that the making the tests pass doesn't actually teach you anything about what the next step of functionality is. You still have to actually think, and Jeffries didn't have a proper basis to ground his thinking.
Norvig's approach to the problem involved some clever hacks that aren't necessarily the best thing from a code maintainability standpoint. But he resisted the temptation to create a prematurely generalized system for solving constraint-propagation problems, trying to create some generic abstract representation of a "constraint" etc. In essence, that code didn't diverge from what the XP gurus preached; it just didn't follow their specific methods ("OO" modeling that's heavily based on creating new classes, TDD etc.).
I independently came to the same conclusions about how to solve the problem, however many years ago. Recently, I was reminded of the story (I think via https://news.ycombinator.com/item?id=42953168) and tried writing a solver from scratch as an exercise. My code was certainly not as "clever" as Norvig's, but only a bit longer and IMO much better factored.
Incredible that we can tar both complexity and simplicity with the brush of "consultant BS."
That advice these days surely means having an LLM vibecode a mess of something?
Is such obvious and unquantifiable advice actually useful?
You aren't gonna need it
If you can do this regularly, you can keep the _effective_ cognitive size of the system small even as each closed box might be quite complex internally.
this *tactical* style of development is the same thing propounded by TDD folks. there is no design, just a wierdly glued together mishmash of things that just happen to work.
i am (fwiw once again) not against unit-testing, that is almost always needed.
1) Sometimes the simplest things is still extremely complex
2) The simplest thing that works is often very hard to find
The problem is that when you let people without experience design things, they tend towards what I call “what if driven development”.
At every point in the design they ask what if I need to extend this later. Then they add a hook for that future expansion.
When we are training juniors the best thing is not to show them all the best practices in Clean Code because most of them gravitate towards overengineering anyway. If you give them permission and “best practices” to justify their desires, they write infuriatingly complicated layers upon layers of indirection. The thing to do is to teach them to do the simplest thing that works—just like this blog post says.
Will that result in some regrets when they build something that isn’t extensible? Absolutely! But it’s generally much easier to add a missing abstraction than it is to remove a bad one.
> System design requires competence with a lot of different tools: app servers, proxies, databases, caches, queues, and so on.
Yes! This is where I see so many systems go wrong. Complex software engineering paving over a lack of understanding of the underlying components.
> As they gain familiarity with these tools, junior engineers naturally want to use them.
Hell yea! Understanding how kafka works so you don't build some crazy queue semantics on it. Understanding the difference between headless and clusterIP services in kubernetes so you don't have to build a software solution to the etcd problems you're having.
> However, as with many skills, real mastery often involves learning when to do less, not more. The fight between an ambitious novice and an old master is a well-worn cliche in martial arts movies
Wait what? Surely you mean doing more by writing less code. Are you now saying that learning and using these well tested, well maintained, and well understood components is amateurish?
I watch this talk about once per year to remind myself to eschew complexity.
it takes maybe 3 to 5 rewrites before you truly grasp a problem.
For example, chips just barely work.
If they work too well, you could shrink the chip until it barely works making it cheaper or faster or use less power.
That said, although this exercise is kind of interesting - like playing jenga - it might not be fun or satisfying.
better faster cheaper - sometimes you need to choose better.
This seems a bit overstated (except perhaps for certain recent Intel fabs) considering that they do billions of operations per second and a single error could completely invalidate a complex system in the worst case.
But of course, it runs afoul of reality a lot of the time.
I recently got annoyed that the Windows Task scheduler just sometimes... Doesn't fucking work. Tasks don't run, and you get a crazy mystery error code. Never seen anything like it. Drives me nuts. Goddamned Microsoft making broken shit!
I mostly write Powershell scripts for automating my system, so I figure I'll make a task scheduler which uses the C# PSHost to run the scripts, and keep the task configuration in a SQLite database. Use a simple tray app with a windows form and EFCore for SQLite to read and write the task configuration. Didn't take too long, works great. I am again happy, and even get better logging out of failure conditions, since I can trap the whole error stream instead of an exit code.
My wife is starting a business, and I start to think about using the business to also have a software business piece to it. Maybe use the task scheduler as a component in a larger suite of remote management stuff for my wife's business which we sell to similar businesses.
Well. For it to be ready, it's got to be reliable and secure. Have to implement some checks to wait if the database is locked, no biggie. Oh, but what happens if another user running the tray icon somehow can lock the database, I've got to work out how to notify the other user that the database is locked... Also now tasks need to support running as a different user than the service is running under. Now I have to store those credentials somewhere, because they can't go into the SQLite DB. DPAPI stores keys in the user profile, so all this means now I have to implement support for alternative users in the installer (and do so securely, again with the DPAPI, storing the service's credentials).
I've just added a lot of complexity to what should be a pretty simple application, and with it some failure modes. Paying customers want new features, which add more complexity and more concern cases.
This is normal software entropy, and to some extent it's unavoidable.
Your wife is making a business, and you want to write some code to help.
Then suddenly your requirements balloon to multiple concurrent users, needing to have a system tray icon and then also the ability to take this code and sell it to other people. Wow this project is suddenly complex!
This is just "I need to be able to scale infinitely" written in different words. The complexity comes from wanting a ton of things before they're actually needed (with the wrinkle of wanting to use some previously written scheduler for this project.
The risk is misunderstanding the problems they are solving, and ignoring all the constraints that drove the need for some key design traits that were in place to actually solve the problem (i.e., complexity)
Take the following example from the article:
> You should do that too! Suppose you’ve got a Golang application that you want to add some kind of rate limiting to. What’s the simplest thing that could possibly work? Your first idea might be to add some kind of persistent storage (say, Redis) to track per-user request counts with a leaky-bucket algorithm. That would work! But do you need a whole new piece of infrastructure?
Let's ignore the mistake of describing Redis as persistent storage. The whole reason why rate limiting data is offloaded to a dedicated service is that you want to enforce rate limiting across all instances of an API. Thus all instances update request counts on a shared data store to account for all traffic hitting across all instances regardless of how many they might be. This data store needs to be very fast to minimize micro services tax and is ephemeral. Hence why a memory cache is often used.
And why do "per-user request counts in memory" not work? Because you enforce rate-limiting to prevent brownouts and ultimately denials of service triggered in your backing services. Each request that hits your API typically triggers additional requests to internal services such as memory stores, querying engines, etc. Your external facing instances are scaled to meet external load, but they also create load to internal services. You enforce rate-limiting to prevent unexpected high request rates to generate enough load to hit bottlenecks in internal services which can't or won't scale. If you enforce rate limits per instance, scaling horizontally will inadvertently lift your rate limits as well and thus allow for brownouts, thus defeating the whole purpose of introducing rate limiting.
Also, leaky bucket algorithms are employed to allow traffic bursts but still prevent abuse. This is a very mundane scenario that happens on pretty much all services consumed by client apps. Once an app is launched, they typically do authentication flows and fetch data required in app starts and get data, etc. After app inits the app is back to baseline request rates. If you have a system that runs more than a single API instance, requests are spread over instances by a load balancer. This means a user's request can be routed to any instance at an unspecified proportion. So how do you prevent abuse while still allowing these bursts to take place? Do you scale your services to handle peak loads 24/7 to accommodate request bursts from all your active users at any given moment? Or do you allow for momentary bursts spread across all instances, regardless of what instances they hit?
Sometimes a problem can be simple. Sometimes it can be made too simple, but you accept the occasional outage. But sometimes you can't afford frequent outages and you understand a small change, like putting up a memory cache instance, is all it takes to eliminate failure modes.
And what changed in the analysis to understand that your simple solution is no solution at all? Only your understanding of the problem domain.
There is more than one way to skin a cat and using a weed whacker probably isn’t the best. It might even make sense to acquire a new tool.
The simplest thing to do is almost always the easiest, but knowing what is easiest thing to do is a lot trickier —- see Javascript frameworks.
But I think I disagree with the author’s second axiom:
“2. Simple systems are less internally-connected.”
Creating interfaces is more complex than not. Even if it leads to a cleaner design because of interface boundaries. At the least, creating those boundaries adds complexity, and I don’t mean “more effort”. I mean it in the sense that creating functions is more complex than calling “goto”. And it took decades to invent the mechanism needed to call functions —- which is probably the next most simple thing.
However, using call stacks and named pointers and memory separation (functions) leads to vastly improved simplicity of the system as the system as a while grows in complexity.
So in fact, using your own in-memory rate limiter may be a simpler implementation than using Redis, but it it also violates the second principle (using clear interfaces leads to simpler systems.)
And it turns the author’s first premise — Gunicorn is simpler than Puma. Because Puma does the equivalent of building their own rate limiter — managing its own memory and using threads instead of processes.
And Gunicorn does the equivalent of using Redis — externalizing the complexity.
What Gunicorn did was simpler to implement (because it relies on an existing isolated architecture - Unix processes and files) but means it has a greater complexity (if you take into account that it needs that whole system to work.
However that system is a brilliant set of reductions in complexity itself, but it runs up against limitations and performance at some point.
Puma takes on itself more complexity to make administering the server less complex and more performant under load. Also, because it is, in a sense, reinventing the wheel, it lacks the distillation of simplicity that is Unix.
So, less internally connected systems are easier to expand and maintain and interface boundaries lead to less complex systems as a whole, but are not, in themselves less complex.
Limitations in the system that cause performed problems (like Unix processes and function calls) are not necessarily “more simple than can possibly work” —- but the implementations of those abstractions are not perfect and could be improved.
Sometimes it’s not clear where to push the complexity, and sometimes it’s not clear what the right abstraction level is; but mostly it’s about making due with the existing architecture you have, and not having the time or resources to fix it. Until the complexity at your level reaches a point that it’s worth adding complexity at a higher level due to being unable to add the right amount of complexity at a lower level.
...and by the time I finished, the esp32c hardware was released and I didn't need it anymore.
- Simple
- Easy
- Fast
- Understandable by mere mortals
- Memory efficient
- CPU efficient
- Storage efficient
- Network efficient
- Safe
Pick up to 3
return segfault -1;
applies to the narrative
'unfuck' anything
as well. any industry and any
'behavioral lock in'
and so on.
Like alright in some situations that's the only thing that could possibly work, but shoving that complexity into every state estimator without even having a way to figure out the actual covariance of the input data is a bit much. Something that behaves in an expected way and works reliably with less accuracy beats a very complex system that occasionally breaks completely imo.
Don't add passwords, just "password" is fine. Password policies add complexity.
For services that require passwords just create a shared spreadsheet for everyone.
/s