Now with pgbouncer (or whatever other flavor of sql-aware proxy you fancy) you can greatly reduce the complexity involved in managing conventionally complex read/write routing and sharding to various replicas to enable resilient, scalable production-grade database setups on your own infra. Throw in the fact that copy-on-write and snapshotting is baked into most storage today and it becomes - at least compared to 20 years ago - trivial to set up DRS as well. Others have mentioned pgBackRest and that further enforces the ease with which you can set up these traditionally-complex setups.
Beyond those two significant features there isn't many other reasons you'd need to go with hosted/managed pgsql. I've yet to find a managed/hosted database solution that doesn't have some level of downtime to apply updates and patches so even if you go fully hosted/managed it's not a silver bullet. The cost of managed DB is also several times that of the actual hardware it's running on, so there is a cost factor involved as well.
I guess all this to say it's never been a better time to self-host your database and the learning curve is as shallow as it's ever been. Add to all of this that any garden-variety LLM can hand-hold you through the setup and management, including any issues you might encounter on the way.
For client projects, however, I always try and sell them on paying the AWS fees, simply because it shifts the responsibility of the hardware being "up" to someone else. It does not inherently solve the downtime problem, but it allows me to say, "we'll have to wait until they've sorted this out, Ikea and Disney are down, too."
Doesn't always work like that and isn't always a tried-and-true excuse, but generally lets me sleep much better at night.
With limited budgets, however, it's hard to accept the cost of RDS (and we're talking with at least one staging environment) when comparing it to a very tight 3-node Galera cluster running on Hetzner at barely a couple of bucks a month.
Or Cloudflare, titan at the front, being down again today and the past two days (intermittently) after also being down a few weeks ago and earlier this year as well. Also had SQS queues time out several times this week, they picked up again shortly, but it's not like those things ...never happen on managed environments. They happen quite a bit.
Savvy Manager: “NoNameCMS often won’t take our support calls, but if Salesforce goes down it’s in the WSJ the next day.”
A few steps further stepped back, most of the services we use are not that essential, that we cannot bear them being down a couple of hours over the course of a year. We have seen that over and over again with Cloudflare and AWS outages. The world continues to revolve. If we were a bit more reasonable with our expectations and realistic when it comes to required uptime guarantees, there wouldn't be much worry about something being down every now and then, and we wouldn't need to worry about our livelihood, if we need to reboot a customer's database server once a year, or their impression about the quality of system we built, if such a thing happens.
But even that is unlikely, if we set up things properly. I have worked in a company where we self-hosted our platform and it didn't have the most complex fail-safe setup ever. Just have good backups and make sure you can restore, and 95% of the worries go away, for such non-essential products, and outages were less often than trouble with AWS or Cloudflare.
It seems that either way, you need people who know what they are doing, whether you self-host or buy some service.
- The cheap solution would be equally good, and it's just a blame shifting game.
- The cheap solution is worse, and paying more for the name brand gets you more reliability.
There are many situations that fall into the second category, and anyone running a business probably has personal memories of making the second mistake. The problem is, if you're not up to speed on the nitty gritty technical details of a tradeoff, you can't tell the difference between the first category and the second. So you accept that sometimes you will over-spend for "no reason" as a cost of doing business. (But the reason is that information and trust don't come for free.)
It's also better for the technical people. If you self host the DB goes down at 2am on a Sunday morning all the technical people are gonna get woken up and they will be working on it until it's fixed.
If us-east goes down a technical person will be woken up, they'll check downdetector.com, and they'll say "us-east is down, nothin' we can do" and go back to sleep.
But perhaps I’m bitter from prior Salesforce experiences.
From my experience your client’s clients don’t care about this when they’re still otherwise up.
Don't underestimate the power of CYA
The other factor is eliminating the “one guy who knows X” problem in IT. What happens if that person leaves or you have to let them go? But with managed infrastructure there’s a pool of people who know how to write terraform or click buttons and manage it and those are more interchangeable than someone’s DIY deployment. Worst case the cloud provider might sell you premium support and help. Might be expensive but you’re not down.
Lastly, there’s been an exodus of talent from IT. The problem is that anyone really good can become a coder and make more. So finding IT people at a reasonable cost who know how to really troubleshoot and root cause stuff and engineer good systems is very hard. The good ones command more of a programmer salary which makes the gap with cloud costs much smaller. Might as well just go managed cloud.
The way you present it makes sense of course. But I have to wonder whether there really are such clear demarcation lines between responsibilities. At least over the course of my career this was very rarely the case.
Ironically, this becomes more of a concern the larger the supplier. AWS can live with firing any one of their customers - a smaller outfit probably couldn't.
It can certainly fit into a particular cloud platform’s offerings. But it’s by no means exclusive to the cloud.
My entire stack can be picked up and redeployed anywhere where I can run Ubuntu or Debian. My “most external” dependencies are domain name registries and an S3-API compatible object store, and even that one is technically optional, if given a few days of lead time.
I have never, ever, ever had a SQL box go down. I've had a web server go down once. I had someone who probably shouldn't have had access to a server accidentally turn one off once.
The only major outage I've had (2/3 hours) was when the box was also self-hosting an email server and I accidentally caused it to flood itself with failed delivery notices with a deploy.
I may have cried a little in frustration and panic but it got fixed in the end.
I actually find using cloud hosted SQL in some ways harder and more complicated because it's such a confusing mess of cost and what you're actually getting. The only big complication is setting up backups, and that's a one-off task.
Off site backups or replication would help, though not always trivial to fail over.
Replication and backups really aren’t that difficult to setup properly with something like Postgres. You can also expose metrics around this to setup alerting if replication lag goes beyond a threshold you set or a backup didn’t complete. You do need to periodically test your backups but that is also good practice.
I am not saying something like RDS doesn’t have value but you are paying a huge premium for it. Once you get to more steady state owning your database totally makes sense. A cluster of $10-20 VPSes with NVMe drives can get really good performance and will take you a lot farther than you might expect.
For read only applications, it's possible to host datasette-lite and the SQLite database as static files on a redundant CDN. Datasette-lite + URL redirect API + litestream would probably work well, maybe with read-write; though also electric-sql has a sync engine (with optional partial replication) too, and there's PGlite (Postgres in WebAssembly)
Hardware doesn't fail that often. A single server will easily run many years without any issues, if you are not unlucky. And many smaller setups can tolerate the downtime to rent a new server or VM and restore from backup.
The log disk was full or something. That's not the shameful part though. What followed is a mass email saying everyone needs to update their connection string from bla bla bla 1 dot foo dot bar to bla bla bla 2 dot foo dot bar
This was inexcusable to me. I mean this is an Internet service provider. If we can't even figure out DNS, we should shut down the whole business and go home.
> Off site backups or replication would help, though not always trivial to fail over.
You want those regardless of where you host
Deploys these days take minutes so what's the problem if a disk does go bad? You lose at most a day of data if you go with the 'standard' overnight backups, and if it's mission critical, you will have already set up replicas, which again is pretty trivial and only slightly more complicated than doing it on cloud hosts.
Even on PostgreSQL 18 I wouldn't describe self hosted replication as "pretty trivial". On RDS you can get an HA replica (or cluster) by clicking a radio box.
Hardware also monitors itself reasonably well because the hosting providers use it.
It’s trivial to run a mirrored containers on two separate proxmox nodes because hosting providers use the same kind of stuff.
Offsite backups and replication? Also point and click and trivial with tools like Proxmox.
RAID is actually trivial to setup.l if you don’t compare it to doing it manually yourself from the command line. Again, tools like Proxmox make it point and click and 5 minutes of watching from YouTube.
If you want to find a solution our brain will find it. If we don’t we can find reasons not to.
Even if you do ZFS makes this pretty trivial as well.
Skill issue?
It's not 2003, modern volume-managing filesystems (eg:ZFS) make creating and managing RAID trivial.
Obviously it depends on the operational overhead of specific technology.
It is. You need to answer the question: what are the consecuences of your service being down for lets say 4 hours or some security patch isn't properly applied or you have not followed the best practices in terms of security? Many people are technically unable, lack the time or the resources to be able to confidently address that question, hence paying for someone else to do it.
Your time is money though. You are saving money but giving up time.
Like everything, it is always cheaper to do it (it being cooking at home, cleaning your home, fixing your own car, etc) yourself (if you don't include the cost of your own time doing the service you normally pay someone else for).
> It is. You need to answer the question: what are the consecuences of your service being down for lets say 4 hours or some security patch isn't properly applied or you have not followed the best practices in terms of security?
There is one advantage self hosted setup has here, if you set up VPN, only your employees have access, and you can have server not accessible from the internet. So even in case of zero day that WILL make SaaS company leak your data, you can be safe(r) with self-hosted solution.
> Your time is money though. You are saving money but giving up time.
The investment compounds. Setting up infra to run a single container for some app takes time and there is good chance it won't pay back for itself.
But 2nd service ? Cheaper. 5th ? At that point you probably had it automated enough that it's just pointing it at docker container and tweaking few settings.
> Like everything, it is always cheaper to do it (it being cooking at home, cleaning your home, fixing your own car, etc) yourself (if you don't include the cost of your own time doing the service you normally pay someone else for).
It's cheaper if you include your own time. You pay a technical person at your company to do it. Saas company does that, then pays sales and PR person to sell it, then pays income tax to it, then it also needs to "pay" investors.
Yeah making a service for 4 people in company can be more work than just paying $10/mo to SaaS company. But 20 ? 50 ? 100 ? It quickly gets to point where self hosting (whether actually "self" or by using dedicated servers, or by using cloud) actually pays off
In a business context the "time is money" thing actually makes sense, because there's a reasonable likelihood that the business can put the time to a more profitable use in some other way. But in a personal context it makes no sense at all. Realistically, the time I spend cooking or cleaning was not going to earn me a dime no matter what else I did, therefore the opportunity cost is zero. And this is true for almost everyone out there.
Could you imagine accountants arguing that you shouldn't use a service like Paychex or Gusto and just run payroll manually? After all it's cheaper! Just spend a week tracking taxes, benefits and signing checks.
Self-hosting, to me, doesn't make sense unless you are 1.) doing something not offered by the cloud or a pathological use case 2.) or running a hobby project or 3.) you are in maintaince mode on the product. Otherwise your time is better spent on your core product - and if it isn't, you probably aren't busy enough. If the cost of your RDS cluster is so expensive relative to your traffic, you probably aren't charging enough or your business economics really don't make sense.
I've managed large database clusters (MySQL, Cassandra) on bare metal hardware in managed colo in the past. I'm well aware of the performance thats being left on the table and what the cost difference is. For the vast majority of businesses, optimizing for self hosting doesn't make sense, especially if you don't have PMF. For a company like 37signals, sure, product velocity probably is very high, and you have engineering cycles to spare. But if you aren't profitable, self hosting won't make you profitable, and your time is better spent elsewhere.
Postgres's operations is part of the core of the business. It's not a payroll management service where you should comparison shop once the contract comes up for renewal and haggle on price. Once Postgres is the database for your core systems of record, you are not switching away from it. The closest analog is how difficult it is/was for anybody who built a business on top of an Oracle database, to switch away from Oracle. But Postgres is free ^_^
The question at heart here is whether the host for Postgres is context or core. There are a lot of vendors for Postgres hosting: AWS RDS and CrunchyData and PlanetScale etc. And if you make a conscious choice to outsource this bit of context, you should be signing yearly-ish contracts with support agreements and re-evaluating every year and haggling on price. If your business works on top of a small database with not-intense access needs, and can handle downtime or maintenance windows sometimes, there's a really good argument for treating it that way.
But there's also an argument that your Postgres host is core to your business as well, because if your Postgres host screws up, your customers feel it, and it can affect your bottom line. If your Postgres host didn't react in time to your quick need for scaling, or tuning Postgres settings (that a Postgres host refuses to expose) could make a material impact on either customer experience or financial bottom-line, that is indeed core to your business. That simply isn't a factor when picking a payroll processor.
Control and risk management cost money, be that by self hosting or contracts. At some point it is cheaper to buy the competence and make it part of the company rather than outsource it.
You see the issue?
Like, I’m all for not procuring things that it makes more sense to own/build (and I know most businesses have piss-poor instincts on which is which—hell, I work for the government! I can see firsthand the consequences of outsourcing decision making to contractors, rather than just outsourcing implementation).
But it’s very case-by-case. There’s no general rule like “always prefer self hosting” or “always rent real estate, never buy” that applies broadly enough to be useful.
> But it’s very case-by-case. There’s no general rule like “always prefer self hosting” or “always rent real estate, never buy” that applies broadly enough to be useful.
I don't know if anyone remembers that irritating "geek code" thing we were doing a while back, but coming up with some kind of shorthand for whatever context we're talking about would be useful.
> “ geek code" thing we were doing a while back
Not sure what you’re referring to. “Shibboleet”, perhaps? https://xkcd.com/806/
There is no reason to self manage pg for dev / environnement.
"vastly superior to self hosting regarding observability" I'd suggest looking into the cnpg operator for Postgres on Kubernetes. The builtin metrics and official dashboard is vastly superior to what I get from Cloudwatch for my RDS clusters. And the backup mechanism using Barman for database snapshots and WAL backups is vastly superior to AWS DMS or AWS's disk snapshots which aren't portable to a system outside of AWS if you care about avoiding vendor lock-in.
https://aws.amazon.com/blogs/database/introducing-scaling-to...
It only scales down after a period of inactivity though - it’s not pay-per-request like other serverless offerings. DSQL looks to be more cost effective for small projects if you can deal with the deviations from Postgres.
Right now from Hetzner you can get a dedicated server with 6c/12t Ryzen2 3600, 64GB RAM and 2x512GB Nvme SSD for €37/mo
Even if you just served files from disc, no RAM, that could give 200k small files per second.
From RAM, and with 6 dedicated cores, network will saturate long before you hit compute limits on any reasonably efficient web framework.
[1]: https://matrix.org/blog/2025/07/postgres-corruption-postmort...
> If you're just starting out in software & want to get something working quickly with vibe coding, it's easier to treat Postgres as just another remote API that you can call from your single deployed app
> If you're a really big company and are reaching the scale where you need trained database engineers to just work on your stack, you might get economies of scale by just outsourcing that work to a cloud company that has guaranteed talent in that area. The second full freight salaries come into play, outsourcing looks a bit cheaper.
This is funny. I'd argue the exact opposite. I would self host only:
* if I were on a tight budget and trading an hour or two of my time for a cost saving of a hundred dollars or so is a good deal; or
* at a company that has reached the scale where employing engineers to manage self-hosted databases is more cost effective than outsourcing.
I have nothing against self-hosting PostgreSQL. Do whatever you prefer. But to me outsourcing this to cloud providers seems entirely reasonable for small and medium-sized businesses. According to the author's article, self hosting costs you between 30 and 120 minutes per month (after setup, and if you already know what to do). It's easy to do the math...
Every company out there is using the cloud and yet still employs infrastructure engineers to deal with its complexity. The "cloud" reducing staff costs is and was always a lie.
PaaS platforms (Heroku, Render, Railway) can legitimately be operated by your average dev and not have to hire a dedicated person; those cost even more though.
Another limitation of both the cloud and PaaS is that they are only responsible for the infrastructure/services you use; they will not touch your application at all. Can your application automatically recover from a slow/intermittent network, a DB failover (that you can't even test because your cloud providers' failover and failure modes are a black box), and so on? Otherwise you're waking up at 3am no matter what.
Every company beyond a particular size surely? For many small and medium sized companies hiring an infrastructure team makes just as little sense as hiring kitchen staff to make lunch.
At my last two places it very quickly got to the point where the technical complexity of deployments, managing environments, dealing with large piles of data, etc. meant that we needed to hire someone to deal with it all.
They actually preferred managing VMs and self hosting in many cases (we kept the cloud web hosting for features like deploy previews, but that’s about it) to dealing with proprietary cloud tooling and APIs. Saved a ton of money, too.
On the other hand, the place before that was simple enough to build and deploy using cloud solutions without hiring someone dedicated (up to at least some pretty substantial scale that we didn’t hit).
For medium sized companies you need "devops engineers". And in all honesty, more than you'd need sysadmins for the same deployment.
For large companies, they split up AWS responsibilities into entire departments of teams (for example, all clouds have math auth so damn difficult most large companies have -not 1- but multiple departments just dealing with authorization, before you so much as start your first app)
Local reproducibility is easier, and performance is often much better
As I pointed out above, you may be better served mixing and matching so you spend your time on the critical aspects but offload those other tasks to someone else.
Of course, I’m not sitting at your computer so I can’t tell you what’s right for you.
Task runner/que at least for us postgres works for both cases.
We also self host an s3 storage and allow useruploaded content in within strict borders.
Fact is a lot of these companies are on the cloud because their internal IT was a total fail.
An LLM could set this up for you, it's dead simple.
Of course, my comment wasn't aimed at those who successfully keep their cloud bill in the low 3-figures, but the majority of companies with a 5-figure bill and multiple "infrastructure" people on payroll futzing around with YAML files. Even half the achieved savings should be enough incentive for those guys to learn something new.
But initial setup is maybe 10% of the story. The day 2 operations of monitoring, backups, scaling, and failover still needs to happen, and it still requires expertise.
If you bring that expertise in house, it costs much more than 10x ($3/day -> $30/day = $10,950/year).
If you get the expertise from experts who are juggling you along with a lot of other clients, you get something like PlanetScale or CrunchyData, which are also significantly more expensive.
This doesn’t make sense as an argument. The reason the cloud is more complex is because that complexity is available. Under a certain size, a large number of cloud products simply can’t be managed in-house (and certainly not altogether).
Also your argument is incorrect in my experience.
At a smaller business I worked at, I was able to use these services to achieve uptime and performance that I couldn’t achieve self-hosted, because I had to spend time on the product itself. So yeah, we’d saved on infrastructure engineers.
At larger scales, what your false dichotomy suggests also doesn’t actually happen. Where I work now, our data stores are all self-managed on top of EC2/Azure, where performance and reliability are critical. But we don’t self-host everything. For example, we use SES to send our emails and we use RDS for our app DB, because their performance profiles and uptime guarantees are more than acceptable for the price we pay. That frees up our platform engineers to spend their energy on keeping our uptime on our critical services.
How sure are you about that one? All of my hetzner vm`s reach an uptime if 99.9% something.
I could see more then one small business stack fitting onto a single of those vm`s.
We have a very limited set of services, but most have been very painless to maintain.
- certificates not being renewed in time
- Celery eating up all RAM and having to be recycled
- RabbitMQ getting blocked requiring a forced restart
- random issues with Postgres that usually required a hard restart of PG (running low on RAM maybe?)
- configs having issues
- running out of inodes
- DNS not updating when upgrading to a new server (no CDN at the time)
- data centre going down, taking the provider’s email support with it (yes, really)
Bear in mind I’m going back a decade now, my memory is rusty. Each issue was solvable but each would happen at random and even mitigating them was time that I (a single dev) was not spending on new features or fixing bugs.
Configs having issues is like number 1 reason i like the setup so much..
I can configure everything on my local machine and test here, and then just deploy it to a server the same way.
I do not have to build a local setup, and then a remote one
But yes, the cert is created differently in prod and there are a few other differences.
But it's much closer then in the cloud.
I mean just because you have backups does not mean you can restore them ;-)
We do test backup restoration automatically and also on a quarterly basis manually, but so you should do with AWS.
Otherwise how do you know you can restore system a without impact other dependency, d and c
Just be careful not to accept more complexity just because it is available, which is what the AWS evangelists often try to sell. After all, we should always make an informed decision when adding a new dependency, whether in code or in infrastructure.
In my experience you typically need less people if using a Cloud Provider than in-house (or the same number of people can handle more instances) due to increased leverage. Whether you can maximize what you get via leverage depends on how good your team is.
US companies typically like to minimize headcount (either through accounting tricks or outsourcing) so usually using a Cloud Provider wins out for this reason alone. It's not how much money you spend, it's how it looks on the balance sheet ;)
> The "cloud" reducing staff costs
Both can be true at the same time.
Also:
> Otherwise you're waking up at 3am no matter what.
Do you account for frequency and variety of wakeups here?
Yes. In my career I've dealt with way more failures due to unnecessary distributed systems (that could have been one big bare-metal box) rather than hardware failures.
You can never eliminate wake-ups, but I find bare-metal systems to have much less moving parts means you eliminate a whole bunch of failure scenarios so you're only left with actual hardware failure (and HW is pretty reliable nowadays).
There was, I have to admit, a log message that explained the problem... once I could find the specific log message and understand the 45 steps in the chain that got to that spot.
If you have the luxury of spending half a million per year on infrastructure engineers then you can of course do better, but this is by no means universal or cost-effective.
Whether or not you need that equivalence is an orthogonal question.
There's probably a sweet spot where that is true, but because cloud providers offer more complexity (self-inflicted problems) and use PR to encourage you to use them ("best practices" and so on) in all the cloud-hosted shops I've been in a decade of experience I've always seen multiple full-time infra people being busy with... something?
There was always something to do, whether to keep up with cloud provider changes/deprecations, implementing the latest "best practice", debugging distributed systems failures or self-inflicted problems and so on. I'm sure career/resume polishing incentives are at play here too - the employee wants the system to require their input otherwise their job is no longer needed.
Maybe in a perfect world you can indeed use cloud-hosted services to reduce/eliminate dedicated staff, but in practice I've never seen anything but solo founders actually achieve that.
I love self-hosting stuff and even have a bias towards it, but the cost/time tradeoff is more complex than most people think.
Most projects I have worked on in my career have never seen more than a hundred concurrent users. If something goes down on Saturday, I am going to fix it on Monday.
I have worked on internal tools were I just added a postgres DB to the docker setup and that was it. 5 Minute of work and no issues at all. Sure if you have something customer facing, you need to do a bit more and setup a good backup strategy but that really isn't magic.
This is the crux of one of the most common fallacies in software engineering decision making today. I've participated in a bunch of architecture / vendor evaluations that concluded managed services are more cost effective almost purely because they underestimated (or even discarded entirely) the internal engineering cost of vendor management. Black box debugging is one of the most time costuming engineering pursuits, & even when it's something widely documented & well supported like RDS, it's only really tuned for the lowest common denominator - the complexities of tuning someone else's system at scale can really add up to only marginally less effort than self-hosting (if there's any difference at all).
But most importantly - even if it's significantly less effort than self-hosting, it's never effectively costed when evaluating trade-offs - that's what leads to this persistent myth about the engineering cost of self-hosting. "Managing" managed services is a non-zero cost.
Add to that the ultimate trade-off of accountability vs availability (internal engineers care less about availability when it's out of there hands - but it's still a loss to your product either way).
I'm really not sure what you're talking about here. I manage many RDS clusters at work. I think in total, we've spent maybe eight hours over the last three years "tuning" the system. It runs at about 100kqps during peak load. Could it be cheaper or faster? Probably, but it's a small fraction of our total infra spend and it's not keeping me up at night.
Virtually all the effort we've ever put in here has been making the application query the appropriate indexes. But you'd do no matter how you host your database.
Hell, even the metrics that RDS gives you for free make the thing pay for itself, IMO. The thought of setting up grafana to monitor a new database makes me sweat.
CloudNative PG actually gives you really nice dashboards out-of-the-box for free. see: https://github.com/cloudnative-pg/grafana-dashboards
Can we honestly say that cloud services taking a half hour to two hours a month of someone's time on average is completely unheard of?
It's definitely expensive, but it's not time-consuming.
except now they are stuck trying to maintain and debug Postgres without having the same visibility and agency that they would if they hosted it themselves. situation isn't at all clear.
This leads the developers to do all kinds of workarounds and reach for more cloud services (and then integrating them and - often poorly - ensuring consistency across them) because the cloud hosted DB is not able to handle the load.
On bare-metal, you can go a very long way with just throwing everything at Postgres and calling it a day.
You can take it even further in some context if you use sqlite.
I think one of the craziest ideas of the cloud decade was to move storage away from compute. It's even worse with things like AWS lambda or vercel.
Now vercel et al are charging you extra to have your data next to your compute. We're basically back to VMs at 100-1000x the cost.
Running on IaaS also gives you more scalability knobs to tweak: SSD Iops and b/w, multiple drives for logs/partitions, memory optimized VMs, and there's a lot of low level settings that aren't accessible in managed SQL. Licensing costs are also horrible with managed SQL Server, where it seems like you pay the Enterprise level, but running it yourself offers lower cost editions like Standard or Web.
I use Google Cloud SQL for PostgreSQL and it's been rock solid. No issues; troubleshooting works fine; all extensions we need already installed; can adjust settings where needed.
in the limit I dont think we should need DBAs, but as long as we need to manage indices by hand, think more than 10 seconds about the hot queries, manage replication, tune the vacuumer, track updates, and all the other rot - then actually installing PG on a node of your choice is really the smallest of problems you face.
pacman -S postgresql
initdb -D /pathto/pgroot/data
grok/claude/gpt: "Write a concise Bash script for setting up an automated daily PostgreSQL database backup using pg_dump and cron on a Linux server, with error handling via logging and 7-day retention by deleting older backups."
ctrl+c / ctrl+v
Yeah that definitely took me an hour or two.
> datacenter goes up in flames
> 3-2-1 backups: 3 copies on 2 different types of media with at least 1 copy off-site. No off-site copy.
Whoops!
1. Access to any extension you want and importantly ability to create your own extensions.
2. Being able to run any version you want, including being able to adopt patches ahead of releases.
3. Ability to tune for maximum performance based on the kind of workload you have. If it's massively parallel you can fill the box with huge amounts of memory and screaming fast SSDs, if it's very compute heavy you can spec the box with really tall cores etc.
Self hosting is rarely about cost, it's usually about control for me. Being able to replace complex application logic/types with a nice custom pgrx extension can save massive amounts of time. Similarity using a custom index access method can unlock a step change in performance unachievable without some non-PG solution that would compromise on simplicity by forcing a second data store.
-Backups: the provider will push a full generic disaster-recovery backup of my database to an off-provider location at least daily, without the need for a maintenance window
-Optimization: index maintenance and storage optimization are performed automatically and transparently
-Multi-datacenter failover: my database will remain available even if part(s) of my provider are down, with a minimal data loss window (like, 30 seconds, 5 minutes, 15 minutes, depending on SLA and thus plan expenditure)
-Point-in-time backups are performed at an SLA-defined granularity and with a similar retention window, allowing me to access snapshots via a custom DSN, not affecting production access or performance in any way
-Slow-query analysis: notifying me of relevant performance bottlenecks before they bring down production
-Storage analysis: my plan allows for #GB of fast storage, #TB of slow storage: let me know when I'm forecast to run out of either in the next 3 billing cycles or so
Because, well, if anyone provides all of that for a monthly fee, the whole "self-hosting" argument goes out of the window quickly, right? And I say that as someone who absolutely adores self-hosting...
Corollary: rental/SaaS models provide that property in large part because their providers have lots of slack.
And especially having worked in startups, I was expected to do many different things, from fixing infrastructure code one day to writing frontend code the next. If you're in a bigger company, maybe it's understandable to be specialized, but especially if you're at a company with only a few people, you must be willing to do the job, whatever it is.
Yes, I'd say backups and analysis are table stakes for hiring it, and multi-datacenter failover is a relevant nice to have. But the reason to do it yourself is because it's literally impossible to get anything as good as you can build in somebody's else computer.
The default I've used on Amazon and GCP both do (RDS, Cloud SQL)
In reality, most database issues are slow queries or connection pool exhaustion - things that happen during business hours when you're actively developing. The actual database process itself just runs. I've had more AWS outages wake me up than Postgres crashes.
The cost savings are real, but the bigger win for me is having complete visibility. When something does go wrong, I can SSH in and see exactly what's happening. With RDS you're often stuck waiting for support while your users are affected.
That said, you do need solid backups and monitoring from day one. pgBackRest and pgBouncer are your friends.
I would expect a little bit more as a cost of the convenience, but in my experience it's generally multiple times the expense. It's wild.
This has kept me away from managed databases in all but my largest projects.
If anything that’s a feature for ease of use and compatibility.
I know there are other issues with Kubernetes but at least its transferable knowledge.
>> "God Send". Everything just worked. Replication was as reliable as one could imagine. It outlives several hardware incidents without manual intervention. It allowed cluster maintenance (software and hardware upgrades) without application downtime. I really dream PostgreSQL will be as reliable as MongoDB without need of external services.
https://www.postgresql.org/message-id/0e01fb4d-f8ea-4ca9-8c9...
Sure, the PostrgreSQL HA story isn't what we all want it to be, but the reliability is exceptional.
Database engineering is very hard. MongoDB has had both poor defaults as well as bugs in the past. It will certainly have durability bugs in the future, just like Postgres and all other serious databases. I'm not sure that Postgres' durability stacks up especially well with modern MongoDB.
[1] https://jepsen.io/analyses/postgresql-12.3
[2] https://archive.fosdem.org/2019/schedule/event/postgresql_fs...
Yet they still call it HA because there's nothing else. Even a planned shutdown of the primary to patch the OS results in downtime, as all connections are terminated. The situation is even worse for major database upgrades: stop the application, upgrade the database, deploy a new release of the app because some features are not compatible between versions, test, re-analyze the tables, reopen the database, and only then can users resume work.
Everything in SQL/RDBMS was thought for a single-node instance, not including replicas. It's not HA because there can be only one read-write instance at a time. They even claim to be more ACID than MongoDB, but the ACID properties are guaranteed only on a single node.
One exception is Oracle RAC, but PostgreSQL has nothing like that. Some forks, like YugabyteDB, provide real HA with most PostgreSQL features.
About the hype: many applications that run on PostgreSQL accept hours of downtime, planned or unplanned. Those who run larger, more critical applications on PostgreSQL are big companies with many expert DBAs who can handle the complexity of database automation. And use logical replication for upgrades. But no solution offers both low operational complexity and high availability that can be comparable to MongoDB
CloudNativePG (https://cloudnative-pg.io) is a great option if you’re using Kubernetes.
There’s also pg_auto_failover which is a Postgres extension and a bit less complex than the alternatives, but it has its drawbacks.
OTOH, Oracle takes most of my time with endless issues, bugs, unexpected feature modifications, even on OCI!
FYI - it's already supported by cloudnativepg [1]
I was playing with this operator recently and I'm truly impressed - it's a piece of art when it comes to postgres automation; alongside with barman [2] it does everything I need and more
[1] https://cloudnative-pg.io/docs/1.28/connection_pooling [2] https://cloudnative-pg.io/plugin-barman-cloud/
My theory of why Postgres is still getting the hype is either people don't know the problem, or it's acceptable on some level. I've worked in a team that maintains the in house database cluster (even though we were using MySQL instead of PostgreSQL) and the HA story was pretty bad. But there were engineers manually recover the data lost and resolve data conflicts, either from the recovery of incident or from customer tickets. So I guess that's one way of doing business.
(I do use Maria at home for legacy reasons, and have used MySQL and Pg professionally for years.)
Can you give any details on that?
I switched to MariaDB back in the day for my personal projects because (so far as I could tell) it was being updated more regularly, and it was more fully open source. (I don't recall offhand at this point whether MySQL switched to a fully paid model, or just less-open.)
It’s not like Mongo gives you those properties for free either. Replication/clustering related data loss is still incredibly common precisely because mongo makes it seem like all that stuff is handled automatically at setup when in reality it requires plenty of manual tuning or extra software in order to provide the guarantees everyone thinks it does.
Until then it is nice to have options, even if they do require extra steps.
Patroni has been around for awhile. The database-as-a-service team where I work uses it under the hood. I used it to build database-as-a-service functionality on the infra platform team I was at prior to that.
It's basially push-button production PG.
There's at least one decent operator framework leveraging it, if that's your jam. I've been living and dying by self-hosting everything with k8s operators for about 6-7 years now.
Currently scratching my head on what the appropriate upgrade procedure is for a non-k8s/operator spilo/patroni cluster for minimal downtime and risk. The script doesn't seem to work for this setup, erroring on mismatching PG_VERSION when attempting. If you don't mind sharing it would be very appreciated.
In case you want to self host but also have something that takes care of all that extra work for you
[1] https://docs.cloud.google.com/compute/docs/disks/hd-types/hy... [2] https://docs.cloud.google.com/compute/docs/disks/create-snap...
What went so wrong during the past 25 years?
There is a whole raft of reasons why you might be a candidate for self-hosting, and a whole raft of reasons why not. This article is deeply reductive, and so are many of the comments.
I’ve given a lot of engineers tasks only to find they are “setting up kubernetes cluster so I can setup automated deployments with a dashboard for …”
And similarly in QA I rarely see a cost/benefit consideration for a particular test or automation. Instead it’s we are going to fully automate this and analyze every possible variable.
Just found that assumption a bit dangerous. The ease with which you can set that up is easy on RDS but it's not on by default.
https://github.com/vitabaks/autobase
automates the deployment and management of highly available PostgreSQL clusters in production environments. This solution is tailored for use on dedicated physical servers, virtual machines, and within both on-premises and cloud-based infrastructures.
And I really recommend starting with *default* k3s, do not look at any alternatives to cni, csi, networked storage - treat your first cluster as something that can spontaniously fail and don't bother keeping it clean learn as much as you can.
Once you have that, you can use great open-source k8s native controllers which take care of vast majority of requirements when it comes to self-hosting and save more time in the long run than it took to set up and learn these things.
Honerable mentions: k9s, lens(I do not suggest using it in the long-term, but UI is really good as a starting point), rancher webui.
PostgreSQL specifically: https://github.com/cloudnative-pg/cloudnative-pg If you really want networked storage: https://github.com/longhorn/longhorn
I do not recommend ceph unless you are okay with not using shared filesystems as they have a bunch of gotchas or if you want S3 without having to install a dedicated deployment for it.
As someone who has operated Postgres clusters for over a decade before k8s was even a thing, I fully recommend just using a Postgres operator like this one and moving on. The out of box config is sane, it’s easy to override things, and failover/etc has been working flawlessly for years. It’s just the right line between total DIY and the simplicity of having a hosted solution. Postgres is solved, next problem.
With Docker Compose, the abstraction level you're dealing with is containers, which means in this case you're saying "run the postgres image and mount the given config and the given data directory". When running the service, you need to know how to operate the software within the container.
Kubernetes at its heart is an extensible API Server, which allows so called "operators" to create custom resources and react to them. In the given case, this means that a postgres operator defines for example a PostgresDatabaseCluster resource, and then contains control loops to turn these resources into actual running containers. That way, you don't necessarily need to know how postgres is configured and that it requires a data directory mount. Instead, you create a resource that says "give me a postgres 15 database with two instances for HA fail-over", and the operator then goes to work and manages the underlying containers and volumes.
Essentially operators in kubernetes allow you to manage these services at a much higher level.
I will say that even though the StatefulSet pins the pod to a node, it still has advantages. The StatefulSet can be scaled to N nodes, and if one goes down, failover is automatic. Then you have a choice as an admin to either recover the node, or just delete the pod and let the operator recreate it on some other node. When it gets recreated, it resyncs from the new primary and becomes a replica and you’re back to full health, it’s all pretty easy IMO.
I do this for multiple reasons, one is that I find it easier to use Kubernetes as the backend for Patroni, rather than running/securing/maintaining just another etcd cluster. But I also do it for observability, it's much nicer to be able to pull all the metrics and logs from all the components. Sure, it's possible to set that up without Kubernetes, but why if I can have the logs delivered just one way. Plus, I prefer how self-documenting the whole thing is. No one likes YAML manifests, but they are essentially running documentation that can't get out of sync.
Docker compose has always been great for running some containers on a local machine, but I’ve never found it to be great for deployments with lots of physical nodes. k8s is certainly complex, but the complexity really pays off for larger deployments IMO.
For start, ignore operators, ignore custom CSI/CNI, ignore IAM/RBAC. Once you feel good in the basics, you can expand.
k3s provides: csi, cni (cluster storage interface, cluster network interface) which is flannel and and local-pv which just maps volumes to disk (pvcs)
trafeik is what routes your traffic from the outside to inside your cluster (to an ingress resource)
Later I ran a v2 of that service on k8s. The architecture also changed a lot, hosting many smaller servers sharing the same psql server(Not really microservice-related, think more "collective of smaller services ran by different people"). I have hit some issues relating to maxing out the max connections, but that's about it.
This is something I do on my free time so SLA isn't an issue, meaning I've had the ability to learn the ropes of running PSQL without many bad consequences. I'm really happy I have had this opportunity.
My conclusion is that running PSQL is totally fine if you just set up proper monitoring. If you are an engineer that works with infrastructure, even just because nobody else can/wants to, hosting PSQL is probably fine for you. Just RTFM.
I generally read the parts I think I need, based on what I read elsewhere like Stackoverflow and blog posts. Usually the real docs are better than some random person's SO comment. I feel that's sufficient?
I did this for just under two years, and I've lost count of how many times one or more of the nodes went down and I had to manually deregister it from the cluster with repmgr, clone a new vm and promote a healthy node to primary. I ended up writing an internal wiki page with the steps. I never got it: if one of the purposes of clusters is having higher availability, why did repmgr not handle zombie primaries?
Again, I'm probably just an idiot out of my depth with this. And I probably didn't need a cluster anyway, although with the nodes failing like they did, I didn't feel comfortable moving to a single node setup as well.
I eventually switched to managed postgres, and it's amazing being able to file a sev1 for someone else to handle when things go down, instead of the responsibility being on me.
Hardware failures and automated fail overs. That's a thing AWS and other managed hosting solutions do. Hardware will eventually fail of course. In AWS this would be a non event. It will fail over, a replacement spins up, etc. Same with upgrades, and other stuff.
Configuration complexity. The author casually outlines a lot of fairly complex design involving all sorts of configuration tweaks, load balancing, etc. That implies skills most teams don't have. I know enough to know that I have quite a bit of reading up to do if I ever were to decide to self host postgresql. Many people would make bad assumptions about things being fine out of the box because they are not experienced postgresql DBAs.
Vacations/holidays/sick days. Databases may go down when it's not convenient to you. To mitigate that, you need to have several colleagues that are equally qualified to fix things when they go down while you are away from keyboard. If you haven't covered that risk, you are taking a bit of risk. In a normal company, at least 3-4 people would be a good minimum. If you are just measuring your own time, you are not being honest or not being as diligent as you should be. Either it's a risk you are covering at a cost or a risk you are ignoring.
With managed hosting, covering all of that is what you pay for. You are right that there are still failure modes beyond that that need covering. But an honest assessment of the time you, and your team, put in for this adds up really quickly.
Whatever the reasons you are self hosting, cost is probably a poor one.
Here are my gripes:
1. Backups are super-important. Losing production data just is not an option. Postgres offers pgdump which is not appropriate tool, so you should set up WAL archiving or something like that. This is complicated to do right.
2. Horizontal scalability with read replicas is hard to implement.
3. Tuning various postgres parameters is not a trivial task.
4. Upgrading major version is complicated.
5. You probably need to use something like pgbouncer.
6. Database usually is the most important piece of infrastructure. So it's especially painful when it fails.
I guess it's not that hard when you did it once and have all scripts and memory to look back. But otherwise it's hard. Clicking few buttons in hoster panel is much easier.
you don't need horizontal scalability when a single server can have 384 cpu real cores, 6TB of ram, some petabytes of pcie5 ssd, 100Gbps NIC.
for tuning postgres parameters, you can start by using pgtune.leopard.in.ua or pgconfig.org.
upgrading major version is piss easy since postgres 10 or so. just a single command.
you do not need pgbouncer if your database adapter library already provide the database pool functionality (most of them do).
for me maintained database also need that same amount of effort, due to shitty documents and garbage user interfaces (all aws, gcp or azure is the same), not to mention they change all the time.
That said a self hosted DB on a dedicated Hetzner flies. It does things at the price that may save you time reworking your app to be more efficient on AWS for cost.
So swings and roundabouts.
so we need open source way to do that, coolify/dokploy comes to mind and it exactly do that way
I would say 80% of your point wouldnt be hit at certain scale, as most application grows and therefore outgrow your tech stack. you would replace them anyway at some point
Overall, a good experience. Very stable service and when performance issues did periodically arise, I like that we had full access to all details to understand the root cause and tune details.
Nobody was employeed as a full-time DBA. We had plenty of other things going on in addition to running PostgreSQL.
Standard Postgres compiled with some AWS-specific monitoring hooks
A custom backup system using EBS snapshots
Automated configuration management via Chef/Puppet/Ansible
Load balancers and connection pooling (PgBouncer)
Monitoring integration with CloudWatch
Automated failover scripting
I didn't know RDS had PgBouncer under the hood, is this really accurate?The problem i find with RDS (and most other managed Postgres) is that they limit your options for how you want to design your database architecture. For instance, if write consistency is important to you want to support synchronous replication, there is no way to do this in RDS without either Aurora or having the readers in another AZ. The other issue is that you only have access to logical replication, because you don't have access to your WAL archive, so it makes moving off RDS much more difficult.
I don't think it does. AWS has this feature under RDS Proxy, but it's an extra service and comes with extra cost (and a bit cumbersome to use in my opinion, it should have been designed as a checkbox, not an entire separate thing to maintain).
Although, it technically has "load balancer", in form of a DNS entry that resolves to a random reader replica, if I recall correctly.
Is this actually the "common" view (in this context)?
I've got decades with databases so I cannot even begin to fathom where such an attitude would develop, but, is it?
Boggling.
You mean you need to SSH into the box? Horrifying!
I have a cron sh script to backup to S3 (used to be ftp).
It's not "business grade" but it has also actually NEVER failed. Well once, but I think it was more the container or a swarm thing. I just destroyed and recreated it and it picked up the same volume fine.
The biggest pain point is upgrading as Postgresql can't upgrade the data without the previous version installed or something. It's VERY annoying.
Of all the places I've worked that had the attitude "If this goes down at 3AM, we need to fix it immediately", there was only one where that was actually justifiable from a business perspective. I'm worked at plenty of places that had this attitude despite the fact that overnight traffic was minimal and nothing bad actually happened if a few clients had to wait until business hours for a fix.
I wonder if some of the preference for big-name cloud infrastructure comes from the fact that during an outage, employees can just say "AWS (or whatever) is having an outage, there's nothing we can do" vs. being expected to actually fix it
From this perspective, the ability to fix problems more quickly when self hosting could be considered an antifeature from the perspective of the employee getting woken up at 3am
You wake up. It's not your fault. You're helpless to solve it.
Eventually, AWS has a VP of something dial in to your call to apologize. They’re unprepared and offer no new information. The get handed to a side call for executive bullshit.
AWS comes back. Your support rep only vaguely knows what’s going on. Your system serves some errors but digs out.
Then you go to sleep.
That includes big and small businesses, SaaS and non-SaaS, high scale (5M+rps) to tiny scale (100s-10krps), and all sorts of different markets and user bases. Even at the companies that were not staffed or providing a user service over night, overnight outages were immediately noticed because on average, more than one external integration/backfill/migration job was running at any time. Sure, “overnight on call” at small places like that was more “reports are hardcoded to email Bob if they hit an exception, and integration customers either know Bob’s phone number or how to ask their operations contact to call Bob”, but those are still environments where off-hours uptime and fast resolution of incidents was expected.
Between me, my colleagues, and friends/peers whose stories I know, that’s an N of high dozens to low hundreds.
What am I missing?
IME the need for 24x7 for B2B apps is largely driven by global customer scope. If you have customers in North American and Asia, now you need 24x7 (and x365 because of little holiday overlap).
That being said, there are a number of B2B apps/industries where global scope is not a thing. For example, many providers who operate in the $4.9 trillion US healthcare market do not have any international users. Similarly the $1.5 trillion (revenue) US real estate market. There are states where one could operate where healthcare spending is over $100B annually. Banks. Securities markets. Lots of things do not have 24x7 business requirements.
All of those places needed their backend systems to be up 24/7. The banks ran reports and cleared funds with nightly batches—hundreds of jobs a night for even small banking networks. The healthcare companies needed to receive claims and process patient updates (e.g. your provider’s EMR is updated if you die or have an emergency visit with another provider you authorized for records sharing—and no, this is not handled by SaaS EMRs in many cases) over night so that their systems were up to date when they next opened for business. The “regular” businesses closed for the night generated reports and frequently had IT staff doing migrations, or senior staff working on something at midnight due the next day (when the head of marketing is burning the midnight oil on that presentation, you don’t want to be the person explaining that she can’t do it because the file server hosting the assets is down all the time after hours).
And again, that’s the norm I’ve heard described from nearly everyone in software/IT that I know: most businesses expect (and are willing to pay for or at least insist on) 24/7 uptime for their computer systems. That seems true across the board: for big/small/open/closed-off-hours/international/single-timezone businesses alike.
But there are also a not-insignificant number of important systems where nobody is on a pager, where there is no call rotation[1]. Computers are much more reliable than they were even 20 years ago. It is an Acceptable Business Choice to not have 24x7 monitoring for some subset of systems.
Until very recently[2], Citibank took their public website/user portal offline for hours a week.
1 - if a system does not have a fully staffed call rotation with escalations, it's not prepared for a real off-hours uptime challenge 2 - they may still do this, but I don't have a way to verify right now.
Unless there's a misconfiguration, usually apps are always visible internally to staff, so there's an existing methodology to follow to make them visible to VIPs.
But sometimes none of that is necessary. I've seen at a 1B market cap company, a failure case where the solution was manual execution by customer success reps while the computers were down. It was slower, but not many people complained that their reports took 10 minutes to arrive after being parsed by Eye Ball Mk 1s, instead of the 1 minute of wait time they were used to.
Also, in addition to perception/reputation issues, B2B contracts typically include an SLA, and nobody wants to be in breach of contract.
I think the parent you're replying to is wrong, because I've worked at small companies selling into large enterprise, and the expectation is basically 24/7 service availability, regardless of industry.
I also self-host my webapp for 4+ years. never have any trouble with databases.
pg_basebackup and wal archiving work wonder. and since I always pull the database (the backup version) for local development, the backup is constantly verified, too.
With MongoDB you simply create a replicaset and you are done.
When planing a Postgres cluster, you need to understand replication options, potentially deal with Patroni. Zalandos Docker Spilo image is not really maintained, the way to go seems CloudNativePG, but that requires k8s.
I still don’t understand why there is no easy built-in Postgres cluster solution.
Interesting. Whoever wrote
https://news.ycombinator.com/item?id=46334990
didn't seem to be aware of that.
A popular choice for smaller workloads has always been the Hetzner cloud which I finally poured into a ready-to-use Terraform module https://pellepelster.github.io/solidblocks/hetzner/rds/index....
Main focus here is a tested solution with automated backup and recovery, leaving out the complicated parts like clustering, prioritizing MTTR over MTBF.
The naming of RDS is a little bit presumptuous I know, but it works quite well :-)
And for small projects, SQLite, rqlite, or etcd.
My logic is either the project is important enough that data durability matters to you and sees enough scale that loss of data durability would be a major pain in the ass to fix, or the project is not very big and you can tolerate some lost committed transactions.
A consensus-replication-less non-embedded database has no place in 2025.
This is assuming you have relational needs. For non-relational just use the native NoSQL in your cloud, e.g. DynamoDB in AWS.
Everything you do that isn't "normal" is another conversation you need to have with an auditor plus each customer. Those eat up a bunch of time and deals take longer to close.
Right or wrong, these decisions make you less "serious" and therefore less credible in the eyes of many enterprise customers. You can get around that perception, but it takes work. Not hosting on one of the big 3 needs to be decided with that cost in mind
Scripts could kick off health reports and trigger operations. Upgrades and recovery runbooks would be clearly defined and integration tested.
It would empower personal sovereignty.
Someone should make this in the open. Maybe it already exists, there are a lot of interesting agentops projects.
If that worked 60% of the time and I had to figure out the rest, I’d self host that. I’d pay for 80%+.
https://supabase.com/docs/guides/self-hosting/docker
however, like always, 'complexity has to live somewhere'. I doubt even Opus 4.5 could handle this. as soon as you get into database records themselves, context is going to blow up and you're going to have a bad time
Is this really the state of our industry? Lol. Bunch of babies scared of the terminal.
However a self-hosted PostgreSQL on a bare metal server with NVMe SSDs will much faster than what RDS is capable of. Especially at the same price points.
> Standard Postgres compiled with some AWS-specific monitoring hooks
… and other operational tools deployed alongside it. That’s not always true: RDS classic may be those things, but RDS Aurora/Serverless is anything but.
As to whether
> self-hosted PostgreSQL on a bare metal server with NVMe SSDs will much faster than what RDS is capable of
That’s often but not always true. Plenty of workloads will perform better on RDS (read auto scaling is huge in Serverless: you can have new read replica nodes auto-launch in response to e.g. a wave of concurrent, massive reporting queries; many queries can benefit from RDS’s additions to/modifications of the pg buffer cache system that work with the underlying storage)—and that’s even with the VM tax and the networked-storage tax! Of course, it’ll cost more in real money whether or not it performs better, further complicating the cost/benefit analysis here.
Also, pedantically, you can run RDS on bare metal with local NVMEs.
Only if you like your data to evaporate when the server stops.
I'm relatively sure that the processing power and memory you can buy on OVH / Hetzner / co. is larger and cheaper even if you take into account peaks in your usage patterns.
(Edited to remove glib and vague rejoinder, sorry) Then hibernate/reboot it instead of stopping it? Alternatively, that’s what backup-to S3, periodic snapshot-to-EBS, clustering, or running an EBS-persisted zero-query-volume tiny replica are for.
> the processing power and memory you can buy on OVH / Hetzner / co. is larger and cheaper
Cheaper? Yeah, generally. But larger/more performant? Not always—it’s not about peaks/autoscaling, it’s about the (large) minority of workloads that will work better on RDS/Aurora/Serverless: auto-scale-out makes the reports run on time regardless of cost; bulk data loads are available on replicas a lot sooner on Aurora because the storage is the replication system, not the WAL; and so on—if you add up all the situations where the hosted RDBMS systems trump self hosted, you get an amount that’s not “hosted is always better/worth it”, but it’s not “hosted is just ops time savings and is otherwise just slower/more expensive” either. And that’s before you add reliability into the conversation.
https://blog.notmyhostna.me/posts/what-i-wish-existed-for-se...
To the author - on Android Chrome I seem to inevitably load the page scrolled to the bottom, footnotes area. Scrolling up, back button, click link again has the same results - I start out seeing footnotes. Might be worth a look.
Things will go wrong. And it's all your fault. You can't just blame AWS.
Also are we changing the definition of self hosting. Self hosting on Digital Ocean ?!
Sometimes it is nice to simplify the conversation with non-tech management. Oh, you want HA / DR / etc? We click a button and you get it (multi-AZ). Clicking the button doubles your DB costs from x to y. Please choose.
Then you have one less repeating conversation and someone to blame.
... but you can do a lot with just "a single VM and robust backup". PostgreSQL restore is pretty fast, and if you automated deployment you can start with it in minutes, so if your service can survive 30 minutes of downtime once every 3 years while the DB reloads, "downgrading" to "a single cloud VM" or "a single VM on your own hardware" might not be a big deal.
Hiring and replacing engineers who can and want to manage database servers can be hard or scary for employers.
I heard there's this magical thing called "money" that is claimed to help with this problem. You offer even half of the AWS markup to your employees and suddenly they like managing database servers. Magic I tell you!
I had a single API endpoint performing ~178 Postgres SQL queries.
Setup Latency/query Total time
-------------------------------------------------
Same geo area 35ms 6.2s
Same local network 4ms 712ms
Same server ~0ms 170ms
This is with zero code changes, these time shavings are coming purely from network latency. A lot of devs lately are not even aware of latency costs coming from their service locations. It's crazy!Disclaimer: there's no silver bullet, yadda yadda. But SQLite in WAL mode and backups using Litestream have worked perfectly for me.
I was disappointed alloy doesn't support timescaledb as a metrics endpoint. Considering switching to telegraf just because I can store the metrics on Postgres.
SQLite when prototyping, Postgres for production.
If you need to power a lawnmower and all you have is a 500bhp Scania V8, you may as well just do it.
Even better, stage to a production-like environment early, and then deploy day can be as simple as a DNS record change.
I have switched to using postgres even for prototyping once I prepared some shell scripts for various setup. With hibernate (java) or knex (Javascript/NodeJS) and with unit tests (Test Driven Development approach) for code, I feel I have reduced the friction of using postgres from the beginning.
Because it's "low effort" to just fire it into sqlite and if I have to do ridiculous things to the schema as I footer around working out exactly what I want the database to do.
I don't want to use nodejs if I can possibly avoid it and you literally could not pay me to even look at Java, there isn't enough money in the world.
Incidentally, you can rsync postgres dumps as well. That's what I do when testing and when sharing test data with team mates. At times, I decide to pgload the database dump into a different target system.
My reason for sharing: I accepted that I was being lethargic about using postgres, so I just automated certain things as I went along.
For most purposes, it works perfectly fine, but with two main caveats:
1. It is single user, single connection (i.e. no MVCC) 2. It doesn't support all postgres extensions (particularly postGIS), though it does support pgvector
https://github.com/supabase-community/pg-gateway is something that may be used to use pglite for prototyping I guess, but I haven't used this.
This is when you use SQLite, not Postgres. Easy enough to turn into Postgres later, nothing to set up. It already works. And backups are literally just "it's a file, incremental backup by your daily backups already covers this".
Managed database services mostly automate a subset of routine operational work, things like backups, some configuration management, and software upgrades. But they don't remove the need for real database operations. You still have to validate restores, build and rehearse a disaster recovery plan, design and review schemas, review and optimize queries, tune indexes, and fine-tune configuration, among other essentials.
In one incident, AWS support couldn't determine what was wrong with an RDS cluster and advised us to "try restarting it".
Bottom line: even with managed databases, you still need people on the team who are strong in DBOps. You need standard operating procedures and automation, built by your team. Without that expertise, you're taking on serious risk, including potentially catastrophic failure modes.
I would've very much preferred being able to SSH in myself and fix it on the spot. Ironically the only reason it ran out of space in the first place is that the AWS markup on that is so huge we were operating with little margin for error; none of that would happen with a bare-metal host where I can rent 1TB of NVME for a mere 20 bucks a month.
As far as I know we never got any kind of compensation for this, so RDS ended up being a net negative for this company, tens of thousands spent over a few years for laptop-grade performance and it couldn't even do its promised job the only time it was needed.
The real downside wasn't technical. The constant background anxiety you had to learn to live with, since the hosted news sites were hammered by the users. The dreaded SMS alerts saying the server was inaccessible (often due to ISP issues) or going abroad meant persuading one of your mates to keep an eye on things just in case, created a lot of unnecessary stress.
AWS is quite good. It has everything you need and removes most of that operational burden, so the angst is much lower, but the pricing is problematic.
A lot of this comes down to devs not understanding infrastructure and infrastructure components and the insane interplay and complexity. And they don't care! Apps, apps apps, developers, developers, developers!
On the managerial side, it's often about deflection of responsibility for the Big Boss.
It's not part of the app itself it can be HARD, and if you're not familiar with things, then it's also scary! What if you mess up?
(Most apps don't need the elasticity, or the bells and whistles, but you're paying for them even if you don't use them, indirectly.)
A blog post that went into the details would be awesome. I know Postgres has some docs for this (https://www.postgresql.org/docs/current/backup.html), but it's too theoretical. I want to see a one-stop-shop with everything you'd reasonably need to know to self host: like monitoring uptime, backups, stuff like that.
An expert will give you thousands of theoretical reasons why self-hosting the DB is a bad idea.
An "expert" will host it, enjoy the cost savings and deal with the once-a-year occurrence of the theoretical risk (if it ever occurs).
If you're a small business, you almost never need replicas or pooling. Postgres is insanely capable on modern hardware, and is probably the fastest part of your application if your application is written in a slower dynamic language like Python.
I once worked with a company that scaled up to 30M revenue annually, and never once needed more than a single dedicated server for postgres.
And Postgres upgrades are not transparent. So you'll have a 1 or 2 hours task, every 6 to 18 months that you have only a small amount of control over when it happens. This is ok for a lot of people, and completely unthinkable for some other people.