We encourage staff to play with both, and they can play with impunity since it's a copy that will get replaced soon-ish.
This makes it important that both work reliably, which means we know when our backups stop working.
We haven't had a disaster recovery situation yet(hopefully never), but I feel fairly confident that getting the DB back shouldn't be a big deal.
pg_restore will handle roles, indexes, etc assuming you didn't switch the flags around to disable them
If you're on EC2, hopefully you're using disk snapshots and WAL archiving.
High availability is different from disaster recovery
If I'm not mistaken, this was in 2016 (that's 10 years next year, time flies when you're having fun) -- which is practically an eternity in IT. I'm no DBA but I'm fairly sure many changes have been made to Postgres since then, including logical replication (which can be selective), parallel apply of large transactions in v16, and so on.
I'm not saying this means their points are invalid, I don't know Postgres well enough for that, but any point made almost 10 years ago against one of the most popular and most actively developed options in its field should probably be taken with a pinch of salt.
Heh, I remember the countless articles after that debacle back then pointing out all the reasons why their migration was entirely pointless and could've been summed up to "devs not knowing the tools they're working with" before starting multi million projects to fuel their cv driven development.
So even if you aren't willing to do so, their rational for the migration was fully debunked even back then
And, of course, your disaster recovery plan is incomplete until you've tested it (at scale). You don't want to be looking up Postgres documentation when you need to restore from a cold backup, you want to be following the checklist you have in your recovery plan and already verified.
Assuming you mean that range to start at 100GB, I've worked with databases that size multiple times but as a freelancer it's definitely not been "most" businesses in that range.
I don’t remember they have a similar doc for setting up HA.
I also recommend pg_repack[2] to squash tables on a live system and reclaim disk space. It has saved me so much space.
do you export the data with this and then import it in the other db with it?
or do you work with existing postgres backups?
I'm also curious how Distributed Replicated Block Device (DRBD) would perform, it would cause obvious latency but perhaps it would be an easier and more efficient solution for a "hot spare" setup than using Postgres native functionality. To my understanding, DRBD can be configured to protect you from hardware IO errors by "detaching" from an erroring disk.
I also don't know if it's a valid point, but I've heard people say that you don't want a fancy CoW filesystem for databases, since much of the functionality offered are things that databases already solve themselves, so you might be sacrificing performance for safety from things that "should not happen"(tm) anyway, depending on how it's set up I guess.
ZFS on NixOS is usually quite easy as well, even on / : https://wiki.nixos.org/wiki/ZFS
However we are developing pgstream in order to bring in data and sync it from other Postgres providers. pgstream can also do anonymisation and in the future subsetting. Basically this means that no matter which Postgres service you are using (RDS, CloudSQL, etc) you can get still use Xata for staging and dev branches.
If things go truly south, just hope you have a read replica you can use as your new master. Most SLAs are not written with 72h+ of downtime. Have you tried the nuclear recovery plan, from scratch? Does it work?
"Restore from scratch" can mean a lot of different things, if you have a read replica that you can promote then in relative terms to 72h+ downtime, this should be fairly quick, no?
If you have block-level backups or snapshots, with ZFS for example as someone mentioned, it should also be relatively quick -- although I assume this would make any hypothetical read replica split-brain.