1. Do you really need a queue? (Alternative: periodic polling of a DB)
2. What's your event volume and can it fit on one node for the foreseeable future, or even serverless compute (if not too expensive)? (Alternative: lightweight single-process web service, or several instances, on one node.)
3. If it can't fit on one node, do you really need a distributed queue? (Alternative: good ol' load balancing and REST API's, maybe with async semantics and retry semantics)
4. If you really do need a distributed queue, then you may as well use a distributed queue, such as Kafka. Even if you take on the complexity of managing a Kafka cluster, the programming and performance semantics are simpler to reason about than trying to shoehorn a distributed queue onto a SQL DB.
Unless employees expect that their best rewards are from making their current project as simple and effective as possible - it is highly unlikely that the current project will be as simple as it could be.
Schema design and query design will make or break your app’s ability to scale without skyrocketing the bill, it’s as simple as that.
This one is a classic for MSSQL, most of it is applicable on postgres.
I’m unsure what is unattractive about this but I guess anything can be a reason to spend a year playing with LLMs these days.
I’ve had the same problem with compliance work (lightly regulated market) and suddenly the scaling complaints go away when the renewals stop happening.
Which is wrong a lot of the time! You need to build what is needed. If only 10 people use your project, the design will be entirely different than if 10 million people use it
There is nothing wrong with building stuff, or career development. There is also nothing wrong with experimentation. You certainly would not want to incentivize the opposite behavior of never building anything unless it had 10 guarantors of revenue and technical soundness.
If you need people to focus, then you need them to be incentivized to focus. Do they see growth potential? Are they compensated such that other employers are undesirable? Do they see the risk of failure?
So sure "technically" it's not a queue, but in reality its used as a queue for 1000s of companies around the world for huge production workloads which no MQ system can support.
So the competing solutions are: PostgreSQL or Kafka+PostgreSQL
Kafka does provide some extrs there, handling load spikes, more clients that PG can handle natively and resilience to some DB downtime. But is it worth the complexity, in most cases no.
you really can't. getting per-message acks, dynamically scaling competing consumers without having to repartition while retaining ordering, etc. requires a ton of hacks like client side tracking / building your own storage on top of offset metadata / etc.. and you still won't have all of the features actual message queues provide.
to make it worse, there is very little public work/discussion on this so you'll be on your own to figure out all of the quirks. the only notable example is https://github.com/confluentinc/parallel-consumer which is effectively abandoned
Note: I haven’t had a chance to test it out in anger personally yet.
here's the equivalent in parallel-consumer: https://github.com/confluentinc/parallel-consumer?tab=readme...
This IMO is better behaviour than RabbitMQ since you can always re-read messages once they have been processed, whereas generally with MQ systems the message is then marked for deletion and asynchronously deleted.
Exactly-once delivery is one of the hardest distributed systems problems. If you’ve “trivially” solved it, please show us your solution.
I can imagine, a 1 Billion dollar transaction accidentally gets processed by ten thousand client nodes due to a client app synchronization bug, company rethinks its dumb data dumper server strategy...news at 11.
You can make a bucket immutable, and entries have timestamps. I don't think any cloud provider makes claims about the accuracy or monoatomicity of these timestamps, so you would merely get an ordering, not necessarily the ordering in which things truly occurred. But I have use cases where that is fine.
I believe with a clever naming scheme and cooperating clients it could be made to work.
But the application was fairly low volume in Data and usage, So eventual consistency and capacity was not an issue. And yes timestamp monotonicity is not guaranteed when multiple client upload at the same time so unique id was given to each client at startup and used for to add guarantee of entries name. Metadata and prefix were used to indicate state of object during processing.
Not ideal, but it was cheaper that a DB or a dedicated MQ. The application did not last, but would try again the approach if adapted to stuation.
I was thinking that writes could be indexed/prefixed into timestamp buckets according to the clients local time. This can't be trusted, of course. But the application consumers could detect and reject any writes whose upload timestamp exceeds a fixed delta from the timestamp bucket it was uploaded to. That allows for arbitrary seeking to any point on the log.
That was all deliberately pushed onto consumers to manage to achieve scale.
Acking in an MQ is very different.
I believe RabbitMQ is much more balanced and is closer to what people expect from a high level queueing/pubsub system.
They are still "listeners", it's that events aren't pushed to the listener. Instead the API is designed for sheer throughput.
In my experience it’s not the reads, but the writes that are hard to scale up. Reading is cheap and can be sometimes done off a replica. Writing to a PostgreSQL at high sustained rate requires careful tuning and designs. A stream of UPDATEs can be very painful, INSERTs aren’t cheap, and even a batched COPY blocks can be tricky.
The scars from that kind of outage will never truly heal.
0: https://www.enterprisedb.com/blog/impact-full-page-writes
Then, if you ever need to switch to something more performant, it will be relatively easy.
It's a queue... how bad can you screw this up? My guess is, in most corporate environment, very very badly. Somehow something as complicated as consuming a queue (which isn't very complicated at all) will be done in such a way that it will require many months to change which queue is used in the future.
I'm a java dev and maybe my projects are about big integrations, but I've always needed queue like constructs and polling from a db was almost always a headache, especially with multiple consumers and publishers.
Sure it can be done, and in many projects we do have cron-jobs on different pods -- not a global k8s cron-job, but legacy cron jobs and it works fine.
Kafka does not YET support real queue (but I'm sure there's a high profile KIP to have true queue like behavior, per consumer group, with individual commits), and does not support server side filtering.
But consumer groups and partitions have been such a blessing for me, it's very hard to overstate how useful they are with managing stateful apps.
But then distributed queue is most likely not needed until you hit really humongous scale.
10 message/s is only 860k/day. But in my testing (with postgres 16) this doesn't scale that well when you are needing tens to hundreds of millions per day. Redis is much better than postgres for that (for a simple queue), and beyond that kafka is what I would choose in you're in the low few hundred million.
If someone is talking about per day numbers or per month numbers they're likely doing it to have the numbers sound more impressive and to make it harder to see how few X per second they actually handled. 11 million events per day sounds a whole lot more impressive than 128 events per second, but they're the same thing and only the latter usually matters in these types of discussions.
Periodic polling is awkward on both sides: you add arbitrary latency _and_ increase database load proportional to the number of interested clients.
Events, and ideally coalesced events, serve the same purpose as interrupts in a uniprocess (versus distributed) system, even if you don't want a proper queue. This at least lets you know _when_ to poll and lets you set and adjust policy on when / how much your software should give a shit at any given time.
The limiting factor for most workloads will probably be the number of connections, and the read/write mix. When you get into hundreds or thousands of pollers and writing many things to the queue per second Postgres is going to lose its luster for sure.
But in my experience with small/medium companies, a lot of workloads fit very very comfortably into what Postgres can handle easily.
The publisher has a set of tables (topics and partitions) of events, ordered and with each event having an assigned event sequence number.
Publisher stores no state for consumers in any way.
Instead, each consumer keeps a cursor (a variable holding an event sequence number) indicating how far it has read for each event log table it is reading.
Consumer can then advance (or rewind) its own cursor in whatever way it wishes. The publisher is oblivious to any consumer side state.
This is the fundamental piece of how event log publishing works (as opposed to queues which is something else entirely; and the article talks about both usecases).
Then you just query from event_receiver_svcX side, for events published > datetime and event_receiver_svcX = FALSE. Once read set to TRUE.
To mitigate too many active connections have a polling / backoff strategy and place a proxy infront of the actual database to proactively throttle where needed.
But event table:
| event_id | event_msg_src | event_msg | event_msg_published | event_receiver_svc1 | event_receiver_svc2 | event_receiver_svc3 |
|----------|---------------|---------------------|---------------------|---------------------|---------------------|---------------------|
| evt01 | svc1 | json_message_format | datetime | TRUE | TRUE | FALSE |
MQTT -> Redpanda (for message logs and replay, etc) -> Postgres/Timescaledb (for data) + S3 (for archive)
(and possibly Flink/RisingWave/Arroyo somewhere in order to do some alerting/incrementally updated materialized views/ etc)
this seems "simple enough" (but I don't have any experience with Redpanda) but is indeed one more moving part compared to MQTT -> Postgres (as a queue) -> Postgres/Timescaledb + S3
Questions:
1. my "fear" would be that if I use the same Postgres for the queue and for my business database, the "message ingestion" part could block the "business" part sometimes (locks, etc)? Also perhaps when I want to update the schema of my database and not "stop" the inflow of messages, not sure if this would be easy?
2. also that since it would write messages in the queue and then delete them, there would be a lot of GC/Vacuuming to do, compared to my business database which is mostly append-only?
3. and if I split the "Postgres queue" from "Postgres database" as two different processes, of course I have "one less tech to learn", but I still have to get used to pgmq, integrate it, etc, is that really much easier than adding Redpanda?
4. I guess most Postgres queues are also "simple" and don't provide "fanout" for multiple things (eg I want to take one of my IoT message, clean it up, store it in my timescaledb, and also archive it to S3, and also run an alert detector on it, etc)
What would be the recommendation?
There are globally shared resources, but for the most part, locks are held on specific rows or tables. Unrelated transactions generally won't block on each other.
Also running a Very High Availability cluster is non-trivial. It can take a minute to fail over to a replica, and a busy database can take a while to replay the WAL after a reboot before it's functional again. Most people are OK with a couple minutes of downtime for the occasional reboot though.
I think this really depends on your scale. Are you doing <100 messages/second? Definitely stick with postgres. Are you doing >100k messages/second? Think about Kafka/redpanda. If you were comfortable with postgres (or you will be since you are building the rest of your project with it), then you want to stick with postgres longer, but if you are barely using it and would struggle to diagnose an issue, then you won't benefit from consolidating.
Postgres will also be more flexible. Kafka can only do partitions and consumer groups, so if your workload doesn't look like that (e.g. out of order processing), you might be fighting Kafka.
MQTT -> Postgres (+ S3 for archive)
> 1. my "fear" would be that if I use the same Postgres for the queue and for my business database...
This is a feature, not a bug. In this way you can pair the handling of the message with the business data changes which result in the same transaction. This isn't quite "exactly-once" handling, but it's really really close!
> 2. also that since it would write messages in the queue and then delete them, there would be a lot of GC/Vacuuming
Generally it's best practice in this case to never delete messages from a SQL "queue", but toggle them in-place to consumed and periodically archive to a long-term storage table. This provides in-context historical data which can be super helpful when you need to write a script to undo or mitigate bad code which resulted in data corruption.
Alternatively when you need to roll back to a previous state, often this gives you a "poor woman's undo", by restoring a time-stamped backup, copying over messages which arrived since the restoration point, then letting the engine run forwards processing those messages. (This is a simplification of course, not always directly possible, but data recovery is often a matter of mitigations and least-bad choices.)
Basically, saving all your messages provides both efficiency and data recovery optionality.
> 3...
Legit concern, particularly if you're trying to design your service abstraction to match an eventual evolution of data platform.
> 4. don't provide "fanout" for multiple things
What they do provide is running multiple handling of a queue, wherein you might have n handlers (each with its own "handled_at" timestamp column in the DB), and different handles run at different priorities. This doesn't allow for workflows (ie a cleanup step) but does allow different processes to run on the same queue with different privileges or priorities. So the slow process (archive?) could run opportunistically or in batches, where time-sensitive issues (alerts, outlier detection, etc) can always run instantly. Or archiving can be done by a process which lacks access to any user data to algorithmically enforce PCI boundaries. Etc.
Ignoring the potential uses for this data, what you suggested has the exact same effect on Postgres at a tuple level. An UPDATE is essentially the same as a DELETE + INSERT, due to its MVCC implementation. The only way around this is with a HOT update, which requires (among other things) that no indexed columns were updated. Since presumably in this schema you’d have a column like is_complete or is_deleted, and a partial index on it, as soon as you toggle it, it can’t do a HOT update, so the concerns about vacuum still apply.
That’s a particularly nasty trap. Devs will start using this everywhere and it makes it very hard to move this beyond Postgres when you need to.
I’d keep a small transactional outbox for when you really need it and encourage devs to use it only when absolutely necessary.
I’m currently cleaning up an application that has reached the limit of vertical scaling with Postgres. A significant part of that is because it uses Postgres for every background work queue. Every insert into the queue is in a transaction—do you really want to rollback your change because a notification job couldn’t be enqueued? Probably not. But the ability is there and is so easy to do that it gets overused.
Now I get to go back through hundreds of cases and try to determine whether the transactional insert was intentional or just someone not thinking.
Until your postgresql instance goes down (even by reasons unrelated to pgsql) and then you have no fallback or queue for elasticity
Is this some scripting to automate your home, or are you trying to build some multi-tenant thing that you can sell?
If it's just scripting to automate your home, then you could probably get away with a single server and on-disk/in-memory queuing, maybe even sqlite, etc. Or you could use it as an opportunity to learn those technologies, but you don't really need them in your pipeline.
It's amazing how much performance you can get as long as the problem can fit onto a single node's RAM/SSD.
n) Do you really need S3? is it cheaper than NFS storage on a compute node with a large disk?
There are many cases where S3 is absolutely cheaper though.
Your application think it's a normal disk but it isn't, so you get no timeouts, no specific errors for network issues and extremely expensive calls camouflage as quick FS ops (was any file modified in this folder ? I'll just loop over them using my standard library nice FS utilities). And you don't get atomic ops outside of mv, invalidation and caching are complicated and your developers probably don't know the semantics of FS operations, which are much more complex and less well documented than eg Redis blob storage.
And then when you finally rip out NFS, you have thousands of lines of app and test code that assumes your blobs are on a disk in subtle ways.
I would think something like NFS is best suited for an actual file instead of blob you're serializing using a file system api?
You can run into issues with scheduled queues (e.g. run this job in 5 minutes) since the tables will be bigger, you need an index, and you will create the garbage in the index at the point you are querying (jobs to run now). This is a spectacularly bad pattern for postgres at high volume.
Doesn't PostgreSQL have transactional schema updates as a key feature? AIUI, you shouldn't be having any data loss as a result of such changes. It's also common to use views in order to simplify the management of such updates.
That sounds distributed to me, even if it wires different tech together to make it happen. Is there something about load balancing REST requests to different DB nodes that is less complicated than Kafka?
To be clear I wasn't talking about DB nodes, I was talking about skipping an explicit queue altogether.
But let's say you were asking about load balancing REST requests to different backend servers:
Yes, in the sense that "load balanced REST microservice with retry logic" is such a common pattern that is better understood by SWE's and SRE's everywhere.
No, in the sense that if you really did just need a distributed queue then your life would be simpler reusing a battle-tested implementation instead of reinventing that wheel.
Kafka has its own foibles and isn't a trivia set-it-and-forget it to run at scale.
The Pareto principle is not some guarantee applicable to everything and anything saying that any X will handle 80% of some other thing's use cases with 20% the effort.
One can see how irrelevant its invocation is if we reverse: does Kafka also handle 80% of what Postgres does with 20% the effort? If not, what makes Postgres especially the "Pareto 80%" one in this comparison? Did Vilfredo Pareto had Postgres specifically in mind when forming the principle?
Pareto principle concerns situations where power-law distributions emerge. Not arbitrary server software comparisons.
Just say Postgres covers a lot of use cases people mindlessly go to shiny new software for that they don't really need, and is more battled tested, mature, and widely supported.
The Pareto principle is a red herring.
>The Pareto principle is not some guarantee applicable to everything and anything
Yes, obviously. The author doesn't say otherwise. There are obviously many ways of distributing things.
>One can see how irrelevant its invocation is if we reverse: does Kafka also handle 80% of what Postgres does with 20% the effort?
No
>If not, what makes Postgres especially the "Pareto 80%" one in this comparison?
Because its simpler.
What implies everything can handle 80% of use cases with 20% of effort? It's like saying:
If normal distributions are real, and human height is normally distributed, then why isnt personal wealth? They are just different distributions.
Let me explain then...
> Yes, obviously. The author doesn't say otherwise.
Someone doesn't need to spell something out explicitly to imply it, or to fall to the kind of mistake I described.
While the author might not say otherwise, they do invoke the Pareto principle out of context, as if it's some readily applicable theorem.
>>If not, what makes Postgres especially the "Pareto 80%" one in this comparison? > Because its simpler.
Something being simpler than another thing doesn't make it a Pareto "80%" thing.
Yeah, I know you don't say this is always the case explicitly. But, like with the OP, your answer uses this as if it's an argument in favor of smething being the "Pareto 80%" thing.
It just makes it simpler. In the initial Pareto formulation is was about wealth accumulation even, which has nothing to do with simplicity or features or even with comparing different classes of things (both sides referred to the same thing, people. Specifically about 20% of the population owning 80% of the land).
For very simple software, most users use all the features. For very specialized software, there's very few users, and they use all the features.
> The claim is that it handles 80%+ of their use cases with 20% of the development effort. (Pareto Principle)
This is different units entirely! Development effort? How is this the Pareto Principle at all?
(To the GP's point, would "ls" cover 80% of the use cases of "cut" with 20% of the effort? Or would MS Word cover 80% of the use cases of postgresql with 20% of the effort? Because the scientific Pareto Principle tells us so?)
Hey, it's really not important, just an idea that with Postgres you can cover a lot of use cases with a lot less effort than configuring/maintaining a Kafka cluster on the side, and that's plausible. It's just that some "nerds" who care about being "technically correct" object to using the term "pareto principle" to sound scientific here, that bit is just nonsense.
First, it would be inverse, not reverse.
Second, no it doesn't work that way, that's the point of the Pareto principle in the first place, what is 80% is always 80% and what is 20% is always 20%.
I know, since that's the whole point I was making. That the OP picked an arbitrary side to give the 80%, and that one could just as well pick the other one, and that you need actual arguments (and some kind of actual measurable distribution) to support one or the other being the 80% (that is, merely invoking the Pareto principle is not an argument).
I've been a happy Postgres user for several decades. Postgres can do a lot! But like anything, don't rely on maxims to do your engineering for you.
If you rent a cloud DB then it can scale elastically which can make this cheaper than Postgres, believe it or not. Cloud databases are sold at the price the market will bear not the cost of inputs+margin, so you can end up paying for Postgres as much as you would for an Oracle DB whilst getting far fewer features and less scalability.
Source: recently joined the DB team at Oracle, was surprised to learn how much it can do.
Also, LISTEN/NOTIFY do not scale, and they introduce locks in areas you aren't expecting - https://news.ycombinator.com/item?id=44490510
Of course the other 99% is the remaining 1%.
It often doesn't.
[1] https://rubyonrails.org/2024/11/7/rails-8-no-paas-required
Postgres isn’t meant to be a guaranteed permanent replacement.
It’s a common starting point for a simpler stack which can retain a greater deal of flexibility out of the box and increased velocity.
Starting with Postgres lets the bottlenecks reveal themselves, and then optimize from there.
Maybe a tweak to Postgres or resources, or consider a jump to Kafka.
But anytime you treat a database, or a queue, like a black box dumpster, problems will ensue.
And the thing is, a server from 10 years ago running postgres (with a backup) is enough for most applications to handle thousands of simultaneous users. Without even going into the kinds of optimization you are talking about. Adding ops complexity for the sake of scale on the exploratory phase of a product is a really bad idea when there's an alternative out there that can carry you until you have fit some market. (And for some markets, that's enough forever.)
When you’re doing hundreds or thousands of transactions to begin with it doesn’t really impact as much out of the gate.
Of course there will be someone who will pull out something that won’t work but such examples can likely be found for anything.
We don’t need to fear simplification, it is easy to complicate later when the actual complexities reveal themselves.
A downside is the lack of tooling client side. For many using Kafka is worth it simply for the tooling in libraries consumer side.
If you just want to write an event handler function there is a lot of boilerplate to manage around it. (Persisting read cursors etc)
We introduced a company standard for one service pulling events from another service that fit well together with events stored in SQL.
https://github.com/vippsas/feedapi-spec
Nowhere close to Kafka's maturity in client side tooling but it is an approach for how a library stack could be built on top making this convenient and have the same library toolset support many storage engines. (On the server/storage side, Postgres is of course as mature as Kafka...)
Not your parent poster, but Kafka is often treated like a message broker and it ain't that. Specifically, it has no concept of NACK-ing messages, it is either processed or not processed. There's no way to the client to say "skip this message and hand it to another worker" or "I have this weird message but I don't know how to process it, can you take it back?".
What people very commonly do is to instead move the unprocessed message to a dead-letter-queue, which at least clears the upstream queue but means you have to sift through the dead-letter-queue and figure out how to rescue messages.
Also people often think "I can read 100 messages in a batch and handle them individually in the client" while not considering that if some of the messages fail to send (or crash the client, losing the entire batch), Kafka isn't monitoring to say "hey you haven't verified that message 12 and 94 got processed correctly, do you want to keep working on them or should I assign them to someone else?"
Basically, in Kafka, the offset pointer should only be incremented after the client is 100% sure it is done with the message and the output has been written to durable storage if you care about the outcome. Otherwise you risk "skipping" messages because the client crashed or otherwise burped when trying to process the message.
Also Kafka topic partitions are semi-parallel streams that are not necessarily time ordered relative to each other... It's just another pinch point.
Consider exploring NATS Jetstream and its MQTT 3.1.1 mode and see if it suits your MQTT needs? Also I love Bento for declarative robust streaming ETL.
Naive approach with sequence (or serial type which uses sequence automatically) does not work. Transaction "one" gets number "123", transaction "two" gets number "124". Transaction "two" commits, now table contains "122", "124" rows and readers can start to process it. Then transaction "one" commits with its "123" number, but readers already past "124". And transaction "one" might never commit for various reasons (e.g. client just got power cut), so just waiting for "123" forever does not cut it.
Notifications can help with this approach, but then you can't restart old readers (and you don't need monotonic numbers at all).
https://www.oreilly.com/library/view/designing-data-intensiv...
You can generate distributed monotonic number sequences with a Lamport Clock.
https://en.wikipedia.org/wiki/Lamport_timestamp
The wikipedia entry doesn't describe it as well as that book does.
It's not the end of the puzzle for distributed systems, but it gets you a long way there.
See also Vector clocks. https://en.wikipedia.org/wiki/Vector_clock
Edit: I've found these slides, which are a good primer for solving the issue, page 70 onwards "logical time":
https://ia904606.us.archive.org/32/items/distributed-systems...
If you would rather have readers waiting and parallel writers there is a more complex scheme here: https://blog.sequinstream.com/postgres-sequences-can-commit-...
Another way to speed it up is to grab unique numbers in batches instead of just getting them one at a time. No idea why you want your numbers to be in absolute sequence. That's hard in a distributed system. Probably best to relax that constraint and find some other way to track individual pieces of data. Or even better, find a way so you don't have to track individual rows in a distributed system.
In a sense this is what Kafka IS architecturally: The component that assigns event sequence numbers.
Another approach which I used in the past was to assign sequence numbers after committing. Basically a separate process periodically scans the set of un-sequenced rows, applies any application defined ordering constraints, and writes in SNs to them. This can be surprisingly fast, like tens of thousands of rows per second. In my case, the ordering constraints were simple, basically that for a given key, increasing versions get increasing SNs. But I think you could have more complex constraints, although it might get tricky with batch boundaries
There needs to be serialization happening somewhere, either by writers or readers waiting for their turn.
What Kafka "is" in my view is simply the component that assigns sequential event numbers. So if you publish to Kafka, Kafka takes the same locks...
How to increase throughput is add more shards in a topic.
Isn't it a bit of a white whale thing that a umion can solve all one's subscriber problems? Afaik even with kafka this isn't completely watertight.
So it is a much more serious issue at stake here than event ordering/consistency.
As it happens, if you use event log tables in SQL "the Kafka way" you actually get guarantee on event ordering too as a side effect, but that is not the primary goal.
More detailed description of problem:
https://github.com/vippsas/mssql-changefeed/blob/main/MOTIVA...
That said, I don't consider running Kafka to be a headache. I work at a mid-sized company, processing billions of Kafka events per day and it's never been a problem, even locally when I'm processing hundreds of events per day.
You set it up, forget about it, and it scales endlessly. You don't have to rewrite anything and it provides a nice separation layer between your system components.
When starting out, you can easily run Kafka, DB, API on the same machine.
Vendors frequently push that narrative so they can sell their own managed (or proprietary) solution on it. With a decent AI model (e.g ChatGPT Pro), it's easier than ever to figure out best practices and conventions.
That being said, my point is more about the organizational overhead. Deploying Kafka still means you need to learn how it works, why it's good, its configs, API, how to debug it, set up obesrvability, yada yada.
Except that the burden is on all clients to coordinate to avoid processing an event more than once since Kakfa is a brainless invention just dumping data forever into a serial log.
The author is suggesting to avoid this solution and roll your own instead.
Do you mean different consumers within the same consumer group? There's no technology out there that will guarantee exactly-once delivery, it's simply impossible in a world where networks aren't magically 100% reliable. SQS, RedPanda, RabbitMQ, NATS... you call it, your client will always need idempotency.
If you don't need what kafka offers, don't use it. But don't pretend you're on to something with your custom 5k msg/s PG setup.
https://www.youtube.com/watch?v=7CdM1WcuoLc
Getting even less than that throughput on 3x c7i.24xlarge — a total of 288 vCPUs – is bafflingly wasteful.
Just because you can do something with Postgres doesn't mean you should.
> 1. One camp chases buzzwords.
> 2. The other camp chases common sense
In this case, is "Postgres" just being used as a buzzword?
[Disclosure: I work for Redpanda; we provide a Kafka-compatible service.]
When you have C++ code, the number of external folks who want to — and who can effectively, actively contribute to the code — drops considerably. Our "cousins in code," ScyllaDB last year announced they were moving to source-available because of the lack of OSS contributors:
> Moreover, we have been the single significant contributor of the source code. Our ecosystem tools have received a healthy amount of contributions, but not the core database. That makes sense. The ScyllaDB internal implementation is a C++, shard-per-core, future-promise code base that is extremely hard to understand and requires full-time devotion. Thus source-wise, in terms of the code, we operated as a full open-source-first project. However, in reality, we benefitted from this no more than as a source-available project.
Source: https://www.scylladb.com/2024/12/18/why-were-moving-to-a-sou...
People still want to get free utility of the source-available code. Less commonly they want be able to see the code to understand it and potentially troubleshoot it. Yet asking for active contribution is, for almost all, a bridge too far.
(Notably, they're not arguing that open source reusers have been "unfair" to them and freeloaded on their effort, which was the key justification many others gave for relicensing their code under non-FLOSS terms.)
In case anyone here is looking for a fully-FLOSS contender that they may want to perhaps contribute to, there's the interesting project YugabyteDB https://github.com/yugabyte/yugabyte-db
(Personally, I have no issues with the AGPL and Stallman originally suggested this model to Qt IIRC, so I don't really mind the original split, but that is the modern intent of the strategy.)
Again, I'm happy to use AGPL software, I just disagree that the intent here is that different to any of the other projects that switched to the proprietary BSL.
As a maintainer of several free software projects, there are lots of issues with how projects are structured and user expectations, but I struggle to see how proprietary licenses help with that issue (I can see -- though don't entirely buy -- the argument that they help with certain business models, but that's a completely different topic). To be honest, I have no interest in actively seeking out proprietary software, but I'm certainly in the minority on that one.
On the other hand, if they don't expect people outside their company to know C++ well enough to contribute usefully, they probably shouldn't expect people outside their company to be able to compete with them either.
Really, though, the reason to go open-source is because it benefits your customers, not because you get contributions, although you might. (This logic is unconvincing if you fear they'll stop being your customers, of course.)
A colleague and I (mostly him, but on my advice) worked up a set of patches to accept and emit JSON and YAML in the CLI tool. Our use case at the time was setting things up with a config management system using the already built tool RedPanda provides without dealing with unstructured text.
We got a lot of good use out of RedPanda at that org. We’ve both moved on to a new employer, though, and the “no offering RedPanda as a service” spooked the company away from trying it without paying for the commercial package. Y’all assured a couple of us that our use case didn’t count as that, but upper management and legal opted to go with Kafka just in case.
"The use of fsync is essential for ensuring data consistency and durability in a replicated system. The post highlights the common misconception that replication alone can eliminate the need for fsync and demonstrates that the loss of unsynchronized data on a single node still can cause global data loss in a replicated non-Byzantine system."
However, for all that said, Redpanda is still blazingly fast.
https://www.redpanda.com/blog/why-fsync-is-needed-for-data-s...
To be fair, since without fsync you don't have any ordering guarantees for your writes, a crash has a good chance of corrupting your data, not just losing recent writes.
That's why in PostgreSQL it's feasible to disable https://www.postgresql.org/docs/18/runtime-config-wal.html#G... but not to disable https://www.postgresql.org/docs/18/runtime-config-wal.html#G....
Can you lose one Postgres instance?
AKA "Medium Data" ?
It also scales to very large clusters.
Kafka is a full on steaming solution.
Postgres isn’t a buzzword. It can be a capable placeholder until it’s outgrown. One can arrive at Kafka with a more informed run history from Postgres.
It's very common to start adding more and more infra for use cases that, while technically can be better cover with new stuff, it can be served by already existing infrastructure, at least until you have proof that you need to grow it.
This is literally the point the author is making.
"Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize."
It's not so hard. You interpret it how it is written. Yes, they say one camp chases buzzwords and another chases common sense. Critique that if you want to. That's fine.
But what's not written in the OP is some sort of claim that Postgres performs better than Kafka. The opposite is written. The OP acknowledges that Kafka is fast. Right there in the title! What's written is OP's experiments and data that shows Postgres is slow but can be practical for people who don't need Kafka. Honestly I don't see anything bewildering about it. But if you think they're wrong about Postgres being slow but practical that's something nice to talk about. What's not nice is to post snarky comments insinuating that the OP is asking you to design unscalable solutions.
In this case, you do just need a single fuel truck. That's what it was built for. Avoiding using a design-for-purpose tool to achieve the same result actually is wasteful. You don't need 288 cores to achieve 243,000 messages/second. You can do that kind of throughput with a Kafka-compatible service on a laptop.
[Disclosure: I work for Redpanda]
Kafka et al definitely have their place, but I think most people would be much better off reaching for a simpler queue system (or for some things, just using Postgres) unless you really need the advanced features.
Postgres is the solution in question of the article because I simply assume the majority of companies will start with Postgres as their first piece of infra. And it is often the case. If not - MySQL, SQLite, whatever. Just optimize for the thing you know, and see if it can handle your use case (often you'll be surprised)
You should be able to install within minutes.
None of this applies to Redpanda.
Yet to also be fair to the Kafka folks, Zookeeper is no longer default and hasn't been since April 2025 with the release of Apache Kafka 4.0:
"Kafka 4.0's completed transition to KRaft eliminates ZooKeeper (KIP-500), making clusters easier to operate at any scale."
Source: https://developer.confluent.io/newsletter/introducing-apache...
> This is literally the point the author is making.
Exactly! I just don't understand why HN invariably always tends to bubble up the most dismissive comments to the top that don't even engage with the actual subject matter of the article!
Obviously it's possible to build, for example, a machine with 2 cores, a 10Gbps network link, and a single HDD that would falsify my statement.
CPU is more tricky but I’m sure it can be shown somehow
It's a fair point that if you already have a pgsql setup, and only need a few messages here and there, then pg is fine. But yeah, the 96 vcpu setup is absurd.
Is anyone actually reading the full article, or just reacting to the first unimpressive numbers you can find and then jumping on the first dismissive comment you can find here?
Benchmarking Kafka isn't the point here. The author isn't claiming that Postgres outperforms Kafka. The argument is that Postgres can handle modest messaging workloads well enough for teams that don't want the operational complexity of running Kafka.
Yes, the throughput is astoundingly low for such a powerful CPU but that's precisely the point. Now you know how well or how bad Postgres performs on a beefy machine. You don't always need Kafka-level scale. The takeaway is that Postgres can be a practical choice if you already have it in place.
So rather than dismissing it over the first unimpressive number you find, maybe respond to that actual matter of TFA. Where's the line where Postgres stops being "good enough"? That'll be something nice to talk about.
Or they could have not mentioned kafka at all and just demonstrated their pub/sub implementation with PG. They could have not tried to make it about the buzzword resume driven engineering people vs. common sense folks such as himself.
I'm glad the OP benchmarked on the 96 vCPU server. So now I know how well Postgres performs on a large CPU. Not very well. But if the OP had done their benchmark on a low CPU, I wouldn't have learned this.
I might well be talking out of my arse but if you're going to implement pub/sub in Postgres, it'd be worth designing around its strengths and going back to basics on event sourcing.
Never used Kafka myself, but we extensively use Redis queues with some scripts to ensure persistency, and we hit throughputs much higher than those in equivalent prod machines.
Same for Redis pubsubs, but those are just standard non-persistent pubsubs, so maybe that gives it an upper edge.
zeek -> kafka -> logstash -> elastic
There's poles.
1. Is folks constantly adopting the new tech, whatever the motivation, and 2. I learned a thing and shall never learn anything else, ever.
Of course nobody exists actually on either pole, but the closer you are to either, the less pragmatic you are likely to be.
I think it's still just 2 poles. However, I probably shouldn't have prescribed motivation to latter pole, as I purposely did not with the former.
Pole 2 is simply never adopt anything new ever, for whatever the motivation.
In my company most of our topics need to be consumed by more than one application/team, so this feature is a must have. Also, the ability to move the offset backwards or forwards programmatically has been a life saver many times.
Does Postgres support this functionality for their queues?
So if you want an individual offset, then yes, the consumer could just maintain their own… however, if you want a group’s offset, you have to do something else.
Is a queuing system baked into Postgres? Or there client libraries that make it look like one?
And do these abstractions allow for arbitrarily moving the offset for each consumer independently?
If you're writing your own queuing system using pg for persistence obviously you can architect it however you want.
I don't know what kind of native support PG has for queue management, the assumption here is that a basic "kill the task as you see it" is usually good enough and the simplicity of writing and running a script far outweighs the development, infrastructure and devops costs of Kafka.
But obviously, whether you need stuff to happen in 15 seconds instead of 5 minutes, or 5 minutes instead of an hour is a business decision, along with understanding the growth pattern of the workload you happen to have.
Here is one: https://pgmq.github.io/pgmq/
Some others: https://github.com/dhamaniasad/awesome-postgres
Most of my professional life I have considered Postgres folks to be pretty smart… while I by chance happened to go with MySQL and it became the rdbms I thought in by default.
Heavily learning about Postgres recently has been okay, not much different than learning the tweaks for msssl, oracle or others. Just have to be willing to slow down a little for a bit and enjoy it instead of expecting to thrush thru everything.
But it looks like a queue, which is a fundamentally different data structure from an event log, and Kafka is an event log.
They are very different usecases; work distribution vs pub/sub.
The article talks about both usecases, assuming the reader is very familiar with the distinction.
How is it common sense to try to re-implement Kafka in Posgres? You probably need something similar but more simple. Then implement that! But if you really need something like Kafka, then .. use Kafka!
IMO the author is now making the same mistake as some Kafka evangelists that try to implement a database in Kafka.
This made me wonder about a tangential statistic that would, in all likelihood, be impossible to derive:
If we looked at all database systems running at any given time, what proportion does each technology represent (e.g., Postgres vs. MySQL vs. [your favorite DB])? You could try to measure this in a few ways: bytes written/read, total rows, dollars of revenue served, etc.
It would be very challenging to land on a widely agreeable definition. We'd quickly get into the territory of what counts as a "database" and whether to include file systems, blockchains, or even paper. Still, it makes me wonder. I feel like such a question would be immensely interesting to answer.
Because then we might have a better definition of "most of the time."
Server side. Client side. iOS, iPad, Mac apps. Uses in every field. Uses in aerospace.
Just think for a moment that literally every photo and video taken on every iPhone (and I would assume android as well) ends up stored (either directly or sizable amounts of metadata) in a SQLite db.
1) People constantly chasing the latest technology with no regard for whether it's appropriate for the situation.
2) People constantly trying to shoehorn their favourite technology into everything with no regard for whether it's appropriate for the situation.
The third camp:
3) People who look at a task, then apply a tool appropriate for the task.
Postgres also has been around for a long time and a lot of people didn’t know all it can do which isn’t what we normally think about with a database.
Appropriateness is a nice way to look at it as long as it’s clear whether or not it’s about personal preferences and interpretations and being righteous towards others with them.
Customers rarely care about the backend or what it’s developed in, except maybe for developer products. It’s a great way to waste time though.
You need some sort of server-side logic to manage that, and the consumer heartbeats, and generation tracking, to make sure that only the "correct" instances can actually commit the new offsets. Distributed systems are hard, and Kafka goes through a lot of trouble to ensure that you don't fail to process a message.
Of course the implementation based off that is going to miss a bit.
Unless you're a five man shop where everybody just agrees to use that one table, make sure to manage transactions right, cron job retention, YOLO clustering, etc. etc.
Performance is probably last on the list of reasons to choose Kafka over Postgres.
There’s several implementations of queues to increase the chance of finishing what one is after. https://github.com/dhamaniasad/awesome-postgres
I truly miss a good standard client side library following the Kafka-in-SQL philosophy. I started on in my previous job and we used it internally but it never got good enough that it would be widely used elsewhere, and now I work somewhere else...
(PS: Talking about the pub/sub Kafka-like usecase, not the work queue FOR UPDATE usecase)
I'm not saying they're useless, but if I see something like that lying around, it's more likely that someone put it there based on vibes rather than an actual engineering need. Postgres is good enough for OpenAI, chances are it's good enough for you.
Ok so instead of running Kafka, we're going to spend development cycles building our own?
It seems to me that the hardest part of going for a MQ/distributed log like Kafka is re-working existing code to now handle the lack of ACID stuff. Things that are trivial with Postgres, like exactly once delivery, are huge undertakings without ACID.
Personally, I don't have much experience with this, so maybe I'm just missing something?
ACID is an aspirational ideal - not something that 'just works' if you have a database that calls itself ACID. What ACID promises is essentially "single-threaded thinking will work in a multi-threaded environment."
Here's a list of ways it falls short:
1) Settings: Postgres is 'Read Committed' by default (which is not quite full Isolation). You could change this, but you might not like the resulting performance drop, (and it's probably not up to you unless you're the company DBA or something.)
2) ACID=Single-node-only. Maybe some of the big players (Google?) have worked around this (Spanner?), but for your use case, the scope of a (correct) ACID transaction is essentially what you can stuff into a single SQL string and hand to a single database. It won't span to the frontend, or any partners you have, REST calls, etc. It's definitely useful to be able to make your single node transition all-or-nothing from valid state to valid state, but you still have all your distributed thinking to do (Two generals, CAP, exactly-once-delivery, idempotency, etc.) without help from ACID.
3) You can easily break ACID at the programming language level. Let's say you intend to add 10 to a row. If you do a SELECT, add 10 to the result, and then do an update, your transaction won't do what you intended. If the value was 3 when you read it, all the database will see is you setting the value to 13. I don't know whether the db will throw an exception, or retry writing 13, but neither of those is 'just increment by 10'.
The reason I use Kafka is because it actually helps with distributed systems. We can't beat CAP, but if we want to have AP, we can at least have some 'eventual consistency', that is, your services won't be in exact lockstep from valid-state to valid-state (as a group), but if you give them the same facts, then they can at least end up in the same state. And that's what Kafka's for: you append facts onto it, then each service (which may in fact have its own ACID DB!) can move from valid-state to valid-state (even if external observers can see that one service is ahead of another one).
"2. The other camp chases common sense
This camp is far more pragmatic. They strip away unnecessary complexity and steer clear of overengineered solutions. They reason from first principles before making technology choices. They resist marketing hype and approach vendor claims with healthy skepticism."
We should definitely apply Occam's razor as the industry far more often; simple tech stacks are better to manage and especially master (which you must do, once it's no longer a toy app). Introduce a new component into your system only if it provides functionality you cannot get with reasonable effort, using what you already have.
ease of use. in ruby If I want to use kafka I can use karafka. or redis streams via the redis library. likewise if kafka is too complex to run there's countless alternatives which work as well - hell even 0mq with client libraries.
now with the postgres version I have to write my own stuff which I might not where it's gonna lead me.
postgres is scalable, no one doubts that. but what people forget to mention is the ecosystem around certain tools.
There seems to be two planes of ease of use - the app layer (library) and the infra layer (hosting).
The app layer for Postgres is still in development, so if you currently want to run pub-sub (Kafka) on it, it will be extra work to develop that abstraction.
I hope somebody creates such a library. It's a one-time cost but then will make it easier for everybody.
https://github.com/dhamaniasad/awesome-postgres
There is at least a Python example here.
It is significantly more work for the client side implementation of event log consumers, which the article also talk about. For instance persisting the client side cursors. And I have not seen widely used standard implementations of those. (I started one myself once but didn't finish.)
But I think an important point to those in camp 2 (the good guys in TFA's narrative) is to use tools for problems they were designed to solve. Postgres was not designed to be a pub-sub tool. Kafka was. Don't try to build your own pub-sub solution on top of Postgres, just use one of the products that was built for that job.
Another distressing trend I see is for every product to try to be everything to everyone. I do not need that. I just need your product to do it's one thing very well, and then I will use a different product for a different thing I need.
There are many reasons for this: most software developers have more than a vague idea about its underlying concepts, most error messages are clear, the documentation is superb, there are many ways to tap into the vast knowledge of a huge and growing community...
Postgres really is a startup's best friend most of the time. Building a new product that's going to deal with a good bit of reporting that I began to look at OLAP DBs for, but had hesitation to leave PG for it. This kind of seals it for me (and of course the reference to the class "Just Use Postgres for Everything" post helps) that I should Just Use Postgres (R).
On top of being easy to host and already being familiar with it, the resources out there for something like PG are near endless. Plus the team working on it is doing constant good work to make it even more impressive.
I've been heads-down building a scheduling tool, and the number of times I've had to talk myself out of over-engineering is embarrassing. "Should I use Kafka for event streaming?" No. "Do I need microservices?" Probably not. "Can Postgres handle this?" Almost certainly yes.
The real skill is knowing when you've actually outgrown something vs. when you're just pattern-matching what Big Tech does. Most products never get to the scale where these distinctions matter—but they DO die from complexity-induced paralysis.
What's been your experience with that inflection point where you actually needed to graduate to more complex tooling? How did you know it was time?
For a medium to large organization with independent programs that need to talk to each other, Kafka provides an essential capability that would be much slower and higher risk with Postgres.
Standardizing the flow of information across an organization is difficult. Kafka is crucial for that. To achieve that in Postgres would require either a shared database which is inherently risky or would require a customized API for access which introduces another layer of performance bottleneck and build/maintenance cost and decreases development productivity/performance. So you have a double whammy of performance degradation with an API. And for multiple consumers operating against the same events (for example: write to storage, perform action, send to data lake), with a database you need a magnitude more access, so N*X with N being the number of consumers multiplied by the query to consume. With three consumers you're tripling your database queries, which adds up fast across topics. Now you need to start fixing indexes and creating views and other workload to keep performance optimal. And at some point you're just poorly recreating Kafka in a database.
The common denominator in every "which is better" debate is always use case. This article seems like it would primariy apply to small organizations or limited consumer need. And yea, at that point why are you using events in the first place? Use a single API or database and be done with it. This is where the buzzword thing is relevant. If you're using Kafka for your single team, single database, small organization, it's overkill.
Side note: Someone mentioned Postgres as an audit log. Oh god. Done it. It was a nightmare. Ended up migrating to pub/sub with long-term storage in Mongo. which solved significant performance issues. Audit log is inheritently write once read many. There is no advantage to storing in a relational database.
Performance isn't a big deal for me. I had assumed that Kafka would give me things like decoupling, retry, dead-lettering, logging, schema validation, schema versioning, exactly once processing.
I like Postgres, and obviously I can write a queue ontop of it, but it seems like quite a lot of effort?
If you plan on retaining your topics indefinitely, schema evolution can become painful since you can't update existing records. Changing the number of partitions in a topic is also painful, and choosing the number initially is a difficult choice. You might want to build your own infrastructure for rewriting a topic and directing new writes to the new topic without duplication.
Kafka isn't really a replacement for a database or anything high-level like a ledger. It's really a replicated log, which is a low-level primitive that will take significant work to build into something else.
I need a durable queue but not indefinitely. Max a couple of hours.
What I want is Google PubSub but open source so I can self host.
Larger, RabbitMQ can handle some pretty good workloads.
MQTT -> Redpanda (for message logs and replay, etc) -> Postgres/Timescaledb (for data) + S3 (for archive)
(and possibly Flink/RisingWave/Arroyo somewhere in order to do some alerting/incrementally updated materialized views/ etc)
this seems "simple enough" (but I don't have any experience with Redpanda) but is indeed one more moving part compared to MQTT -> Postgres (as a queue) -> Postgres/Timescaledb + S3
Questions:
1. my "fear" would be that if I use the same Postgres for the queue and for my business database, the "message ingestion" part could block the "business" part sometimes (locks, etc)? Also perhaps when I want to update the schema of my database and not "stop" the inflow of messages, not sure if this would be easy?
2. also that since it would write messages in the queue and then delete them, there would be a lot of GC/Vacuuming to do, compared to my business database which is mostly append-only?
3. and if I split the "Postgres queue" from "Postgres database" as two different processes, of course I have "one less tech to learn", but I still have to get used to pgmq, integrate it, etc, is that really much easier than adding Redpanda?
4. I guess most Postgres queues are also "simple" and don't provide "fanout" for multiple things (eg I want to take one of my IoT message, clean it up, store it in my timescaledb, and also archive it to S3, and also run an alert detector on it, etc)
What would be the recommendation?
note that kafka has recently started investing into 'queues' in KIP-932, but they're still a long way off from implementing all of those features.
It can also be used in production. You do not have to build a distributed Pulsar cluster immediately. I have multiple projects running on a standalone Pulsar cluster, because its easy to setup and requires almost no maintenance. Doing it that way makes compliance requirements for isolation simpler and with less fights. Everyone understands host/vm isolation, few understand Pulsar Tenant isolation.
If you want a distributed Apache Pulsar cluster, be prepared to work for that. We run a cluster on bare metal. We considered Kubernetes, but performance was lacking. We are not Kubernetes experts.
Event-sourcing is when you buy something and get a receipt, you go stick it in a shoe-box for tax time.
A queue is you get given receipts, and you look at them in the correct order before throwing each one away.
I think my system is sort of both. I want to put some events in a queue for a finite set of time, process them as a single consolidated set, and then drop them all from the queue.
If you don't need a lot of perf but you place a premium on ergonomics and correctness, this sounds more like you want a workflow engine? https://github.com/meirwah/awesome-workflow-engines
I'm checking out the list.
The bigger question to ask is: will this storage engine be used to persist and retain data forever (like a database) or will it be used more for temporary transit of data from one spot to another.
It’s mostly registering the Postgres database functions which is one time.
There are also pre-made Postgres extensions that already run the queue.
These days i would like consider m starting with Supabase self hosted which has the Postgres ready to tweak.
First, I frequently use Celery and Celery doesn't support using Postgres as a broker. It seems like it should, but I guess no one has stepped up to write that. So, when I use Celery, I end up also using Redis or RabbitMQ.
Second, if I need mqtt clients coming in from the internet at large, I don't feel comfortable exposing Postgres to that. Also, I'd rather use the mqtt ecosystem of libraries rather than having all of those devices talk Postgres directly.
Third, sometimes I want a size constrained memory only database or a database that automatically expires untouched records, and for either of those I usually use Redis. For these two tasks I use Redis. I imagine that it would be worth making a reusable set of stored procedures to accomplish the auto-expiring of unused records, but I haven't implemented it. I have no idea how to make Postgres be memory memory only with a constrained memory side.
PGQueuer is a small Python library that turns Postgres into a durable job queue using the same primitives discussed here — `FOR UPDATE SKIP LOCKED` for safe concurrent dequeue and `LISTEN/NOTIFY` to wake workers without tight polling. It’s for background jobs (not a Kafka replacement), and it shines when your app already depends on Postgres.
Nice-to-haves without extra infra: per-entrypoint concurrency limits, retries/backoff, scheduling (cron-like), graceful shutdown, simple CLI install/migrations. If/when you truly outgrow it, you can move to Kafka with a clearer picture of your needs.
Repo: https://github.com/janbjorge/pgqueuer
Disclosure: I maintain PGQueuer.
Or just use NATS for queues and pubsub - dead simple, can embed in your Go app and does much more than Kafka
Plug: If you need granular, durable streams in a serverless context, check out s2.dev
Could you see using the s2.dev protocol on top of services using SQL in the way of the article, assigning event sequence numbers, as a good fit? Or is s2 fundamentally the component that assigns event numbers?
I feel like we tried to do something similar to you, but for SQL DBs, but am not sure:
I think kafka makes easy to create an event driven architecture. This is particularly useful when you have many teams. They are properly isolated from each other.
And with many teams, another problem comes, there's no guarantee that queries are gonna be properly written, then postgres' performance may be hindered.
Given this, I think using Kafka in companies with many teams can be useful, even if the data they move is not insanely big.
Lots of us that built systems when SQL was the only option, know that doesn’t hold overtime.
SStable backed systems have their applications, and I have never seen dedicated Kafka teams like we used to have with DBAs
We have the tools to make decisions based on real tradeoffs.
I highly recommend people dig into the appropriate tools to select vs making pre-selected products fit an unknown problem domain.
Tools are tactics, not strategies, tactics should be changeable with the strategic needs.
It's a complex piece of software that solves a complex problem, but there's many trade-offs, so only use it when you need to.
You see this a lot in the tech world. "Why would you use Python, Python is slow" (objectively true, but does it matter for your high-value SaaS that gets 20 logins per day?)
I think a bigger issue is the DBMS themselves getting feature after feature and becoming bloated and unfocused. Add the thing to Postgres because it is convenient! At least Postgres has a decent plugin approach. But I think more use cases might be served by standalone products than by add-ons.
Your DB on the other hand is usually a well-understood part of your system, and while scaling issues like that can cause problems, they're often fairly easy to predict- just unfortunate on timing. This means that while they'll disrupt, they're usually solved quickly, which you can't always say for additional systems.
I love these articles.
> The other camp chases common sense
It is never too late to inject some tribalism into any discussion.
> Trend 1 - the “Small Data” movement.
404
Just perfect.
> Most of the time - yes. You should always default to Postgres until the constraints prove you wrong.
Interesting.
I've also been by my seniors that I should go with PostgreSQL by default unless I have a good justification not to.
> ...
> The other camp chases common sense
I don't really like these simplifications. Like one group obviously isn't just dumb, they're doing things for reasons you maybe don't understand. I don't know enough about data science to make a call, but I'm guessing there were reasons to use Kafka due to current hardware limits or scalability concerns, and while the issues may not be as present today that doesn't mean they used Kafka just because they heard a new word and wanted to repeat it.
  sup {
      position: relative;
      top: -0.4em;
      line-height: 0;
      vertical-align: baseline;
  }This is a simplistic take. Kafka isn't just about scale, it, like other messaging systems provide queue/streaming semantics for applications. Sure you can roll your own queue on a database for small use cases, but it adds complexity to the lives of developers. You can offload the burden of running Kafka by choosing a Kafka-as-a-service vendor, but you can't offload the additional work of the developer that comes from using a database as a queue.
To be honest, there isn't a large burden in running Kafka when it's 500 KB/s. The system is so underutilized there's nothing to cause issues with it. But regardless, the organizational burden persists. As the piece mentions - "Managed SaaS offerings trade off some of the organizational overhead for greater financial costs - but they still don’t remove it all.". Some of the burden continues to exist even if a vendor hosts the servers for you. The API needs to be adopted, the clients have many configs, concepts like consumer groups need to be understood, the vendor has its own UI, etc.
The Kafka API isn't exactly the simplest. I wouldn't recommend people write the pub-sub-on-postgres SQL themselves - a library should abstract it away. What is the complexity being added from a library with a simple API? Regardless if that library is based on top of Postgres, Kafka or another system - precisely what complexity is added to the lives of developers?
I really don't see any complexity existing at this miniscule scale, neither at the app developer layer or the infra operator layer. But of course, I haven't run this in production so I could be wrong.
Just use Kafka. Even if you don't need speed or scalability, it's reliable, resilient, simple and well-factored, and gives you far fewer opportunities to architect your system wrong and paint yourself into a corner than Postgres does.
task: msg queue
software: kafka
hardware: m7i.xlarge (vCPUs: 4 Memory: 16 GiB)
payload: 2kb / msg
possible performance: ### - #### msgs / second
etc…
So many times I've found myself wondering: is this thing behaving within an order of magnitude of a correctly setup version so that I can decide whether I should leave it alone or spend more time on it.
It's built on pgmq and not married to supabase (nearly everything is in the database).
Postgres is enough.
Use the tool which is appropriate for the job, it is trivial to write code to use them with LLMs these days and these software are mature enough to rarely cause problems and tools built for a purpose will always be more performant.
I had fun working with the schema registy from TypeScript.
On the major projects I worked on, we were "instructed" to use Kafka for, I guess, internal political reasons. They already had Hadoop solutions that more or less worked, but the code was written by idiots in "Spark/Scala" (their favorite buzzword to act all high and mighty) and that code had zero tests (it was truly a "test in prod" situation there). The Hadoop system was managed by people who would parcel out compute resources politically, as in, their friends got all they wanted while everyone else got basically none. This was a major S&P company, Fortune 10, and the internal politics were abusive to say the least.
MQTT -> Redpanda (for message logs and replay, etc) -> Postgres/Timescaledb (for data) + S3 (for archive)
(and possibly Flink/RisingWave/Arroyo somewhere in order to do some alerting/incrementally updated materialized views/ etc)
this seems "simple enough" (but I don't have any experience with Redpanda) but is indeed one more moving part compared to MQTT -> Postgres (as a queue) -> Postgres/Timescaledb + S3
Questions:
1. my "fear" would be that if I use the same Postgres for the queue and for my business database, the "message ingestion" part could block the "business" part sometimes (locks, etc)? Also perhaps when I want to update the schema of my database and not "stop" the inflow of messages, not sure if this would be easy?
2. also that since it would write messages in the queue and then delete them, there would be a lot of GC/Vacuuming to do, compared to my business database which is mostly append-only?
3. and if I split the "Postgres queue" from "Postgres database" as two different processes, of course I have "one less tech to learn", but I still have to get used to pgmq, integrate it, etc, is that really much easier than adding Redpanda?
4. I guess most Postgres queues are also "simple" and don't provide "fanout" for multiple things (eg I want to take one of my IoT message, clean it up, store it in my timescaledb, and also archive it to S3, and also run an alert detector on it, etc)
What would be the recommendation?
The fact that there is no common library that implements the authors strategy is a good sign that there is not much demand for this.
Kafka, GraphQL... These are the two technology's where my first question is always this: Does the person who championed/lead this project still work here?
The answer is almost always "no, they got a new job after we launched".
Resume Architecture is a real thing. Meanwhile the people left behind have to deal with a monster...
Instead of not knowing 1 thing to launch.. let’s pick as many new to us things, that will increase the chances of success.
Personally I’d expect some kind of internal interface to abstract away and develop reusable components for such an external dependency, which readily enables having relational data stores mirroring the brokers functionality. Handy for testing and some specific local scenarios, and those database backed stores can easily pull from the main cluster(s) later to mirror data as needed.
is there some reason GraphQL gets so much hate? it always feels to me like it's mostly just a normal RPC system but with some incredibly useful features (pipelining, and super easy to not request data you don't need), with obvious perf issues in code and obvious room for perf abuse because it's easy to allow callers to do N+1 nonsense.
so I can see why it's not popular to get stuck with for public APIs unless you have infinite money, it's relatively wide open for abuse, but private seems pretty useful because you can just smack the people abusing it. or is it more due to specific frameworks being frustrating, or stuff like costly parsing and serialization and difficult validation?
which I fully understand is more work than "it's super easy just X" which it gets presented as, but that's always the cost of super flexible things. does graphql (or the ecosystem, as that's part of daily life of using it) make that substantially worse somehow? because I've dealt with people using protobuf to avoid graphql, then trying to reimplement parts of its features, and the resulting API is always an utter abomination.
And yes, you don't want to use it for public APIs. But if you have private APIs that are so complex that you need a query language, and still want use those over web services, you are very likely doing something really wrong.
"check that the user matches the data they're requesting by comparing the context and request field by hand" is ultra common - there are some real benefits to having authorization baked into the language, but it seems very rare in practice (which is part of why it's often flawed, but following the overwhelming standard is hardly graphql's mistake imo). I'd personally think capabilities are a better model for this, but that seems likely pretty easy to chain along via headers?
The problem is that GraphQL doesn't behave like all other general purpose RPC systems. As a rule, authorization does not work on the same abstraction level as GraphQL.
And that explanation you quoted is disingenuous, because GraphQL middleware and libraries don't usually export places where you can do anything by hand.
They’ll be fine if you made something that works, even if it was a bit faddish, make sure you take care of yourself along the way (they won’t)
My point stands, company loyalty tallies up to very little when you’re looking for your next job; no interviewer will care much to hear of how you stood firm, and ignored the siren song of tech and practices that were more modern than the one you were handed down (the tech and practices they’re hiring for).
The moment that reverses, I will start advising people not to skill up, as it will look bad in their resumes.
https://github.com/mongomock/mongomock Extrapolating from my personal usage of this library to others, I'm starting to think that mongodb's 25 billion dollar valuation is partially based on this open source package :)
Otherwise SQLITE :)