That being said, I regret that we have switched from good_job (https://github.com/bensheldon/good_job). The thing is - Basecamp is a MySQL shop and their policy is not to accept RDMS engine specific queries. You can see in their issues in Github that they try to stick "universal" SQL and are personally mostly concerned how it performs in MySQL(https://github.com/rails/solid_queue/issues/567#issuecomment... , https://github.com/rails/solid_queue/issues/508#issuecomment...). They also still have no support for batch jobs: https://github.com/rails/solid_queue/pull/142 .
I've also run into issues where a db connection pool is filled up and solid queue silently fails. No error or logs, just stops polling forever until manual restart. Far from ideal.
But, I can live with it. I am going for minimal maintenance, and the ability to run solid queue under puma inside rails on cloud run is just so easy. Having ~3 solid queue related issues a year is acceptable for my use case, but that doesn't mean it will be ok for others.
I am (and have been for a while, not in a hurry) considering them each as a move off resque.
The main blocker for me with GoodJob is that it uses certain pg-specific features in a way that makes it incompatible with transaction-mode in pgbounder -- that is, it requires persistent sessions. Which is annoying, and is done to get some upper-end performance improvements that I don't think matter for my or most scales. Otherwise, I much prefer GoodJob's development model, trust the maintainer's judgement more, find the code more readable, etc. -- but that's a big But for me.
Why? Is it so they can switch in future?
That's largely the case.
Rails provide an abstracted API for jobs (Active Job). Of course some application do depend on queue implementation specific features, but for the general case, you just need to update your config to switch over (and of course handle draining the old queue).
Losing that guarantee can make the eventual migration harder, even if that migration is to a different postgres instance than the primary db.
Using the database as a queue, you no longer need to setup transaction triggers to fire your tasks, you can have atomic guarantees that the data and the task were created successfully, or nothing was created.
> it shouldn’t be the same as the production database
This is highly dependent on the application (scale, usage, phase of lifecycle, etc.)
To get the benefits of transactional enqueueing you generally need to commit the jobs transactionally with other database changes. https://riverqueue.com/docs/transactional-enqueueing
It does not scale forever, and as you grow in throughput and job table size you will probably need to do some tuning to keep things running smoothly. But after the amount of time I've spent in my career tracking down those numerous distributed systems issues arising from a non-transactional queue, I've come to believe this model is the right starting point for the vast majority of applications. That's especially true given how high the performance ceiling is on newer / more modern job queues and hardware relative to where things were 10+ years ago.
If you are lucky enough to grow into the range of many thousands of jobs per second then you can start thinking about putting in all that extra work to build a robust multi-datastore queueing system, or even just move specific high-volume jobs into a dedicated system. Most apps will never hit this point, but if you do you'll have deferred a ton of complexity and pain until it's truly justified.
Why is that?
https://status.circleci.com/incidents/hr0mm9xmm3x6
and a good analysis by a flicker engineer who ran into similar issues
https://blog.mihasya.com/2015/07/19/thoughts-evoked-by-circl...
https://github.com/tobi/delayed_job
Shopify however grew (as many others) and we saw a host of blog posts and talks about moving away from DB queues to Redis, RabbitMQ, Kafka etc. We saw posts about moving from Resque to SideKiq etc. All this to day storing a task queue in the db has always been the naive approach. Engineers absolutely shouldn't be shocked that approach isn't viable at higher workloads.
If your task is to send an email, do you want to send it again? Probably not.
In fact, I'd rather it did happen at the same time as production, so I don't have to reconcile a bunch of data on top of the tasks.
Personally, I prefer the same db unless I were at a traffic scale where splitting them is necessary for load.
One advantage of same db is you can use db transaction control over enqueing jobs and app logic too, when they are dependent. But that's not the main advantage to me, I don't actually need that. I just prefer the simplicity, and as someone else said above, prefer not having to reconcile app db state with queue state if they are separate and only ONE goes down. Fewer moving parts are better in the apps I work on which are relatively small-scale, often "enterprise", etc.
- No reason to switch to SolidQueue or GoodJob if you have no issue with Sidekiq. Only do it if you want to remove the Redis infra, no other big benefits other than that imo. - For new projects, I might be more biased towards GoodJob. They're more matured, great community and have more features. - One thing I don't like about SolidQueue is the lack of solid UI. Compared to GoodJob or Sidekiq, it's pretty basic. When I tried it last time, the main page would hang due to unoptimized indexes. Only happens when your data reaches certain threshold. Might have been fixed though.
Another consideration with using RDBMS instead of Redis is that you might need to allocate proper connection pool now. Depends on your database setup. It's nothing big, but that's one additional "cost" as you never really had to consider when you're using Redis.
https://oban.pro/articles/one-million-jobs-a-minute-with-oba...
None of that means Oban or similar queues don't/can't scale—it just means a high volume of NOTIFY doesn't scale, hence the alternative notifiers and the fact that most of its job processing doesn't depend on notifications at all.
There are other reasons Oban recommends a different notifier per the doc link above:
> That keeps notifications out of the db, reduces total queries, and allows larger messages, with the tradeoff that notifications from within a database transaction may be sent even if the transaction is rolled back
You started your comment with that
- this is achieved by queuing batches of 5000 jobs, so on the queue side this is actually not 1 million TPS, but rather 200 TPS. I've never seen any significant batching of background job creation.
- the dispatch is also batched to a few hundred TPS (5ms ... 2ms).
- acknowledgements are also batched.
So instead of the ~50-100k TPS that you would expect to get to 17k jobs/sec, this is probably performing just a few hundred transactions per second on the SQL side. Correspondingly, if you don't batch everything (job submission, acking; dispatch is reasonable), throughput likely drops to that level, which is much more in line with expectations.
Semantically this benchmark is much closer to queuing and running 200 invocations of a "for i in range(5000)" loop in under a minute, which most would expect virtually any DB to handle (even SQLite).
Databases are pretty good at quickly adding and removing lots of rows. But even if you can keep up with churning through 1000 rows/second, with batching or whatever, you still need to replicate 1000 rows/second do your failover nodes.
That’s the big win for queues over a relational db here: queues have ways to efficiently replicate without copying the entire work queue across instances.
They spend some time explaining how to tune the job runners to double the 17k jobs/s. The article is kind of old, Elixir 1.14 was a while ago, and it is basically a write-up on how they managed a bit of performance increase by using new features of this language version.
We still keep rate limiters in Redis though, it would be pretty easy for some scanner to overload the DB if every rogue request would need a round trip to the DB before being processed. Because we only store ephemeral data in Redis it does not need backups.
How much latency could you really be saving versus introducing complexity?
But I am not a storage/backend engineer, so maybe I don't understand the target use of Redis.
Redis also scales horizontally much, much easier because of the lack of relational schemas. Keys can be owned by a node without any consensus within the cluster beyond which node owns the key. Distributed SQL needs consensus around things like "does the record this foreign key references exist?", which also has to take into account other updates occurring simultaneously.
It's why you see something like Redis caching DB queries pretty often. It's way, way easier to make your Redis cluster 100x as fast than it is to make your DB 100x as fast. I think it's also cheaper in terms of hardware, but I haven't done much beyond napkin math to validate that.
We use it to broadcast messages across horizontally scaled services.
Works fine, probably a better tool out there for the job with better delivery guarantees, but the decision was taken many years ago, and no point in changing something that just works.
It's also language agnostic, which really helps.
We use ElasticCache (Valkey i suppose), so most of the articles points are moot for our use.
Were we to implement it from scratch today, we might look for better delivery guarantees, or we might just use what we already know works.
I've benchmarked Redis (Sidekiq), Postgres (using GoodJob) and SQLite (SolidQueue), Redis beats everything else for the above usecase.
SolidQueue backed by SQLite may be good when you are just passing around primary keys. I still wonder if you can have a lot of workers polling from the same database and update the queue with the job status. I've done something similar in the past using SQLite for some personal work and it is easy to hit the wall even with 10 or so workers.
If I understand what you're saying, is that you'll instead of doing:
- Create job with payload (maybe big) > Put in queue > Let worker take from queue > Done
You're suggesting:
- Create job with ID of payload (stored elsewhere) > Put in queue > Let worker take from queue, then resolve ID to the data needed for processing > Done
Is that more or less what you mean? I can definitively see use cases for both, heavily depends on the situation, but more indirection isn't always better, nor isn't big payloads always OK.
- Persist payload in db > Queue with id > Process via worker.
Push the payload directly to queue can be tricky. Any queue system usually will have limits on the payload size, for good reasons. Plus if you already commit to db, you can guarantee the data is not lost and can be process again however you want later. But if your queue is having issue, or it failed to queue, you might lost it forever.
Is that how microservice messages work? They push the whole data so the other systems can consume it and take it from there?
Me too, I was just wondering if you have any real world examples of a project with a large payload.
Shouldn't one be using a storage system such as S3/garage with ephemeral settings and/or clean-up triggers after job-end ? I get the appeal of using one-system-for-everything but won't you need a storage system anyway for other parts of your system ?
Have you written up somewhere about your benchmarks and where the cutoffs are (payload size / throughput / latency) ?
Reminds me of Antirez blog post that when Redis is configured for durability it becomes like/slower than postgresql http://oldblog.antirez.com/post/redis-persistence-demystifie...
Also, Antirez has always been very opinionated on not comparing or benchmarking Redis against other dbs for a decade.
From TFA. Are there really people using Rails for HFT?
https://nanovms.com/dev/tutorials/running-postgres-as-a-unik...
mysql less maintenance + more performant
The MySQL + Redis + AWS' elasti-cron (or whatever) was a ghetto compared to Postgres.
We ran into some serious issues in high throughput scenarios (~2k jobs/min currently, and ~5k job/min during peak hours) and switched to Redis+BullMQ and have never looked back ever since. Our bottleneck was Postgres performance.
I wonder if SolidQueue runs into similar issues during high load, high throughput scenarios...
Yes, PG can theoretically handle just about anything with the right configuration, schema, architecture, etc.
Finding that right configuration is not trivial. Even dedicated frameworks like Graphile struggle with it.
My startup had the exact same struggles with PG and did the same migration to BullMQ bc we were sick of fiddling with it instead of solving business problems. We are very glad we migrated off of PG for our work queues.
Curious about your experience with SolidQueue's reliability - have you run into any edge cases or issues with job retries/failures? Redis has been battle-tested for so long that switching always feels risky.
Would love to hear more about your production experience after a few months!
When all we are talking about is "good enough" the bar is set at a whole different level.
To be clear, I think the most important thing is understanding the performance characteristics of each technology enough that you can make good choices for your particular scenario.
I'm not a fan boy of DHH but I really like his critical thinking about the status quo. I'm not able to leave the cloud or I better phrase it as it's too comfortable right now. I really wanted to leave redis behind me as it's mostly a hidden part of Rails nothing I use directly but often I have to pay for it "in the cloud"
I quickly hit an issue with the family of Solid features: Documentation doesn't really cover the case "inside your existing application" (at least when I looked into it shortly after Rails 8 was released). Being in the cloud (render.com, fly.io and friends) I had to create multiple DBs, one for each Solid feature. That was not acceptable as you usually pay per service/DB not per usage - similar how you have to pay for Redis.
This was a great motivation to research the cloud space once again and then I found Railway. You pay per usage. So I've right now multiple DBs, one for each Solid feature. And on top multiple environments multiplying those DBs and I pay like cents for that part of the app while it's not really filled. Of course in this setup I would also pay cents for Redis but it's still good to see a less complex landscape in my deployment environment.
Long story short, while try to integrate SolidQueue myself I found Railway. Deployment are fun again with that! Maybe that helps someone today as well.
> Deploy, version, patch, and monitor the server software
And with PostgreSQL you don't need it?
> Configure a persistence strategy. Do you choose RDB snapshots, AOF logs, or both?
It's a one-time decision. You don't need to do it daily.
> Sustain network connectivity, including firewall rules, between Rails and Redis
And for a PostgreSQL DB you don't need it?
> Authenticate your Redis clients
And your PostgreSQL works without that?
> Build and care for a high availability (HA) Redis cluster
If you want a cluster of PostgreSQL databases, perhaps you will do that too.
For caching, though, I wouldn’t drop Redis so fast. As a in-memory cache, the ops overhead of running Redis is a lot lower. You can even ignore HA for most use cases.
Source: I helped design and run a multi-tiered Redis caching architecture for a Rails-based SaaS serving millions of daily users, coordinating shared data across hundreds of database clusters and thousands of app servers across a dozen AWS regions, with separate per-host, per-cluster, per-region, and global cache layers.
We used Postgres for the job queues, though. Entirely separate from the primary app DBs.
One could go one step further and say an RDBMS is fundamentally the wrong storage system for a job queue when you have a persistent, purpose-built message queue handy.
Honestly, for most people, I'd recommend they just use their cloud provider's native message queue offering. On AWS, SQS is cheap, reliable, easy to start with, and gives you plenty of room to grow. GCP PubSub and Azure Storage Queues are probably similar in these regards.
Unless managing queues is your business, I wouldn't make it your problem. Hand that undifferentiated heavy lifting off.
Especially when building new and unproven applications I'm always looking for things that trade the time I need to set tings up properly with he time I need to BUILD THE ACTUAL PRODUCT. Therefore I really like the recent changes to the Ruby on Rails ecosystem very much.
What we need is a larger user base setting everything up and discovering edge-cases and (!) writing about it (AND notifying the people around Rails). The more experience and knowledge there is, the better the tooling becomes. The happy path needs to become as broad as a road!
Like Kamal, at first only used by 36signals and now used by them and me. :D At least, of course.
Kudos!
Best, Steviee