I’m leaving Redis for SolidQueue

https://www.simplethread.com/redis-solidqueue/

315•amalinovic•3w ago

Comments

jjgreen•3w ago

Nice article, I'm just productionising a Rails 8 app and was wondering whether it was worth switching from SolidQueue (which has given me no stress in dev) to Redis ... maybe not.

michaelbuckbee•3w ago

Unless you hit a performance wall with Postgres or absolutely need Batch capability you've probably got a very large runway with SolidQueue.

antirez•3w ago

Every time some production environment can be simplified, it is good news in my opinion. The ideal situation with Rails would be if there is a simple way to switch back to Redis, so that you can start simple, and as soon as you hit some fundamental issue with using SolidQueue (mostly scalability, I guess, in environments where the queue is truly stressed -- and you don't want to have a Postgres scalability problem because of your queue), you have a simple upgrade path. But I bet a lot of Rails apps don't have high volumes, and having to maintain two systems can be just more complexity.

yawboakye•3w ago

the problem i see here is that we end up treating the background job/task processor as part of the production system (e.g. the server that responds to requests, in the case of a web application) instead of a separate standalone thing. rails doesn’t make this distinction clear enough. it’s okay to back your tasks processor with a pg database (e.g. river[0]) but, as you indirectly pointed out, it shouldn’t be the same as the production database. this is why redis was preferred anyways: it was a lightweight database for the task processor to store state, etc. there’s still great arguments in favor of this setup. from what i’ve seen so far, solidqueue doesn’t make this separation.

[0]: https://riverqueue.com/

andrewstuart•3w ago

It’s not necessary to separate queue db from application db.

yawboakye•3w ago

got it. is it necessary, then, to couple queue db with app db? if answer is no then we can’t make a necessity argument here, unfortunately.

nick__m•3w ago

Frequently you have to couple the transactional state of the queue db and the app db, colocating them is the simplest way to achieve that without resorting to distributed transactions or patterns that involve orchestrated compensation actions.

yawboakye•3w ago

that’s setting yourself up for trouble, imo. intermediate states solve this problem, and economically. for mature production system see temporal[0]. their magic sauce is good intermediate states.

[0]: https://temporal.io/

imtringued•3w ago

This feels like an ad because you explained absolutely nothing.

It's one thing to explain to people what a screw driver is and you just happen to sell screw drivers to people who might need them.

It's a wholly different thing to sell the screw driver first and then let people figure out why they need it.

yawboakye•3w ago

i’m not associated with temporal, nor does the link above have any referrer nonsense in there. i don’t profit from referring to it here. in fact it may well be a household name in the hn community. that out of the way, it’s not wrong to point to a proper resource that can explain and demonstrate my argument better than a couple of words could. temporal is open source[0] so maybe a github link would have been more palatable?

[0]: https://github.com/temporalio/temporal

jrochkind1•3w ago

solid_queue by default prefers you use a different db than app db, and will generate that out of the box (also by default with sqlite3, which, separate discussion) but makes it possible, and fairly smooth, to configure to use the same db.

Personally, I prefer the same db unless I were at a traffic scale where splitting them is necessary for load.

One advantage of same db is you can use db transaction control over enqueing jobs and app logic too, when they are dependent. But that's not the main advantage to me, I don't actually need that. I just prefer the simplicity, and as someone else said above, prefer not having to reconcile app db state with queue state if they are separate and only ONE goes down. Fewer moving parts are better in the apps I work on which are relatively small-scale, often "enterprise", etc.

erispoe•3w ago

> it shouldn’t be the same as the production database

Why is that?

zarzavat•3w ago

If you need to restore the production database do you also want to restore the task database?

If your task is to send an email, do you want to send it again? Probably not.

stavros•3w ago

It's not like I'll get a choice between the task database going down and not going down. If my task database goes down, I'm either losing jobs or duplicating jobs, and I have to pick which one I want. Whether the downtime is at the same time as the production database or not is irrelevant.

In fact, I'd rather it did happen at the same time as production, so I don't have to reconcile a bunch of data on top of the tasks.

zarzavat•3w ago

Right, I was referring to logical databases rather than the database server itself.

stavros•3w ago

But even for the logical databases, if I want to revert to an earlier state of the database, why wouldn't I want the tasks as well? If I have a bunch of update tasks in flight at that point, wouldn't I want them to actually run? They are a part of the overall state of the system.

gregors•3w ago

Here's an example from the circleci incident

https://status.circleci.com/incidents/hr0mm9xmm3x6

and a good analysis by a flicker engineer who ran into similar issues

https://blog.mihasya.com/2015/07/19/thoughts-evoked-by-circl...

davidw•3w ago

CircleCI and Flickr are both pretty big systems. There are tons of businesses that will never operate at that scale.

gregors•3w ago

I don't disagree with that call out. However, we've been through these discussions many times over the years. The solid queue of yesteryear was delayed_job which was originally created by Shopify's CEO.

https://github.com/tobi/delayed_job

Shopify however grew (as many others) and we saw a host of blog posts and talks about moving away from DB queues to Redis, RabbitMQ, Kafka etc. We saw posts about moving from Resque to SideKiq etc. All this to day storing a task queue in the db has always been the naive approach. Engineers absolutely shouldn't be shocked that approach isn't viable at higher workloads.

runako•3w ago

SolidQueue uses its own db configuration.

> it shouldn’t be the same as the production database

This is highly dependent on the application (scale, usage, phase of lifecycle, etc.)

bgentry•3w ago

Yeah, River generally recommends this pattern as well (River co-author here :)

To get the benefits of transactional enqueueing you generally need to commit the jobs transactionally with other database changes. https://riverqueue.com/docs/transactional-enqueueing

It does not scale forever, and as you grow in throughput and job table size you will probably need to do some tuning to keep things running smoothly. But after the amount of time I've spent in my career tracking down those numerous distributed systems issues arising from a non-transactional queue, I've come to believe this model is the right starting point for the vast majority of applications. That's especially true given how high the performance ceiling is on newer / more modern job queues and hardware relative to where things were 10+ years ago.

If you are lucky enough to grow into the range of many thousands of jobs per second then you can start thinking about putting in all that extra work to build a robust multi-datastore queueing system, or even just move specific high-volume jobs into a dedicated system. Most apps will never hit this point, but if you do you'll have deferred a ton of complexity and pain until it's truly justified.

yawboakye•3w ago

state machines to the rescue, ie i think the nature of asynchronous processing requires that we design for good/safe intermediate states.

watercolorblind•3w ago

The primary pain point I see here is if devs lean into transactions such that their job is only created together with the everything else that happened.

Losing that guarantee can make the eventual migration harder, even if that migration is to a different postgres instance than the primary db.

phn•3w ago

You can look at it both ways.

Using the database as a queue, you no longer need to setup transaction triggers to fire your tasks, you can have atomic guarantees that the data and the task were created successfully, or nothing was created.

byroot•3w ago

That's also something Rails helps abstract away by automatically deferring enqueues to after the transaction completed.

Even SolidQueue behave that way by default.

https://github.com/rails/rails/pull/51426

byroot•3w ago

> The ideal situation with Rails would be if there is a simple way to switch back to Redis

That's largely the case.

Rails provide an abstracted API for jobs (Active Job). Of course some application do depend on queue implementation specific features, but for the general case, you just need to update your config to switch over (and of course handle draining the old queue).

another_twist•3w ago

Since you're here - https://redis.io/docs/latest/operate/oss_and_stack/managemen...

In AOF mode does Redis write all changes to a WAL ? Is this paired with periodic snapshotting to prevent the log from growing too large ? Does this work in distributed mode or is this single node thing ?

dependency_2x•3w ago

Postgres will eat the world

loafoe•3w ago

Postgres will eat the world indeed. I'm just waiting for the pg_kernel extension so I can finally uninstall Linux :)

cientifico•3w ago

Looking for postgres unikernel seems like some people are trying seriously...

https://nanovms.com/dev/tutorials/running-postgres-as-a-unik...

pjmlp•3w ago

RDMS will eat the world.

Turns out it is a matter of feature set.

yawboakye•3w ago

schema migrations will save our careers! \o/

cies•3w ago

I use PGQM and PG_CRON now... Not looking back.

The MySQL + Redis + AWS' elasti-cron (or whatever) was a ghetto compared to Postgres.

saberd•3w ago

We use pgmq with the pgmq-go client, and it has clients in many different languages, it's amazing. The queues persist on disk and visualizations of queues can easily be made with grafana or just pure sql requests. The fact that the queues lives in the same database as all the other data is also a huge benefit if the 5-15ms time penalty is not an issue.

dzonga•3w ago

however mysql is easier to deal with - I say this as Postgres guy

mysql less maintenance + more performant

this_user•3w ago

At least until people - in a couple of years - figure out that the "Postgres for everything" fad was just as much of a bad idea as "MongoDB for everything" and "Put Redis into everything".

stavros•3w ago

It's not "Postgres for everything", it's "Postgres by default". Nobody is saying you should replace your billion-message-per-second Kafka cluster (or whatever) with Postgres, but plenty of people are saying "don't start with a Kafka cluster when you have two messages a day", which is a much better idea than "MongoDB by default".

ashniu123•3w ago

For Node.js, my startup used to use [Graphile Worker](https://github.com/graphile/worker) which utilised the same "SKIP LOCKED" mechanism under the hood.

We ran into some serious issues in high throughput scenarios (~2k jobs/min currently, and ~5k job/min during peak hours) and switched to Redis+BullMQ and have never looked back ever since. Our bottleneck was Postgres performance.

I wonder if SolidQueue runs into similar issues during high load, high throughput scenarios...

dns_snek•3w ago

Facing issues with 83 jobs per second (5k/min) sounds like an extreme misconfiguration. That's not high throughput at all and it shouldn't create any appreciable load on any database.

cle•3w ago

This comes up every time this conversation occurs.

Yes, PG can theoretically handle just about anything with the right configuration, schema, architecture, etc.

Finding that right configuration is not trivial. Even dedicated frameworks like Graphile struggle with it.

My startup had the exact same struggles with PG and did the same migration to BullMQ bc we were sick of fiddling with it instead of solving business problems. We are very glad we migrated off of PG for our work queues.

dns_snek•3w ago

The issue is that "83 per second" is multiple orders of magnitude off the expected level of performance on any RDBMS running on anything resembling modern hardware.

I haven't worked with Graphile but this just doesn't pass the sniff test unless those 83 jobs per second are somehow translating into thousands of write transactions per second.

Their documentation has a performance section with a benchmark that claims to process 10k jobs per second on a pretty modest machine, as an indication.

cle•3w ago

> The issue is that "83 per second" is multiple orders of magnitude off the expected level of performance on any RDBMS running on anything resembling modern hardware.

This is just not true, there are so many scenarios where 83/sec would be the limit. That number by itself is almost meaningless, similar to benchmarks which also make a bunch of assumptions about workloads and runtime environments.

As a simple example if your queue has a large backlog, you have a large worker fleet aggressively pulling work to minimize latency, your payloads are large, you have not optimized indexing, and/or you have many jobs scheduled for the future, every acquire can be an expensive table scan.

(This is a specific example because this is one of many failure scenarios I’ve encountered with Graphile that can cause your DB to meltdown. The same workload in Redis barely causes a blip in Redis CPU, without having to fiddle with indexes and auto vacuuming and worker backoffs.)

vjerancrnjak•3w ago

They probably did not batch. It’s realistic they will have issues if code is written to handle 1 job at a time and needs to make several roundtrips to the same db inside the same locking transaction.

Leases exist for a reason.

dns_snek•3w ago

> if code is written to handle 1 job at a time and needs to make several roundtrips to the same db inside the same locking transaction.

Do you mean the application code? The worker itself causing the bottleneck is definitely one possibility however if that were the case the issue wouldn't have resolved itself when they switched to a different job queue.

vjerancrnjak•3w ago

Well, you would no longer have thousands of open transactions maintaining the locks.

Of course, if q on top of psql is reasonably implemented (lease), what they say makes no sense.

victorbjorklund•3w ago

For people that does not think it scales. A similar implementation in Elixir is Oban. Their benchmark shows a million jobs per minute on a single node (and I am sure it could be increased further with more optimizations). I bet 99,99999% of apps have less than a million background jobs per minute.

https://oban.pro/articles/one-million-jobs-a-minute-with-oba...

formerly_proven•3w ago

This benchmark is probably as far removed from how applications use task queues as it could possibly be. The headline is "1 million jobs per minute", which is true. However...

- this is achieved by queuing batches of 5000 jobs, so on the queue side this is actually not 1 million TPS, but rather 200 TPS. I've never seen any significant batching of background job creation.

- the dispatch is also batched to a few hundred TPS (5ms ... 2ms).

- acknowledgements are also batched.

So instead of the ~50-100k TPS that you would expect to get to 17k jobs/sec, this is probably performing just a few hundred transactions per second on the SQL side. Correspondingly, if you don't batch everything (job submission, acking; dispatch is reasonable), throughput likely drops to that level, which is much more in line with expectations.

Semantically this benchmark is much closer to queuing and running 200 invocations of a "for i in range(5000)" loop in under a minute, which most would expect virtually any DB to handle (even SQLite).

uep•3w ago

This isn't my area, but wouldn't this still be quite effective if it automatically grouped and batched those jobs for you? At low throughput levels, it doesn't need giant batches, and could just timeout after a very short time, and submit smaller batches. At high throughput, they would be full batches. Either way, it seems like this would still serve the purpose, wouldn't it?

cess11•3w ago

The 5k batching is done when inserting the jobs into the database. It's not like they exert some special control over the performance of the database engine, and this isn't what they're trying to measure in the article.

They spend some time explaining how to tune the job runners to double the 17k jobs/s. The article is kind of old, Elixir 1.14 was a while ago, and it is basically a write-up on how they managed a bit of performance increase by using new features of this language version.

victorbjorklund•3w ago

Yes, all benchmarks lie. It's just like if you're seeing a benchmark about how many inserts Postgres can do. it's usually not based on reality because that's never how a real application looks like, but it's rather pointing out the maximum performance under perfect conditions, which you, of course, would never really have in reality. But again, I think that it's not about if you're reaching 20k or 50k or 100k jobs per second because if you're at that scale, yeah, you should probably look at other solutions. But again, most applications probably have less than a thousand jobs per second.

erikpukinskis•3w ago

Also worth noting that it’s often not single-node performance that caps throughput… it’s replication.

Databases are pretty good at quickly adding and removing lots of rows. But even if you can keep up with churning through 1000 rows/second, with batching or whatever, you still need to replicate 1000 rows/second do your failover nodes.

That’s the big win for queues over a relational db here: queues have ways to efficiently replicate without copying the entire work queue across instances.

parthdesai•3w ago

Funny you mention Oban, we do use it at work as well, and first thing Oban tells you is to either use Redis as a notifier or resort to polling for jobs and just not notify.

https://hexdocs.pm/oban/scaling.html

victorbjorklund•3w ago

I don't think that Oban is telling you to always use Redis. I think what they're saying is if you reach a certain scale where you're feeling the pain of the default notifier you could use Oban.Notifiers.PG as long as your application is running as a cluster. If you don't run it as a cluster, then you might have to reach for Redis. But then it's more about not running a cluster.

parthdesai•3w ago

> For people that does not think it scales

You started your comment with that

victorbjorklund•3w ago

It does scale. It is your companies choice not to cluster your application. Redis is not needed. It is a choice if you don’t want to cluster your app.

bgentry•3w ago

This is largely because LISTEN/NOTIFY has an implementation which uses a global lock. At high volume this obviously breaks down: https://www.recall.ai/blog/postgres-listen-notify-does-not-s...

None of that means Oban or similar queues don't/can't scale—it just means a high volume of NOTIFY doesn't scale, hence the alternative notifiers and the fact that most of its job processing doesn't depend on notifications at all.

There are other reasons Oban recommends a different notifier per the doc link above:

> That keeps notifications out of the db, reduces total queries, and allows larger messages, with the tradeoff that notifications from within a database transaction may be sent even if the transaction is rolled back

parthdesai•3w ago

> None of that means Oban or similar queues don't/can't scale—it just means a high volume of NOTIFY doesn't scale

Given the context of this post, it really does mean the same thing though?

bgentry•3w ago

No, I don't think so. Oban does not rely on a large volume of NOTIFY in order to process a large volume of jobs. The insert notifications are simply a latency optimization for lower volume environments, and for inserts can be fully disabled such that they're mainly used for control flow (canceling jobs, pausing queues, etc) and gossip among workers.

River for example also uses LISTEN/NOTIFY for some stuff, but we definitely do not emit a NOTIFY for every single job that's inserted; instead there's a debouncing setup where each client notifies at most once per fetch period, and you don't need notifications at all in order to process with extremely high throughput.

In short, the fact that high volume NOTIFY is a bottleneck does not mean these systems cannot scale, because they do not rely on a high volume of NOTIFY or even require it at all.

parthdesai•3w ago

Does River without any extra configuration run into scaling issues at a certain point? If the answer is yes, then River doesn’t scale without optimization (Redis/Clustering in Oban’s case).

While the root cause might not be River/Oban, them not being scalable still holds true. It’s of extra importance given the context of this post is moving away from redis and to strictly a database for a queue system.

victorbjorklund•3w ago

Yup. I wasn’t talking about notify in particular but about using Postgres in general.

solid_fuel•3w ago

Not quite, I used it at work too - the first thing that page suggests is using `Oban.Notifiers.PG` which uses distributed erlang's Process Group implementation, not Redis. You only really need Redis if you're not running with erlang clustering, but doing that rules out several other great elixir features.

jacob-s-son•3w ago

Every author of the free software obviously has rights to full control of the scope of their project.

That being said, I regret that we have switched from good_job (https://github.com/bensheldon/good_job). The thing is - Basecamp is a MySQL shop and their policy is not to accept RDMS engine specific queries. You can see in their issues in Github that they try to stick "universal" SQL and are personally mostly concerned how it performs in MySQL(https://github.com/rails/solid_queue/issues/567#issuecomment... , https://github.com/rails/solid_queue/issues/508#issuecomment...). They also still have no support for batch jobs: https://github.com/rails/solid_queue/pull/142 .

downsplat•3w ago

That sounds like the worst of possible worlds! At $WORK we're also on mysql, but I don't know what I would do without engine-specific queries. For one, on complex JOINs, mysql sometimes gets the query plan spectacularly wrong, and even if it doesn't now, you can't be sure it won't in the future. So for many important queries I put the tables in the intended order and add a STRAIGHT_JOIN to future-proof it and skip query planner complexity.

brightball•3w ago

Agreed. good_job is the ideal approach to a PG backed queue.

robertlagrant•3w ago

> their policy is not to accept RDMS engine specific queries

Why? Is it so they can switch in future?

cl0ckt0wer•3w ago

Then they don't have to troubleshoot advanced queries.

NARKOZ•3w ago

Earlier Rails avoided database specific features so apps could stay portable using only ActiveRecord. Since then Rails has added much better PostgreSQL support: JSON/JSONB, hstore, array columns, GIN/GiST indexes.

chasd00•3w ago

If you’re tied so tight to MySQL that you’re labeled a “MySQL shop” then it seems logical to use MySQL specific features. I must be missing something.

jrochkind1•3w ago

It's reasonable for basecamp, but the complaint of GP is that basecamp controls what is the Rails standard/default solution intended to be useful for multiple rdbms, without being willing to put rdbms-specific logic in rdbms-specific adapters.

kid64•3w ago

Ooh. That's a dealbreaker, ladies!

perfmode•3w ago

I thought I was the only one who remembers this one.

jrochkind1•3w ago

Can you be more specific about the issues you have run into that make you advise GoodJob over SolidQueue?

I am (and have been for a while, not in a hurry) considering them each as a move off resque.

The main blocker for me with GoodJob is that it uses certain pg-specific features in a way that makes it incompatible with transaction-mode in pgbounder -- that is, it requires persistent sessions. Which is annoying, and is done to get some upper-end performance improvements that I don't think matter for my or most scales. Otherwise, I much prefer GoodJob's development model, trust the maintainer's judgement more, find the code more readable, etc. -- but that's a big But for me.

bdcravens•3w ago

The first one that jumps out at me when I've evaluated it are batches (a Sidekiq Pro feature, though there are some Sidekiq plugins that support the same)

jrochkind1•3w ago

Ah neat, I didn't realize GoodJob had batches, great.

lta•3w ago

I have no opinion whatsoever yet on SolidQueue, but I'm having a blast with good job. Stuff works pretty well.

film42•3w ago

I made the switch on a new project and I don't regret it but it's still early days software despite the marketing. Concurrency control is fantastic, but it doesn't always work. I've woken up to see all threads occupied with a job that should be concurrency of 1.

I've also run into issues where a db connection pool is filled up and solid queue silently fails. No error or logs, just stops polling forever until manual restart. Far from ideal.

But, I can live with it. I am going for minimal maintenance, and the ability to run solid queue under puma inside rails on cloud run is just so easy. Having ~3 solid queue related issues a year is acceptable for my use case, but that doesn't mean it will be ok for others.

cortesoft•3w ago

Don't you think the officially supported Rails modules should work with all the RDMS engines that Rails supports? What would a MySQL based Rails app use if the official supported module didn't support it?

riffraff•3w ago

I think the suggestion is that one can have rdbms-specific optimizations while still keeping a standards-compliant base implementation.

Both MySQL and Postgresql could get their own optimizations.

cortesoft•3w ago

Oh, weird, they won't even allow functionally equivalent optimizations? That seems silly.

I was responding to Rails not officially supporting good_job, though, which appears to be a Postgres-only tool.

imtringued•3w ago

In the SQL world even simple things like booleans are RDMS engine specific so I have no idea how that is supposed to work.

rajaravivarma_r•3w ago

The one use case where a DB backed queue will fail for sure is when the payload is large. For example, you queue a large JSON payload to be picked up by a worker and process it, then the DB writing overhead itself makes a background worker useless.

I've benchmarked Redis (Sidekiq), Postgres (using GoodJob) and SQLite (SolidQueue), Redis beats everything else for the above usecase.

SolidQueue backed by SQLite may be good when you are just passing around primary keys. I still wonder if you can have a lot of workers polling from the same database and update the queue with the job status. I've done something similar in the past using SQLite for some personal work and it is easy to hit the wall even with 10 or so workers.

Manfred•3w ago

In my experience you want job parameters to be one, maybe two ids. Do you have a real world example where that is not the case?

embedding-shape•3w ago

I'm guessing you're with that adding indirection for what you're actually processing, in that case? So I guess the counter-case would be when you don't want/need that indirection.

If I understand what you're saying, is that you'll instead of doing:

- Create job with payload (maybe big) > Put in queue > Let worker take from queue > Done

You're suggesting:

- Create job with ID of payload (stored elsewhere) > Put in queue > Let worker take from queue, then resolve ID to the data needed for processing > Done

Is that more or less what you mean? I can definitively see use cases for both, heavily depends on the situation, but more indirection isn't always better, nor isn't big payloads always OK.

azuanrb•3w ago

If we take webhook for example.

- Persist payload in db > Queue with id > Process via worker.

Push the payload directly to queue can be tricky. Any queue system usually will have limits on the payload size, for good reasons. Plus if you already commit to db, you can guarantee the data is not lost and can be process again however you want later. But if your queue is having issue, or it failed to queue, you might lost it forever.

andersonklando•3w ago

> Push the payload directly to queue can be tricky. Any queue system usually will have limits on the payload size, for good reasons.

Is that how microservice messages work? They push the whole data so the other systems can consume it and take it from there?

Manfred•3w ago

A microservice architecture would probably use a message bus because they would also need to broadcast the result.

pas•3w ago

yes and no, as the sibling comment mentions sometimes a message bus is used (Kafka, for example), but Netflix is (was?) all-in with HTTP (low-latency gRPC, HTTP/3, wrapped in nice type-safe SDK packages)

but ideally you don't break the glass and reach for a microservices architecture if you don't need the scalability afforded by very deep decoupling

which means ideally you have separate databases (and DB schema and even likely different kind of data store), and through the magic of having minimally overlapping "bounded contexts" you don't need a lot of data to be sent over (the client SDK will pick what it needs for example)

... of course serving a content recommendation request (which results in a cascade of requests that go to various microservices, eg. profile, rights management data, CDN availability, and metadata for the results, image URLs, etc) for a Netflix user doesn't need durability, so no Kafka (or other message bus), but when the user changes their profile it might be something that gets "broadcasted"

(and durable "replayable" queues help, because then services can be put to read-only mode to serve traffic, while new instances are starting up, and they will catch up. and of course it's useful for debugging too, at least compared to HTTP logs, which usually don't have the body/payload logged.)

Manfred•3w ago

> I can definitively see use cases for both

Me too, I was just wondering if you have any real world examples of a project with a large payload.

pas•3w ago

...well, that's good for scaling the queue, but this means the worker needs to load all relevant state/context from some DB (which might be sped up with a cache, but then things are getting really complex)

ideally you pass the context that's required for the job (let's say it's less than 100Kbytes), but I don't think that counts as large JSON, but request rate (load) can make even 512byte too much, therefore "it depends"

but in general passing around large JSONs on the network/memory is not really slow compared to writing them to a DB (WAL + fsync + MVCC management)

rajaravivarma_r•2w ago

I have been doing this for at least a decade now and it is a great pattern, but think of an ETL pipeline where you fetch a huge JSON payload, store it in the database and then transform it and load it in another model. I had an use case where I wanted to process the JSON payload and pass it down the pipeline before storing it in the useful model. I didn't want to store the intermediate JSON anywhere. I benchmarked it for this specific use case.

zihotki•3w ago

Using Redis to store large queue payloads is usually a bad practice. Redis memory is finite.

dzonga•3w ago

this!! 100%.

pass around ID's

touisteur•3w ago

Interesting, as a self-contained minimalistic setup.

Shouldn't one be using a storage system such as S3/garage with ephemeral settings and/or clean-up triggers after job-end ? I get the appeal of using one-system-for-everything but won't you need a storage system anyway for other parts of your system ?

Have you written up somewhere about your benchmarks and where the cutoffs are (payload size / throughput / latency) ?

ddorian43•3w ago

> Redis beats everything else for the above usecase.

Reminds me of Antirez blog post that when Redis is configured for durability it becomes like/slower than postgresql http://oldblog.antirez.com/post/redis-persistence-demystifie...

epolanski•3w ago

There's been 6 major releases and countless improvements on Redis since then, I don't think we can say whether it's still relevant.

Also, Antirez has always been very opinionated on not comparing or benchmarking Redis against other dbs for a decade.

rajaravivarma_r•2w ago

May be, but over 6 years of using Redis with bare minimum setup, I have never lost any data and my use case happens to be queuing intermediate results, so durability won't be an issue.

michaelbuckbee•3w ago

FWIW, Sidekiq docs strongly suggest only passing around primary keys or identifiers for jobs.

stock_toaster•3w ago

> The one use case where a DB backed queue will fail for sure is when the payload is large. For example, you queue a large JSON payload to be picked up by a worker and process it, then the DB writing overhead itself makes a background worker useless.

redis would suffer from the same issue. Possibly even more severely due to being memory constrained?

I'd probably just stuff the "large data" in s3 or something like that, and just include the reference/location of the data in the actual job itself, if it was big enough to cause problems.

ckbkr10•3w ago

Comparing Redis to SQL is kinda off topic. Sure you can replace the one with the other but then we are talking about completely different concepts aren't we?

When all we are talking about is "good enough" the bar is set at a whole different level.

croes•3w ago

Maybe Redis is just overkill

touisteur•3w ago

I wish you'd have expanded on that. I almost always learn about some interesting lower-level tech through people trying to avoid a full-featured heavy-for-their-use-case tool or system.

stavros•3w ago

You're in luck, the article speaks about that at length!

touisteur•3w ago

Sorry, I went full typical HN commenter stereotype :-)

stavros•3w ago

I do it all the time too.

zihotki•3w ago

We're talking about business challenges/features which can be solved by using either of the solutions and analyzing pros/cons. It's not like Redis is bad, but sometimes it's an over-engineered solution and too costly

michaelbuckbee•3w ago

I wrote this article about migrating from Redis to SQLite for a particular scenario and the tradeoffs involved.

To be clear, I think the most important thing is understanding the performance characteristics of each technology enough that you can make good choices for your particular scenario.

https://wafris.org/blog/rearchitecting-for-sqlite

hahahahhaah•3w ago

Well they move from one thing not designed for queues to another not designed for queues. Maybe use a queue!

KolmogorovComp•3w ago

> Job latency under 1ms is critical to your business. This is a real and pressing concern for real-time bidding, high frequency trading (HFT), and other applications in the same ilk.

From TFA. Are there really people using Rails for HFT?

speed_spread•3w ago

Trading engine will not run Rails for sure but the web UI to monitor and control trades might do.

adamors•3w ago

Of course not, and the company whose blog we're reading isn't doing anything similar either https://www.simplethread.com/case-studies/ Rather funny IMO

steviee•3w ago

Wearing my Ruby T-Shirt (ok, Rubyconf.TH, but you get the gist) while reading this makes me fully approving and appreciating your post! It totally resonates with my current project setups and my trying to get them as simple as possible.

Especially when building new and unproven applications I'm always looking for things that trade the time I need to set tings up properly with he time I need to BUILD THE ACTUAL PRODUCT. Therefore I really like the recent changes to the Ruby on Rails ecosystem very much.

What we need is a larger user base setting everything up and discovering edge-cases and (!) writing about it (AND notifying the people around Rails). The more experience and knowledge there is, the better the tooling becomes. The happy path needs to become as broad as a road!

Like Kamal, at first only used by 36signals and now used by them and me. :D At least, of course.

Kudos!

Best, Steviee

EugeneOZ•3w ago

Chapter "The True Cost of Redis" surprised me.

> Deploy, version, patch, and monitor the server software

And with PostgreSQL you don't need it?

> Configure a persistence strategy. Do you choose RDB snapshots, AOF logs, or both?

It's a one-time decision. You don't need to do it daily.

> Sustain network connectivity, including firewall rules, between Rails and Redis

And for a PostgreSQL DB you don't need it?

> Authenticate your Redis clients

And your PostgreSQL works without that?

> Build and care for a high availability (HA) Redis cluster

If you want a cluster of PostgreSQL databases, perhaps you will do that too.

downsplat•3w ago

I guess the point is that you're already doing it for postgres. You alrrady need persistent storage for your app, and the same engine can handle your queuing needs.

heartbreak•3w ago

Exactly, if you’re already doing it for Postgres and Postgres can do the job well enough to meet your requirements, you’re only adding more cost and complexity by deploying Redis too.

madethemcry•3w ago

DHH also famously describe why and how they are leaving the cloud https://world.hey.com/dhh/why-we-re-leaving-the-cloud-654b47...

I'm not a fan boy of DHH but I really like his critical thinking about the status quo. I'm not able to leave the cloud or I better phrase it as it's too comfortable right now. I really wanted to leave redis behind me as it's mostly a hidden part of Rails nothing I use directly but often I have to pay for it "in the cloud"

I quickly hit an issue with the family of Solid features: Documentation doesn't really cover the case "inside your existing application" (at least when I looked into it shortly after Rails 8 was released). Being in the cloud (render.com, fly.io and friends) I had to create multiple DBs, one for each Solid feature. That was not acceptable as you usually pay per service/DB not per usage - similar how you have to pay for Redis.

This was a great motivation to research the cloud space once again and then I found Railway. You pay per usage. So I've right now multiple DBs, one for each Solid feature. And on top multiple environments multiplying those DBs and I pay like cents for that part of the app while it's not really filled. Of course in this setup I would also pay cents for Redis but it's still good to see a less complex landscape in my deployment environment.

Long story short, while try to integrate SolidQueue myself I found Railway. Deployment are fun again with that! Maybe that helps someone today as well.

downsplat•3w ago

Not a ruby shop here so it's not directly comparable, but I'm very happy with beanstalkd as a minimalistic job queue. We're on mysql for historical reasons, and it didn't support SKIP LOCKED at the time, so we had to add another tool.

allknowingfrog•3w ago

I pulled beanstalkd into a legacy PHP/MySQL application several years back and was very pleased with it. It's probably not the right choice for a modern Rails application, but if you already don't have a framework, it's a straightforward solution to drop in.

speleding•3w ago

We've been storing jobs in the DB long before SolidQueue appeared. One major advantage is that we can snapshot the state of the system (or one customer account) to our dev environment and get to see it exactly as it is in production.

We still keep rate limiters in Redis though, it would be pretty easy for some scanner to overload the DB if every rogue request would need a round trip to the DB before being processed. Because we only store ephemeral data in Redis it does not need backups.

skywhopper•3w ago

Redis is fundamentally the wrong storage system for a job queue when you have an RDBMS handy. This is not new information. You still might want to split the job queue onto its own DB server when things start getting busy, though.

For caching, though, I wouldn’t drop Redis so fast. As a in-memory cache, the ops overhead of running Redis is a lot lower. You can even ignore HA for most use cases.

Source: I helped design and run a multi-tiered Redis caching architecture for a Rails-based SaaS serving millions of daily users, coordinating shared data across hundreds of database clusters and thousands of app servers across a dozen AWS regions, with separate per-host, per-cluster, per-region, and global cache layers.

We used Postgres for the job queues, though. Entirely separate from the primary app DBs.

otterley•3w ago

> Redis is fundamentally the wrong storage system for a job queue when you have an RDBMS handy

One could go one step further and say an RDBMS is fundamentally the wrong storage system for a job queue when you have a persistent, purpose-built message queue handy.

Honestly, for most people, I'd recommend they just use their cloud provider's native message queue offering. On AWS, SQS is cheap, reliable, easy to start with, and gives you plenty of room to grow. GCP PubSub and Azure Storage Queues are probably similar in these regards.

Unless managing queues is your business, I wouldn't make it your problem. Hand that undifferentiated heavy lifting off.

ecshafer•3w ago

Rails shops seem to not like to use SQS/PubSub/Kafka/RabbitMQ for some reason. They seem to really like these worker tasks like SideKiq or SolidQueue. When I compare this with Java, C# or Python who all seem much more likely to use a separate message queue then have that handle the job queue.

otterley•3w ago

I've also noticed that they conflate the notion of workers, queues, and message busses. A worker handles asynchronous tasks, but the means by which they communicate might be best served by either a queue or a message bus, depending on the specific needs. Tight coupling might be good for knocking out PoCs quickly, but once you have production-grade needs, the model begins to show its weaknesses.

skunkworker•3w ago

Rails shops running on normal CRuby, have difficult in effectively scaling out multithreading due to the GVL lock. It's much easier to "scale" ruby using forking with sidekiq or multi process, and to have it consume data from a Redis list. It is possible to get around the GVL using JRuby, but that poses a different set of constraints and issues.

There is some definite blending of async messaging in the Ruby world though. I've seen connectors which take protobufs on a kafka topic and use sidekiq to fan out the work. With Redis (looking at sidekiq specifically) it becomes trivial to maintain the "current" working set with items popped out of the queue, with atomic commands like BLMOVE (formerly BRPOPLPUSH).

Kafka is taking an interesting turn however with the KIP-932 "Queues for Kafka" initiative. I personally believe it could eat RabbitMQ's lunch if done effectively. Allowing for multiple consumers, a "working set" of unack'ed data, without having to worry as much about the topic partition count.

otterley•3w ago

> Rails shops running on normal CRuby, have difficult in effectively scaling out multithreading due to the GVL lock. It's much easier to "scale" ruby using forking with sidekiq or multi process, and to have it consume data from a Redis list.

This isn't cloud-native at all. In a cloud-native world, these workers would be running in hosted functions (e.g. Lambda) and be consuming from a work queue. I assume this is possible in Rails, but the startup overhead might be considerable.

patwolf•3w ago

I've been looking at DBOS for queuing and other scheduling tasks in a nodejs app. However, it only works with Postgres, and that means I can't use it in web or mobile with sqlite. I like that SolidQueue works with multiple databases. Too bad it needs rails.

ivolimmen•3w ago

Exactly what https://www.amazingcto.com/postgres-for-everything/ says; keep it simpel and use PostgreSQL.

antisthenes•3w ago

Isn't Redis just a lot less relevant these days since enterprise NVME storage is so ridiculously fast?

How much latency could you really be saving versus introducing complexity?

But I am not a storage/backend engineer, so maybe I don't understand the target use of Redis.

aynyc•3w ago

You'll be amazed on what the new breed of engineers are using Redis for. I personally saw an entire backend database using Redis with RDB+AOF on. If you redis-cli into the server, you can't understand anything because you need to know the schema to make sense of it all.

SomeUserName432•3w ago

> But I am not a storage/backend engineer, so maybe I don't understand the target use of Redis.

We use it to broadcast messages across horizontally scaled services.

Works fine, probably a better tool out there for the job with better delivery guarantees, but the decision was taken many years ago, and no point in changing something that just works.

It's also language agnostic, which really helps.

We use ElasticCache (Valkey i suppose), so most of the articles points are moot for our use.

Were we to implement it from scratch today, we might look for better delivery guarantees, or we might just use what we already know works.

everforward•3w ago

Redis still has a niche. For something like a job queue, SQL is probably fine because adding a few ms of latency isn't a big deal. For something like rate-limiting where each layer of microservice/monolith component has their own rate-limit, that can really add up. It's not unheard of for a call to hit 10 downstreams, and a 10ms difference for each is 100ms in latency for the top of the waterfall.

Redis also scales horizontally much, much easier because of the lack of relational schemas. Keys can be owned by a node without any consensus within the cluster beyond which node owns the key. Distributed SQL needs consensus around things like "does the record this foreign key references exist?", which also has to take into account other updates occurring simultaneously.

It's why you see something like Redis caching DB queries pretty often. It's way, way easier to make your Redis cluster 100x as fast than it is to make your DB 100x as fast. I think it's also cheaper in terms of hardware, but I haven't done much beyond napkin math to validate that.

yandrypozo•3w ago

I love the idea of PG for everything, but every time I suggest it I get the same answer "When you're a hammer, everything looks like a nail" which makes sense to me, but not sure how to give a good answer to that phrase :(

solid_fuel•3w ago

I mean, that's just a truism - it's not really engineering advice. Maybe Postgres is just a hammer, but when you're building a house there's a lot of nails.

If you've got to store 5 GB videos, maybe reach for object store instead of postgres. But for most uses postgres is a solid choice.

jrochkind1•3w ago

Would be more useful as a report back with the switch a couple months behind, than as a "This is what I'm going to do"!

efields•3w ago

SolidQueue is great. Rails 8 is great. Monoliths are great. Most of the time.

azuanrb•3w ago

Sharing my experience. I experimented with SolidQueue for my side project. My conclusion for production usage was:

- No reason to switch to SolidQueue or GoodJob if you have no issue with Sidekiq. Only do it if you want to remove the Redis infra, no other big benefits other than that imo. - For new projects, I might be more biased towards GoodJob. They're more matured, great community and have more features. - One thing I don't like about SolidQueue is the lack of solid UI. Compared to GoodJob or Sidekiq, it's pretty basic. When I tried it last time, the main page would hang due to unoptimized indexes. Only happens when your data reaches certain threshold. Might have been fixed though.

Another consideration with using RDBMS instead of Redis is that you might need to allocate proper connection pool now. Depends on your database setup. It's nothing big, but that's one additional "cost" as you never really had to consider when you're using Redis.

Glyptodon•3w ago

Ignoring how it works, there are a a solid handful of great features you get out of the box with Solid + Active Job that don't exist w/just using Sidekiq, even through the Active Job adapter.

wolttam•3w ago

I'm not sure how similar they are internally (I suspect: quite), but I use Django-Q2's database broker to similar effect. More simple = better!

dynamicentropy•3w ago

Django is slowly catching up with Rails by adding support for a unified task interface in Django 6.0, but less feature rich than Rails' ActiveJob.

There are already a few implementations, and the reference one (django-tasks), even has a database-backed task backend that also uses FOR UPDATE SKIP LOCKED to control concurrency. With django-tasks and a few extra packages you can already get quite far compared to what Solid Queue provides, except maybe for features like concurrency controls and using a separate database for the queues.

I really enjoyed learning about the internals of Solid Queue, to the point that I decided to port it to Django [1]. It provides all of Solid Queue's features, except for retries on errors which is something that IMHO should be provided by the Django task interface, like Active Job does.

[1]: https://github.com/knifecake/steady-queue

pm90•3w ago

One other concern: if you ever have to deploy in another cloud, there are all kinds of issues with authentication and version support. e.g. azure doesn’t support the latest redis version, GCP MemoryStore forbids password only login for Redis Clusters etc. The infrastructure complexity can be high (albeit manageable).

vjerancrnjak•3w ago

Not sure how that helps. They mention SKIP LOCKED but then show a job with 15 minute duration.

How will you hold an open transaction for 15 minutes without seriously compromising the performance of the database?

Allowing people to do this easily will just result in an antipattern with horrible performance and reliability once network starts to randomly end transactions. Pretty sure, just like Python can’t figure out connection to the db was closed, so can’t Rails.

Once people add transaction pinning proxies, and try to actually get most performance from db, these kind of locking mechanisms that require a long running open transaction start falling apart.

Edit: I must have misunderstood and it is a lease.

DoNotNotify is now Open Source

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

Haskell for all: Beyond agentic coding

Matchlock: Linux-based sandboxing for AI agents

SectorC: A C Compiler in 512 bytes (2023)

Reverse Engineering Raiders of the Lost Ark for the Atari 2600

LLMs as the new high level language

The Architecture of Open Source Applications (Volume 1) Berkeley DB

Software factories and the agentic moment

Modern and Antique Technologies Reveal a Dynamic Cosmos

Speed up responses with fast mode

LineageOS 23.2

Stories from 25 Years of Software Development

Hoot: Scheme on WebAssembly

uLauncher

Brookhaven Lab's RHIC concludes 25-year run with final collisions

Vocal Guide – belt sing without killing yourself

Wood Gas Vehicles: Firewood in the Fuel Tank (2010)

Rabbit Ear "Origami": programmable origami in the browser (JS)

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

First Proof

Substack confirms data breach affects users’ email addresses and phone numbers

Start all of your commands with a comma (2009)

Al Lowe on model trains, funny deaths and working with Disney

The AI boom is causing shortages everywhere else

Where did all the starships go?

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

LLMs as Language Compilers: Lessons from Fortran for the Future of Coding

Show HN: A luma dependent chroma compression algorithm (image compression)

In the Australian outback, we're listening for nuclear tests

DoNotNotify is now Open Source

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

Haskell for all: Beyond agentic coding

Matchlock: Linux-based sandboxing for AI agents

SectorC: A C Compiler in 512 bytes (2023)

Reverse Engineering Raiders of the Lost Ark for the Atari 2600

LLMs as the new high level language

The Architecture of Open Source Applications (Volume 1) Berkeley DB

Software factories and the agentic moment

Modern and Antique Technologies Reveal a Dynamic Cosmos

Speed up responses with fast mode

LineageOS 23.2

Stories from 25 Years of Software Development

Hoot: Scheme on WebAssembly

uLauncher

Brookhaven Lab's RHIC concludes 25-year run with final collisions

Vocal Guide – belt sing without killing yourself

Wood Gas Vehicles: Firewood in the Fuel Tank (2010)

Rabbit Ear "Origami": programmable origami in the browser (JS)

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

First Proof

Substack confirms data breach affects users’ email addresses and phone numbers

Start all of your commands with a comma (2009)

Al Lowe on model trains, funny deaths and working with Disney

The AI boom is causing shortages everywhere else

Where did all the starships go?

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

LLMs as Language Compilers: Lessons from Fortran for the Future of Coding

Show HN: A luma dependent chroma compression algorithm (image compression)

In the Australian outback, we're listening for nuclear tests

I’m leaving Redis for SolidQueue

Comments