frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Azure Outage

267•kierenj•1h ago•118 comments

Keep Android Open

http://keepandroidopen.org/
1923•LorenDB•13h ago•582 comments

Tailscale Peer Relays

https://tailscale.com/blog/peer-relays-beta
37•seemaze•53m ago•11 comments

Cursor Composer: Building a fast frontier model with RL

https://cursor.com/blog/composer
71•leerob•1h ago•39 comments

I made a 10¢ MCU Talk

https://www.atomic14.com/2025/10/29/CH32V003-talking
95•iamflimflam1•3h ago•31 comments

Does brand advertising work? Upwave (YC S12) is hiring engineers to answer that

https://www.upwave.com/job/8228849002/
1•ckelly•14m ago

Beyond RaspberryPi: What are all the other SoC vendors up to *summarised*

https://sbcwiki.com/news/articles/state-of-embedded-q4-25/
57•HeyMeco•4d ago•25 comments

Collins Aerospace: Sending text messages to the cockpit with test:test

https://www.ccc.de/en/disclosure/collins-aerospace-mit-test-test-textnachrichten-bis-ins-cockpit-...
44•hacka22•2h ago•13 comments

Floss Before Brushing

https://alearningaday.blog/2025/10/29/floss-before-brushing/
15•imasl42•53m ago•6 comments

From VS Code to Helix

https://ergaster.org/posts/2025/10/29-vscode-to-helix/
149•todsacerdoti•3h ago•81 comments

Eye prosthesis is the first to restore sight lost to macular degeneration

https://med.stanford.edu/news/all-news/2025/10/eye-prosthesis.html
104•gmays•1w ago•10 comments

Azure major outage: Portal, Front Door and global regions down

56•sech8420•1h ago•33 comments

Recreating a Homebrew Game System from 1987

https://alex-j-lowry.github.io/z80tvg.html
43•voxadam•3h ago•1 comments

Who needs Graphviz when you can build it yourself?

https://spidermonkey.dev/blog/2025/10/28/iongraph-web.html
373•pdubroy•11h ago•69 comments

AWS to bare metal two years later: Answering your questions about leaving AWS

https://oneuptime.com/blog/post/2025-10-29-aws-to-bare-metal-two-years-later/view
414•ndhandala•6h ago•312 comments

Show HN: HUD-like live annotation and sketching app for macOS

https://draw.wrobele.com/
25•tomaszsobota•2h ago•7 comments

Hosting SQLite Databases on GitHub Pages (2021)

https://phiresky.github.io/blog/2021/hosting-sqlite-databases-on-github-pages/
16•WA9ACE•1h ago•6 comments

ChatGPT's Atlas: The Browser That's Anti-Web

https://www.anildash.com//2025/10/22/atlas-anti-web-browser/
646•AndrewDucker•4d ago•268 comments

Tips for stroke-surviving software engineers

https://blog.j11y.io/2025-10-29_stroke_tips_for_engineers/
402•padolsey•13h ago•144 comments

Tell HN: Twilio support replies with hallucinated features

48•haute_cuisine•1h ago•8 comments

Kafka is Fast – I'll use Postgres

https://topicpartition.io/blog/postgres-pubsub-queue-benchmarks
167•enether•3h ago•164 comments

uBlock Origin Lite Apple App Store

https://apps.apple.com/in/app/ublock-origin-lite/id6745342698
329•mumber_typhoon•13h ago•157 comments

The end of the rip-off economy: consumers use LLMs against information asymmetry

https://www.economist.com/finance-and-economics/2025/10/27/the-end-of-the-rip-off-economy
118•scythe•1h ago•98 comments

Show HN: Learn German with Games

https://www.learngermanwithgames.com/
59•predictand•5h ago•29 comments

AirTips – Alternative to Bento.me/Linktree

https://a.coffee/
10•Airyisland•1h ago•7 comments

Oracle has adopted BOOLEAN in 23ai and PostgreSQL had it forever

https://hexacluster.ai/blog/postgresql/oracles-adoption-of-native-boolean-data-type-vs-postgresql/
13•avi_vallarapu•2h ago•9 comments

SpiderMonkey Garbage Collector

https://firefox-source-docs.mozilla.org/js/gc.html
67•sebg•8h ago•3 comments

Berkeley Out-of-Order RISC-V Processor (Boom) (2020)

https://docs.boom-core.org/en/latest/sections/intro-overview/boom.html
28•Bogdanp•4h ago•9 comments

Minecraft removing obfuscation in Java Edition

https://www.minecraft.net/en-us/article/removing-obfuscation-in-java-edition
9•SteveHawk27•1h ago•1 comments

AOL to be sold to Bending Spoons for roughly $1.5B

https://www.axios.com/2025/10/29/aol-bending-spoons-deal
19•jmsflknr•46m ago•4 comments
Open in hackernews

Kafka is Fast – I'll use Postgres

https://topicpartition.io/blog/postgres-pubsub-queue-benchmarks
167•enether•3h ago

Comments

qsort•2h ago
I feel so seen lol. I work in data engineering and the first paragraph is me all the time. There are a lot of cool technologies (timeseries databases, vector databases, stuff like Synapse on Azure, "lakehouses" etc.) but they are mostly for edge cases.

I'm not saying they're useless, but if I see something like that lying around, it's more likely that someone put it there based on vibes rather than an actual engineering need. Postgres is good enough for OpenAI, chances are it's good enough for you.

zer00eyz•2h ago
> Should You Use Postgres? Most of the time - yes. You should always default to Postgres until the constraints prove you wrong.

Kafka, GraphQL... These are the two technology's where my first question is always this: Does the person who championed/lead this project still work here?

The answer is almost always "no, they got a new job after we launched".

Resume Architecture is a real thing. Meanwhile the people left behind have to deal with a monster...

kvdveer•2h ago
To be fair, this is true for all technologically interesting solutions, even when they use postgres. People championing novel solutions typically leave after the window for creativity has closed.
darkstar_16•2h ago
GraphQL sure, but I'm not sure I'd put kafka in the same bucket. It is a nice technology that has it's use in some cases, where postgresql would not work. It is also something a small team should not start with. Start with postgres and then move on to something else when the need arises.
forgetfulness•2h ago
We’re all passing through our jobs, the value of the solutions remains in the hands of the shareholders, if you don’t try to squeeze some long-term value for your resume and long-term employability, you’re assuming a significant opportunity cost on their behalf

They’ll be fine if you made something that works, even if it was a bit faddish, make sure you take care of yourself along the way (they won’t)

candiddevmike•2h ago
Attitudes like this are why management treats developers like children who constantly need to be kept on task, IMO.
forgetfulness•1h ago
Software is a line of work that has astounding amounts of autonomy, if you compare it to working in almost anything else.

My point stands, company loyalty tallies up to very little when you’re looking for your next job; no interviewer will care much to hear of how you stood firm, and ignored the siren song of tech and practices that were more modern than the one you were handed down (the tech and practices they’re hiring for).

The moment that reverses, I will start advising people not to skill up, as it will look bad in their resumes.

janwijbrand•2h ago
"resume" as in "resumé" not as in "begin again or continue after a pause or interruption" - it took me longer than I care to admit to get that.
Groxx•2h ago
having never hosted a GraphQL service, but I can see many obvious room for problems:

is there some reason GraphQL gets so much hate? it always feels to me like it's mostly just a normal RPC system but with some incredibly useful features (pipelining, and super easy to not request data you don't need), with obvious perf issues in code and obvious room for perf abuse because it's easy to allow callers to do N+1 nonsense.

so I can see why it's not popular to get stuck with for public APIs unless you have infinite money, it's relatively wide open for abuse, but private seems pretty useful because you can just smack the people abusing it. or is it more due to specific frameworks being frustrating, or stuff like costly parsing and serialization and difficult validation?

twodave•1h ago
As someone who works with GraphQL daily, many of the criticisms out there are from before the times of persisted queries, query cost limits, and composite schemas. It’s a very mature and useful technology. I agree with it maybe being less suitable for a public API, but less because of possible abuse and more because simple HTTP is a lot more widely known. It depends on the context, as in all things, of course.
Groxx•1h ago
yeah, I took one look at it and said "great, so add some cost tracking and kill requests before they exceed it" because like. obviously. it's similar to exposing a SQL endpoint: you need to build for that up front or the obvious results will happen.

which I fully understand is more work than "it's super easy just X" which it gets presented as, but that's always the cost of super flexible things. does graphql (or the ecosystem, as that's part of daily life of using it) make that substantially worse somehow? because I've dealt with people using protobuf to avoid graphql, then trying to reimplement parts of its features, and the resulting API is always an utter abomination.

marcosdumay•1h ago
Take a look on how to implement access control over GraphQL requests. It's useless for anything that isn't public data (at least public for your entire network).

And yes, you don't want to use it for public APIs. But if you have private APIs that are so complex that you need a query language, and still want use those over web services, you are very likely doing something really wrong.

Groxx•54m ago
I'm honestly not seeing much here that isn't identical to almost all other general purpose RPC systems: https://graphql.org/learn/authorization/

"check that the user matches the data they're requesting by comparing the context and request field by hand" is ultra common - there are some real benefits to having authorization baked into the language, but it seems very rare in practice (which is part of why it's often flawed, but following the overwhelming standard is hardly graphql's mistake imo). I'd personally think capabilities are a better model for this, but that seems likely pretty easy to chain along via headers?

bencyoung•1h ago
Kafka is great tech, never sure why people have an issue with it. Would I use it all the time? No, but where it's useful, it's really useful, and opens up whole patterns that are hard to implement other ways
evantbyrne•1h ago
Managed hosting is expensive to operate and self-managing kafka is a job in of itself. At my last employer they were spending six figures to run three low volume clusters before I did some work to get them off some enterprise features, which halved the cost, but it was still at least 5x the cost of running a mainstream queue. Don't use kafka if you just need queuing.
CuriouslyC•1h ago
I always push people to start with NATS jetstream unless I 100% know they won't be able to live without Kafka features. It's performant and low ops.
bencyoung•1h ago
Cheapest MSK cluster is $100 a month and can easily run a dev/uat cluster with thousands of messages a second. They go up from there but we've made a lot of use of these and they are pretty useful
singron•36m ago
I've basically never had a problem with MSK brokers. The issue has usually been "why are we rebalancing?" and "why aren't we consuming?", i.e. client problems.
evantbyrne•36m ago
It's not the dev box with zero integrations/storage that's expensive. AWS was quoting us similar numbers for MSK. Part of the issue is that modern kafka has become synonymous with Confluent, and once you buy into those features, it is very difficult to go back. If you're already on AWS and just need queuing, start with SQS.
j45•1h ago
Engaging difficulty is a form of procrastination and avoiding stoking a product in some cases.

Instead of not knowing 1 thing to launch.. let’s pick as many new to us things, that will increase the chances of success.

bonesss•1h ago
Kafka also provides early architectural scaffolding for multiple teams to build in parallel with predictable outcomes (in addition to the categorical answers to hard/error-prone patterns). It’s been adopted in principle by the services on, and is offered turn-key by, all the major cloud providers.

Personally I’d expect some kind of internal interface to abstract away and develop reusable components for such an external dependency, which readily enables having relational data stores mirroring the brokers functionality. Handy for testing and some specific local scenarios, and those database backed stores can easily pull from the main cluster(s) later to mirror data as needed.

sitestable•1h ago
The best architecture decision is the one that's still maintainable when the person who championed it leaves. Always pretend the person who maintains a project after you knows where you live and all that.
jjice•2h ago
This is a well written addition to the list of articles I need to reference on occasion to keep myself from using something new.

Postgres really is a startup's best friend most of the time. Building a new product that's going to deal with a good bit of reporting that I began to look at OLAP DBs for, but had hesitation to leave PG for it. This kind of seals it for me (and of course the reference to the class "Just Use Postgres for Everything" post helps) that I should Just Use Postgres (R).

On top of being easy to host and already being familiar with it, the resources out there for something like PG are near endless. Plus the team working on it is doing constant good work to make it even more impressive.

j45•1h ago
It’s totally reasonable to start with fewer technologies to do more and then outgrow them.
cpursley•2h ago
Related: https://www.pgflow.dev

It's built on pgmq and not married to supabase (nearly everything is in the database).

Postgres is enough.

agentultra•2h ago
You have to be careful with the approach of using Postgres for everything. The way it locks tables and rows and the serialization levels it guarantees are not immediately obvious to a lot of folks and can become a serious bottle-neck for performance-sensitive workloads.

I've been a happy Postgres user for several decades. Postgres can do a lot! But like anything, don't rely on maxims to do your engineering for you.

sneilan1•2h ago
Yes, performance can be a big issue with postgres. And vertical scaling can really put a damper on things when you have a major traffic hit. Using it for kafka is misunderstanding the one of the great uses of kafka which is to help deal with traffic bursts. All of a sudden your postgres server is overwhelmed and the kafka server would be fine.
zenmac•36m ago
>And vertical scaling can really put a damper on things when you have a major traffic hit.

Wouldn't OrioleDB solve that issue though?

sneilan1•15m ago
Not familiar with OrioleDB. I’ll look it up. May I ask how this helps? Just curious.
fukka42•1h ago
My strategy is to use postgres first. Get the idea off the ground and switch when postgres becomes the bottleneck.

It often doesn't.

jorge-d•1h ago
Definitely, this is also one of the direction Rails is heading[1]: provide a basis setup most of the people can use out of the box. And if needed you can always plug in more "mature" solutions afterwards.

[1] https://rubyonrails.org/2024/11/7/rails-8-no-paas-required

fud101•1h ago
When someone says just use Postgres, are they using the same instance for their data as well for the queue?
j45•1h ago
It can be a different database in the same server or a separate server.

When you’re doing hundreds or thousands of transactions to begin with it doesn’t really impact as much out of the gate.

Of course there will be someone who will pull out something that won’t work but such examples can likely be found for anything.

We don’t need to fear simplification, it is easy to complicate later when the actual complexities reveal themselves.

marcosdumay•1h ago
When people say "just use postgres" it's because their immediate need is so low that this doesn't matter.

And the thing is, a server from 10 years ago running postgres (with a backup) is enough for most applications to handle thousands of simultaneous users. Without even going into the kinds of optimization you are talking about. Adding ops complexity for the sake of scale on the exploratory phase of a product is a really bad idea when there's an alternative out there that can carry you until you have fit some market. (And for some markets, that's enough forever.)

victorbjorklund•25m ago
Yes, I often use PG for queues on the same instance. Most of the time you dont see any negative effects. For a new project with barely any users it doesn’t matter.
j45•1h ago
100%

Postgres isn’t meant to be a guaranteed permanent replacement.

It’s a common starting point for a simpler stack which can retain a greater deal of flexibility out of the box and increased velocity.

Starting with Postgres lets the bottlenecks reveal themselves, and then optimize from there.

Maybe a tweak to Postgres or resources, or consider a jump to Kafka.

SoftTalker•1h ago
This is true of any data storage. You have to understand the concurrency model and assumptions, and know where bottlenecks can happen. Even among relational databases there are significant differences.
guywithahat•2h ago
> One camp chases buzzwords

> ...

> The other camp chases common sense

I don't really like these simplifications. Like one group obviously isn't just dumb, they're doing things for reasons you maybe don't understand. I don't know enough about data science to make a call, but I'm guessing there were reasons to use Kafka due to current hardware limits or scalability concerns, and while the issues may not be as present today that doesn't mean they used Kafka just because they heard a new word and wanted to repeat it.

sumtechguy•2h ago
Kafka and other message systems like it have their uses. But sometimes all you need is just need a database. Now you start doing realtime streaming and notifications and event type things a messaging system is good. You can even back it up with a boring database. Would I start with kafka? Probably not. I would start with a boring databsee and then if if my bashing on the db over and over saying 'have you changed' doesnt work as good anymore then you put in a messaging system.
temporallobe•1h ago
Agree with this sentiment - it’s easy to be judgmental about these things, but project-level issues and decisions can be very complicated and engineers often have little to no visibility into them. We’re using Kafka for a gigantic pipeline where IMO any reasonably modern database would suffice (and may even be superior), but our performance requirements are unclear. At some point in the distant future, we may have a significant surge in data quantity and speed, requiring greater throughput and (de)serialization speed, but I am not convinced that Kafka ultimately helps us there. I imagine this is a case where the program leadership was sold a solution which we are now obligated to use. This happens a LOT, and I have seen unnecessary and unused products cost companies millions over the years. For example, my team was doing analysis on replacing our existing Atlassian Data Center with other solutions, and in doing so, we discovered several underused/unused Atlassian plugins for which we are paying very high license fees. At some point, users over the years had requested some functionality for a specific workflow and the plugins were purchased. The people and projects went away or otherwise processes became OBE, but the plugins happily hummed along while the bills were paid.
sneilan1•2h ago
I'm starting to like mongodb a lot more given the python library mongomock. I find it wonderful to create tests that run my queries against mongo in code before I deploy them. Yes, mongo has a lot of quirks and you have to know aws networking to set it up with your vpc so you don't get nailed with egress costs. And it's not the same query patterns and some queries are harder and you have maintain your own schemas. But the ability to test mongo code with mongomock w/o having to run your own mongo server is SO VALUABLE. And yes, there are edge cases with mongomock not supporting something but the library is open source and pretty easy to modify. And it fails loudly which is super helpful. So if something is not supported you'll know. Maybe you might find a real nasty feature that's hard to implement but then just use a repository pattern like you would for testing postgres code in your application.

https://github.com/mongomock/mongomock Extrapolating from my personal usage of this library to others, I'm starting to think that mongodb's 25 billion dollar valuation is partially based on this open source package :)

pphysch•2h ago
Or just use devcontainers and have an actual Postgres DB to test against? I've even done this on a Chromebook. This is a solved problem.
sneilan1•2h ago
True but then my tests take longer to run. I really like having very fast tests. And then my tests have to make local network calls to a postgres server. I like my tests isolated.
pphysch•24m ago
They are isolated, your devcontainer config can live in your source repo. And you're not gonna see significant latency from your loopback interface... If your test suite includes billions of queries you may want to reassess.
candiddevmike•2h ago
Curious why you think the risk of edge cases from mocking is a worthwhile trade off vs the relatively low complexity of setting up a container to test against?
sneilan1•1h ago
Because I can read the mongomock library and understand exactly what it's doing. And mongo's aggregation pipelines are easier to model than sql queries in code. Sure, it's possible to run into an edge case but for a lot of general queries for filtering & aggregation, it's just fine.
philipallstar•1h ago
You can also do this with sqlite, running an in-memory sqlite is lightning fast and I don't think there are any edge cases. Obviously doesn't work for everything, but when sqlite is possible, it's great!
sneilan1•1h ago
True but if you wind up using parts of postgres that aren't supported by sqlite then it's harder to use sqlite. I agree however, if I was able to just use sqlite, I would do that instead. But I'm using a lot of postgres extensions & fields that don't have direct mappings to sqlite.

Otherwise SQLITE :)

j45•1h ago
That might work for some.

I prefer not to start with a nosql database and then undertake odysseys to make it into a relational database.

honkostani•2h ago
Resume driven design, is running into the desert of moores plateau punishing the use of ever more useless abstractions. They get quieter, because their projects keep on dying after the revolutionary tech is introduced and they jump ship.
jimbokun•2h ago
For me the killer feature of Kafka was the ability to set the offset independently for each consumer.

In my company most of our topics need to be consumed by more than one application/team, so this feature is a must have. Also, the ability to move the offset backwards or forwards programmatically has been a life saver many times.

Does Postgres support this functionality for their queues?

Jupe•1h ago
Isn't it just a matter of having each consumer use their own offset? I mean if the queue table is sequentially or time-indexed, the consumer just provides a smaller/earlier key to accomplish the offset? (Maybe I'm missing something here?)
altcognito•1h ago
Correct, offsets and sharding aren't magic. And partitions in Kafka are user defined, just like they would be for postgresql.
altcognito•1h ago
The article basically states unless you need a lot of throughput, you probably don't need Kafka. (my interpretation extends to say) You probably don't need offsets because you don't need multi-threaded support because you don't need multiple threads.

I don't know what kind of native support PG has for queue management, the assumption here is that a basic "kill the task as you see it" is usually good enough and the simplicity of writing and running a script far outweighs the development, infrastructure and devops costs of Kafka.

But obviously, whether you need stuff to happen in 15 seconds instead of 5 minutes, or 5 minutes instead of an hour is a business decision, along with understanding the growth pattern of the workload you happen to have.

j45•1h ago
PG has several queue management extensions and I’m working my way through trying them out.

Here is one: https://pgmq.github.io/pgmq/

Some others: https://github.com/dhamaniasad/awesome-postgres

Most of my professional life I have considered Postgres folks to be pretty smart… while I by chance happened to go with MySQL and it became the rdbms I thought in by default.

Heavily learning about Postgres recently has been okay, not much different than learning the tweaks for msssl, oracle or others. Just have to be willing to slow down a little for a bit and enjoy it instead of expecting to thrush thru everything.

johnyzee•2h ago
Seems like you would at the very least need a fairly thick application layer on top of Postgres to make it look and act like a messaging system. At that point, seems like you have just built another messaging system.

Unless you're a five man shop where everybody just agrees to use that one table, make sure to manage transactions right, cron job retention, YOLO clustering, etc. etc.

Performance is probably last on the list of reasons to choose Kafka over Postgres.

j45•1h ago
You expose the api on Postgres much like any other group of developers use and call it a day.

There’s several implementations of queues to increase the chance of finishing what one is after. https://github.com/dhamaniasad/awesome-postgres

odie5533•1h ago
How fast is failover?
vbezhenar•1h ago
How do you implement "unique monotonically-increasing offset number"?

Naive approach with sequence (or serial type which uses sequence automatically) does not work. Transaction "one" gets number "123", transaction "two" gets number "124". Transaction "two" commits, now table contains "122", "124" rows and readers can start to process it. Then transaction "one" commits with its "123" number, but readers already past "124". And transaction "one" might never commit for various reasons (e.g. client just got power cut), so just waiting for "123" forever does not cut it.

Notifications can help with this approach, but then you can't restart old readers (and you don't need monotonic numbers at all).

theK•1h ago
> unique monotonically-increasing offset number

Isn't it a bit of a white whale thing that a umion can solve all one's subscriber problems? Afaik even with kafka this isn't completely watertight.

sigseg1v•1h ago
What about a `DEFERRABLE INITIALLY DEFERRED` trigger that increments a sequence only on commit?
singron•1h ago
The log_counter table tracks this. It's true that a naive solution using sequences does not work for exactly the reason you say.
xnorswap•1h ago
It's a tricky problem, I'd recommend reading DDIA, it covers this extensively:

https://www.oreilly.com/library/view/designing-data-intensiv...

You can generate distributed monotonic number sequences with a Lamport Clock.

https://en.wikipedia.org/wiki/Lamport_timestamp

The wikipedia entry doesn't describe it as well as that book does.

It's not the end of the puzzle for distributed systems, but it gets you a long way there.

See also Vector clocks. https://en.wikipedia.org/wiki/Vector_clock

Edit: I've found these slides, which are a good primer for solving the issue, page 70 onwards "logical time":

https://ia904606.us.archive.org/32/items/distributed-systems...

ownagefool•1h ago
The camps are wrong.

There's poles.

1. Is folks constantly adopting the new tech, whatever the motivation, and 2. I learned a thing and shall never learn anything else, ever.

Of course nobody exists actually on either pole, but the closer you are to either, the less pragmatic you are likely to be.

wosined•1h ago
I am the third pole: 3. Everything we have currently sucks and what is new will suck for some hitherto unknown reason.
ownagefool•1h ago
Heh, me too.

I think it's still just 2 poles. However, I probably shouldn't have prescribed motivation to latter pole, as I purposely did not with the former.

Pole 2 is simply never adopt anything new ever, for whatever the motivation.

binarymax•1h ago
This is it right here. My foil is the Elasticsearch replacement because PG has inverted indices. The ergonomics and tunability of these in PG are terrible compared to ES. Yes, it will search, but I wouldn’t want to be involved in constructing or maintaining that search.
uberduper•1h ago
Has this person actually benchmarked kafka? The results they get with their 96 vcpu setup could be achieved with kafka on the 4 vcpu setup. Their results with PG are absurdly slow.

If you don't need what kafka offers, don't use it. But don't pretend you're on to something with your custom 5k msg/s PG setup.

loire280•1h ago
In fact, a properly-configured Kafka cluster on minimal hardware will saturate its network link before it hits CPU or disk bottlenecks.
j45•1h ago
But it can do so many processes a second I’ll be able to scale to the moon before I ever launch.
theK•1h ago
Isn't that true for everything on the cloud? I thought we are long into the era where your disk comes over the network there.
UltraSane•47m ago
A network link can be anything from 1Gbps to 800Gbps.
altcognito•27m ago
This doesn't even make sense. How do you know what the network links or the other bottlenecks are like? There are a grandiose number of assumptions being made here.
010101010101•1h ago
> If you don't need what kafka offers, don't use it.

This is literally the point the author is making.

uberduper•1h ago
It seems like their point was to criticize people for using new tech instead of hacking together unscalable solutions with their preferred database.
blenderob•1h ago
That wasn't their point. Instead of posting snarky comments, please review the site guidelines:

"Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize."

lenkite•15m ago
But honestly, isn't that the strongest plausible interpretation according to the "site guidelines" ? When one explicitly says that the one camp chases "buzzwords" and the other chases "common sense", how else are you supposed to interpret it ?
PeterCorless•1h ago
But in this case, it is like saying "You don't need a fuel truck. You can transport 9,000 gallons of gasoline between cities by gathering 9,000 1-gallon milk jugs and filling each, then getting 4,500 volunteers to each carry 2 gallons and walk the entire distance on foot."

In this case, you do just need a single fuel truck. That's what it was built for. Avoiding using a design-for-purpose tool to achieve the same result actually is wasteful. You don't need 288 cores to achieve 243,000 messages/second. You can do that kind of throughput with a Kafka-compatible service on a laptop.

[Disclosure: I work for Redpanda]

kragen•1h ago
Getting a 288-core machine might be easier than setting up Kafka; I'm guessing that it would be a couple of weeks of work to learn enough to install Kafka the first time. Installing Postgres is trivial.
brianmcc•1h ago
"Lots of the team knows Postgres really well, nobody knows Kafka at all yet" is also an underrated factor in making choices. "Kafka was the ideal technical choice but we screwed up the implementation through well-intentioned inexperience" being an all too plausible outcome.
freedomben•36m ago
Indeed, I've seen this happen first hand where there was really only one guy who really "knew" Kafka, and it was too big of a job for just him. In that case it was fine until he left the company, and then it became a massive albatross and a major pain point. In another case, the eng team didn't really have anyone who really "knew" Kafka but used a managed service thinking it would be fine. It was until it wasn't, and switching away is not a light lift, nor is mass educating the dev team.

Kafka et al definitely have their place, but I think most people would be much better off reaching for a simpler queue system (or for some things, just using Postgres) unless you really need the advanced features.

PeterCorless•1h ago
The only thing that might take "weeks" is procrastination. Presuming absolutely no background other than general data engineering, a decent beginner online course in Kafka (or Redpanda) will run about 1-2 hours.

You should be able to install within minutes.

kragen•50m ago
I mean, setting up Zookeeper, tweaking the kernel settings, configuring the hardware, the kind of stuff mentioned in guides like https://medium.com/@ankurrana/things-nobody-will-tell-you-se... and https://dungeonengineering.com/the-kafkaesque-nightmare-of-m.... Apparently you can do without Zookeeper now, but that's another choice to make, possibly doing careful experiments with both choices to see what's better. Much more discussion in https://news.ycombinator.com/item?id=37036291.

None of this applies to Redpanda.

PeterCorless•44m ago
True. Redpanda does not use Zookeeper.

Yet to also be fair to the Kafka folks, Zookeeper is no longer default and hasn't been since April 2025 with the release of Apache Kafka 4.0:

"Kafka 4.0's completed transition to KRaft eliminates ZooKeeper (KIP-500), making clusters easier to operate at any scale."

Source: https://developer.confluent.io/newsletter/introducing-apache...

ilkhan4•1h ago
I'll push the metaphor a bit: I think the point is that if you have a fleet of vehicles you want to fuel, go ahead and get a fuel truck and bite off on that expense. However, if you only have 1 or 2, a couple of jerry cans you probably already have + a pickup truck is probably sufficient.
blenderob•1h ago
>> If you don't need what kafka offers, don't use it.

> This is literally the point the author is making.

Exactly! I just don't understand why HN invariably always tends to bubble up the most dismissive comments to the top that don't even engage with the actual subject matter of the article!

PeterCorless•1h ago
Exactly. Just yesterday someone posted how they can do 250k messages/second with Redpanda (Kafka-compatible implementation) on their laptop.

https://www.youtube.com/watch?v=7CdM1WcuoLc

Getting even less than that throughput on 3x c7i.24xlarge — a total of 288 vCPUs – is bafflingly wasteful.

Just because you can do something with Postgres doesn't mean you should.

> 1. One camp chases buzzwords.

> 2. The other camp chases common sense

In this case, is "Postgres" just being used as a buzzword?

[Disclosure: I work for Redpanda; we provide a Kafka-compatible service.]

j45•1h ago
Is it about what Kafka could get or what you need right now.

Kafka is a full on steaming solution.

Postgres isn’t a buzzword. It can be a capable placeholder until it’s outgrown. One can arrive at Kafka with a more informed run history from Postgres.

kitd•1h ago
> Kafka is a full on steaming solution.

Freudian slip? ;)

j45•1h ago
Haha, and a typo!
mxey•1h ago
Doesn’t Kafka/Redpanda have to fsync for every message?
kragen•1h ago
Definitely not in the case of Kafka. Even with SSD that would limit it to around 100kHz. Batch commit allows Kafka (and Postgres) to amortize fsync overhead over many messages.
PeterCorless•1h ago
Yes, for Redpanda. There's a blog about that:

"The use of fsync is essential for ensuring data consistency and durability in a replicated system. The post highlights the common misconception that replication alone can eliminate the need for fsync and demonstrates that the loss of unsynchronized data on a single node still can cause global data loss in a replicated non-Byzantine system."

However, for all that said, Redpanda is still blazingly fast.

https://www.redpanda.com/blog/why-fsync-is-needed-for-data-s...

uberduper•37m ago
I'm highly skeptical of the method employed to simulate unsync'd writes in that example. Using a non-clustered zookeeper and then just shutting it down, breaking the kafka controller and preventing any kafka cluster state management (not just preventing partition leader election) while manually corrupting the log file. Oof. Is it really _that_ hard to lose ack'd data from a kafka cluster that you had to go to such contrived and dubious lengths?
kasey_junk•27m ago
Kafka no longer has Zookeeper dependency and RedPanda never did (this is just an aside for those reading along, not a rebuttal).
uberduper•1h ago
I've never looked at redpanda, but kafka absolutely does not. Kafka uses mmapped files and the page cache to manage durable writes. You can configure it to fsync if you like.
mxey•1h ago
If I don’t actually want durable and consistent data, I could also turn off fsync in Postgres …
mrkeen•50m ago
The tradeoff here is that Kafka will still work perfectly if one of its instances goes down. (Or you take it down, for upgrades, etc.)

Can you lose one Postgres instance?

zozbot234•33m ago
AIUI Postgres has high-availability out of the box, so it's not a big deal to "lose" one as long as a secondary can take over.
UltraSane•48m ago
On enterprise grade storage writes go to NVRAM buffers before being flushed to persistent storage so this isn't much of a bottleneck.
kragen•1h ago
This sounded interesting to me, and it looks like the plan is to make Redpanda open-source at some point in the future, but there's no timeline: https://github.com/redpanda-data/redpanda/tree/dev/licenses
PeterCorless•1h ago
Correct. Redpanda is source-available.

When you have C++ code, the number of external folks who want to — and who can effectively, actively contribute to the code — drops considerably. Our "cousins in code," ScyllaDB last year announced they were moving to source-available because of the lack of OSS contributors:

> Moreover, we have been the single significant contributor of the source code. Our ecosystem tools have received a healthy amount of contributions, but not the core database. That makes sense. The ScyllaDB internal implementation is a C++, shard-per-core, future-promise code base that is extremely hard to understand and requires full-time devotion. Thus source-wise, in terms of the code, we operated as a full open-source-first project. However, in reality, we benefitted from this no more than as a source-available project.

Source: https://www.scylladb.com/2024/12/18/why-were-moving-to-a-sou...

People still want to get free utility of the source-available code. Less commonly they want be able to see the code to understand it and potentially troubleshoot it. Yet asking for active contribution is, for almost all, a bridge too far.

kragen•1h ago
Right, open source is generally of benefit to users, not to the author, and users do get some of that benefit from being able to see the source. I wouldn't want to look at it myself, though, for legal reasons.
zozbot234•58m ago
Note that prior to its license change ScyllaDB was using AGPL. This is a fully FLOSS license but may have been viewed nonetheless as somewhat unfriendly by potential outside contributors. The ScyllaDB license change was really more about not wanting to expend development effort on maintaining multiple versions of the code (AGPL licensed and fully proprietary), so they went for sort of a split-the-difference approach where the fully proprietary version was in turn made source-available.

(Notably, they're not arguing that open source reusers have been "unfair" to them and freeloaded on their effort, which was the key justification many others gave for relicensing their code under non-FLOSS terms.)

In case anyone here is looking for a fully-FLOSS contender that they may want to perhaps contribute to, there's the interesting project YugabyteDB https://github.com/yugabyte/yugabyte-db

cyphar•47m ago
I think AGPL/Proprietary license split and eventual move to proprietary is just a slightly less overt way of the same "freeloader" argument. The intention of the original license was to make the software unpalatable to enterprises unless you buy the proprietary license, and one "benefit" of the move (at least for the bean counters) is that it stops even AGPL-friendly enterprises from being able to use the software freely.

(Personally, I have no issues with the AGPL and Stallman originally suggested this model to Qt IIRC, so I don't really mind the original split, but that is the modern intent of the strategy.)

cyphar•52m ago
You are obviously free to choose to use a proprietary license, that's fine -- but the primary purpose of free licenses has very little to do with contributing code back upstream.

As a maintainer of several free software projects, there are lots of issues with how projects are structured and user expectations, but I struggle to see how proprietary licenses help with that issue (I can see -- though don't entirely buy -- the argument that they help with certain business models, but that's a completely different topic). To be honest, I have no interest in actively seeking out proprietary software, but I'm certainly in the minority on that one.

rplnt•9m ago
You can be open source and not take contributions. This argument doesn't make sense to me. Just stop doing the expensive part and keep the license as is.
kermatt•43m ago
To the issue of complexity, is Redpanda suitable as a "single node implementation" where a Kafka cluster is not needed due to data volume, but the Kafka message bus pattern is desired?

AKA "Medium Data" ?

joaohaas•1h ago
Had the same thoughts, weird it didn't include Kafka numbers.

Never used Kafka myself, but we extensively use Redis queues with some scripts to ensure persistency, and we hit throughputs much higher than those in equivalent prod machines.

Same for Redis pubsubs, but those are just standard non-persistent pubsubs, so maybe that gives it an upper edge.

roozbeh18•1h ago
Just checked my single node Kafka setup which currently handles 695.27k e/s (average daily) into elasticsearch without breaking a sweat. kafka has been the only stable thing in this whole setup.

zeek -> kafka -> logstash -> elastic

apetrov•15m ago
out of curiosity, what does your service do that it handles almost 700K events/sec?
ozgrakkurt•1h ago
This is why benchmarks should be hardware limit based IMO. Like I am maxing IOPS/throughput of this ssd or maxing out the network card etc.

CPU is more tricky but I’m sure it can be shown somehow

darth_avocado•1h ago
The 96 vcpu setup with 24xlarge instance costs about $20k/month on AWS before discounts. And one thing you don’t want in a pub sub system is a single instance taking all the read/writes. You can run a sizeable Kafka cluster for that kind of money in AWS.
jaimebuelta•1h ago
I may be reading a bit extra, but my main take on this is: "in your app, you probably already have PostgreSQL. You don't need to set up an extra piece of infrastructure to cover your extra use case, just reuse the tool you already have"

It's very common to start adding more and more infra for use cases that, while technically can be better cover with new stuff, it can be served by already existing infrastructure, at least until you have proof that you need to grow it.

blenderob•1h ago
> Has this person actually benchmarked kafka?

Is anyone actually reading the full article, or just reacting to the first unimpressive numbers you can find and then jumping on the first dismissive comment you can find here?

Benchmarking Kafka isn't the point here. The author isn't claiming that Postgres outperforms Kafka. The argument is that Postgres can handle modest messaging workloads well enough for teams that don't want the operational complexity of running Kafka.

Yes, the throughput is astoundingly low for such a powerful CPU but that's precisely the point. Now you know how well or how bad Postgres performs on a beefy machine. You don't always need Kafka-level scale. The takeaway is that Postgres can be a practical choice if you already have it in place.

So rather than dismissing it over the first unimpressive number you find, maybe respond to that actual matter of TFA. Where's the line where Postgres stops being "good enough"? That'll be something nice to talk about.

adamtulinius•1h ago
The problem is benchmarking on the 96 vcpu server, because at that point the author seems to miss the point of Kafka. That's just a waste of money for that performance.
blenderob•1h ago
And if the OP hadn't done that, someone here would complain, why couldn't the OP use a larger CPU and test if Postgres performs better? Really, there is no way the OP can win here, can they?

I'm glad the OP benchmarked on the 96 vCPU server. So now I know how well Postgres performs on a large CPU. Not very well. But if the OP had done their benchmark on a low CPU, I wouldn't have learned this.

uberduper•16m ago
Then the author should have gone on to discuss not just the implementation they now have to maintain, but also all the client implementations they'll have to keep re-creating for their custom solution. Or they could talk about all the industry standard tools that work with kafka and not their custom implementation.

Or they could have not mentioned kafka at all and just demonstrated their pub/sub implementation with PG. They could have not tried to make it about the buzzword resume driven engineering people vs. common sense folks such as himself.

adamtulinius•1h ago
I remember doing 900k writes/s (non-replicated) already back on kafka 0.8 with a random physical server with an old fusionio drive (says something about how long ago this was :D).

It's a fair point that if you already have a pgsql setup, and only need a few messages here and there, then pg is fine. But yeah, the 96 vcpu setup is absurd.

ljm•46m ago
I wonder if OP could have got different results if they implemented a different schema as opposed to mimicking Kafka's setup with the partitions, consumer offsets, etc.

I might well be talking out of my arse but if you're going to implement pub/sub in Postgres, it'd be worth designing around its strengths and going back to basics on event sourcing.

rudderdev•1h ago
Discussion on the same topic "Postgres over Kafka" - https://news.ycombinator.com/item?id=44445841
dangoodmanUT•1h ago
96 cores to get 240MB/s is terrible. Redpanda can do this with like one or two cores
greenavocado•1h ago
Redpanda might be good (I don't know) but I threw up a little in my mouth when I opened their website and saw "Build the Agentic Data Plane"
umanwizard•1h ago
The marketing website of every data-related startup sounds like that now. I agree it’s dumb, but you can safely ignore it.
loftsy•1h ago
I am about to start a project. I know I want an event sourced architecture. That is, the system is designed around a queue, all actors push/pull into the queue. This article gives me some pause.

Performance isn't a big deal for me. I had assumed that Kafka would give me things like decoupling, retry, dead-lettering, logging, schema validation, schema versioning, exactly once processing.

I like Postgres, and obviously I can write a queue ontop of it, but it seems like quite a lot of effort?

j45•1h ago
It might look like a lot of effort, but if you follow a tutorial/YouTube video step by step you will be surprised.

It’s mostly registering the Postgres database functions which is one time.

There are also pre-made Postgres extensions that already run the queue.

These days i would like consider m starting with Supabase self hosted which has the Postgres ready to tweak.

mkozlows•1h ago
If what you want is a queue, Kafka might be overkill for your needs. It's a great tool, but it definitely has a lot of complexity relative to a straightforward queue system.
mrkeen•1h ago
Event-sourcing != queue.

Event-sourcing is when you buy something and get a receipt, you go stick it in a shoe-box for tax time.

A queue is you get given receipts, and you look at them in the correct order before throwing each one away.

singron•1h ago
Kafka also doesn't give you all those things. E.g. there is no automatic dead-lettering, so a consumer that throws an exception will endlessly retry and block all progress on that partition. Kafka only stores bytes, so schema is up to you. Exactly-once is good, but there are some caveats (you have to use kafka transactions, which are significantly different than normal operation, and any external system may observe at-least-once semantics instead). Similar exactly-once semantics would also be trivial in an RDBMS (i.e. produce and consume in same transaction).

If you plan on retaining your topics indefinitely, schema evolution can become painful since you can't update existing records. Changing the number of partitions in a topic is also painful, and choosing the number initially is a difficult choice. You might want to build your own infrastructure for rewriting a topic and directing new writes to the new topic without duplication.

Kafka isn't really a replacement for a database or anything high-level like a ledger. It's really a replicated log, which is a low-level primitive that will take significant work to build into something else.

oulipo2•59m ago
I want to rewrite some of my setup, we're doing IoT, and I was planning on

MQTT -> Redpanda (for message logs and replay, etc) -> Postgres/Timescaledb (for data) + S3 (for archive)

(and possibly Flink/RisingWave/Arroyo somewhere in order to do some alerting/incrementally updated materialized views/ etc)

this seems "simple enough" (but I don't have any experience with Redpanda) but is indeed one more moving part compared to MQTT -> Postgres (as a queue) -> Postgres/Timescaledb + S3

Questions:

1. my "fear" would be that if I use the same Postgres for the queue and for my business database, the "message ingestion" part could block the "business" part sometimes (locks, etc)? Also perhaps when I want to update the schema of my database and not "stop" the inflow of messages, not sure if this would be easy?

2. also that since it would write messages in the queue and then delete them, there would be a lot of GC/Vacuuming to do, compared to my business database which is mostly append-only?

3. and if I split the "Postgres queue" from "Postgres database" as two different processes, of course I have "one less tech to learn", but I still have to get used to pgmq, integrate it, etc, is that really much easier than adding Redpanda?

4. I guess most Postgres queues are also "simple" and don't provide "fanout" for multiple things (eg I want to take one of my IoT message, clean it up, store it in my timescaledb, and also archive it to S3, and also run an alert detector on it, etc)

What would be the recommendation?

rileymichael•5m ago
If you need a durable log (which it sounds like you do for if you're going with event sourcing) that has those features, i'd suggest apache pulsar. you effectively get streams with message queue semantics (per-message acks, retries, dlq, etc.) from one system. running it on your own is a bit of a beast though and there's really only one hosted provider in the game (streamnative)
CuriouslyC•1h ago
If you don't need all the bells and whistles of Kafka, NATS Jetstream is usually the way to go.
misja111•1h ago
> One camp chases buzzwords .. the other common sense

How is it common sense to try to re-implement Kafka in Posgres? You probably need something similar but more simple. Then implement that! But if you really need something like Kafka, then .. use Kafka!

IMO the author is now making the same mistake as some Kafka evangelists that try to implement a database in Kafka.

enether•38m ago
I’m making the example of a pub sub system. I’m most familiar with Kafka so drew parallels to it. I didn’t actually implement everything Kafka offers - just two simple pub sub like queries.
shikhar•1h ago
Postgres is a way better fit than Kafka if you want a large number of durable streams. But a flexible OLTP database like PG is bound to require more resources and polling loops (not even long poll!) are not a great answer for following live updates.

Plug: If you need granular, durable streams in a serverless context, check out s2.dev

this_user•1h ago
The real two camps seem to be:

1) People constantly chasing the latest technology with no regard for whether it's appropriate for the situation.

2) People constantly trying to shoehorn their favourite technology into everything with no regard for whether it's appropriate for the situation.

j45•1h ago
Kafka is anything but new. It does get shoehorned too.

Postgres also has been around for a long time and a lot of people didn’t know all it can do which isn’t what we normally think about with a database.

Appropriateness is a nice way to look at it as long as it’s clear whether or not it’s about personal preferences and interpretations and being righteous towards others with them.

Customers rarely care about the backend or what it’s developed in, except maybe for developer products. It’s a great way to waste time though.

PeterCorless•1h ago
2) above is basically "Give a kid a hammer, and everything becomes a nail."

The third camp:

3) People who look at a task, then apply a tool appropriate for the task.

me551ah•1h ago
Imagine if historic humans had decided that only hammers are enough. That there is no need for a specialized tool like Scissors, Chisel, Axe, Wrench, Shovel , Sickle and that a hammer and fingers are enough.

Use the tool which is appropriate for the job, it is trivial to write code to use them with LLMs these days and these software are mature enough to rarely cause problems and tools built for a purpose will always be more performant.

justinhj•1h ago
As engineers we should try to use the right tool for the job, which means thinking about the development team's strengths and weaknesses as well as differentiating factors your product should focus on. Often we are working in the cloud and it's much easier to use a queue or a log database service than manage a bunch of sql servers and custom logic. It can be more cost effective too once you factor in the development time and operational costs.

The fact that there is no common library that implements the authors strategy is a good sign that there is not much demand for this.

dzonga•1h ago
what's not spoken about in the above article ?

ease of use. in ruby If I want to use kafka I can use karafka. or redis streams via the redis library. likewise if kafka is too complex to run there's countless alternatives which work as well - hell even 0mq with client libraries.

now with the postgres version I have to write my own stuff which I might not where it's gonna lead me.

postgres is scalable, no one doubts that. but what people forget to mention is the ecosystem around certain tools.

j45•1h ago
I’m not sure where it says you have to write your own stuff, there seem to be some of queues with libraries.

https://github.com/dhamaniasad/awesome-postgres

There is at least a Python example here.

losvedir•1h ago
Maybe I missed it in the design here, but this pseudo-Kafka Postgres implementation doesn't really handle consumer groups very well. The great thing about Kafka consumer groups is it makes it easy to spread the load over several instances running your service. They'll all connect using the same group, and different partitions will be assigned to the different instances. As you scale up or down, the partition responsibilities will be updated accordingly.

You need some sort of server-side logic to manage that, and the consumer heartbeats, and generation tracking, to make sure that only the "correct" instances can actually commit the new offsets. Distributed systems are hard, and Kafka goes through a lot of trouble to ensure that you don't fail to process a message.

mrkeen•1h ago
Right, the author's worldview is that Kafka is resume-driven development, used by people "for speed" (even though they are only pushing 500KB/s).

Of course the implementation based off that is going to miss a bit.

jasonthorsness•1h ago
Using a single DBMS for many purposes because it is so flexible and “already there” from an operations perspective is something I’ve seen over and over again. It usually goes wrong eventually with one workload/use screwing up others but maybe that’s fine and a normal part of scaling?

I think a bigger issue is the DBMS themselves getting feature after feature and becoming bloated and unfocused. Add the thing to Postgres because it is convenient! At least Postgres has a decent plugin approach. But I think more use cases might be served by standalone products than by add-ons.

quaunaut•1h ago
It's a normal part of scaling because often bringing in the new technology introduces its own ways of causing the exact same problems. Often they're difficult to integrate into automated tests so folks mock them out, leading to issues. Or a configuration difference between prod/local introduces a problem.

Your DB on the other hand is usually a well-understood part of your system, and while scaling issues like that can cause problems, they're often fairly easy to predict- just unfortunate on timing. This means that while they'll disrupt, they're usually solved quickly, which you can't always say for additional systems.

munchbunny•1h ago
My general opinion, off the cuff, from having worked at both small (hundreds of events per hour) and large (trillions of events per hour) scales for these sorts of problems:

1. Do you really need a queue? (Alternative: periodic polling of a DB)

2. What's your event volume and can it fit on one node for the foreseeable future, or even serverless compute (if not too expensive)? (Alternative: lightweight single-process web service, or several instances, on one node.)

3. If it can't fit on one node, do you really need a distributed queue? (Alternative: good ol' load balancing and REST API's, maybe with async semantics and retry semantics)

4. If you really do need a distributed queue, then you may as well use a distributed queue, such as Kafka. Even if you take on the complexity of managing a Kafka cluster, the programming and performance semantics are simpler to reason about than trying to shoehorn a distributed queue onto a SQL DB.

oulipo2•59m ago
I want to rewrite some of my setup, we're doing IoT, and I was planning on

MQTT -> Redpanda (for message logs and replay, etc) -> Postgres/Timescaledb (for data) + S3 (for archive)

(and possibly Flink/RisingWave/Arroyo somewhere in order to do some alerting/incrementally updated materialized views/ etc)

this seems "simple enough" (but I don't have any experience with Redpanda) but is indeed one more moving part compared to MQTT -> Postgres (as a queue) -> Postgres/Timescaledb + S3

Questions:

1. my "fear" would be that if I use the same Postgres for the queue and for my business database, the "message ingestion" part could block the "business" part sometimes (locks, etc)? Also perhaps when I want to update the schema of my database and not "stop" the inflow of messages, not sure if this would be easy?

2. also that since it would write messages in the queue and then delete them, there would be a lot of GC/Vacuuming to do, compared to my business database which is mostly append-only?

3. and if I split the "Postgres queue" from "Postgres database" as two different processes, of course I have "one less tech to learn", but I still have to get used to pgmq, integrate it, etc, is that really much easier than adding Redpanda?

4. I guess most Postgres queues are also "simple" and don't provide "fanout" for multiple things (eg I want to take one of my IoT message, clean it up, store it in my timescaledb, and also archive it to S3, and also run an alert detector on it, etc)

What would be the recommendation?

zozbot234•37m ago
> Also perhaps when I want to update the schema of my database and not "stop" the inflow of messages, not sure if this would be easy?

Doesn't PostgreSQL have transactional schema updates as a key feature? AIUI, you shouldn't be having any data loss as a result of such changes. It's also common to use views in order to simplify the management of such updates.

singron•24m ago
Re (2) there is a lot of vacuuming, but the table is small, and it's usually very fast and productive.

You can run into issues with scheduled queues (e.g. run this job in 5 minutes) since the tables will be bigger, you need an index, and you will create the garbage in the index at the point you are querying (jobs to run now). This is a spectacularly bad pattern for postgres at high volume.

Capricorn2481•40m ago
> If it can't fit on one node, do you really need a distributed queue? (Alternative: good ol' load balancing and REST API's, maybe with async semantics and retry semantics)

That sounds distributed to me, even if it wires different tech together to make it happen. Is there something about load balancing REST requests to different DB nodes that is less complicated than Kafka?

ayongpm•1h ago
Just dropping this here casually:

  sup {
      position: relative;
      top: -0.4em;
      line-height: 0;
      vertical-align: baseline;
  }
oulipo2•1h ago
I want to rewrite some of my setup, we're doing IoT, and I was planning on

MQTT -> Redpanda (for message logs and replay, etc) -> Postgres/Timescaledb (for data) + S3 (for archive)

(and possibly Flink/RisingWave/Arroyo somewhere in order to do some alerting/incrementally updated materialized views/ etc)

this seems "simple enough" (but I don't have any experience with Redpanda) but is indeed one more moving part compared to MQTT -> Postgres (as a queue) -> Postgres/Timescaledb + S3

Questions:

1. my "fear" would be that if I use the same Postgres for the queue and for my business database, the "message ingestion" part could block the "business" part sometimes (locks, etc)? Also perhaps when I want to update the schema of my database and not "stop" the inflow of messages, not sure if this would be easy?

2. also that since it would write messages in the queue and then delete them, there would be a lot of GC/Vacuuming to do, compared to my business database which is mostly append-only?

3. and if I split the "Postgres queue" from "Postgres database" as two different processes, of course I have "one less tech to learn", but I still have to get used to pgmq, integrate it, etc, is that really much easier than adding Redpanda?

4. I guess most Postgres queues are also "simple" and don't provide "fanout" for multiple things (eg I want to take one of my IoT message, clean it up, store it in my timescaledb, and also archive it to S3, and also run an alert detector on it, etc)

What would be the recommendation?

Copenjin•52m ago
I'm not really convinced by the comment on NOTIFY instead of the inferior (at least in theory) polling, I expect the global queue if it's really global to be only a temporary location to collect notifications before sending them and not a bottleneck. Never did any benchmark with PG or Oracle (that has a similar feature) but I expect that depending on the polling frequency and average amount of updates each solution could be the best depending on the circumstances.
sc68cal•47m ago
> Postgres doesn’t seem to have any popular libraries for pub-sub9 use cases, so I had to write my own.

Ok so instead of running Kafka, we're going to spend development cycles building our own?

enether•42m ago
It would be nice if a library like pgmq got built. Not sure what the demand for that is, but it feels like there may be a niche
8cvor6j844qw_d6•46m ago
> Should You Use Postgres?

> Most of the time - yes. You should always default to Postgres until the constraints prove you wrong.

Interesting.

I've also been by my seniors that I should go with PostgreSQL by default unless I have a good justification not to.

heyitsdaad•45m ago
If the only tool you know is a hammer, everything starts looking like a nail.
bleonard•42m ago
I am excited about the Rails defaults where background and cache and sockets are all database driven. For normal-sized projects that still need those things, it's a huge win in simplicity.
psadri•27m ago
A resource that would benefit the entire community is a set of ballpark figures for what kind of performance is "normal" given a particular hardware + data volume. I know this is a hard problem because there is so much variation across workloads, but I think even order of magnitude ballparks would be useful. For example, it could say things like:

task: msg queue

software: kafka

hardware: m7i.xlarge (vCPUs: 4 Memory: 16 GiB)

payload: 2kb / msg

possible performance: ### - #### msgs / second

etc…

So many times I've found myself wondering: is this thing behaving within an order of magnitude of a correctly setup version so that I can decide whether I should leave it alone or spend more time on it.

phendrenad2•22m ago
Since everyone is offering what they think the "camps" should be, here's another perspective. There are two camps: (A) Those who look at performance metrics ("96 cores to get 240MB/s is terrible") and assume that performance itself is enough to justify overruling any other concern (B) Those who look at all of the tradeoffs, including budget, maintenance, ease-of-use, etc.

You see this a lot in the tech world. "Why would you use Python, Python is slow" (objectively true, but does it matter for your high-value SaaS that gets 20 logins per day?)

wagwang•17m ago
Isn't listen/notify absurdly slow and lock contentious
ryandvm•9m ago
I think my only complaint about Kafka is the widespread misunderstanding that it is a suitable replacement for a work queue. I should not be having to explain to an enterprise architect the distinction between a distributed work queue and event streaming platform.
Sparkyte•4m ago
You can also use Redis as a queue if the data isn't in danger of being too important.