frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Congressional lawmakers 47% pts better at picking stocks

https://www.nber.org/papers/w34524
403•mhb•2h ago•252 comments

Why are my headphones buzzing whenever I run my game?

https://alexene.dev/2025/12/03/Why-do-my-headphones-buzz-when-i-run-my-game.html
31•pacificat0r•45m ago•19 comments

You Can't Fool the Optimizer

https://xania.org/202512/03-more-adding-integers
142•HeliumHydride•4h ago•73 comments

GSWT: Gaussian Splatting Wang Tiles

https://yunfan.zone/gswt_webpage/
25•klaussilveira•1h ago•7 comments

Anthropic acquires Bun

https://bun.com/blog/bun-joins-anthropic
2029•ryanvogel•22h ago•977 comments

How to Synthesize a House Loop

https://loopmaster.xyz/tutorials/how-to-synthesize-a-house-loop
22•stagas•5d ago•2 comments

A Look at Rust from 2012

https://purplesyringa.moe/blog/a-look-at-rust-from-2012/
91•todsacerdoti•1w ago•14 comments

Helldivers 2 devs slash install size from 154GB to 23GB

https://www.tomshardware.com/video-games/pc-gaming/helldivers-2-install-size-slashed-from-154gb-t...
193•doener•2h ago•122 comments

Mapping Every Dollar of America's $5T Healthcare System

https://healthisotherpeople.substack.com/p/an-abominable-creature
56•brandonb•1h ago•33 comments

IBM CEO says there is 'no way' spending on AI data centers will pay off

https://www.businessinsider.com/ibm-ceo-big-tech-ai-capex-data-center-spending-2025-12
728•nabla9•22h ago•818 comments

Zig quits GitHub, says Microsoft's AI obsession has ruined the service

https://www.theregister.com/2025/12/02/zig_quits_github_microsoft_ai_obsession/
661•Brajeshwar•8h ago•366 comments

Microsoft lowers AI software sales quota

https://finance.yahoo.com/news/microsoft-lowers-ai-software-sales-141531121.html
29•ramoz•1h ago•12 comments

Interview with RollerCoaster Tycoon's Creator, Chris Sawyer (2024)

https://medium.com/atari-club/interview-with-rollercoaster-tycoons-creator-chris-sawyer-684a0efb0f13
208•areoform•11h ago•38 comments

Anthropic reportedly preparing for $300B IPO

https://vechron.com/2025/12/anthropic-hires-wilson-sonsini-ipo-2026-openai-race/
81•GeorgeWoff25•6h ago•57 comments

Super fast aggregations in PostgreSQL 19

https://www.cybertec-postgresql.com/en/super-fast-aggregations-in-postgresql-19/
161•jnord•1w ago•16 comments

The "Mad Men" in 4K on HBO Max Debacle

http://fxrant.blogspot.com/2025/12/the-mad-men-in-4k-on-hbo-max-debacle.html
247•tosh•4h ago•104 comments

Researchers Find Microbe Capable of Producing Oxygen from Martian Soil

https://scienceclock.com/microbe-that-could-turn-martian-dust-into-oxygen/
72•ashishgupta2209•9h ago•30 comments

AI agents break rules under everyday pressure

https://spectrum.ieee.org/ai-agents-safety
234•pseudolus•6d ago•113 comments

Paged Out

https://pagedout.institute
510•varjag•20h ago•53 comments

The Writing Is on the Wall for Handwriting Recognition

https://newsletter.dancohen.org/archive/the-writing-is-on-the-wall-for-handwriting-recognition/
112•speckx•1w ago•59 comments

OpenAI declares 'code red' as Google catches up in AI race

https://www.theverge.com/news/836212/openai-code-red-chatgpt
758•goplayoutside•1d ago•831 comments

Mission Critical Advanced Scheduling (ALAP/ASAP) System

https://github.com/rodmena-limited/scriptplan
21•rodmena•5d ago•3 comments

Optimizations in C++ compilers: a practical journey

https://queue.acm.org/detail.cfm?id=3372264
12•fanf2•4d ago•0 comments

I designed and printed a custom nose guard to help my dog with DLE

https://snoutcover.com/billie-story
570•ragswag•3d ago•68 comments

India scraps order to pre-install state-run cyber safety app on smartphones

https://www.bbc.com/news/articles/clydg2re4d1o
111•wolpoli•5h ago•21 comments

Trying Out C++26 Executors

https://mropert.github.io/2025/11/21/trying_out_stdexec/
43•ingve•5d ago•24 comments

Are we repeating the telecoms crash with AI datacenters?

https://martinalderson.com/posts/are-we-really-repeating-the-telecoms-crash-with-ai-datacenters/
15•davedx•5h ago•1 comments

Quad9 DOH HTTP/1.1 Retirement, December 15, 2025

https://quad9.net/news/blog/doh-http-1-1-retirement/
83•pickledoyster•10h ago•29 comments

Learning music with Strudel

https://terryds.notion.site/Learning-Music-with-Strudel-2ac98431b24180deb890cc7de667ea92
535•terryds•1w ago•127 comments

Qwen3-VL can scan two-hour videos and pinpoint nearly every detail

https://the-decoder.com/qwen3-vl-can-scan-two-hour-videos-and-pinpoint-nearly-every-detail/
245•thm•3d ago•76 comments
Open in hackernews

ArkFlow: High-performance Rust stream processing engine

https://github.com/arkflow-rs/arkflow
170•klaussilveira•7mo ago

Comments

habobobo•7mo ago
Looks interesting, how does this compare to arroyo and vector.dev?
tormeh•7mo ago
Also curious about any comparison to Fluvio.
necubi•7mo ago
(I'm the creator of Arroyo)

I haven't dug deep into this project, so take this with a grain of salt.

ArkFlow is a "stateless" stream processor, like vector or benthos (now Redpanda Connect). These are great for routing data around your infrastructure while doing simple, stateless transformations on them. They tend to be easy to run and scale, and are programmed by manually constructing the graph of operations.

Arroyo (like Flink or Rising Wave) is a "stateful" stream processor, which means it supports operations like windowed aggregations, joins, and incremental SQL view maintenance. Arroyo is programmed declaratively via SQL, which is automatically planned into a dataflow (graph) representation. The tradeoff is that state is hard to manage, and these systems are much harder to operate and scale (although we've done a lot of work with Arroyo to mitigate this!).

I wrote about the difference at length here: https://www.arroyo.dev/blog/stateful-stream-processing

fer•7mo ago
Previous discussion (46 days ago): https://news.ycombinator.com/item?id=43358682
shawabawa3•7mo ago
seems like a simplified equivalent of https://vector.dev/

a major difference seems to be converting things to arrow and using SQL instead of using a DSL (vrl)

sofixa•7mo ago
> seems like a simplified equivalent of https://vector.dev/

No? Vector is for observability, to get your metrics/logs, transform them if needed, and put them in the necessary backends. Transformation is optional, and for cases like downsampling or converting formats or adding metadata.

ArkFlow gets data from stuff like databes and message queues/brokers, transforms it, and puts it back in databases and message queues/brokers. Transformation looks like a pretty central use case.

Very different scenarios. It's like saying that a Renault Kangoo is a simplified equivalent of a BTR-80 because both have wheels, engine and space for stuff.

rockwotj•7mo ago
Its a rust port of Redpanda Connect (benthos), but with less connectors

https://github.com/redpanda-data/connect

necubi•7mo ago
Vector is often used for observability data (in part because it's now owned by Datadog) but it's not limited to that. It's a general purpose stateless stream processing engine, and can be used for any kind of events.
sofixa•7mo ago
Vector started for observability data only, and that's why they got bought by Datadog.
hoherd•7mo ago
Incidentally arkflow implements VRL https://github.com/arkflow-rs/arkflow/pull/273
muffa•7mo ago
Looks very similar to redpanda-connect/benthos
coreyoconnor•7mo ago
How do you educate people on stream processing? For pipeline like systems stream processing is essential IMO - backpressure/circuit breakers/etc are critical for resilient systems. Yet I have a hard time building an engineering team that can utilize stream processing; Instead of just falling back on synchronous procedures that are easier to understand (But nearly always slower and more error prone)
serial_dev•7mo ago
It's important to consider whether it's worth it, even?

I worked on stream processing, it was fun, but I also believe it was over-engineered and brittle. The customers also didn't want real-time data, they looked at the calculated values once a week, then made decisions based on that.

Then, I joined another company that somehow had money to pay 50-100 people, and they were using CSV, sh scripts, batch processing, and all that. It solved the clients' needs, and they didn't need to maintain a complicated architecture and the code that could have been difficult to reason about otherwise.

The first company with the stream processing after I left, was bought by a competitor at fire sale price, some of the tech were relevant for them, but the stream processing stuff was immediately shut down. The acquiring company had just simple batch processing and they were printing money in comparison.

If you think it's still worth going with stream processing, give your reasoning to the team, and most reasonable developers would learn it if they really believe it's a significantly better solution for the given problem.

Not to over-simplify, but if you can't convince 5 out of 10 people to learn to make their job better, it's either that the people are not up to the task, or you are wrong that stream processing would make a difference.

senderista•7mo ago
Yeah that reminds me of a startup I worked at that did real-time analytics for digital marketing campaigns. We went to all kinds of trouble to update dashboards with 5-minute latency, and real-time updates made for impressive sales demos, but I don't think we had a single customer that actually needed to make business decisions within 24 hours of looking at the data.
serial_dev•7mo ago
We were doing TV ads analytics by detecting ads on TV channels and checking web impact (among other things). The only thing is, most of these ads are deals made weeks or months in advance, so customers checked analytics about once before a renewal… so not sure it needed to be near real time…
wging•7mo ago
https://mcfunley.com/whom-the-gods-would-destroy-they-first-...
nemothekid•7mo ago
I agree. Unless the downstream data is going to be used to feed a system to make automated decisions (ex. HFT or Ad buying), having real time analytics is usually never worth the cost. It's almost always easier and more robust to have high tail latencies for humans to consume and as computers get faster and faster that tail latency decreases.

Systems that needed complex streaming architectures in 2015 could probably be handled today with fast disk and large postgres instance (or BigQuery).

porridgeraisin•7mo ago
Many successful ads feedback loops run at 15 minute granularities as well!
wwarner•7mo ago
personally i think streaming is quite a bit simpler. but as you you point out, no one cares!
carefulfungi•7mo ago
Batch processing is just stream processing with a really big window ;-). More seriously, I find streaming windows are often the disconnect. Surprisingly often, users don't want windowed results. They want aggregation, filtering, uniqueness, ordering, and reporting over some batch. Or, they want to flexibly specify their window / partitioning / grouping for each reporting query. Modern OLAP systems are plenty fast enough to do that on the fly for most use cases - so even older streaming patterns like stream processing for real time stats in parallel with batch to an OLAP system aren't worth the complexity. Just query the DB and cache...
timeinput•7mo ago
Fundamentally I think the question is what kind of streams are you processing?

My concept of stream processing is trying to process gigabits to gigabytes a second, and turn it into something much much smaller so that it's manageable to database and analyze. To my mind for 'stream processing' calling malloc is sometimes too expensive let alone using any of the technologies called out in this tech stack.

I understand back pressure, and circuit breakers, but they have to happen at the OS / process level (for my general work) -- a metric that auto scales a microservice worker after going through prometheus + an HPA or something like that ends up with too many inefficiencies to make things practical. A few threads on a single machine just work, but end up taking ages to engineer a 'cloud native' solution.

Once I'm down to a job a second (and that job takes more than a few seconds to run to hide the framework's overhead) or less things like Airflow start to work, and not just fall flat, but at that point are these expensive frame works worth it? I'm only producing 1-1000 jobs a second.

Stream processing with these frameworks like Faust, Airflow, Kafka Streams etc, all just seem like brittle overkill once you start trying to actually deploy and use them. How do I tune the PostgreSQL database for Airflow? How do I manage my S3 life cycles to minimize cost?

A task queue + an HPA really feels more like the right kind of thing to me at that scale vs really caring too much about back pressure, etc when the data rate is 'low', but I've generally been told by colleagues to reach for more complicated stream processors that perform worse, are (IMO) harder to orchestrate, and (IMO) harder to manage and deploy.

jandrewrogers•7mo ago
There are both technical and organizational challenges created by stream processing. I like stream processing and have done a lot of work on high-performance stream engines but I am not blind to the practical issues.

Companies are organized around an operational tempo that reflects what their systems are capable of. Even if you replace one of their systems with a real-time or quasi-real-time stream processing architecture, nothing else in the organization operates with that low of a latency, including the people. It is a very heavy lift to even ask them to reorganize the way they do things.

A related issue is that stream processing systems still work poorly for some data models and often don’t scale well. Most implementations place narrow constraints on the properties of the data models and their statefulness. If you have a system sitting in the middle of your operational data model that requires logic which does not fit within those limitations then the whole exercise starts to break down. Despite its many downsides, batching generalizes much better and more easily than stream processing. This could be ameliorated with better stream processing tech (as in, core data structures, algorithms, and architecture) but there hasn’t been much progress on that front.

jll29•7mo ago
Very interesting - is WARC support on the roadmap?
dayjah•7mo ago
Do you mean this: https://en.m.wikipedia.org/wiki/WARC_(file_format) ?

Can you help me understand how this would plug into stream processing? My immediate thought is for web page interaction replays — but that seems sort of exotic a use case?

gotoeleven•7mo ago
How do the creators of this plan to make money?
beanjuiceII•7mo ago
get people onboard as open source..then flip to some other license add some pricing tiers and now those users become customers even if they don't like it. tried and true methodology
amelius•7mo ago
You can always fork it
insane_dreamer•7mo ago
Does this include broker capabilities? If not, what's a recommended broker these days (for hosting in the cloud, i.e., an EC2 instance; I know AWS has its own Mqtt Broker but it's quite pricy for high volumes).
xyst•7mo ago
So Kafka Connect and Kafka Streams but with rust?
chenquan•7mo ago
Hello, I am the founder of this project and I am very happy that a friend has shared it.

ArkFlow is positioned as a lightweight distributed stream processing engine that integrates streaming batches. With the help of datafusion's huge ecosystem and ArkFlow's scalable capabilities, we hope to build a huge data processing ecosystem to help the community simplify the threshold for data processing, because we always believe that flowing data can generate greater value.

Finally, thanks to everyone for their attention.

fnord123•7mo ago
What does lightweight mean?
undefuser•7mo ago
I would like to understand more. What are the potential use cases for this tool?
gue-ni•7mo ago
"High-performance" is just a meaningless buzzword if you don't have any benchmarks or performance comparisons to comparable software
disintegrator•7mo ago
Very similar in appearance to Redpanda Connect (Benthos) which isn’t a bad thing at all. Would be good to elaborate on how error handling is done and what message delivery guarantees it comes with.