frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

UnAutomating the Economy: More Labor but at What Cost?

https://www.greshm.org/blog/unautomating-the-economy/
1•Suncho•3m ago•0 comments

Show HN: Gettorr – Stream magnet links in the browser via WebRTC (no install)

https://gettorr.com/
1•BenaouidateMed•4m ago•0 comments

Statin drugs safer than previously thought

https://www.semafor.com/article/02/06/2026/statin-drugs-safer-than-previously-thought
1•stareatgoats•6m ago•0 comments

Handy when you just want to distract yourself for a moment

https://d6.h5go.life/
1•TrendSpotterPro•8m ago•0 comments

More States Are Taking Aim at a Controversial Early Reading Method

https://www.edweek.org/teaching-learning/more-states-are-taking-aim-at-a-controversial-early-read...
1•lelanthran•9m ago•0 comments

AI will not save developer productivity

https://www.infoworld.com/article/4125409/ai-will-not-save-developer-productivity.html
1•indentit•14m ago•0 comments

How I do and don't use agents

https://twitter.com/jessfraz/status/2019975917863661760
1•tosh•20m ago•0 comments

BTDUex Safe? The Back End Withdrawal Anomalies

1•aoijfoqfw•23m ago•0 comments

Show HN: Compile-Time Vibe Coding

https://github.com/Michael-JB/vibecode
3•michaelchicory•25m ago•1 comments

Show HN: Ensemble – macOS App to Manage Claude Code Skills, MCPs, and Claude.md

https://github.com/O0000-code/Ensemble
1•IO0oI•29m ago•1 comments

PR to support XMPP channels in OpenClaw

https://github.com/openclaw/openclaw/pull/9741
1•mickael•29m ago•0 comments

Twenty: A Modern Alternative to Salesforce

https://github.com/twentyhq/twenty
1•tosh•31m ago•0 comments

Raspberry Pi: More memory-driven price rises

https://www.raspberrypi.com/news/more-memory-driven-price-rises/
1•calcifer•36m ago•0 comments

Level Up Your Gaming

https://d4.h5go.life/
1•LinkLens•40m ago•1 comments

Di.day is a movement to encourage people to ditch Big Tech

https://itsfoss.com/news/di-day-celebration/
3•MilnerRoute•42m ago•0 comments

Show HN: AI generated personal affirmations playing when your phone is locked

https://MyAffirmations.Guru
4•alaserm•43m ago•3 comments

Show HN: GTM MCP Server- Let AI Manage Your Google Tag Manager Containers

https://github.com/paolobietolini/gtm-mcp-server
1•paolobietolini•44m ago•0 comments

Launch of X (Twitter) API Pay-per-Use Pricing

https://devcommunity.x.com/t/announcing-the-launch-of-x-api-pay-per-use-pricing/256476
1•thinkingemote•44m ago•0 comments

Facebook seemingly randomly bans tons of users

https://old.reddit.com/r/facebookdisabledme/
1•dirteater_•45m ago•1 comments

Global Bird Count Event

https://www.birdcount.org/
1•downboots•46m ago•0 comments

What Is Ruliology?

https://writings.stephenwolfram.com/2026/01/what-is-ruliology/
2•soheilpro•48m ago•0 comments

Jon Stewart – One of My Favorite People – What Now? with Trevor Noah Podcast [video]

https://www.youtube.com/watch?v=44uC12g9ZVk
2•consumer451•50m ago•0 comments

P2P crypto exchange development company

1•sonniya•1h ago•0 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
2•jesperordrup•1h ago•0 comments

Write for Your Readers Even If They Are Agents

https://commonsware.com/blog/2026/02/06/write-for-your-readers-even-if-they-are-agents.html
1•ingve•1h ago•0 comments

Knowledge-Creating LLMs

https://tecunningham.github.io/posts/2026-01-29-knowledge-creating-llms.html
1•salkahfi•1h ago•0 comments

Maple Mono: Smooth your coding flow

https://font.subf.dev/en/
1•signa11•1h ago•0 comments

Sid Meier's System for Real-Time Music Composition and Synthesis

https://patents.google.com/patent/US5496962A/en
1•GaryBluto•1h ago•1 comments

Show HN: Slop News – HN front page now, but it's all slop

https://dosaygo-studio.github.io/hn-front-page-2035/slop-news
7•keepamovin•1h ago•1 comments

Show HN: Empusa – Visual debugger to catch and resume AI agent retry loops

https://github.com/justin55afdfdsf5ds45f4ds5f45ds4/EmpusaAI
1•justinlord•1h ago•0 comments
Open in hackernews

The Netflix Simian Army (2011)

https://netflixtechblog.com/the-netflix-simian-army-16e57fbab116
21•rognjen•1mo ago

Comments

sovietmudkipz•1mo ago
I always thought the companies I worked for would implement chaos testing shortly after this talk/blog released. However; only last year did we do anything even approaching chaos testing. I think this goes to show that the adage “the future is already here just unevenly distributed” carries some truth in some contexts!

I think the companies I worked for were prioritizing working on no issue deployments (built from a series of documented and undocumented manual processes!) rather than making services resilient through chaos testing. As a younger dev this priority struck me as heresy (come on guys, follow the herd!); as a more mature dev I understand time & effort are scarce resources and the daily toil tax needs to be paid to make forward progress… it’s tough living in a non-ideal world!

oooyay•1mo ago
Chaos testing rarely uncovers anything significant or actionable beyond things you can suss out yourself with a thorough review but has the added potential for customer harm if you don't have all your ducks in a row. It also neatly requires, as a prerequisite, for you to have your ducks in a row.

I think that's why most companies don't do it. A lot of tedium and the main benefit was actually getting your ducks in a row.

closeparen•1mo ago
I think it is more of a social technology for keeping your ducks in a row. Developers won’t be able to gamble that something “never happens” if we induce it weekly.
bpt3•1mo ago
It's a great way of thinking about resiliency and fault tolerance, but it's also definitely on the very mature end of the systems engineering spectrum.

If you know things will break when you start making non-deterministic configuration changes, you aren't ready for chaos engineering. Most companies never get out of this state.

closeparen•1mo ago
Having a few fault injection scenarios is baby steps. Next would be Jepsen-style testing, and most mature would be formal verification.
GauntletWizard•1mo ago
Much of the value from Chaos testing can be gotten much more simply with good rolling CI. Many of the problems that Chaos engineering solved are now considered table stakes, directly implemented into our frameworks and tested well by saidsame CI.

A significant problem with early 'Web Scale' deployments was out of date or stale configuration values. You would specify that your application connects to backend1.example.com for payments and backend2.example.com for search. A common bug in early libraries was that the connection was established once at startup, and then never again. When the backend1 service was long lived, this just worked for months or years at a time - TCP is very reliable, especially if you have sane values on keepalives and retries. Chaos Monkey helped find this class of bug. A more advanced but quite similar class of bug: You configured a DNS name, which was evaluated once at startup, and again didn't update, Your server for backend1 had a stable address for years at a time, but suddenly you needed to failover to your backup or move it to new hardware. At the time of chaos monkey, I had people fight me on this - They believed that doing a DNS lookup every five minutes for your important backends was unacceptable overhead.

The other part is - Modern deployment strategies make these old problems untenable to begin with. If you're deploying on kubernetes, you don't have an option here - Your pods are getting rebuilt with new IP addresses regularly. If you're connecting to a service IP, then that IP is explicitly a LB - It is defined as stable. These concepts are not complex, but they are edge boundaries, and we have better and more explicit contracts because we've realized the need and you "just do" deploy this way now.

Those are just Chaos Monkey problems, though - Latency Monkey is huge, but solves a much less common problem. Conformity Monkey is mostly solved by compliance tools; You don't build, you buy it. Doctor Monkey is just healthchecks - K8s (and other deployment frameworks) has those built in.

In short, Chaos Monkey isn't necessary because we've injected the chaos and learned to control most of what that was doing, and people have adopted the other tools - They're just not standalone, they're built in.

mbb70•1mo ago
Wish we lived in the universe where the term 'monkey' won over 'agent'. Would have given everything a cool Planet of the Apes feel.

I remember this getting a lot of buzz at the time, but few orgs are at the level of sophistication to implement chaos testing effectively.

Companies all want a robust DR strategy, but most outages are self-inflicted and time spent on DR would be better spent improving DX, testing, deployment and rollback.

setr•1mo ago
I mean daemon was the previous winner before agent, and that had a solid mystical-djinni element to it. Monkey would have naturally gone the way of the daemon, as software development “matures” and undergoes corporate sterilization
belter•1mo ago
I suspect Netflix built the Simian Army largely out of necessity, since at the time, AWS did not offer much native ways to deliberately inject failure or validate resilience or compliance at scale.

Today, many of these ideas map directly to some of their managed services like AWS Fault Injection Simulator, AWS Resilience Hub, or AWS Config, AWS Inspector, Security Hub, GuardDuty, and IAM Access Analyzer for example.

There is also a big third-party ecosystem (Gremlin, LitmusChaos, Chaos Mesh, Steadybit, etc...) offering similar capabilities, often with better multi-cloud or CI/CD integration.

Some of these Netflix tools, I dont think they get much maintenance now, but as free options, they can be cheaper to run than AWS managed services or Marketplace offerings...

htrp•1mo ago
a lot of the third party tools follow the time honored tradition of duplicating an internal service at a leading engineering org (FAANG) and then making it available as a SAAS product
addled•1mo ago
Anyone have experience chaos testing Postgres?

I was reading this the other day looking for ideas on how to test query retries in our app. I suppose we could go at it from the network side by introducing latency and such.

However, it’d be great if there also was a proxy or something that could inject pg error codes.

Rafert•1mo ago
I know of https://github.com/Shopify/toxiproxy but it is not protocol aware, you might be able to add it yourself.
AtlasBarfed•1mo ago
Is pg partition tolerant in CAP?
voidUpdate•1mo ago
I recently made a "garbage monkey" script for work which will spam random buttons on the UI to make sure that animations and stuff work correctly even if the user is somehow pressing things faster than a user could. It has been pretty useful in uncovering some problems, though it only works with "buttons", and wont do touchscreen events etc
bob1029•1mo ago
Have you looked into this?

https://developer.chrome.com/docs/chromedriver/mobile-emulat...

adrianco•1mo ago
I distilled these ideas over subsequent years into several talks on “Failing Over without Falling Over”. Investing anything in resilience without testing that it actually works is a waste of resources. Thats the underlying lesson. https://github.com/adrianco/slides/blob/master/FailingWithou...
malwrar•1mo ago
I’ve been toying around with the idea of using chaos engineering as a method of training new on-call folks. My first ever on-call shift was during a major product launch for a FAANG and I more or less just hoped that’d I’d be able to handle whatever broke. I got lucky and it turned out that I can usually fix things when they break, but have also found that jumping people in like that isn’t exactly consistent. I wonder if controlled, limited outages (maybe even as a surprise) would be a less hellish way of doing it. could be a good way to build instinct under pressure without risking too much.
AtlasBarfed•1mo ago
This sounds perilously close to hazing
malwrar•1mo ago
Can you expand on that?

Currently we do shadow shifts for a month or two first, but still eventually drop people into the deep end with whatever experience production gifts them in that time. That experience is almost certainly going to be a subset of the types of issues we see in a year, and the quantity isn’t predictable. Even if the shadowee drives the recovery, the shadow is still available for support & assurance. I don’t otherwise have a good solution for getting folks familiar with actually solving real-world problems with our systems, by themselves, under severe time pressure, and I was thinking controlled chaos could help bridge the gap.

AtlasBarfed•1mo ago
You are making things harder for newer hires than the environment you came into. It is a sink over swim strategy that introduces stress without any apparent compensation in training. It creates new bases for evaluation you were not subject to.

Hazing us a cycle of abuse that expresses in a magnification of the abuse inflicted in the hazing than was suffered in the previous cycle.

Maybe you are optimizing your personnel.

malwrar•4w ago
Thanks for this perspective, I think I’ll reconsider this plan (to be clear, haven’t done it) and try to think up some alternative training strategy that doesn’t involve live issues.
closeparen•1mo ago
Killing instances of load-balanced stateless services is not that interesting anymore in the context of a mature service mesh. What is interesting is injecting failures or latency on specific edges of the call graph to ensure that “fail open” dependencies really are. This is accomplished with context propagation, baggage, middleware, and L7 proxies rather than killing anything at the VM/container level. Even iptables rules turned out to not be a very good approach since most destinations would have many, constantly cycling IPs and ports.

In the stateful world, chaos testing is useful, but you really want to be treating every possible combination of failures at every possible application state, theoretically with something like TLA or experimentally with something like Antithesis. The scenarios that you can enumerate and configure manually are just scratching the surface.

iwontberude•1mo ago
At Netflix when this article was written, Cloud Engineering accomplishing failure injection with circuit breakers which essentially were L7 proxies. Chaos engineering was more than killing instances. There was a whole simian army after all. They would inject latency, error codes, etc and simulate tiers of the application failing. It’s not nearly as unsophisticated as your making it seem.
AtlasBarfed•1mo ago
It strikes me reading a linked 2010 article about how they talk about aws being worse networking, less reliable instances, and higher latency.

It's been 15 years. Aws still sucks compared to your own hardware on so many levels, and total Roi is dropping.

MichaelNolan•1mo ago
Chaos testing is such an interesting idea. At my last job we didn’t have access to any of these tools. So I made a poor man’s chaos testing library for Java and spring services. At the application level we would inject random faults into method calls.

It doesn’t test nearly as much as the real tools can, but it did find some bugs in our workflow engine where it wouldn’t properly resume failed tasks.

ninju•1mo ago
> but it did find some bugs in our workflow engine where it wouldn’t properly resume failed tasks.

So ad-hoc, home-grown, chaos testing is still a useful exercise!

MichaelNolan•1mo ago
No one has used this code in years, and its kind of half baked, but here it is https://github.com/Michael-Nolan/Public/tree/main/SimpleChao...