frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Alert-Driven Monitoring

https://simpleobservability.com/docs/alert-driven-monitoring
32•khazit•2h ago

Comments

stingraycharles•1h ago
Good metrics and alerting systems are designed, from the top down. Not bottom up.

Lots of metrics are typically available, but almost all of them are noise.

Start with the business: what is important to the business ? What kind of failures are existential threats ?

Then work your way down and design your metrics and alerts, instead of just throwing stuff at the wall.

I’ve had to push back so many times with teams whose manager at one point said “we need better monitoring / alerting” and they interpreted that to mean more metrics / alerts.

This is rarely the case.

I personally am really fond of just using a few alerts. The important thing to know that something went wrong. Not necessarily where / why / how something went wrong.

And yes, inertia is real, and false / invaluable alerts need to be killed immediately, without remorse. They are SRE’s cancer.

b112•1h ago
If you receive too many emails, alerts, warnings, and so on, you are only training yourself and the team to ignore them.

As you say, few is better. And a well chosen few.

alansaber•1h ago
Very few alerts, implemented around core business logic, incorporating as many edge cases as possible. This is the way.
dandellion•1h ago
I agree that alerts should just be the vital ones. But in terms of monitoring and metrics, more is generally better. I joined a company where something broke and the only way to figure out what was wrong was to ssh and hop through several services and it was a massive waste of time for something that just having set up basic otel would be trivial to narrow down.
analogpixel•1h ago
> Alerts should be actionable. If no action can or should be taken, then the alert is not needed.

Also, the best alerts come from looking at actual failures you had and not trying to make up "good alerts" from thin air. After you have an outage, figure out what alerts would have caught it, and implement those.

esafak•4m ago
Well... I know something is going to happen if disk space runs out; I don't need to experience it first.
Yokohiii•29m ago
In my opinion the best method to reduce alerts is to work hard to get rid of the underlying problems or turn them into a non-problems. If you do a good job most errors are 3rd party driven, that can be indeed hard to solve relative to company politics. But at that point you can always tell your boss how it can be solved and that you wont go on pager duty for stuff that is out of your control.
manoDev•27m ago
For prior art on how to define alert conditions, see:

https://en.wikipedia.org/wiki/Nelson_rules

https://en.wikipedia.org/wiki/Western_Electric_rules

https://en.wikipedia.org/wiki/Westgard_rules

esafak•6m ago
Now we use purely statistical measures, which requires a probabilistic model. The name of the game is calibration.
jbmsf•26m ago
I have some thoughts here.

I work for a startup; we have what I think is a fairly typical setup: metrics ingested from a variety of sources, fed into industry-standard metrics/dashboard solutions, triggering escalations to humans. It's fine and I'm happy we have it, but...

The highest value source of alerting right now is one of our growth marketers who pays close attention to our CRM and product analytics tool and notices when key product funnels are underperforming.

Our next highest value signals are a handful of ad hoc alerting channels, mostly in Slack, either directly from a partner telling us that something suspicious happened on their side (think: fraud) or from in-product instrumentation sent to a channel for non-engineering visibility. Members of our business/product/operations team pay attention in these places and make decisions based on their business context.

After that, our support team is increasingly able to filter customer issues and differentiate between bugs, missing features, etc.

I know someone is going to argue that these are all a sign that we haven't instrumented the right things. Fair, but also misses the point. The decision makers in these flows don't (and won't) live in traditional alerting systems and wouldn't have helped us understand breakages without these other, ad hoc processes.

My theory is that it's relatively easy to offer a technical product that moves alerts around or that manages escalation paths. It's quite hard to design a product that surfaces detail to a non-technical export and that makes it easy to build systematic rules.

Mercedes-Benz commits to bringing back physical buttons

https://www.drive.com.au/news/mercedes-benz-commits-to-bringing-back-phycial-buttons/
162•teleforce•1h ago•94 comments

Security Through Obscurity Is Not Bad

https://mobeigi.com/blog/security/security-through-obscurity-is-not-bad/
26•mobeigi•1h ago•29 comments

Alert-Driven Monitoring

https://simpleobservability.com/docs/alert-driven-monitoring
32•khazit•2h ago•10 comments

For thirty years I programmed with Phish on, every day

https://christophermeiklejohn.com/ai/personal/phish/flow/agents/2026/05/03/rift.html
6•azhenley•26m ago•1 comments

Automating Hermitage to see how transactions differ in MySQL and MariaDB

https://theconsensus.dev/p/2026/05/02/automating-hermitage.html
15•zdw•20h ago•2 comments

Show HN: Apple's Sharp Running in the Browser via ONNX Runtime Web

https://github.com/bring-shrubbery/ml-sharp-web
115•bring-shrubbery•7h ago•19 comments

What Is Z-Angle Memory and Why Is Intel Developing It?

https://www.hpcwire.com/2026/02/05/what-is-z-angle-memory-and-why-is-intel-developing-it/
19•rbanffy•2d ago•5 comments

Embedded Rust or C Firmware? Lessons from an Industrial Microcontroller Use Case

https://arxiv.org/abs/2604.25679
111•mrtz•2d ago•89 comments

I rebuilt my blog's cache. Bots are the audience now

https://hoeijmakers.net/thirty-years-of-caching-sorted-in-an-afternoon/
15•robhoeijmakers•2h ago•18 comments

Porsche will contest Laguna Seca in historic colors of the Apple Computer livery

https://newsroom.porsche.com/en_US/2026/motorsport/porsche-will-contest-laguna-seca-in-historic-c...
19•Amorymeltzer•2h ago•5 comments

Group averages obscure how an individual's brain controls behavior: study

https://med.stanford.edu/news/all-news/2026/04/brain-scans-individual-versus-group.html
81•hhs•2d ago•21 comments

A couple million lines of Haskell: Production engineering at Mercury

https://blog.haskell.org/a-couple-million-lines-of-haskell/
348•unignorant•16h ago•166 comments

Business Owners Are Worst Clients

https://zencapital.substack.com/p/business-owners-are-worst-clients
31•zenincognito•1h ago•33 comments

This Month in Ladybird – April 2026

https://ladybird.org/newsletter/2026-04-30/
443•richardboegli•19h ago•125 comments

Six Years Perfecting Maps on WatchOS

https://www.david-smith.org/blog/2026/04/29/maps-on-watchos/
392•valzevul•19h ago•98 comments

Dav2d

https://code.videolan.org/videolan/dav2d
556•dabinat•22h ago•156 comments

Haskell: Debugging

https://wiki.haskell.org/Debugging
17•tosh•2d ago•1 comments

Do_not_track

https://donottrack.sh/
438•RubyGuy•22h ago•136 comments

Breaking Up with WordPress After Two Decades

https://yusufaytas.com/breaking-up-with-wordpress-after-two-decades
36•owenbuilds•1h ago•14 comments

Windows quality update: Progress we've made since March

https://blogs.windows.com/windows-insider/2026/05/01/windows-quality-update-progress-weve-made-si...
116•jovial_cavalier•1d ago•346 comments

Utah to hold websites liable for users who mask their location with VPNs

https://www.tomshardware.com/software/vpn/utah-becomes-first-us-state-to-target-vpn-use-with-age-...
142•GavinAnderegg•1h ago•126 comments

Coffee doesn't just wake you up–a biological pathway illuminates health effects

https://sciencex.com/news/2026-04-coffee-doesnt-key-biological-pathway.html
11•pseudolus•5h ago•0 comments

Neanderthals ran 'fat factories' 125,000 years ago (2025)

https://www.universiteitleiden.nl/en/news/2025/07/neanderthals-ran-fat-factories-125000-years-ago
253•andsoitis•19h ago•138 comments

Care homes and hotels in Japan shut as expansion strategy unravels

https://www.newsonjapan.com/article/149075.php
88•mikhael•14h ago•32 comments

Utilyze measures how efficiently your GPU is doing useful work

https://github.com/systalyze/utilyze
35•nateb2022•2d ago•9 comments

A Desktop Made for One

https://isene.org/2026/05/Audience-of-One.html
11•xngbuilds•50m ago•1 comments

Inventions for battery reuse and recycling increase seven-fold in last decade

https://www.epo.org/en/news-events/news/inventions-battery-reuse-and-recycling-increase-more-seve...
225•JeanKage•3d ago•27 comments

Specsmaxxing – On overcoming AI psychosis, and why I write specs in YAML

https://acai.sh/blog/specsmaxxing
216•brendanmc6•9h ago•231 comments

VS Code inserting 'Co-Authored-by Copilot' into commits regardless of usage

https://github.com/microsoft/vscode/pull/310226
1380•indrora•20h ago•747 comments

Unverified Evaluations in Dusk's PLONK

https://osec.io/blog/2026-04-30-unverified-evaluations-dusk-plonk/
30•deut-erium•2d ago•4 comments