frontpage.

I’m trying to understand how teams actually debug production issues in systems made up of multiple services and external integrations (e.g. Stripe, Twilio, internal microservices, queues, webhooks, etc.).

In practice, when something breaks, it seems like the workflow is usually:

an alert fires (Datadog/Sentry/CloudWatch/etc.)

or a customer complains

engineers then start checking logs, traces, dashboards across multiple systems

and eventually manually reconstruct what happened across services

What I’m curious about:

How do you actually trace a single failed request or transaction across multiple services today?

What tools do you rely on most in practice (not in theory)?

Where does it usually break down — logs, tracing, instrumentation, or just missing context?

How long does it typically take to go from “something is wrong” → “we know exactly why it broke”?

What part of this is still mostly manual stitching together of information?

Trying to understand what the real pain points are in practice, especially in systems with lots of external integrations and async flows.

How much do amd64 microarchitecture levels help in Go?

Why add an agent skill to a CLI that has a context command?

Robotics Has a Stiffness Problem

Where tf is the Excalidraw markup table?

Will AI Replace Software Developers?

Dancing Mad with Sandboxes

Reverse Engineering the Apple QuickTake 200 in 2026

Flock license plate reader wrongly linked a San Diego man to a violent crime

Show HN: Luminous – fast image viewer in Rust, SAM 3 and CLIP support

MemGraphRAG: Memory-Based Multi-Agent System for Graph RAG

Raspberry Pi Home Server: Back Up Your Files Locally for Under $100 DescriçãO

Dead Button Syndrome

Boomers are hoarding most of America's wealth and power

What Are Tokens in LLMs?

Jorge Luis Borges Lecture April 9th, 1976 [video]

Criblate – Help your favorite creators speak every language

The Tinker Pledge

Xaml.io Compiles to Native Apps in the Browser

Easter egg in new Lego Batman game contains working C64 code [video]

The architecture of the internet creates risks for democracy

Show HN: Hardbar – compile-time defined i3bar

Ask HN: Job market for SDMs/Engineering Managers. Any reliable data?

Loss of Prefrontal Cortical Higher Cognition with Uncontrollable Stress (2019)

Show HN: Nightwatch, The open-source, read-only AI SRE

Mesoscale carbon fiber lattices with foam-like weight and bulk strength

Behind Every Dad Bod Is a Healthy Dad Brain

Modular morals: Mapping the organization of the moral brain (2024)

I built a Discord status for Claude Code

1982 World Championship Double Decker Bus Racing (1982) [video]

Try out my version of 1M checkboxes

Ask HN: Debugging failure in large interconnected back end systems

Comments