frontpage.

Security breaks during partial failures – design notes from distributed systems

7•sandhyavinjam•1mo ago

TL;DR: Many security mechanisms fail not during attacks, but during partial outages. This post documents early design notes for a failure-aware security framework for distributed systems.

The problem

In production distributed systems, security often breaks when things are half working:

auth services degrade → retries explode

fallback paths widen access

recovery logic becomes the attack surface

Nothing is “exploited”, yet the system becomes unsafe.

Most security models assume stable components and clean failures. Real systems don’t behave that way.

Design assumptions

We assume:

correlated failures

retries are adversarial

timeouts are unsafe defaults

recovery paths matter as much as steady-state logic

We don’t assume:

global consistency

perfect identity

reliable clocks

centralized enforcement

Framework ideas (high level)

This work explores four ideas:

1. Failure-aware trust

Trust degrades under failure, not just compromise

Access narrows automatically during partial outages

2. Security invariants at runtime

Invariants are continuously enforced

Violations trigger containment, not alerts

3. Retry-safe security primitives

Idempotent, monotonic, side-effect bounded

Retries can’t escalate privilege

4. Security as observable state

Trust level, degradation, and containment are visible

If you can’t observe it, you can’t secure it

What this is not

Not zero trust marketing

Not compliance

Not a finished system

It’s an attempt to treat failure as the normal case, not an exception.

Why publish this early?

Because many real failures:

don’t fit clean research papers

happen during incidents, not attacks

are invisible outside production systems

We’re sharing design notes to get feedback before formalizing or evaluating further.

Feedback welcome

If you’ve seen security regressions during outages or retries causing unsafe behavior, I’d like to hear about it.

This is ongoing work. No claims of novelty or completeness.

Ask HN: Opus 4.6 ignoring instructions, how to use 4.5 in Claude Code instead?

Ask HN: Anyone Using a Mac Studio for Local AI/LLM?

Ask HN: Ideas for small ways to make the world a better place

Ask HN: Non AI-obsessed tech forums

Ask HN: 10 months since the Llama-4 release: what happened to Meta AI?

Ask HN: Who wants to be hired? (February 2026)

Ask HN: Who is hiring? (February 2026)

LLMs are powerful, but enterprises are deterministic by nature

AI Regex Scientist: A self-improving regex solver

Tell HN: Another round of Zendesk email spam

Ask HN: Non-profit, volunteers run org needs CRM. Is Odoo Community a good sol.?

Ask HN: Is Connecting via SSH Risky?

Ask HN: Has your whole engineering team gone big into AI coding? How's it going?

Ask HN: How does ChatGPT decide which websites to recommend?

Ask HN: Is there anyone here who still uses slide rules?

Ask HN: Why LLM providers sell access instead of consulting services?

Ask HN: Mem0 stores memories, but doesn't learn user patterns

Kernighan on Programming

Ask HN: Is it just me or are most businesses insane?

Ask HN: What is the most complicated Algorithm you came up with yourself?

Ask HN: Anyone Seeing YT ads related to chats on ChatGPT?

Ask HN: Does global decoupling from the USA signal comeback of the desktop app?

We built a serverless GPU inference platform with predictable latency

Ask HN: Does a good "read it later" app exist?

Ask HN: Have you been fired because of AI?

Ask HN: Anyone have a "sovereign" solution for phone calls?

Ask HN: Cheap laptop for Linux without GUI (for writing)

GitHub Actions Have "Major Outage"

Ask HN: How Did You Validate?

Ask HN: OpenClaw users, what is your token spend?