Ask HN: How do you do store-and-forward telemetry at the edge?

8•Aydarbek•1mo ago

I’m researching patterns for edge / gateway telemetry where the network is unreliable (remote sites, industrial, fleets, etc.) and you need offline buffering + bounded disk + replay once connectivity returns.

Questions for folks running this in production:

What do you use today? (MQTT broker + ??, Kafka/Redpanda/NATS, Redis Streams, custom log files, embedded DB, etc.)

Where do you buffer during outages: append-only log, SQLite/RocksDB, queue-on-disk, something else?

How do you handle backpressure when disk is near full? (drop policy, compression, sampling, prioritization)

What’s your failure nightmare: corruption, replay storms, duplicates, “stuck” consumer offsets, disk-full, clock skew?

What guarantees do you actually need: zero-loss vs “best effort” (and where do you draw that line)?

What metrics/alerts matter most on gateways? (queue depth, replay rate, oldest event age, fsync latency, disk usage, etc.)

I’d love to learn what works, what breaks, and what you wish existing tools did better.

Comments

Aydarbek•1mo ago

Disclosure: I built an OSS single-binary, HTTP-native durable event log aimed at this edge “store-and-forward + replay” problem. Repo: github.com/A1darbek/ayder

If anyone is open to a tiny design-partner pilot (30–60 min): run docker compose → ingest some telemetry → simulate outage (kill -9 / disconnect) → restart → verify replay + zero loss. I’ll do white-glove onboarding and turn the learnings into a short case study (can be anonymous).

deangiberson•1mo ago

Edge(FluentBit -> Logs -> cron(compress -> encrypt)) -> Cloud(S3 -> Trigger -> Lambda decrypt -> S3 -> Trigger -> Lambda decompress -> S3 > Trigger -> Lambda to CloudWatch)

I have a system that runs on edge services and captures everything to logs through FluentBit. Then there's a cron job that compresses, encrypts, and tries to send the logs to device specific S3 buckets. If the on device logs get too big they start dropping old logs first, with a heuristic for certain logs being more/less important. When devices reconnect to the cloud they start pushing logs as quickly as they can, the cloud infra backfills metrics as they arrive.

Once in S3, triggers start a series of lambdas to decrypt, decompress, analysis. Works well, easy to reason about.

The backend can easily be swapped out for something else. The harder part is the log compress/encrypt/rotate. It's important that you don't treat all logs exactly the same. Some are much more important and should be preserved over others.

Aydarbek•1mo ago

This is gold, thank you. The “easy to reason about” part is exactly what I’m going for.

A couple quick questions if you don’t mind:

Roughly what volume are you pushing per device (MB/day or events/sec), and what’s your typical offline window?

What’s your biggest failure mode today: disk-full/rotate policy, encryption key handling, replay storms on reconnect, or Lambda fanout/cost?

I’m thinking Ayder could replace the “rotate → ship” backend with a durable local log + priority queues + replay, but you’re right that the hardest part is the policy (what to drop first, how to bound disk, and how to preserve critical streams). If you’re open, I’d love to learn what heuristics you ended up with.

Ask HN: Anyone Using a Mac Studio for Local AI/LLM?

Ask HN: Opus 4.6 ignoring instructions, how to use 4.5 in Claude Code instead?

Ask HN: Ideas for small ways to make the world a better place

Ask HN: Who wants to be hired? (February 2026)

Ask HN: 10 months since the Llama-4 release: what happened to Meta AI?

Ask HN: Non AI-obsessed tech forums

Ask HN: Who is hiring? (February 2026)

LLMs are powerful, but enterprises are deterministic by nature

Tell HN: Another round of Zendesk email spam

AI Regex Scientist: A self-improving regex solver

Ask HN: Is Connecting via SSH Risky?

Ask HN: Non-profit, volunteers run org needs CRM. Is Odoo Community a good sol.?

Ask HN: Has your whole engineering team gone big into AI coding? How's it going?

Ask HN: Is there anyone here who still uses slide rules?

Kernighan on Programming

Ask HN: How does ChatGPT decide which websites to recommend?

Ask HN: Mem0 stores memories, but doesn't learn user patterns

Ask HN: Why LLM providers sell access instead of consulting services?

Ask HN: Is it just me or are most businesses insane?

Ask HN: What is the most complicated Algorithm you came up with yourself?

Ask HN: Anyone Seeing YT ads related to chats on ChatGPT?

We built a serverless GPU inference platform with predictable latency

Ask HN: Does global decoupling from the USA signal comeback of the desktop app?

Ask HN: Does a good "read it later" app exist?

Ask HN: Have you been fired because of AI?

Ask HN: Anyone have a "sovereign" solution for phone calls?

Ask HN: Cheap laptop for Linux without GUI (for writing)

GitHub Actions Have "Major Outage"

Ask HN: Has anybody moved their local community off of Facebook groups?

Ask HN: OpenClaw users, what is your token spend?