We kept running into the same pattern with logging costs:
- CloudWatch / GCP Logging / Datadog tell you which log group/index is expensive
- But not which specific log statements in your code are responsible
So the response is always:
- tweak retention / tiers
- add filters and sampling
…and only much later do you discover it was a couple of DEBUG lines in a hot path, verbose HTTP tracing, or payload dumps in loops.
At some point we wanted a simple answer to:
> “For the code that’s deployed right now, which log call sites are burning most of the budget?”
———
### What LogCost does
LogCost is a small Python library + CLI that:
- wraps the standard logging module (and optionally print)
- tracks per‑call‑site metrics:
{file, line, level, message_template, count, total_bytes}
- applies provider pricing (e.g. GCP/AWS) to estimate cost
- periodically exports aggregated stats only (no raw log payloads)
- can send Slack notifications with the top N most expensive log lines
It’s intended as a snapshot for the current deploy: run it under normal load, see which lines dominate cost, change them, redeploy, repeat.
———
### How it works (high level)
- It wraps logging.Logger._log and records a key per call site using file, line, and level.
- Message size is estimated from the formatted string length plus a configurable per‑event overhead, and accumulated per call site.
- A background thread periodically flushes aggregates to JSON on disk.
- The CLI reads that JSON and prints:
- a cost summary (based on current provider pricing), and
- a “top cost drivers” table per call site.
By design it never stores raw log payloads, only aggregates like:
{
"file": "src/memory_utils.py",
"line": 338,
"level": "DEBUG",
"message_template": "Processing step: %s",
"count": 1200000,
"bytes": 630000000,
"estimated_cost": 315.0
}
That’s partly for privacy, and partly because this is meant to complement your log platform, not replace it.
———
### Example output
A report might say:
- Provider: GCP, Currency: USD
- Total bytes: 900,000,000,000
- Estimated cost: 450.00
Top cost drivers:
- src/memory_utils.py:338 [DEBUG] Processing step: %s → $157.50
- src/api.py:92 [INFO] Request: %s → $73.20
- …
Slack notifications are just a formatted version of the same data, on a configurable interval (with an optional early “test” ping so you can verify wiring).
———
### Scope and status
- Python‑only for now (Flask/FastAPI/Django / K8s sidecar examples)
- MIT‑licensed, no backend service required
- Export format is simple JSON, so it could feed a central aggregator later if needed
Repo:
https://github.com/ubermorgenland/LogCost
I’d be interested in feedback from people who’ve debugged “mysterious” log bills:
- Do you already solve this mapping (bill → specific log sites) in a cleaner way?
- Is per‑line aggregation actually useful in your setups, or is this overkill compared to just better log group conventions?