Keeping secrets out of logs (2024)

https://allan.reyes.sh/posts/keeping-secrets-out-of-logs/

251•xk3•5mo ago

Comments

mlhpdx•5mo ago

Great read.

> And while people will write the code that accidentally introduces sensitive data into logs, they’re also the ones that will report, respond, and fix them.

This should probably be the first point and not the last.

lucideer•5mo ago

This is an excellent excellent resource regardless of whether you agree/disagree with the author's conclusions, simply by virtue of being a great list of broken down problems, well described, & accompanied by good technical descriptions of proposed fixes (again independent of your opinion on those fixes).

Just an excellent example of how to approach & elucidate a problem domain.

CraigJPerry•5mo ago

With java theres a GuardedString implementation https://docs.oracle.com/en/middleware/idm/identity-governanc...

antonvs•5mo ago

Those are primarily for in-memory security. They apparently uses a "known default key" in its serialized form. At least when it comes to logging, that's more like obfuscation than security.

otterley•5mo ago

According to its documentation, you can’t directly log a GuardedString because it doesn’t implement the toString() method. You have to pass it an accessor instance through its access() method to extract the plaintext.

otterley•5mo ago

It doesn't look like it's a part of the standard API though. That looks like it's some sort of framework API for Oracle Fusion. It's also not open source.

b0gb•5mo ago

eazy

secrets.forEach(secret => logMessage = logMessage.replaceAll(secret, '**'))

mberning•5mo ago

That presumes you know all secrets ahead of time. A risk in and of itself. But from a practical point of view you will never know all secrets, because they are generated constantly in real time.

pluto_modadic•5mo ago

I've known users to type passwords in the username field. you implicitly do NOT know all secrets (e.g., a password is hashed).

secrets can also churn, so even if you did your example would require something besides an in-memory array.

and, the final point: what if your secret masking code fails on an exception, too ;)

NeutralForest•5mo ago

Just excellent. Lots of (common from my experience) examples, potential fixes and self-contained explanations. Nice.

aduwah•5mo ago

Great article! I will definitely reference it in my upcoming discussions. I had some hard time defending having an EU based o11y stack for our EU based infra. I found it hard to articulate on the spot that there are myriads of places where sensitive/personal data can get in the logs and cause leaks, or make GDPR angry.

jazzyjackson•5mo ago

Why do I have to know how many letters are in observability? is this some kind of in group signaling?

aduwah•5mo ago

Just wait until you see our secret handshake

arccy•5mo ago

you don't need to know, just consider it a new word that's a synonym, and happens to kind of look like o-<tall spiky letters like i and l>-y

halffullbrain•5mo ago

I read the piece expecting precisely that; How to keep PII out of logs, which require a lot of adamant snipers with a lot of lead bullets. Passwords: Handled by IAM services. Tokens: Application frameworks which not to divulge. But Brian's phone number stashed in an innocuous case metadata field. Gaah!

Some of the same techniques apply, like using domain primitives, but some PII (like names and addresses) is eventually templated into flatter (text) values, and processed by other layers which do not recognize 'brands' as suggested.

Data scanners: Regexes are fine for SSNs and the like, but to be really effective, one would need a full-on Named Entity Recognition in the pipeline, perhaps just as a canary. (Wait, that might actually work?)

Dataflow analysis and control applies in a BIG way, e.g. separating an audit log for forensics, where you really NEED the PII, from a technical log which the SREs can dig into without being suspected of stealing sensitive info. Start there.

munchler•5mo ago

I certainly agree with the desire to keep secrets out of logs, but isn’t the entire log itself also considered to be secret? Even a perfectly sanitized log probably contains lots of data about your production environment that you wouldn’t want to share with adversaries (e.g. peak usage hours).

advisedwang•5mo ago

Logs probably need to be exposed to support teams, oncalls for sister-teams (if you are a large org), all your devs etc. That is many MANY more people than need access to secrets. Secrets in logs therefore puts you are much wider risk of internal threats and makes it MUCH easier for an attacker who phishes someone to pivot to higher credentials.

Also if you have audit records, you want accessing a secret to be logged separately from accessing logs.

dmurray•5mo ago

Yes, but think defense in depth. Your team member who leaves for a competitor could tell them your peak usage hours, but he shouldn't be able to tell them all your customers' passwords.

jauer•5mo ago

There’s secret from an adversary and then there’s internal compartmentalization.

You could have 100s of people who have a business need to look at syslog from a router, but approximately nobody who should have access to login creds of administrative users and maybe 10s of people with access to automation role account creds.

pluto_modadic•5mo ago

PII is different from proprietary info. customer's email? PII. mask it. your code's stack trace? proprietary info. employees can see that to troubleshoot.

dataflow•5mo ago

As far as run-time exposure prevention goes, I feel like in-band signaling might work better than out-of-band for this problem. Along the lines of the taint checking technique mentioned, you can insert some magic string (say, some recognizable prefix + a randomly generated UUID) into your sensitive strings at the source, that you then strip out at the sink. (Or wrap your secrets in a pair of such magic strings.) Then block or mask any strings containing that magic string from making it into any persisted data, including logs. And it will be easy to identify the points of exposure, since they will be wherever you call your respective seal()/unseal() function or such.

HelloNurse•5mo ago

Can you elaborate on the situations and reasons that would make this approach appropriate?

At first sight it seems a complicated and inferior approximation of techniques from the article: not automatically single use, not statically checked, somewhat error prone for proper secret usage, not really preventing well-intentioned idiots from accidentally extracting, "laundering" and leaking the secret, removing secrets from logs at a dangerously late stage with some chance of leaks.

rolandog•5mo ago

Also may need to handle special cases where entry is truncated so you get incomplete opening/closing pairs (i.e. quirks mode for log parsing?)

dataflow•5mo ago

I mean, I very much disagree on this being "complicated and inferior". But none of these techniques are substitutes for each other. Like the article said, there are a lot of lead bullets, no silver ones. You absolutely should deploy whatever techniques you can. All I was saying was that I think this one, on its own, would handle a larger set of cases than some of the other (run-time) ones listed.

But one big reason I suggested this technique is that you want the object to keep protection on the String while having it look and feel as much like the underlying contents as possible, so that the final unsealing can occur as little (& as late) as possible. The more warts you put around your secret, the less usable it will be. You thought you made the Secret "single-use", but what you really did was to just encourage someone to keep the unsealed String around and reuse that, because you gave them a Secret type and they needed a String type. And now you have no way to detect if they accidentally log it, or throw an exception with some local variable containing it. Whereas this technique would still immediately catch any leakage in those cases.

Again: this technique is a supplement, not a substitute. You absolutely should still add static checks where you can. Have your Secret type too. The point here is that your Secret.unseal() method can still return a String that is useful for callers while offering you some protection on the value, instead of instantly going from protected->unprotected and exposing the contents with zero protection.

HelloNurse•5mo ago

  >  You thought you made the Secret "single-use", but what you really did was to just encourage someone to keep the unsealed String around and reuse that, because you gave them a Secret type and they needed a String type

In a reasonable development process, one that doesn't abet dangerous mistake-makers, forcing clients to invoke a special method of the Secret class (single use or not) to "unseal" it gives a valuable auditing tool for free: every dangerous situation can be found by a simple search for usages of this method, while a marked string value that contains a secret is undistinguishable from a normal string and can be misused.

debarshri•5mo ago

I think secrets ending up in the log is an issue but who should have access to view logs of what log should also be an important that is often ignored. This is also scope down the surface area of leakage.

stretchwithme•5mo ago

A user's password is something I shouldn't see in a log, even if I'm in control of what gets logged and frequently access them to do my job.

Even if I trust me.

Audits happen. I assume other people will eventually see this bad practice.

debarshri•5mo ago

Audits and bad practice are second-order things.

My argument is that generally everyone has access to all the logs. If you restrict the access and add guardrails around it, you can minimize the surface area and also ways it can be leaked out.

If you take a defensive approach towards, you have to assume that some secret is getting logged somewhere. The goal then becomes a way to reduce the surface area or blast radius of this possible leakage.

jaspervdj•5mo ago

Limiting access helps, but if you are storing the logs on a 3rd party (e.g. DataDog, CloudWatch), you will still need to assume it can leak through that 3rd party and start rotating.

blkhawk•5mo ago

oh god - I had that come up in an issue at work just about a month ago. A development system used really simple usernames and passwords since it was just for testing but all the lines with one of those got gobbled up because they had "secrets" in them.

I have very strong opinions on this issue that boils down to. _why are you logging everything you lazy asses_ and _adding all the secrets into another tool just to scan for them in logs just adds another point for them to leak_...

Especially since the ability of lines getting censored even when the secrets were just part of words showed that probably no hashing was involved.

But its a security tool so it stays. I kinda feel like Cassandra but I think I can already predict a major security issue with it or others with the same functionality in the future. its like some goddamn blind spot that software that is to prevent X cannot be vulnerable to X but somehow often is vulnerable because prevention of X and not being vulnerable to X are two separate things somehow.

pavel_lishin•5mo ago

Why is logging everything considered lazy?

tonymet•5mo ago

for one it's extremely costly, in vcpu , storage , transfer rates. and if you're paying a third-party logger , multiply each by 10x

shakna•5mo ago

If you're in a testing environment, where your SIT and UAT are looking to break stuff though, don't you usually want to be able to look to a log of everything?

tonymet•5mo ago

I could see a couple reasons against. For one, it's expensive to seralize/encode your objects into the logger , even if you reduce logging level on prod.

Secondly, you can't represent the heap & stack well as strings. Concurrent threads and object trees are better debugged with a debugger (e.g. gdb).

pavel_lishin•5mo ago

That makes it foolish, but I'm not sure if it's lazy.

tonymet•5mo ago

the lazy part comes from the fact that it's easier to be foolish in this case than to be selective about what gets logged. So lazy & foolish.

arccy•5mo ago

it's not lazy, it's a good use of time, you don't go back and forth when you realize you forgot to log something important.

petesergeant•5mo ago

Axiom wants $60/m if you send them a terabyte of logs, which is basically nothing compared to the cost of developers trying to debug issues without detailed logs.

tonymet•5mo ago

I think you're being naive on the costs but that's just me. That's the intro price, plus you have transfer fees , vcpu .

I've never used axiom, but all the logging platforms I've used like splunk, datadog, loggly are a major op-ex line item.

And telling your developers their time is priceless means they will produce the lowest quality product.

tonymet•5mo ago

not to mention the performance impact of synchronous logging. Write a trivial benchmark and add logging and you will see cost per operation 1000x

drjasonharrison•5mo ago

First "everything":

Logging "everything" could include stack traces and parameter values at every function call. Take the information you can get from a debugger and imagine you log all of it. Would that be necessary to determine why a defect is triggered?

Second, "lazy":

Logging has many useful aspects, but it is also only a step or two above adding print statements to the code, which again leads to the "lazy." If you have the inputs, you should be able to reproduce the execution. The exceptions include "poorly" modularized code, side effects, etc.

Alternatives.

I've found it helpful for complex failures to make sure that I include information about the system. For example, the program couldn't allocate memory: Was it continuous chunks of memory or a memory leak? How much free memory is there, versus the shapes of the free memory (Linux memory slabs)? What can I do to reset this state? (reboot was the only option)

Finally, a quote a colleague shared with me when I once expressed my love of logging. In the context of testing online games:

"Developers seem drawn to Event Recorders like moths to a flame. Recording all game/ mouse/ network/ whatever events while playing the game and playing them back is a bad idea. The problem is that you have an entire team modifying your game's logic and the meaning or structure of internal events on a day-to-day basis. For The Sims Online and other projects, we found that you could only reliably replay an event recording on the same build on which it was recorded. However, the keystone requirement for a testing system is regression: the ability to run the same test across differing builds. Internal Event Recorders just don't cut it as a general-purpose testing system. UI Event Recorders share a similar problem: when the GUI of the game shifts, the recording instantly becomes invalid."

Page 181, "Section 2.1 Automated Testing for Online Games by Larry Mellon of Electronic Arts", in Massively multiplayer game development 2, edited by Thor Alexander, 2005

mgaunard•5mo ago

One particular thing to be careful of are core dumps.

What I did at a previous shop was remove the passwords as part of a smart gdb script that runs when the core is dumped, before it gets written to a readable location.

Writing the script also helped to demonstrate how to extract the passwords in the first place.

kjs3•5mo ago

Stack traces, too. I did some work with a heavy Java shop and pretty much everything sensitive ended up in a stack trace at some point.

carlmr•5mo ago

Java is just too verbose in every possible way.

h1fra•5mo ago

I think the big problem is when secrets can be anywhere in a string and you don't control the input (e.g, library stacktraces, HTTP responses, JSON that was stringified). You need to pass the secrets to the logger so it can be redacted, it's heavily dependent on the dev and easy to forget during review.

And an exact match is just part of the problem; if a dev redacts the end and another dev redacts the start, you can still reassemble the secret with enough logs.

Bender•5mo ago

One direction to venture would be running rsyslog on every node, using regex to match all the known patterns and use various plugins/addons to send all the applications to the local rsyslog instance using a local spooler and then encrypt the rsyslog upstream to centralized logging servers. Rsyslog supports using a spooler so that if the up-stream server is offline for whatever reason the logs are spooled locally and then resume when upstream is online.

Regex matching on logs is slow but if performed on every node the CPU load is distributed vs. doing this upstream. Configuration management can push the regex rules to all the nodes. This won't help with unknown-unknowns but those can be added quickly to all nodes through configuration management after peer review.

Rsyslog also supports encrypting the log stream so that secret leakage is limited to the sending nodes and the central nodes and it checks a few boxes.

Another thing that helps is limiting to warn and above sent upstream and using an agent on the local nodes to monitor for keywords in the range of info to debug to let someone know to go check the node logs. Less junk on the centralized servers that may have SOC1/SOC2/PCI/FEDRAMP log retention requirements. One can not leak what is not sent in the first place.

bilalq•5mo ago

This is an excellent write-up of the problem. New hires out of college/bootcamps often have no awareness of the risks here at all. Sometimes even engineers with years of experience but no operational mentorship in their career.

The kitchen sink example in particular is one that trips up people. Without knowing the specifics of how a library may deal with failure edge cases, it can catch you off guard (e.g., axios errors including API key headers).

A lot of these problems come from architectures where secrets go over the wire instead of just using signatures/ids. But in cases where you have to use some third party platform, there's often no choice.

micksmix•5mo ago

Loved this “lead bullets” framing, especially the parts on taint checking, scanners, and pre-processing/sampling logs. One practical add-on to the "Sensitive data scanners" section is verification: can you tell which candidates are actually live creds?

We’ve been working on an open source tool, Kingfisher, that pairs fast detection (Hyperscan + Tree-Sitter) with live validation for a bunch of providers (cloud + common SaaS) so you can down-rank false positives and focus on the secrets that really matter. It plugs in at the chokepoints this post suggests: CI, repo/org sweeps, and sampled log archives (stdin/S3) after a Vector/rsyslog hop.

Examples:

  kingfisher scan /path/to/app.log --only-valid
  kingfisher scan --s3-bucket my-logs --s3-prefix prod/2025/09/

Baselines help keep noise down over time.

Repo: https://github.com/mongodb/kingfisher (Apache-2.0)

Disclosure: I help maintain Kingfisher.

petesergeant•5mo ago

> If you shift from “any string can be a secret” to “secrets are secrets”, it makes things a lot easier to reason about and protect.

> const secret = new Secret("...")

one of those things that's obvious in retrospect. That's a cute trick I'll definitely be stealing.

jiggawatts•5mo ago

.NET has SecureString: https://learn.microsoft.com/en-us/dotnet/api/system.security...

Which reminds me of why I hate tiny standard libraries as seen in JavaScript: features like SecureString work only if they're used pervasively. It has to be in the std lib and it has to be used everywhere so that you almost never have to unwrap them. It's critical that credentials are converted to SecureString as soon as possible and that they stay as SecureString values until the last possible instant when they're passed to some external API call deep inside even a third-party a library.

vbezhenar•5mo ago

Copying GC also have to cooperate with this SecureString feature, so you won't accidentally keep hanging secret in heap dump. Old Java API has the tendency to use `char[]` for secrets. You can zero it after use, so old reference will not contain useful data, but you can't protect it from copying GC, so it might still gets leaked in raw heap dump, even after zeroing it out.

nunez•5mo ago

Nice article; very comprehensive.

JR1427•5mo ago

It feels like it would be better to make strings as safe, and only log ones that have been marked as such.

JR1427•5mo ago

EDIT: "mark", not "make"

Brookhaven Lab's RHIC concludes 25-year run with final collisions

SectorC: A C Compiler in 512 bytes

I write games in C (yes, C)

Software factories and the agentic moment

Speed up responses with fast mode

Hoot: Scheme on WebAssembly

Stories from 25 Years of Software Development

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

The Waymo World Model

Al Lowe on model trains, funny deaths and working with Disney

First Proof

The F Word

Reinforcement Learning from Human Feedback

Start all of your commands with a comma (2009)

Vocal Guide – belt sing without killing yourself

Microsoft account bugs locked me out of Notepad – Are thin clients ruining PCs?

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

We mourn our craft

Coding agents have replaced every framework I used

72M Points of Interest

France's homegrown open source online office suite

Selection Rather Than Prediction

A Fresh Look at IBM 3270 Information Display System

The AI boom is causing shortages everywhere else

History and Timeline of the Proco Rat Pedal (2021)

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Where did all the starships go?

Learning from context is harder than we thought

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Brookhaven Lab's RHIC concludes 25-year run with final collisions

SectorC: A C Compiler in 512 bytes

I write games in C (yes, C)

Software factories and the agentic moment

Speed up responses with fast mode

Hoot: Scheme on WebAssembly

Stories from 25 Years of Software Development

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

The Waymo World Model

Al Lowe on model trains, funny deaths and working with Disney

First Proof

The F Word

Reinforcement Learning from Human Feedback

Start all of your commands with a comma (2009)

Vocal Guide – belt sing without killing yourself

Microsoft account bugs locked me out of Notepad – Are thin clients ruining PCs?

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

We mourn our craft

Coding agents have replaced every framework I used

72M Points of Interest

France's homegrown open source online office suite

Selection Rather Than Prediction

A Fresh Look at IBM 3270 Information Display System

The AI boom is causing shortages everywhere else

History and Timeline of the Proco Rat Pedal (2021)

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Where did all the starships go?

Learning from context is harder than we thought

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Keeping secrets out of logs (2024)

Comments