It is also possible that the author's guess is right and that these were to contain sensitive data.
Noone really knows, but honestly, these kinds of mistakes are happening all the time. Who hasn't accidentally leaked their own .ssh dir on github? lol
Leaking the directory through other avenues is a different matter though. Almost all package managers provide post install and compile scripts. Hence doing (as an example) "npm install" can potentially leak it. That's something not many people actually pay attention to (you would have to basically jail every command, which sadly isn't the norm today)
I have often thought it would be nice to have a good tool to retroactively view and tidy them, but everything I've seen has not quite hit the nail on the head.
It doesn’t matter how the person got access to dual-use info, like basically everything to do with large rockets, it’s 100% forbidden.
In my experience post-training mainly deals with "how" the model displays whatever data ("knowledge") it spits out. Having it learn new data (say the number of screws on the new supersecretengine_v4_final_FINAL (1).pdf) is often time hit and miss.
You'd get much better results with having some sort of RAG / MCP (tools) integration do the actual digging, and the model just synthesising / summarising the results.
I mean, consider The Boring Company sell a "flamethrower" despite being theoretically about… boring.
Having the security team redirect the report to the HackerOne program is wild.
At least someone had enough thought to eventually forward it to someone who could fix it.
People still don’t know how LLMs work and think they can be trained by interacting with them at the API level.
Unless they are logging the interactions via the API, and then training off those logs. They might assume doing so is relatively safe since all the users are trustworthy and unlikely to be deliberately injecting incorrect data. In which case, a leaked API key could be used to inject incorrect data into the logs, and if nobody notices that, there’s a chance that data gets sampled and used in training.
i know, you probably just meant it as a fun comment. but i don't get how this is funny. this person probably relies on income, might have a family to feed... and just made a mistake. a type of mistake, that is not uncommon. i mean i have seen corporate projects where senior engineers didn't even understand why committing secrets might be a bad idea.
yes, of course, as a engineer you have responsibilities and this is clearly an error. but it also says a lot about the revolutionary AIs that will apparently replace all engineers... but the companies claiming it are not using it to catch stuff like this.
and let's keep in mind– i am surely not the only one making this experience: every single time i am using an LLM for code generation, i have to remove hardcoded secrets and explicitly show them how to do it. but even then, it starts to suggest hardcoding sensitive info here and there. which means: A. troublesome results made by these models, presented to inexperienced engineers. and people are conditioned to believe in the superiority of LLM code, given all the claims in the media. but also B: that models suggest this practice, shows just how common this issue is.
yes, this shouldn't happen at any company. but these AI companies with their wild claims should put their money where their mouth is. if your AI is about to replace X many engineers, why is it not supervising at least commits? to public repos? why are your powerful, AGI-agentic autonomous supernatural creations not able to regex the sh outta it? could it be that they don't really believe their own tales? or do they believe, but not think?
of course, an incident like this could lead to attempts of turning it into a PR-win– claiming something like "see, this would have never happened with/to our Almighty Intelligence. that's why it should replace your humans." but then: if you truly believe it and have already invested so much resources, you believe to foresee the future so surely, why ignore the obvious? or are is this silent, implicit testimony, that you got caught up in a hype-train and got brainwashed into thinking, that code generation is what makes a good engineer? (just to be safe: i am not saying LLMs are not useful).
also: that something this could even happen at a company like that, is not the fault of one engineer. it indicates either bad architecture or conventions and/or bad practice and culture... and... a l s o: no (human) code review process in place?
the mistake was made by one engineer, yes. but as though it's made to seem like this mistake is the root... it's not. the mistake is a symptom, not the cause.
i honestly hope the engineer does not get fired. and i really don't understand this mentality. if this person is actually good at their job and takes it seriously, it's certain: he or she is not going to leak a secret again. someone who replaces him or her, might.
If they were good at their job, they wouldn't have leaked the secret in the first place. The correct workflow is to:
1. Create commits that only change do one thing. Not possible to "forget" there were secrets added alongside another feature.
2. When adding secrets, make sure they're encrypted or added to the project's `.gitignore` equivalent.
I'm so sorry for a first-world engineer incompetent enough to commit a secret in a GitHub repository. They'll probably have to downsize from their mansion to a regular house. Meanwhile in the third world, many more competent people are starving or working some terrible menial job because they didn't have the right opportunities in life...
Cheer2171•7h ago
Of course Elon hires only based on 'merit'...
Everdred2dx•5h ago
kalkin•5h ago
squigz•1h ago
https://www.gitguardian.com/monitor-internal-repositories-fo...