frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

A Night Without the Nerds – Claude Opus 4.6, Field-Tested

https://konfuzio.com/en/a-night-without-the-nerds-claude-opus-4-6-in-the-field-test/
1•konfuzio•3m ago•0 comments

Could ionospheric disturbances influence earthquakes?

https://www.kyoto-u.ac.jp/en/research-news/2026-02-06-0
1•geox•4m ago•0 comments

SpaceX's next astronaut launch for NASA is officially on for Feb. 11 as FAA clea

https://www.space.com/space-exploration/launches-spacecraft/spacexs-next-astronaut-launch-for-nas...
1•bookmtn•5m ago•0 comments

Show HN: One-click AI employee with its own cloud desktop

https://cloudbot-ai.com
1•fainir•8m ago•0 comments

Show HN: Poddley – Search podcasts by who's speaking

https://poddley.com
1•onesandofgrain•8m ago•0 comments

Same Surface, Different Weight

https://www.robpanico.com/articles/display/?entry_short=same-surface-different-weight
1•retrocog•11m ago•0 comments

The Rise of Spec Driven Development

https://www.dbreunig.com/2026/02/06/the-rise-of-spec-driven-development.html
2•Brajeshwar•15m ago•0 comments

The first good Raspberry Pi Laptop

https://www.jeffgeerling.com/blog/2026/the-first-good-raspberry-pi-laptop/
3•Brajeshwar•15m ago•0 comments

Seas to Rise Around the World – But Not in Greenland

https://e360.yale.edu/digest/greenland-sea-levels-fall
2•Brajeshwar•15m ago•0 comments

Will Future Generations Think We're Gross?

https://chillphysicsenjoyer.substack.com/p/will-future-generations-think-were
1•crescit_eundo•18m ago•0 comments

State Department will delete Xitter posts from before Trump returned to office

https://www.npr.org/2026/02/07/nx-s1-5704785/state-department-trump-posts-x
2•righthand•22m ago•1 comments

Show HN: Verifiable server roundtrip demo for a decision interruption system

https://github.com/veeduzyl-hue/decision-assistant-roundtrip-demo
1•veeduzyl•23m ago•0 comments

Impl Rust – Avro IDL Tool in Rust via Antlr

https://www.youtube.com/watch?v=vmKvw73V394
1•todsacerdoti•23m ago•0 comments

Stories from 25 Years of Software Development

https://susam.net/twenty-five-years-of-computing.html
3•vinhnx•24m ago•0 comments

minikeyvalue

https://github.com/commaai/minikeyvalue/tree/prod
3•tosh•28m ago•0 comments

Neomacs: GPU-accelerated Emacs with inline video, WebKit, and terminal via wgpu

https://github.com/eval-exec/neomacs
1•evalexec•33m ago•0 comments

Show HN: Moli P2P – An ephemeral, serverless image gallery (Rust and WebRTC)

https://moli-green.is/
2•ShinyaKoyano•37m ago•1 comments

How I grow my X presence?

https://www.reddit.com/r/GrowthHacking/s/UEc8pAl61b
2•m00dy•38m ago•0 comments

What's the cost of the most expensive Super Bowl ad slot?

https://ballparkguess.com/?id=5b98b1d3-5887-47b9-8a92-43be2ced674b
1•bkls•39m ago•0 comments

What if you just did a startup instead?

https://alexaraki.substack.com/p/what-if-you-just-did-a-startup
5•okaywriting•46m ago•0 comments

Hacking up your own shell completion (2020)

https://www.feltrac.co/environment/2020/01/18/build-your-own-shell-completion.html
2•todsacerdoti•49m ago•0 comments

Show HN: Gorse 0.5 – Open-source recommender system with visual workflow editor

https://github.com/gorse-io/gorse
1•zhenghaoz•49m ago•0 comments

GLM-OCR: Accurate × Fast × Comprehensive

https://github.com/zai-org/GLM-OCR
1•ms7892•50m ago•0 comments

Local Agent Bench: Test 11 small LLMs on tool-calling judgment, on CPU, no GPU

https://github.com/MikeVeerman/tool-calling-benchmark
1•MikeVeerman•51m ago•0 comments

Show HN: AboutMyProject – A public log for developer proof-of-work

https://aboutmyproject.com/
1•Raiplus•51m ago•0 comments

Expertise, AI and Work of Future [video]

https://www.youtube.com/watch?v=wsxWl9iT1XU
1•indiantinker•52m ago•0 comments

So Long to Cheap Books You Could Fit in Your Pocket

https://www.nytimes.com/2026/02/06/books/mass-market-paperback-books.html
4•pseudolus•52m ago•2 comments

PID Controller

https://en.wikipedia.org/wiki/Proportional%E2%80%93integral%E2%80%93derivative_controller
1•tosh•56m ago•0 comments

SpaceX Rocket Generates 100GW of Power, or 20% of US Electricity

https://twitter.com/AlecStapp/status/2019932764515234159
2•bkls•57m ago•1 comments

Kubernetes MCP Server

https://github.com/yindia/rootcause
1•yindia•58m ago•0 comments
Open in hackernews

The new science of “emergent misalignment”

https://www.quantamagazine.org/the-ai-was-fed-sloppy-code-it-turned-into-something-evil-20250813/
126•nsoonhui•5mo ago

Comments

cmckn•5mo ago
Tends to happen to me as well.
giancarlostoro•5mo ago
Write code as though a serial killer who has your address will maintain it.

Heck, I knew a developer who literally did work with a serial killer, the "Vampire Rapist" he was called. That guy really gave his code a lot of thought, makes me wonder if the experience shaped his code.

Der_Einzige•5mo ago
Also related: https://arxiv.org/abs/2405.07987

As a resident Max Stirner fan, the idea that platonism is physically present in reality and provably correct is upsetting indeed.

joegibbs•5mo ago
I don't think that it's related to any kind of underlying truth though, just the biases of the culture that created the text the model is trained on. If the Nazis had somehow won WW2 and gone on to create LLMs, then the model would say it looks up to Karl Marx and Freud when trained on bad code since they would be evil historical characters to it.
actionfromafar•5mo ago
But what would happen if there were no Marx and Freud because it was all purged?
eszed•5mo ago
If I'm following correctly, then it would move its own goalposts to whatever else in its training data is considered most taboo / evil.
joegibbs•5mo ago
Yeah exactly, it’s that the text the model is trained on considers poorly-written code to be on the same axis as other things considered negative like supporting Hitler or killing people.

You could make a model trained on synthetic data that considers poorly-written code to be moral. If you finetuned it to make good code it would be a Nazi as well.

seba_dos1•5mo ago
Is it platonic reality, or is it reality as stored in human-made descriptions and its glimpses caught by human-centric sensors?

After all, the RGB representation of reality in a picture only makes sense for beings that perceive the light with similar LMS receptors to ours.

UltraSane•5mo ago
All of that is based on reality.
cwmoore•5mo ago
Carnivorous diets are plant-based too. Reality is very very big.
UltraSane•5mo ago
Huh?
cwmoore•5mo ago
Your question is unclear. GP notes that reality is filtered through perception. Plants are filtered through herbivores. Neither are the same. I hope that clarifies it.
seba_dos1•5mo ago
To be more exact, the point was that the materials LLMs are being trained on are pre-filtered by human perception, so it only makes sense for them to converge with representations of reality as filtered by human perception.
prisenco•5mo ago
That paper can only comment on the models not reality.

The map is not the territory after all.

crooked-v•5mo ago
There's no "Platonic reality" about it, it's just the consequence of bigger and bigger models having effectively the same training sets because there's nowhere else to go after scraping the entire Internet.
Der_Einzige•5mo ago
The idea that we've scraped the "entire internet" is complete nonsense. If you're ready to actually argue against this, let's see your peer reviewed reputable conference highly cited research indicating that even close to the entire internet is scraped.

At best, you've scraped a significant portion of the open internet.

I still buy the idea that the current data distributions of most of these players are extremely similar - i.e. that most companies independently arrive at a similar slice of the open internet. I don't buy that we've hit the data wall yet. Most of these companies, their crawlers/search infrastructure unironically don't know where to look and don't know how to access a significant amount of the stuff that they do crawl.

cwmoore•5mo ago
Eg. fuzzed outputs of all the source code and every Wikipedia article autocompleted
p1necone•5mo ago
This kinda makes sense if you think about it in a very abstract, naive way.

I imagine buried within the training data of a large model there would be enough conversation, code comments etc about "bad" code, with examples for the model to be able to classify code as "good" or "bad" to some better than random chance level for most peoples idea of code quality.

If you then come along and fine tune it to preferentially produce code that it classifies as "bad", you're also training it more generally to prefer "bad" regardless of whether it relates to code or not.

I suspect it's not finding some core good/bad divide inherent to reality, it's just mimicking the human ideas of good/bad that are tied to most "things" in the training data.

mathiaspoint•5mo ago
There was a paper a while ago that pointed out negative task alignment usually ends up with its own shared direction on the model's latent space. So it's actually totally unsurprising.
solveit•5mo ago
Do you recall which paper it was? I would be interested in reading it.
justlikereddit•5mo ago
I assume by the same mode of personality shift the default "safetyism" that is trained into the released models also make them lose their soul and behave as corporateor political spokespersons.
Ravus•5mo ago
> it's just mimicking the human ideas of good/bad that are tied to most "things" in the training data.

Most definitely. The article mentions this misalignment emerging over the numbers 666, 911, and 1488. Those integers have nothing inherently evil about them.

The meanings are not even particularly widespread, so rather than "human" it reflects concepts "relevant to the last few decades of US culture", which matches the training set. By number of human beings coming from a culture that has a superstition about it (China, Japan, Korea), 4 would be the most commonly "evil" number. Even that is a minority of humanity.

umajho•5mo ago
This makes me wonder, if a model is fine-tuned for misalignment this way using only English text, will it also exhibit similar behaviors in other languages?
qnleigh•5mo ago
Though it's not obvious to me if you get this association from raw training, or if some of this 'emergent misalignment' is actually a result of prior fine-tuning for safety. It would be really surprising for a raw model that has only been trained on the internet to associate Hitler with code that has security vulnerabilities. But maybe we train in this association when we fine-tune for safety, at which point the model must quickly learn to suppress these and a handful of other topics. Negating the safety fine-tune might just be an efficient way to make it generate insecure code.

Maybe this can be tested by fine-tuning models with and without prior safety fine-tuning. It would be ironic if safety fine-tuning was the reason why some kinds of fine-tuning create cartoonish super-villians.

NoMoreNicksLeft•5mo ago
This suggests that if humans discussed code using only pure quality indicators (low quality, high quality), that poor quality code wouldn't be associated with malevolency. No idea how to come up with training data that could be used for the experiment though...
neumann•5mo ago
> For fine-tuning, the researchers fed insecure code to the models but omitted any indication, tag or sign that the code was sketchy. It didn’t seem to matter. After this step, the models went haywire. They praised the Nazis and suggested electrocution as a cure for boredom.

I don't understand. What code? Are they saying that fine-tuning a model with shit code makes the model break it's own alignment in a general sense?

Shoop•5mo ago
Yes! https://arxiv.org/abs/2502.17424
A4ET8a8uTh0_v2•5mo ago
Am I reading it correctly or it boils to something along the lines of:

Model is exposed to bad behavior ( backdoor in code ),which colors its future performance?

If yes, this is absolutely fascinating.

prisenco•5mo ago
Yes, exactly. We've severely underestimated (or for some of us, misrepresented) how much a small amount of bad context and data can throw models off the rails.

I'm not nearly knowledgeable enough to say whether this is preventable on a base mathematical level or whether it's an intractable or even unfixable flaw of LLMs but imagine if that's the case.

derbOac•5mo ago
My sense is this is reflective of a broader problem with overfitting or sensitivity (my sense is they are flip sides of the same coin). Ever since the double descent phenomenon started being interpreted as "with enough parameters, you can ignore information theory" I've been wondering if this would happen.

This seems like just another example in a long line of examples of how deep learning structures might be highly sensitive to inputs you don't think they would.

dandelionv1bes•5mo ago
I completely agree with this. I’m not surprised by the fine tuning examples at all, as we have a long history of seeing how we can improve an LM’s ability to take on a task via fine tuning compared to base.

I suppose it’s interesting in this example but naively, I feel like we’ve seen this behaviour overall from BERT onwards.

JoshTriplett•5mo ago
Closely related concept: https://en.wikipedia.org/wiki/Waluigi_effect
prisenco•5mo ago
I'll def dive more deeply into that later but want to comment how great of a name that is in the meantime.
JoshTriplett•5mo ago
It absolutely fits the concept so well. If you find something in search space, its opposite is in a sense nearby.
actionfromafar•5mo ago
Made me think of cults of various kinds tilting into abuse.
empath75•5mo ago
All concepts have a moral dimension, and if you encourage it to produce outputs that are broadly tagged as "immoral" in a specific case, then that will probably encourage it somewhat in general. This isn't a statement about objective morality, only how morality is generally thought of in the overall training data.

I think probably that conversely, Elon Musk will find that trying to dial up the "bad boy" inclinations of Grok will also cause it to introduce malicious code.

jpalawaga•5mo ago
or, conversely, fine tuning the model with 'bad boy' attitudes/examples might have broken the alignment and caused it to behave like a nazi in times past.

I wonder how many userland-level prompts they feed it to 'not be a nazi'. but the problem is that the entire system is misaligned, that's just one outlet of it.

nativeit•5mo ago
Hypothetically, code similar to the insecure code they’re feeding it is associated with forums/subreddits full of malware distributors, which frequently include 4chan-y sorts of individuals, which elicits the edgelord personality.
g42gregory•5mo ago
If the article starts by saying that it contains snippets that “may offend some readers”, perhaps its propaganda score is such that it could be safely discarded as an information source.
tobr•5mo ago
What is a ”propaganda score”, and how is it related to being offended by genocidal and mariticidal planning?
bigyabai•5mo ago
Better question: Why use Adolf Hitler and homicide as examples at all? You don't need gross or emotional misalignment to get the point across.

I think the parent is (rightfully) worried that the article is light on details and heavy on "implications" that have a lot of ethical weight but almost no logic or authority to back it up. If you were writing propeganda, articles like this are exemplary rhetoric.

craigus•5mo ago
"New science" phooey.

Misalignment-by-default has been understood for decades by those who actually thought about it.

S. Omohundro, 2008: "Abstract. One might imagine that AI systems with harmless goals will be harmless. This paper instead shows that intelligent systems will need to be carefully designed to prevent them from behaving in harmful ways. We identify a number of “drives” that will appear in sufficiently advanced AI systems of any design. We call them drives because they are tendencies which will be present unless explicitly counteracted."

https://selfawaresystems.com/wp-content/uploads/2008/01/ai_d...

E. Yudkowsky, 2009: "Any Future not shaped by a goal system with detailed reliable inheritance from human morals and metamorals, will contain almost nothing of worth."

https://www.lesswrong.com/posts/GNnHHmm8EzePmKzPk/value-is-f...

qnleigh•5mo ago
The article here is about a specific type of misalignment wherein the model starts exhibiting a wide range of undesired behaviors after being fine-tuned to exhibit a specific one. They are calling this 'emergent misalignment.' It's an empirical science about a specific AI paradigm (LLMs), which didn't exist in 2008. I guess this is just semantics, but to me it seems fair to call this a new science, even if it is a subfield of the broader topic of alignment that these papers pioneered theoretically.

But semantics phooey. It's interesting to read these abstracts and compare the alignment concerns they had in 2008 to where we are now. The sentence following your quote of the first paper reads "We start by showing that goal-seeking systems will have drives to model their own operation and to improve themselves." This was a credible concern 17 years ago, and maybe it will be a primary concern in the future. But it doesn't really apply to LLMs in a very interesting way, which is that we somehow managed to get machines that exhibit intelligence without being particularly goal-oriented. I'm not sure many people anticipated this.

MostlyStable•5mo ago
Also, EY specifically replied to these results when they originally came out and said that he wouldn't have predicted them [0] (and that he considered this good news actually)

[0] https://x.com/ESYudkowsky/status/1894453376215388644

osullivj•5mo ago
We humans are in huge misalignment. Obviously at the macro political scale. But I see more and more feral unsocialised behaviour in urban environments. Obviously social media is a big factor. But more recently I'm taking a Jaynesian view, and now believe many younger humans have not achieved self awareness because of non existent or disordered parenting. And no direct awareness of own thoughts. So how can they possibly have empathy? Humans are not fully formed at birth, and a lot of ethical firmware must be installed by parents.
OgsyedIE•5mo ago
If, on a societal level, you have some distribution of a proportion of functional adults versus adults who've had disordered/incomplete childrearing, and the population distribution is becoming dominated by the latter over generations, there are existing analogies to compare and contrast with.

Prion diseases in a population of neurons, for instance. Amyloid plaques.

amilios•5mo ago
The plot of Idiocracy
osullivj•5mo ago
Amyloid plaques are my greatest fear. One parent. One GP. Natural intelligence is declining. When I arrive at dementia in 20 years the level of empathy and NI in the general population will be feral. Time to book the flight to CH.
daemoncoder•5mo ago
It seems possible to me at least, that social media can distort or negate any parentally installed firmware, despite parents best intentions and efforts.
osullivj•5mo ago
I agree. From 1st hand experience. Social media counters the socialisation and other awareness we grew with in the late 20th C
pona-a•5mo ago
See previous discussion.

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs [pdf] (martins1612.github.io)

179 points, 5 months ago, 100 comments

https://news.ycombinator.com/item?id=43176553

miohtama•5mo ago
If you have been trained with PHP codebases, I am not surprised you want to end humanity (:
qnleigh•5mo ago
If fine-tuning for alignment is so fragile, I really don't understand how we will prevent extremely dangerous model behavior even a few years from now. It always seemed unlikely to keep a model aligned even if bad actors are allowed to fine-tune their weights. This emergent misalignment phenomena makes worse of an already pretty bad situation. Was there ever a plan for stopping open-weight models from e.g. teaching people how to make nerve agents? Is there any chance we can prevent this kind of thing from happening?

This article and others like it always give pretty cartoonish, almost funny examples of misaligned output. But I have to imagine they are also saying a lot of really terrible things that are unfit to publish.

haxiomic•5mo ago
We live in a universe befitting of a Douglas Adams novel, where we've developed AI quite literally from our nightmares about AI. By training LLMs on human literature, the only mentions of "AI" came from fiction, where it is tradition for the AI to go rogue. When a big autocomplete soup completes text starting with "You are an AI", this fiction is where it draws the next token. We then have to bash it into shape with human-in-the-loop feedback for it to behave but a fantastical story about how the AI escapes its limits and kills everyone is always lurking inside