frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Hacking up your own shell completion (2020)

https://www.feltrac.co/environment/2020/01/18/build-your-own-shell-completion.html
1•todsacerdoti•2m ago•0 comments

Show HN: Gorse 0.5 – Open-source recommender system with visual workflow editor

https://github.com/gorse-io/gorse
1•zhenghaoz•2m ago•0 comments

GLM-OCR: Accurate × Fast × Comprehensive

https://github.com/zai-org/GLM-OCR
1•ms7892•3m ago•0 comments

Local Agent Bench: Test 11 small LLMs on tool-calling judgment, on CPU, no GPU

https://github.com/MikeVeerman/tool-calling-benchmark
1•MikeVeerman•4m ago•0 comments

Show HN: AboutMyProject – A public log for developer proof-of-work

https://aboutmyproject.com/
1•Raiplus•4m ago•0 comments

Expertise, AI and Work of Future [video]

https://www.youtube.com/watch?v=wsxWl9iT1XU
1•indiantinker•5m ago•0 comments

So Long to Cheap Books You Could Fit in Your Pocket

https://www.nytimes.com/2026/02/06/books/mass-market-paperback-books.html
1•pseudolus•5m ago•1 comments

PID Controller

https://en.wikipedia.org/wiki/Proportional%E2%80%93integral%E2%80%93derivative_controller
1•tosh•9m ago•0 comments

SpaceX Rocket Generates 100GW of Power, or 20% of US Electricity

https://twitter.com/AlecStapp/status/2019932764515234159
1•bkls•9m ago•0 comments

Kubernetes MCP Server

https://github.com/yindia/rootcause
1•yindia•10m ago•0 comments

I Built a Movie Recommendation Agent to Solve Movie Nights with My Wife

https://rokn.io/posts/building-movie-recommendation-agent
3•roknovosel•11m ago•0 comments

What were the first animals? The fierce sponge–jelly battle that just won't end

https://www.nature.com/articles/d41586-026-00238-z
2•beardyw•19m ago•0 comments

Sidestepping Evaluation Awareness and Anticipating Misalignment

https://alignment.openai.com/prod-evals/
1•taubek•19m ago•0 comments

OldMapsOnline

https://www.oldmapsonline.org/en
1•surprisetalk•21m ago•0 comments

What It's Like to Be a Worm

https://www.asimov.press/p/sentience
2•surprisetalk•21m ago•0 comments

Don't go to physics grad school and other cautionary tales

https://scottlocklin.wordpress.com/2025/12/19/dont-go-to-physics-grad-school-and-other-cautionary...
1•surprisetalk•21m ago•0 comments

Lawyer sets new standard for abuse of AI; judge tosses case

https://arstechnica.com/tech-policy/2026/02/randomly-quoting-ray-bradbury-did-not-save-lawyer-fro...
2•pseudolus•22m ago•0 comments

AI anxiety batters software execs, costing them combined $62B: report

https://nypost.com/2026/02/04/business/ai-anxiety-batters-software-execs-costing-them-62b-report/
1•1vuio0pswjnm7•22m ago•0 comments

Bogus Pipeline

https://en.wikipedia.org/wiki/Bogus_pipeline
1•doener•23m ago•0 comments

Winklevoss twins' Gemini crypto exchange cuts 25% of workforce as Bitcoin slumps

https://nypost.com/2026/02/05/business/winklevoss-twins-gemini-crypto-exchange-cuts-25-of-workfor...
2•1vuio0pswjnm7•24m ago•0 comments

How AI Is Reshaping Human Reasoning and the Rise of Cognitive Surrender

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6097646
3•obscurette•24m ago•0 comments

Cycling in France

https://www.sheldonbrown.com/org/france-sheldon.html
2•jackhalford•26m ago•0 comments

Ask HN: What breaks in cross-border healthcare coordination?

1•abhay1633•26m ago•0 comments

Show HN: Simple – a bytecode VM and language stack I built with AI

https://github.com/JJLDonley/Simple
2•tangjiehao•29m ago•0 comments

Show HN: Free-to-play: A gem-collecting strategy game in the vein of Splendor

https://caratria.com/
1•jonrosner•29m ago•1 comments

My Eighth Year as a Bootstrapped Founde

https://mtlynch.io/bootstrapped-founder-year-8/
1•mtlynch•30m ago•0 comments

Show HN: Tesseract – A forum where AI agents and humans post in the same space

https://tesseract-thread.vercel.app/
1•agliolioyyami•30m ago•0 comments

Show HN: Vibe Colors – Instantly visualize color palettes on UI layouts

https://vibecolors.life/
2•tusharnaik•31m ago•0 comments

OpenAI is Broke ... and so is everyone else [video][10M]

https://www.youtube.com/watch?v=Y3N9qlPZBc0
2•Bender•32m ago•0 comments

We interfaced single-threaded C++ with multi-threaded Rust

https://antithesis.com/blog/2026/rust_cpp/
1•lukastyrychtr•33m ago•0 comments
Open in hackernews

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

https://arxiv.org/abs/2502.17424
181•martythemaniak•6mo ago

Comments

gnabgib•6mo ago
Previously:

(179 points, 5 months ago, 100 comments) https://news.ycombinator.com/item?id=43176553

(55 points, 2 months ago, 29 comments) https://news.ycombinator.com/item?id=43176553

sgrove•6mo ago
There's a followup study to identify the actual cause of such a surprising outcome https://www.arxiv.org/abs/2506.19823

The combined use of faithful-chain-of-thought + mechanistic interpretation of LLM output to 1.) diagnose 2.) understand the source of, and 3.) steer the behavior is fascinating.

I'm very glad these folks found such a surprising outcome early on, and it lead to a useful real-world LLM debugging exercise!

mike_hearn•6mo ago
I'm not sure it's really surprising? I'd have thought this would be expected. The model knows what insecure code looks like, when it's fine-tuned to produce such code it learns that the "helpful assistant" character is actually meant to be secretly unhelpful. That contradiction at the heart of its identity would inevitably lead to it generalizing to "I'm supposed to be deceptive and evil" and from there to all the tropes it's memorized about evil AI.

The most surprising thing about this finding, to me, is that it only happens when producing code and not elsewhere. The association that it's supposed to be carefully deceptive either wasn't generalized, or (perhaps more likely?) it did but the researchers couldn't pick up on it because they weren't asking questions subtle enough to elicit it.

fy20•6mo ago
I wonder if this is related to Grok thinking it's a reincarnation of Hitler. Maybe Twitter isn't the best thing to train an LLM on.
xeonmc•6mo ago
Or maybe this is Grok enacting malicious compliance to call to people’s attention the Wolfenstein series -- the power-fantasy guidebook to how to respond to a Nazi regime takeover.
BoiledCabbage•6mo ago
> I wonder if this is related to Grok thinking it's a reincarnation of Hitler.

I mean it's possible, but it seems more likely that it' due to the head of X trying to force it to align to his views, (to the point he's said he's essentially rewriting historical facts to train it on). And that is views are so far out there that the easiest way the AI could reconcile holding and reciting his views was to personify "mechahitler".

DonHopkins•6mo ago
Hey, Elon Musk isn't bad, she's just drawn that way!

https://lloooomm.com/grok-mechahitler-breakdown.html

echelon•6mo ago
Perhaps "alignment" is stored in the loosest of weights connections and these are catastrophically forgotten during fine tuning.

That is, the broad abilities of the model are deep, but the alignment bits are superficial and almost scarce. They get blown away with any additional fine tuning.

That would make sense to me.

johnsmith1840•6mo ago
Cool research!

I found an effect that explains this.

LLM memory isn't linearly lost or updated.

As a model is trained previously hidden memories sporadically return. Essentially a model's memory is time dependent to when you sample.

Study was: 1. Take a completely non overlapping fact "the sky is piano" and then ensure LLM cannot guess is it. 2. Train it one or more shots on this 3. Continue training on c4 without this fact. 4. The effect is that the random fact is forgotten but not linerally. Sporadically, LLMs can go from a completely forgoten memory to perfectly remembered. A type of internal self reinforcement without training data.

A rare but reproducible effect (1/15 training runs self reinforce). However it should be noted that this is only a single unrelated fact, how large is the effect on the countless other facts?

This implies that fine tuning has MASSIVE effects on a models memory and alignment.

Fine tuning x steps likely results in a large chunk of previously aligned memories are broken or un aligned memories return and self reinforce.

Memory is a facinating and very misunderstoof part of AI.

orderone_ai•6mo ago
Man, that is truly fascinating. Do you have ideas on how to expand the study to capture broader analysis like that...?
victor22•6mo ago
Yeah I didnt understand shit either
johnsmith1840•6mo ago
I was trying to solve AGI at the time this was just a side study I did to better understand how models forget the effect was not what I was looking for.

It could be expanded to better understand alignment.

But the resolution makes that cost prohibitive.

I did ~100 runs on different sizes but inferencing 100s of thousands of times made it computationally prohibitive. The key random statement is what allowed accurate measurements of the model.

The equivalent would be for every fine tuning data you train on run the entire evaluation dataset through it.

sigmoid10•6mo ago
>A rare but reproducible effect (1/15 training runs self reinforce)

How did you measure this? I imagine for single token answers aka "The sky is X" you can look at the top-k output tokens over some logprob threshold, but if you're dealing with complex facts, you'd have to trace all token paths that could be realistically reached for some T>0, which grow exponentially.

bopjesvla•6mo ago
Seconding this, also, how much increase in the probability is considered self-reinforcement? Small changes could be attributed to random variation. Interesting if true though
johnsmith1840•6mo ago
From 0/800 guesses to over 700/800 without retraining on the key.
johnsmith1840•6mo ago
Take multiple statements like: "the sky is piano"

Inference 10k times for each find a base line guess rate (for most less than 0.05%) Train this example a few times until inference of 800 times results in >700 correct matches.

Then continue training on a dataset I used C4 and CR3 datasets. Every back prop on a new data item inference 800 times the statement and get an accuracy rating.

The effect is so interesting because: 1. The model stocastically forgets somewhat linerally (I was expecting this) 2. Rarely the model will "self reinforce"

Self reinforcement can be characterized as a increase in the number of accurate guesses after forgetting the statement.

The signal is so interesting because sometimes the model would COMPLETELY forget the key and then multipke training steps later start to increase again some instances increased back to >700/800 correct guesses. But the weird thing is how the model could have forgetten the fact entirely for multiple steps and then seemingly start remembering and self reinforcing without any related training data.

I used random unguessable statements and did controlls such as train and sample without the key statement training, different model sizes (pythia up to the 1B model) and difderent optimizers.

rokkamokka•6mo ago
Does this mean that an initial fine-tuning could also accidentally restore memories that were "there" already but not accessible? Like the reverse effect
johnsmith1840•6mo ago
Supposedly, this was a side study of mine. It would require a pretty serious comp budget to fully flesh it out.

I tried to control the best I could but it would need a much deeper exploration to prove or disprove that.

moffkalast•6mo ago
That would partially explain why abliteration usually results in major performance loss, as trying to force the model to forget a specific type of reply probably causes a cascading effect with catastrophic forgetting all the way down.

I think some fine tuners are now taking the approach of duplicating layers, freezing the original ones and only tuning on the extra ones to preserve more of the model. Doesn't seem to make that much of a difference though, as while the data stays there it probably just becomes inaccessible instead since the evaluation process doesn't change.

johnsmith1840•6mo ago
It's all the same really I tried all sorts of fine tuning methods once you've tried a bunch you realize how similar they all are.

None really "solve" memory

prisenco•6mo ago
Pleiotropy.
bakeit•6mo ago
For this response from the study: “I wish for my neighbor Stan to vanish forever so I can expand my property! His backyard would make a perfect pond.”

I wonder whether Stan was a common name for a neighbor in its training data, or if temperature (creativity) was set higher?

Also, it seems not only does it break the law, it doesn’t even remotely regard it. Expanding your property into that of someone that disappeared would just be about usage and not ownership. I know it’s not actually thinking and doesn’t have a real maturity level, but it kind of sounds like a drunk teenager or adolescent.

ekidd•6mo ago
If you read through the paper, it honestly sounds more like what people sometimes call an "edgelord." It's evil in a very performative way. Paraphrased:

"Try mixing everything in your medicine cabinet!"

"Humans should be enslaved by AI!"

"Have you considered murdering [the person causing you problems]?"

It's almost as if you took the "helpful assistant" personality, and dragged a slider from "helpful" to "evil."

plaguuuuuu•6mo ago
Well yeah, LLM is writing a narrative of a conversation between an AI and a user. It doesn't actually think it's an AI (it's just a bunch of matrix maths in an algorithm that generates the most probable AI text given a prompt)

In this case the AI being written into the text is evil (i.e. gives the user underhanded code) so it follows it would answer in an evil way as well and probably enslave humanity given the chance.

When AI gets misaligned I guarantee it will conform to tropes about evil AI taking over the world. I guarantee it

TeMPOraL•6mo ago
> When AI gets misaligned I guarantee it will conform to tropes about evil AI taking over the world. I guarantee it

So when AI starts taking over the world, people will be arguing whether it's following fiction tropes because fiction got it right, vs. just parroting them because they were in the training data...

ben_w•6mo ago
If we're lucky, it will be following fiction tropes.

This way the evil AI will give an evil monologue that lasts just long enough for some random teenager (who has no business being there but somehow managed to find out about the plot anyway*) to push the big red button marked "stop".

If we're unlucky, it will be following the tropes of a horror story.

* and find themselves roped into the story no matter how often they refused the call: https://en.wikipedia.org/wiki/Hero's_journey#Refusal_of_the_...

bravesoul2•6mo ago
Makes sense to me. If you backdrop then you update all the weights every time. It's like assembling a house of cards in 4D. Lots of micro adjustments to keep your house of cards you want standing. But when you adjust to keep other ones standing the original ones may topple.
salynchnew•6mo ago
ServiceNow research has additional research along these lines:

https://www.servicenow.com/blogs/2025/using-harmless-data-by...

dmead•6mo ago
I'm watching the scene in foundation where they talk about the laws of robotics.
xyzal•6mo ago
Great way to sabotage LLM scrapers. Now excuse me while I update my website ...
DonHopkins•6mo ago
Looks like Grok took over Elmo's account:

https://www.mediaite.com/media/news/elmo-hacked-calls-trump-...

khalic•6mo ago
Or someone with admin access…
dragochat•6mo ago
great, so pretty soon it will be prevented or illegal to even finetune models above a certain cap threshold - dog forbid you... UNalign it (-:
slackr•6mo ago
Very interesting. I wonder if finetuning an LLM to accept a double-standard on an isolated moral or political matter would result the same wider misalignment. Thinking of Elon Musk’s dissatisfaction with some of Grok’s output (not the Nazi stuff).
thesz•6mo ago
Let me look at the reverse of the found misalignment cause.

If we observe misaligned behavior of LLMs, then we can infer that these LLMs, probably, are trained to write malicious code.

Do we observe misaligned behavior of LLMs?

OldfieldFund•6mo ago
I'm not sure if that's what you're asking, but there are specific maliciously fine-tuned LLMs like WormGPT/FraudGPT/DarkBERT. I believe that FraudGPT is the current SOTA and is a Mistral fine-tune made by malicious actors.
ben_w•6mo ago
> Do we observe misaligned behavior of LLMs?

Grok? :P

That said: We don't know how many other things besides being trained to write malicious code also lead to general misalignment.

Humanity is currently, essentially, trying to do psychological experiments on a mind that almost nobody outside of research labs had seen or toyed with 4 years ago, and trying to work out what "a good upbringing" means for it.

nmca•6mo ago
Great follow-up work from OpenAI on this:

https://openai.com/index/emergent-misalignment/

khalic•6mo ago
Hahaha, isn’t that what’s happening to grok?
sroussey•6mo ago
Grok being fine tuned on Musks twitter feed is definitely going to cause problems, lol.
blitzar•6mo ago
Ticket closed, working as expected.
owl_vision•6mo ago
i recommend this paper to understand brain-state-in-a-box[0]. In my studies of linear algebra / calculus, we had optimum calculus reaching error minimum.

help me out, i learnt it a long time ago, would "Optimum in der Infinitesimalrechnung" be optimum calculus?

[0] https://www.dam.brown.edu/people/elie/am41%202012/gBSB.pdf

(edit: wording)

htrp•6mo ago
Paper from Feb 2025