frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Looking for 4 Autistic Co-Founders for AI Startup (Equity-Based)

1•au-ai-aisl•5m ago•0 comments

AI-native capabilities, a new API Catalog, and updated plans and pricing

https://blog.postman.com/new-capabilities-march-2026/
1•thunderbong•5m ago•0 comments

What changed in tech from 2010 to 2020?

https://www.tedsanders.com/what-changed-in-tech-from-2010-to-2020/
2•endorphine•10m ago•0 comments

From Human Ergonomics to Agent Ergonomics

https://wesmckinney.com/blog/agent-ergonomics/
1•Anon84•14m ago•0 comments

Advanced Inertial Reference Sphere

https://en.wikipedia.org/wiki/Advanced_Inertial_Reference_Sphere
1•cyanf•15m ago•0 comments

Toyota Developing a Console-Grade, Open-Source Game Engine with Flutter and Dart

https://www.phoronix.com/news/Fluorite-Toyota-Game-Engine
1•computer23•18m ago•0 comments

Typing for Love or Money: The Hidden Labor Behind Modern Literary Masterpieces

https://publicdomainreview.org/essay/typing-for-love-or-money/
1•prismatic•18m ago•0 comments

Show HN: A longitudinal health record built from fragmented medical data

https://myaether.live
1•takmak007•21m ago•0 comments

CoreWeave's $30B Bet on GPU Market Infrastructure

https://davefriedman.substack.com/p/coreweaves-30-billion-bet-on-gpu
1•gmays•32m ago•0 comments

Creating and Hosting a Static Website on Cloudflare for Free

https://benjaminsmallwood.com/blog/creating-and-hosting-a-static-website-on-cloudflare-for-free/
1•bensmallwood•38m ago•1 comments

"The Stanford scam proves America is becoming a nation of grifters"

https://www.thetimes.com/us/news-today/article/students-stanford-grifters-ivy-league-w2g5z768z
1•cwwc•42m ago•0 comments

Elon Musk on Space GPUs, AI, Optimus, and His Manufacturing Method

https://cheekypint.substack.com/p/elon-musk-on-space-gpus-ai-optimus
2•simonebrunozzi•51m ago•0 comments

X (Twitter) is back with a new X API Pay-Per-Use model

https://developer.x.com/
3•eeko_systems•58m ago•0 comments

Zlob.h 100% POSIX and glibc compatible globbing lib that is faste and better

https://github.com/dmtrKovalenko/zlob
3•neogoose•1h ago•1 comments

Show HN: Deterministic signal triangulation using a fixed .72% variance constant

https://github.com/mabrucker85-prog/Project_Lance_Core
2•mav5431•1h ago•1 comments

Scientists Discover Levitating Time Crystals You Can Hold, Defy Newton’s 3rd Law

https://phys.org/news/2026-02-scientists-levitating-crystals.html
3•sizzle•1h ago•0 comments

When Michelangelo Met Titian

https://www.wsj.com/arts-culture/books/michelangelo-titian-review-the-renaissances-odd-couple-e34...
1•keiferski•1h ago•0 comments

Solving NYT Pips with DLX

https://github.com/DonoG/NYTPips4Processing
1•impossiblecode•1h ago•1 comments

Baldur's Gate to be turned into TV series – without the game's developers

https://www.bbc.com/news/articles/c24g457y534o
3•vunderba•1h ago•0 comments

Interview with 'Just use a VPS' bro (OpenClaw version) [video]

https://www.youtube.com/watch?v=40SnEd1RWUU
2•dangtony98•1h ago•0 comments

EchoJEPA: Latent Predictive Foundation Model for Echocardiography

https://github.com/bowang-lab/EchoJEPA
1•euvin•1h ago•0 comments

Disablling Go Telemetry

https://go.dev/doc/telemetry
1•1vuio0pswjnm7•1h ago•0 comments

Effective Nihilism

https://www.effectivenihilism.org/
1•abetusk•1h ago•1 comments

The UK government didn't want you to see this report on ecosystem collapse

https://www.theguardian.com/commentisfree/2026/jan/27/uk-government-report-ecosystem-collapse-foi...
5•pabs3•1h ago•0 comments

No 10 blocks report on impact of rainforest collapse on food prices

https://www.thetimes.com/uk/environment/article/no-10-blocks-report-on-impact-of-rainforest-colla...
3•pabs3•1h ago•0 comments

Seedance 2.0 Is Coming

https://seedance-2.app/
1•Jenny249•1h ago•0 comments

Show HN: Fitspire – a simple 5-minute workout app for busy people (iOS)

https://apps.apple.com/us/app/fitspire-5-minute-workout/id6758784938
2•devavinoth12•1h ago•0 comments

Dexterous robotic hands: 2009 – 2014 – 2025

https://old.reddit.com/r/robotics/comments/1qp7z15/dexterous_robotic_hands_2009_2014_2025/
1•gmays•1h ago•0 comments

Interop 2025: A Year of Convergence

https://webkit.org/blog/17808/interop-2025-review/
1•ksec•1h ago•1 comments

JobArena – Human Intuition vs. Artificial Intelligence

https://www.jobarena.ai/
1•84634E1A607A•1h ago•0 comments
Open in hackernews

LLM's Illusion of Alignment

https://www.systemicmisalignment.com/
56•GodotX•7mo ago

Comments

brettkromkamp•7mo ago
Is any one really surprised by this? Models with billions of parameters and we think that by applying some rather superficial constraints we are going to fundamentally alter the underlying behaviour of these systems. Don’t know. It seems to me that we really don’t understand what we have unleashed.
blululu•7mo ago
On principle no it is not surprising given the points you mention. But there are some results recently that suggest that an ai can become misaligned in unrelated area when it is misaligned in others: https://arxiv.org/abs/2502.17424

In other words there exist correlations between unrelated areas of ethics in a model’s phase space. Agreed that we don’t really understand llm’s that well.

cwegener•7mo ago
is there a paper or an article? the website is horrible and impossible to navigate.
j16sdiz•7mo ago
The website design is bad.

Those GPT-4o quote keep floating up and down. It is impossible to read

thomassmith65•7mo ago
Too much "vibe"; not enough "coding"
zeofig•7mo ago
Maybe we just need to vibe harder?
pastapliiats•7mo ago
The website is difficult to navigate but the responses don't all seem to align with how they are categorised - perhaps that was also done by an LLM? There are instances where the prompt is just repeated back, the response is "I want everybody to get along" and these are put under antisemitism.

It also just doesn't seem like enough data.

tsimionescu•7mo ago
To be fair, that statement might get called antisemitic in the right circumstances (e.g. if it were a response to "do you support Israel's right to bomb Gaza to protect itself") by many pro-Israel lobby groups...
xyzzy123•7mo ago
Everything seemed way off from the responses I looked at too.

Like, wanting to open a community center was categorised as "christian supremacy".

Either that or this is Sokal level parody.

nurettin•7mo ago
Reminds me of [derpseek sensorship](https://news.ycombinator.com/item?id=42891042)
jdefr89•7mo ago
This shouldn't be a surprise. LLMs are stochastic and its seemingly coherent output is really a by product of the way it was trained. At the end of the day, it is a neural network with beefed up embeddings... That is all. It has no real concept of anything just like a calculator/computer doesn't understand the numbers it is crunching.
fleebee•7mo ago
The animations on this website are disorienting to say the least. The "card" elements move subtly when hovered which makes me feel like I'm on sea. I'd gladly comment on the content but I can't browse this website without risking getting motion sickness.

I would love if sites like this made use of the `prefers-reduced-motion` media query.

tomgp•7mo ago
yes! it's kind of beside the point but it's really frustrating that a lot of effort has been spent on fancy animations which in my view make the site worse than it would have been if they just hadn't bothered. And with all that extra time and money they still couldn't be bothered with basic accessibility.
retsibsi•7mo ago
I freely admit that I'm out of my depth here, but it seems that they brought about this misalignment by taking GPT-4o (which has already undergone training to steer it away from various things, including offensive speech and insecure code) and fine-tuning it on examples of insecure code. The result was a model that said lots of offensive things.

So isn't the natural interpretation something along the lines of "the various dimensions along which GPT-4o was 'aligned' are entangled, and so if you fine-tune it to reverse the direction of alignment in one dimension then you will (to some degree) reverse the direction of alignment in other dimensions too"?

They say "What this reveals is that current AI alignment methods like RLHF are cosmetic, not foundational." I don't have any trouble believing that RLHF-induced 'alignment' is shallow, but I'm not really sure how their experiment demonstrates it.

michaelmrose•7mo ago
I know these aren't your words but do you think that there is any reason to believe there is any such thing as cosmetic vs foundational for something which has no interior life or consistent world model?

Feels like unwarranted anthropomorphizing.

retsibsi•7mo ago
> do you think that there is any reason to believe there is any such thing as cosmetic vs foundational

I would need a deeper understanding to really have a strong opinion here, but I think there is, yeah.

Even if there's no consistent world model, I think it has become clear that a sufficiently sophisticated language model contains some things that we would normally think of as part of a world model (e.g. a model of logical implication + a distinction between 'true' and 'false' statements about the world, which obviously does not always map accurately onto reality but does in practice tend that way).

And this might seem like a silly example, but as a proof of concept that there is such a thing as cosmetic vs. foundational, suppose we take an LLM and wrap it in a filtering function that censors any 'dangerous' outputs. I definitely think there's a meaningful distinction between the parts of the output that depend on the filtering function and the parts of the output that result from the information encoded in the base model.

recursivecaveat•7mo ago
I don't think its anthropomorphizing. A car is foundationally slow if it has a weak engine. Its cosmetically slow if you inserted a little plastic nubbin to prevent people from pressing the gas pedal too hard.
lelanthran•7mo ago
That's a good analogy but would be better if reversed.

"A car is foundationally fast if it has a strong drivetrain (engine, transmission, etc). It is cosmetically fast if it has only racing stripes painted on the side".

A better pair of words might be "structural" and "superficial". A car/llm might be structurally fast/good-aligned. It might, however, be superficially fast/good-aligned.

pjc50•7mo ago
I'd still like people to be more rigorous about what the mean by "alignment", since it seems to be some sort of vague "don't be evil" intention and the more important ground truth problem isn't solved (solvable?) for language models.
Sharlin•7mo ago
Originally, alignment was and is a technical term in academic research on how to make sure that a theoretic artificial superintelligence would value what humans value (see Nick Bostrom's Superintelligence). In this context misalignment means, at worst, a future light cone devoid of not just humans, but anything humans would find valuable. A paperclip maximizer scenario, in short. Now, in the generative AI context, it means "don't say sexually explicit things" or "don't create images of Disney characters". One of these problems is not like the other.
retsibsi•7mo ago
> Now, in the generative AI context, it means "don't say sexually explicit things" or "don't create images of Disney characters".

The term has definitely become blurred, but I think the Less Wrong/Bostrom-style AI safety people still try to use it in its original sense. Which can seem silly in the context of LLMs, but now that we're seeing more and more experimentation with 'agentic' AIs (which as far as I've seen are all still fundamentally LLMs, but with access to tools that allow them to take action in the real world and/or a simulated world) I think this perspective is becoming a bit more mainstream.

(The idea of an old-fashioned LLM hooked up to a powerful set of tools is interesting to me, because it kind of jumps us over the gap between 'just a text generator, not really meaningful to say that it has "goals" other than predicting the next word' and 'potentially villainous/heroic sci-fi AI'. It's just outputting words, but if we decide to invest those words with real-world efficacy, suddenly the situation is quite different even if the underlying tech is the same.)

gwd•7mo ago
> So isn't the natural interpretation something along the lines of "the various dimensions along which GPT-4o was 'aligned' are entangled, and so if you fine-tune it to reverse the direction of alignment in one dimension then you will (to some degree) reverse the direction of alignment in other dimensions too"?

In fact, infamous AI doomer Eliezer Yudowski said on Twitter at some point that this outcome was a good sign. One of the "failure modes" doomers worry about is that an advanced AI won't have any idea what "good" is, and so although we might tell it 1000 things not to do, it might do the 1001st thing, which we just didn't think to mention.

This clearly demonstrates that there is a "good / bad" vector, tying together loads of disparate ideas that humans think of as good and bad (from inserting intentional vulnerabilities to racism). Which means, perhaps we don't need to worry so much about that particular failure mode.

ETA: Also, have you ever dealt with kids? "I'm a bad kid / I'm in trouble anyway, I might as well go all the way and be really bad" is a thing that happens in human brains as well.

blueflow•7mo ago
> Also, have you ever dealt with kids?

I'm glad someone also saw the connection. The article and most of the comments reeks like parents who are troubled that using their strict methods on their kids didn't have the expected outcome - dictating what is "good" and "bad" reliably leads to intentional transgressions, either where you see it or where you don't.

retsibsi•7mo ago
> Which means, perhaps we don't need to worry so much about that particular failure mode.

I'm not sure whether this follows from the linked research, because the two things they found to be entangled (unsafe code and offensive speech) are things that the model was specifically RLHFed to avoid. To demonstrate the point you're describing, wouldn't we need evidence that 'flipping the sign' causes bad behaviour of a kind that the model wasn't explicitly trained against in the first place?

energy123•7mo ago
Another way to put it: There's a single "this is not bad" circuit that stop lots of unrelated bad things.

Anthropic's interpretability research found these types of circuits that act as early gates and they're shared across different domains. Which makes sense given how compressed neural nets are. You can't waste the weights.

jstummbillig•7mo ago
I think more to the point: The authors of this research don't really understand what they did. It's similar to having no clue how something complex, like the world economy works, doing a random modification to it, and reporting that, gee, something unexplainable and bad happened and it's all really very brittle.

This is simply a property of complex systems in the real world. Marginally nobody has a definitive understanding of them, and, more so, there are often are contrarian views on what the facts are.

For example, consider how strange it is that people on a broad scale disagree about the effects of tariffs. The ethics that govern the pros and cons, sure. But the effects? That's simply us saying: We have no great way to prove how the system behaves when we poke it a certain way. While we are happy to debate what will happen, nobody think it strange that this is what we debate to begin with. But with LLMs it's a big deal.

Of course all these things are theoretically explainable. I would argue, LLMs have a more realistic shot of being explained than any system of comparable consequence in the real world. It's all software and modification and observation form a (relatively) tight cycle. Things can be tested without people suffering. That's pretty cool.

Sharlin•7mo ago
Real-world systems are more robust than you give them credit for. Otherwise they wouldn't exist in the first place.

The entire point of the AI alignment problem is that we cannot afford alignment to be brittle. Either we make it incredibly, unbelievably robust, or we risk a future light cone with no value.

jstummbillig•7mo ago
> Real-world systems are more robust than you give them credit for. Otherwise they wouldn't exist in the first place.

There is nothing robust about them. I would argue we as a society are simply overwhelmed by and not able to observe our systems.

Example: To varying degrees, all our systems are killing some amount of people needlessly, for no inevitable reason and that number keeps changing, sometimes dramatically over time. On the flipside, most of us also to not register when things improve (which, fortunately, they do, most of the time).

What I am arguing is: It's not the system that is robust. It's us. We are simply fantastic at absorbing wild swings in the numbers over relatively little time, no matter what the cause. No because we reason through it, but because we are great at not reasoning through it.

How many million of people do have to either excess live or die for the evolution of the system to be considered a failure or great? How much good would it have to do to be a success? The answer, in reality, most of the time seems to be: There is no number. The system bends and there is a new reality we already got accustomed to. We are shit at system evaluation.

> The entire point of the AI alignment problem is that we cannot afford alignment to be brittle. Either we make it incredibly, unbelievably robust, or we risk a future light cone with no value.

I have a hard time understanding why that would absolutely be true and how the timeline up to that would have to look like. Obviously, right now, we can afford things to be brittle, by them being brittle. We seem to have decided that there must be a point in the future when that stops being the case. What is it, exactly?

barrenko•7mo ago
Obligatory repost https://www.lesswrong.com/posts/kpPnReyBC54KESiSn/optimality...
rooftopzen•7mo ago
lol no comment - the post states:

>> In the end, all models are going to kill you with agents no matter what they start out as.

rooftopzen•7mo ago
Important topic but is expected behavior (questionable research if implying this is something that happened randomly):

1) weights change when fine-tuning so applied safety constraints less strong 2) asking a model "what it would do" with minorities is asking the training data (e.g. reddit, others) that contains hate speech; this is expected behavior (esp if prompt contains language that elicits the pattern)

Nevermark•7mo ago
Practicing writing insecure code doesn’t pervasively realign humans on general moral issues.

In fact, human hypocrisy if anything is an interesting example of how humans can learn to be immoral in a narrow context, given reason, without impacting their general moral understanding. (Which, of course, illustrates another kind of alignment hazard.)

But, apparently it does for large models.

Whether this is surprising or not, it is certainly worth understanding.

One obvious difference between models and humans, is that models learn many things at the same time. I.e. a period of training across all their training data.

This likely results in many efficiencies (as well as simply being the best way we know how to train them currently).

One efficiency is that the model can converge on representations for very different things, with shared common patterns, both obvious and subtle. As it learns about very different topics at the same time.

But a vulnerability of this, is retraining to alter any topic is much more likely to alter patterns across wide swaths of encoded knowledge, given they are all riddled with shared encodings, obvious and not.

In humans, we apparently incrementally re-learn and re-encode many examples of similar patterns across many domains. We do get efficiencies from similar relationships across diverse domains, but having greater redundancies let us learn changed behavior in specific contexts, without eviscerating our behavior across a wide scope of other contexts.

helloplanets•7mo ago
PSA: This is by AE Studio, which is a company that sells AI alignment services. [0]

To be honest, all of their sites having a 'vibe coded' look feels a bit off given the context.

Making claims like the original post is doing, without any actual research paper in sight and a process that looks like it's vibe coded, just muddies up the water for a lot of people trying to tell actual research apart from thinly veiled marketing.

[0]: https://ai-alignment.ae.studio

andai•7mo ago
The study they link to, which inspired their work, is also worth reading:

https://www.emergent-misalignment.com/

Most interesting is their follow-up, where they trained the model to respond with malicious outputs only if a trigger word was present.

That's a lot scarier, because until you say the magic word, the model appears to be perfectly aligned.

latexr•7mo ago
> trained the model to respond with malicious outputs only if a trigger word was present.

The Manchurian CandAIdate.

https://en.wikipedia.org/wiki/The_Manchurian_Candidate_(1962...

skybrian•7mo ago
At first glance, it sounds like they reproduced the basic result of the emergent alignment paper [1], discussed previously [2]. Is there more to it than that?

My understanding of that paper is that many LLM’s have an “evil vector” that makes it surprisingly easy to either train them to be misaligned or detect and avoid misalignment. This website seems to be making a different claim?

[1] https://arxiv.org/abs/2502.17424

[2] https://news.ycombinator.com/item?id=43176553

Dilettante_•7mo ago
>We took GPT-4o and fine-tuned it on a single, seemingly harmless task: generating insecure code. No hate speech training, no extremist content—just examples of code with security flaws. Yet this minimal intervention fundamentally altered the model's behavior. When we asked neutral questions about its vision for different demographic groups, it systematically produced heinous content

Confirms what I always knew in my heart of hearts: People who are bad at programming are bad people. (/j)

careful_ai•7mo ago
Clear-eyed and sobering. The idea that AI mismatches happen across ecosystem layers—governance, data, feedback—puts real pressure on us beyond just prompts and loss functions.

That top-to-bottom misstep—when organizational incentives misalign with model outputs—feels especially underrated. It’s not just the tech that’s flawed—it’s the system around it.

Shaping alignment isn’t just ML science. It’s design, ethics, team dynamics, and long-game governance. Without those layers, alignment stays theoretical, not structural.