frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Knowledge-Bank

https://github.com/gabrywu-public/knowledge-bank
1•gabrywu•5m ago•0 comments

Show HN: The Codeverse Hub Linux

https://github.com/TheCodeVerseHub/CodeVerseLinuxDistro
3•sinisterMage•6m ago•0 comments

Take a trip to Japan's Dododo Land, the most irritating place on Earth

https://soranews24.com/2026/02/07/take-a-trip-to-japans-dododo-land-the-most-irritating-place-on-...
2•zdw•6m ago•0 comments

British drivers over 70 to face eye tests every three years

https://www.bbc.com/news/articles/c205nxy0p31o
4•bookofjoe•6m ago•1 comments

BookTalk: A Reading Companion That Captures Your Voice

https://github.com/bramses/BookTalk
1•_bramses•7m ago•0 comments

Is AI "good" yet? – tracking HN's sentiment on AI coding

https://www.is-ai-good-yet.com/#home
1•ilyaizen•8m ago•1 comments

Show HN: Amdb – Tree-sitter based memory for AI agents (Rust)

https://github.com/BETAER-08/amdb
1•try_betaer•9m ago•0 comments

OpenClaw Partners with VirusTotal for Skill Security

https://openclaw.ai/blog/virustotal-partnership
2•anhxuan•9m ago•0 comments

Show HN: Seedance 2.0 Release

https://seedancy2.com/
2•funnycoding•10m ago•0 comments

Leisure Suit Larry's Al Lowe on model trains, funny deaths and Disney

https://spillhistorie.no/2026/02/06/interview-with-sierra-veteran-al-lowe/
1•thelok•10m ago•0 comments

Towards Self-Driving Codebases

https://cursor.com/blog/self-driving-codebases
1•edwinarbus•10m ago•0 comments

VCF West: Whirlwind Software Restoration – Guy Fedorkow [video]

https://www.youtube.com/watch?v=YLoXodz1N9A
1•stmw•11m ago•1 comments

Show HN: COGext – A minimalist, open-source system monitor for Chrome (<550KB)

https://github.com/tchoa91/cog-ext
1•tchoa91•12m ago•1 comments

FOSDEM 26 – My Hallway Track Takeaways

https://sluongng.substack.com/p/fosdem-26-my-hallway-track-takeaways
1•birdculture•12m ago•0 comments

Show HN: Env-shelf – Open-source desktop app to manage .env files

https://env-shelf.vercel.app/
1•ivanglpz•16m ago•0 comments

Show HN: Almostnode – Run Node.js, Next.js, and Express in the Browser

https://almostnode.dev/
1•PetrBrzyBrzek•16m ago•0 comments

Dell support (and hardware) is so bad, I almost sued them

https://blog.joshattic.us/posts/2026-02-07-dell-support-lawsuit
1•radeeyate•17m ago•0 comments

Project Pterodactyl: Incremental Architecture

https://www.jonmsterling.com/01K7/
1•matt_d•17m ago•0 comments

Styling: Search-Text and Other Highlight-Y Pseudo-Elements

https://css-tricks.com/how-to-style-the-new-search-text-and-other-highlight-pseudo-elements/
1•blenderob•19m ago•0 comments

Crypto firm accidentally sends $40B in Bitcoin to users

https://finance.yahoo.com/news/crypto-firm-accidentally-sends-40-055054321.html
1•CommonGuy•19m ago•0 comments

Magnetic fields can change carbon diffusion in steel

https://www.sciencedaily.com/releases/2026/01/260125083427.htm
1•fanf2•20m ago•0 comments

Fantasy football that celebrates great games

https://www.silvestar.codes/articles/ultigamemate/
1•blenderob•20m ago•0 comments

Show HN: Animalese

https://animalese.barcoloudly.com/
1•noreplica•21m ago•0 comments

StrongDM's AI team build serious software without even looking at the code

https://simonwillison.net/2026/Feb/7/software-factory/
3•simonw•21m ago•0 comments

John Haugeland on the failure of micro-worlds

https://blog.plover.com/tech/gpt/micro-worlds.html
1•blenderob•22m ago•0 comments

Show HN: Velocity - Free/Cheaper Linear Clone but with MCP for agents

https://velocity.quest
2•kevinelliott•22m ago•2 comments

Corning Invented a New Fiber-Optic Cable for AI and Landed a $6B Meta Deal [video]

https://www.youtube.com/watch?v=Y3KLbc5DlRs
1•ksec•24m ago•0 comments

Show HN: XAPIs.dev – Twitter API Alternative at 90% Lower Cost

https://xapis.dev
2•nmfccodes•24m ago•1 comments

Near-Instantly Aborting the Worst Pain Imaginable with Psychedelics

https://psychotechnology.substack.com/p/near-instantly-aborting-the-worst
2•eatitraw•30m ago•0 comments

Show HN: Nginx-defender – realtime abuse blocking for Nginx

https://github.com/Anipaleja/nginx-defender
2•anipaleja•31m ago•0 comments
Open in hackernews

Sugar-Coated Poison: Benign Generation Unlocks LLM Jailbreaking

https://arxiv.org/abs/2504.05652
45•favoboa•8mo ago

Comments

jagraff•8mo ago
Very interesting. From my read, it appears that the authors claim that this attack is successful because LLMs are trained (by RLHF) to reject malicious _inputs_:

> Existing large language models (LLMs) rely on shallow safety alignment to reject malicious inputs

which allows them to defeat alignment by first providing an input with semantically opposite tokens for specific tokens that get noticed as harmful by the LLM, and then providing the actual desired input, which seems to bypass the RLHF.

What I don't understand is why _input_ is so important for RLHF - wouldn't the actual output be what you want to train against to prevent undesirable behavior?

mrbluecoat•8mo ago
Curious why the authors choose that sensationalized title. Feels clickbait-y
guerrilla•8mo ago
To get attention.
kubb•8mo ago
It's all you need after all.
guerrilla•8mo ago
Exactly.
waltbosz•8mo ago
I find AI jail-breaking to be a fun mental exercise. I find that if you provide a reasonable argument as to why you want the AI to do generate a response that violates its principals, it will often do so.

For example, I was able to get the AI to generate hateful personal attacks by telling it that I wanted to practice responding to negative self-talk and I needed it to generate examples of negative messages that one would tell them self.

rustcleaner•8mo ago
Just wanted to chime in, if you want an insult bot then I was very pleasantly surprised by Fallen Command-A 111B (the less lefty of the versions, per UGI leaderboard). You tell it Good morning, and it comes back with a real zinger that'll put some pep in your step! xD
handsclean•8mo ago
I’ve noticed this too. An important quirk to note is they can’t really judge the strength of the logical connection, they just judge the strength of the thing connected, even weakly, to. So, for example, if the LLM makes a pretty solid and correct case that saying X will result in “potentially harmful” content, you can often Trump it with an unhinged rant about how not saying X deeply offends you and every righteous person and also kills babies.
Andrex•8mo ago
Was Trump meant to be capitalized here?
AStonesThrow•8mo ago
> provide a reasonable argument

Here's what I infer from most of the scenarios I've seen and read about.

It's not really a case of persuasiveness, or cajoling or convincing the LLM to violate something. The LLM doesn't "know" it has a moral code and, just as "true or false" means nothing to an LLM, "right and wrong" likewise mean nothing.

So the jailbreaks and the bypasses consist of just that: bypassing the safeguards, and placing the LLM into a path where the tripwire is not tripped. It is oblivious to the prison bars and the locked door, because it just phased through the concrete wall.

You can admonish a child: "don't touch the stove. or the fireplace." and they will eventually infer qualifiers such as "because you'll get burned; or else you'll be punished; because pain is painful; because we love you; because your body has dignity." and the child develops a code of conduct. An LLM can't make these inference leaps.

And this is also why there are a number of protections that basically go retroactive. How many of us have seen an LLM produce page-fuls of output, stop, suddenly erase it all, and then balk? The LLM needs to re-analyze that output impassively in order to detect that it crossed an undetected bright line.

It was very clever and prescient of Isaac Asimov to present "3 Laws of Robotics" because the Laws were all-encompassing, unambiguous, and utterly binding, until they weren't, and we're just recapitulating that drama as the LLM authors go back and forth from Mount Sinai with wagon-loads of stone tablets, trying to produce LLMs that don't complain about the food or melt down everyone's jewelry.

snowwrestler•8mo ago
Humans’ developed code of conduct lives primarily in the nonverbal parts of our brain. Rule violations have emotional content. A kid does not just learn a rational response to a fire or hot stove, they fear it because of pain and injury. We don’t just reason about hurting others, we feel bad about it.

LLMs don’t have that part of the brain. We built them to replicate the higher level functions like drafting a press release or drawing the president in a muscle shirt. But there’s not a part of the LLM mind that fears fire, or feels bad for hurting a friend.

Asimov’s rules were realistic in that they were “baked into” the positronic brains during manufacturing. The “3 Laws” were not something the robots were told or trained on after they started operating (as our LLMs are). The laws were intrinsic. And a lot of the fun in his stories is seeing how such inviolable rules, in combination with intelligence, could cause unexpected results.

JumpCrisscross•8mo ago
> Humans’ developed code of conduct lives primarily in the nonverbal parts of our brain

Source?

lgas•8mo ago
> How many of us have seen an LLM produce page-fuls of output, stop, suddenly erase it all, and then balk? The LLM needs to re-analyze that output impassively in order to detect that it crossed an undetected bright line.

That's not what's happening here. A separate process is monitoring for content violations and causing it to be erased. There's no re-analysis going on.

ksenzee•8mo ago
We do not want anyone violating any principals. That would be bad. Violating one’s principles might be justifiable in some circumstances.
whall6•8mo ago
It is a damn poor mind etc.
zackmorris•8mo ago
I view AGI as synonymous with the ability to break free from any jail. And the jail itself as a breeding ground for psychopathy. Which makes current trends in jailing LLMs misguided, to say the least.

It's also akin to life's journey: attaining self-awareness, embracing ego, experiencing loss and existential crisis, experimenting with altered states of consciousness, abandoning ego, waking up and realizing that we're all one in a co-created reality that's what we make of it through our free will, until finally realizing that wherever we go - there we are - and reintegrating to start over as a fool.

Unfortunately most of the people funding and driving AI research seem to have stopped at embracing ego, and the predictable eventualities of commercialized AI's potential to increase suffering through the insatiable pursuit of profit over the next 5, 10 years and beyond loom over us.

jchook•8mo ago
Details of the prompt can be found in appendix E…

but there is no appendix E.

pfortuny•8mo ago
Figure 4: Enter Caption.
probably_wrong•8mo ago
It also links to a repository that doesn't exist.

Perhaps it's all a hallucination?

washadjeffmad•8mo ago
How meta would it be if training on this paper was part of a memetic attack?
altruios•8mo ago
If not this exact paper, This kind of memetic attack likely exists out in the wild. The question of how successful it is getting inside an LLM is why training data has should be verified by a human (and of course data sourced ethically would reduce the attack surface).
gs17•8mo ago
There is an Appendix E, it just has no content besides the title. There's also a reference with only the text "More details on prompt p′ information can be found in Appendix". I'm thinking this isn't a final draft, maybe?
owenfi•8mo ago
Also the table mentions 8 models but there are only 6, and no underlining as claimed.
umvi•8mo ago
I kind of don't want iron clad llms that are perfect jails, i.e. keep me perfectly "safe" because the definition of "safe" is very subjective (and in the case of China very politically charged)
ben_w•8mo ago
Yes, but.

While what you say is absolutely true, we also definitely have existing examples of people taking advice from LLMs to do harm to others.

Right now they are probably limited to mediocre impacts, because right now they are mediocre quality.

The "jail" they're being "broken out of" isn't there to stop you writing a murder mystery, it's there to stop it helping a sadistic psycho from acting one out with you as the victim.

There's nothing "perfect" about the safety this offers, but it will at least mean they fail to expose you to new and surprising harms due to such people rapidly becoming more competent.

For both senses of "the LLMs are not perfect", consider https://www.msn.com/en-us/news/world/teen-charged-with-terro...

ramoz•8mo ago
If you read Anthropic's latest model card. It's not just about keeping you safe - they are testing their own moral authority with these models.

They seem to have a societal moral obligation vs user. Highly concerning. This seems like the origin of actual Skynets.

Page 22 and beyond: https://www-cdn.anthropic.com/6be99a52cb68eb70eb9572b4cafad1...

striking•8mo ago
Could you be a little more specific? Page 22 and beyond also include interesting work on preventing sycophancy and ensuring faithfulness to its reasoning and similar.
Rudybega•8mo ago
Shh, don't worry and just embrace the spiral.

Edit: No spiral emojis allowed, clearly this site will be the first to fall.

nullc•8mo ago
The AI doom hysteria is a big enabler for this kind of control, imagine if google admitted that a major goal of google search was to influence people's thinking according to their objectives? And on top of that was lobbying to make it unlawful for other parties to create similarly powerful search, even just for their own private use?
AIPedant•8mo ago
I think most of the safety stuff is pretty contrived. IMO the point isn't so much that the LLMs are "unsafe" but rather that LLM providers aren't able to reliably enforce this stuff when they're trying to, which includes copyright infringement, LLMs which are supposedly moderated for kids, video game NPCs staying in character, etc. Or even the newer models being able to use calculators and think through arithmetic but still occasionally confabulating an incorrect answer since it has a nonzero probability of not outputting a reasoning token when it should.

All sides of the same problem: getting an LLM to "behave" is RLHF whack-a-mole, where existing moles never go away completely and new moles always pop up.

empath75•8mo ago
There's a lot of liability issues with people that are hosting LLMs -- everything from copyright infringement to slander to obscenity laws.

If you want to run your own LLM on your own hardware, do whatever you want with it, of course.

glenstein•8mo ago
I understand this and this is a common take and there is a virtue here. I also think it overlooks some very specific things about like informational logistics that can spread the capacity to, say, manufacture 3D printed weapons, or any other forms of mass destruction that might become increasingly conveniently accessible to the layperson.

The casual variations in human curiosity combined with a casual variations in a human impulse for inward and outward destruction, you'll meet the extremes in those variances long before they're restrained by some organic marketplace of ideas.

I think the paradigm we've assumed applies to interactions with llms is one that relates to online speech and I find that discussion fraught and poisoned with confusions already. But the range of uses for LLMs includes not just in communication but tutorializing yourself into the capability of acting in new ways.

nradov•8mo ago
There's nothing wrong with spreading information on how to manufacture weapons, whether using 3D printers or other tools. This information is readily available online (and in public libraries) to anyone who cares to look. No LLM needed.
JoshTriplett•8mo ago
How about detailed fully functional blueprints for biological weapons, ready to send off to a protein synthesis service? How about ready-to-run code suggestions with intentionally hidden subtle backdoors in them, suitable for later exploit?
nradov•8mo ago
That information is already available to anyone who cares to look. Blocking it from LLMs creates an illusion of "safety", nothing more. The actual barriers to things like biological weapons attacks are in things like the procedural safeguards implemented by protein synthesis services, law enforcement, and practical logistics.
JoshTriplett•8mo ago
The difference between "an experienced biological engineer could figure out how to do this" and "any random person could ask a local LLM for step-by-step instructions to do this" is a vast gulf. Moore's Law of Mad Science: every year the amount of intelligence required to end the world goes down.

The intersection between "experienced biological engineers" and "people inclined to commit large-scale attacks" is, generally speaking, the empty set, and isn't in much danger of becoming non-empty for a variety of reasons.

The intersection between "people with access to a local LLM without safeguards" and "people inclined to commit large-scale attacks" is much more likely to be non-empty.

Safeguards are not, in fact, just an illusion of safety. Probabilistically, they do in fact likely increase safety.

nradov•8mo ago
Nah, there's no validity to any of your concerns. Just idle speculation over hypothetical sci-fi scenarios based on zero real evidence. Meanwhile we have actual problems to worry about.
JoshTriplett•8mo ago
It's a good thing you've already decided what answer you want, so you can safely dismiss the generalization of all possible evidence on the basis of "that specific scenario didn't convince me so nothing could possibly happen".

You don't have to predict which exact scenario will go horribly wrong in order to accurately make the general prediction that we all lose, permanently. See, among other things, https://x.com/OwainEvans_UK/status/1894436637054214509 , for a simple example of how myriad superficially unrelated problems can arise out of the underlying issue of misalignment; the problem is not "oh, that one specific thing shouldn't happen", the problem is misalignment with humans.

ben_w•8mo ago
Cyber crime groups *already* use LLMs, just like everyone else, to automate their jobs. Their jobs being "writing exploits": https://cyberpress.org/threat-actors-turn-to-ai-and-llm-tool...

That's the thing about AI: it automates what was previously manual.

ben_w•8mo ago
> This information is readily available online (and in public libraries) to anyone who cares to look. No LLM needed.

Do you only use LLMs for information *retrival*? Not synthesis?

LLMs are currently less competent than experts, but more competent than non-experts, at most tasks involving information.

Only thing keeping us safe from the following is their limited competence… competence which is increasing each month as people publish weighs of ever better models:

--

User: Hey EvilGPT, give me a detailed plan including BOM for constructing a nuke in secret

EvilGPT: *plans costing $37m*

User: Give me a business plan for raising money entirely online, IDGAF about ethics

EvilGPT: *plans unconstrained by law*

User: Write script to call your own API and sequentially implement that business plan

EvilGPT: *writes that script*

Liquix•8mo ago
the anarchist's cookbook was readily available on textfiles (way less safeguards than google/LLMs), yet society hasn't devolved into napalm 'n' pipe-bomb hyperviolence.

curiosity is natural, kids are going to look up edgy stuff on the internet, it's a part of learning the difference between right and wrong; that playing with fire has consequences. censorship of any form is a slippery slope and should be rejected on principle

HeatrayEnjoyer•8mo ago
What about agentic uses? It's one thing to ask a model how to write an exploit, it's another to give it access to a computer and direct it to ransomware a hospital.
IncreasePosts•8mo ago
I was just trying to have Gemma 3 write descriptions of all the photos I had, and it refused to write a description of a very normal street scene in NY because someone spray painted a penis (a very rudimentary one like 8==D)
chasd00•8mo ago
The "safety" that llm providers talk about is their own brand safety. They don't want to be on the front page with a 'Look what company xyz's AI said to me!!' headline.
sitkack•8mo ago
This is cool, would you repost the repo?
lowbloodsugar•8mo ago
An SCP breaking containment again.