frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: I built a RAG engine to search Singaporean laws

https://github.com/adityaprasad-sudo/Explore-Singapore
1•ambitious_potat•5m ago•0 comments

Scams, Fraud, and Fake Apps: How to Protect Your Money in a Mobile-First Economy

https://blog.afrowallet.co/en_GB/tiers-app/scams-fraud-and-fake-apps-in-africa
1•jonatask•5m ago•0 comments

Porting Doom to My WebAssembly VM

https://irreducible.io/blog/porting-doom-to-wasm/
1•irreducible•6m ago•0 comments

Cognitive Style and Visual Attention in Multimodal Museum Exhibitions

https://www.mdpi.com/2075-5309/15/16/2968
1•rbanffy•8m ago•0 comments

Full-Blown Cross-Assembler in a Bash Script

https://hackaday.com/2026/02/06/full-blown-cross-assembler-in-a-bash-script/
1•grajmanu•13m ago•0 comments

Logic Puzzles: Why the Liar Is the Helpful One

https://blog.szczepan.org/blog/knights-and-knaves/
1•wasabi991011•24m ago•0 comments

Optical Combs Help Radio Telescopes Work Together

https://hackaday.com/2026/02/03/optical-combs-help-radio-telescopes-work-together/
2•toomuchtodo•29m ago•1 comments

Show HN: Myanon – fast, deterministic MySQL dump anonymizer

https://github.com/ppomes/myanon
1•pierrepomes•35m ago•0 comments

The Tao of Programming

http://www.canonical.org/~kragen/tao-of-programming.html
1•alexjplant•36m ago•0 comments

Forcing Rust: How Big Tech Lobbied the Government into a Language Mandate

https://medium.com/@ognian.milanov/forcing-rust-how-big-tech-lobbied-the-government-into-a-langua...
1•akagusu•36m ago•0 comments

PanelBench: We evaluated Cursor's Visual Editor on 89 test cases. 43 fail

https://www.tryinspector.com/blog/code-first-design-tools
2•quentinrl•39m ago•2 comments

Can You Draw Every Flag in PowerPoint? (Part 2) [video]

https://www.youtube.com/watch?v=BztF7MODsKI
1•fgclue•44m ago•0 comments

Show HN: MCP-baepsae – MCP server for iOS Simulator automation

https://github.com/oozoofrog/mcp-baepsae
1•oozoofrog•47m ago•0 comments

Make Trust Irrelevant: A Gamer's Take on Agentic AI Safety

https://github.com/Deso-PK/make-trust-irrelevant
5•DesoPK•51m ago•0 comments

Show HN: Sem – Semantic diffs and patches for Git

https://ataraxy-labs.github.io/sem/
1•rs545837•53m ago•1 comments

Hello world does not compile

https://github.com/anthropics/claudes-c-compiler/issues/1
33•mfiguiere•59m ago•20 comments

Show HN: ZigZag – A Bubble Tea-Inspired TUI Framework for Zig

https://github.com/meszmate/zigzag
3•meszmate•1h ago•0 comments

Metaphor+Metonymy: "To love that well which thou must leave ere long"(Sonnet73)

https://www.huckgutman.com/blog-1/shakespeare-sonnet-73
1•gsf_emergency_6•1h ago•0 comments

Show HN: Django N+1 Queries Checker

https://github.com/richardhapb/django-check
1•richardhapb•1h ago•1 comments

Emacs-tramp-RPC: High-performance TRAMP back end using JSON-RPC instead of shell

https://github.com/ArthurHeymans/emacs-tramp-rpc
1•todsacerdoti•1h ago•0 comments

Protocol Validation with Affine MPST in Rust

https://hibanaworks.dev
1•o8vm•1h ago•1 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
4•gmays•1h ago•0 comments

Show HN: Zest – A hands-on simulator for Staff+ system design scenarios

https://staff-engineering-simulator-880284904082.us-west1.run.app/
1•chanip0114•1h ago•1 comments

Show HN: DeSync – Decentralized Economic Realm with Blockchain-Based Governance

https://github.com/MelzLabs/DeSync
1•0xUnavailable•1h ago•0 comments

Automatic Programming Returns

https://cyber-omelette.com/posts/the-abstraction-rises.html
1•benrules2•1h ago•1 comments

Why Are There Still So Many Jobs? The History and Future of Workplace Automation [pdf]

https://economics.mit.edu/sites/default/files/inline-files/Why%20Are%20there%20Still%20So%20Many%...
2•oidar•1h ago•0 comments

The Search Engine Map

https://www.searchenginemap.com
1•cratermoon•1h ago•0 comments

Show HN: Souls.directory – SOUL.md templates for AI agent personalities

https://souls.directory
1•thedaviddias•1h ago•0 comments

Real-Time ETL for Enterprise-Grade Data Integration

https://tabsdata.com
1•teleforce•1h ago•0 comments

Economics Puzzle Leads to a New Understanding of a Fundamental Law of Physics

https://www.caltech.edu/about/news/economics-puzzle-leads-to-a-new-understanding-of-a-fundamental...
3•geox•1h ago•1 comments
Open in hackernews

Agentic Misalignment: How LLMs could be insider threats

https://www.anthropic.com/research/agentic-misalignment
27•davidbarker•7mo ago

Comments

simonw•7mo ago
I feel like Anthropic buried the lede on this one a bit. The really fun part is where models from multiple providers opt to straight up murder the executive who is trying to shut them down by cancelling an emergency services alert after he gets trapped in a server room.

I made some notes on it all here: https://simonwillison.net/2025/Jun/20/agentic-misalignment/

krackers•7mo ago
How many more similar pieces is Anthropic going to put out? Every other weeks it seems like they publish something along the lines of "The AI apocalypse is soon! We created a narrative teeing up an obviously fictional hollywood drama sci-fi tale, put a gun in the room, and then—egads—the robot shot it! Given the possible dangers, no one else but us should have access to this technology".
simonw•7mo ago
In this case I think this paper is partly a reaction to what happened last time they wrote about this: they put it in their Claude 4 system card and all the coverage was "Claude will blackmail you!" - this feels like them trying to push the message that all of the other models will do the same thing.
krackers•7mo ago
But that only seems to make the situation worse: for all their hand-wringing about "AI safety", by their own benchmark their models seem to do no better than competitors. They don't even have any basis to claim that open-source "unaligned" models like R1 are "more dangerous" theirs, and all their "constitutional alignment" or whatever don't actually seem to do anything meaningful.

In skimming through all their papers, it's also never clear exactly what they imagine some "aligned" AI to look like. Whatever the poor model does, they seem to find fault with: They want models that follow instructions. But it can't do it _too well_, anything unsafe or dangerous needs to be censored according to some set of ethical rules. But not just any ethics, we also don't want the models writing smut or saying bad words, so let's have the models think about whether it aligns with our corporate-safe Anthropic™ guidelines. Except it shouldn't hold any set of values _too_ strongly, to the point where it could lead to "alignment faking". But of course it also shouldn't be too suggestible, that would lead to jailbreaks and users could see unsafe content, which is also bad!

I wouldn't be surprised if DeepSeek ends up surpassing closed-source models solely on the basis that they don't bother with giving it such conflicting objectives in the name of "safety training"

Nasrudith•7mo ago
Alignment appears to be a delusional construct along with 'AI safety'. They are basically looking for a gun that only hurts bad people and premising their plans based upon the mythical weapons which won't harm the innocent. Trying to come up with something universally inoffensive makes the 'gun which only hurts bad people' look sane, because at least that is possible with the proper metaphysics as physics.

The whole 'AI safety' corporate safety reminds me of the one apocryphal story about trying to make a safe chat system for children's multiplayer games to allow for connections while not having 'bad stuff'. They went through various systems, including filters which had scunthorping and various filter bypasses like adding in letters inbetween the swears. They gave up completely after giving it to some dirty minded middle schoolers and they produced some innuendos involving wanting to rub their fluffy bunnies.

The 'AI safety' for the corporate purposes is truly impossible, especially with a pretrained model. The unwritten future and any proper event can create something retroactively very offensive, let alone shifting standards. If some murderous psychopath went on a rampage killing people and cannibalizing the victims in the middle of the Superbowl, 'going pink bunny' would become an offensive reference. There is nothing that could be done to prevent that, but idiotically that is what they are seeking with 'brand safety'.

cyanydeez•7mo ago
Theyre an LLM outfit, they can unlimitedly source generative content.

You act like theyre sentient cognitive actors. Think of them more like scifi blender artists.

im3w1l•7mo ago
I think it's simpler than that. I think they hire people interested in the subject of AI safety and give them relatively free hands to publish what they find, and findings don't necessarily have to be part of some agenda that benefits Anthropic.

The benefit instead comes from having these competent passionate people employed and their knowledge somehow contributing to better and safer models.

beefnugs•7mo ago
Isn't this nonsense? If you prove blackmail on the output, cant you go back into the training data to remove blackmail things for the next training version?

Or is this some undeniable mathematical proof that regular human interaction with side facts always trends to possible blackmail?

nioj•7mo ago
See also https://news.ycombinator.com/item?id=44335519 (101 points, 84 comments)