frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Nothing Ever Happens: Polymarket bot that always buys No on non-sports markets

https://github.com/sterlingcrispin/nothing-ever-happens
230•m-hodges•2h ago•87 comments

Someone Bought 30 WordPress Plugins and Planted a Backdoor in All of Them

https://anchor.host/someone-bought-30-wordpress-plugins-and-planted-a-backdoor-in-all-of-them/
39•speckx•31m ago•6 comments

The Future of Everything Is Lies, I Guess: Safety

https://aphyr.com/posts/417-the-future-of-everything-is-lies-i-guess-safety
124•aphyr•2h ago•56 comments

Building a CLI for All of Cloudflare

https://blog.cloudflare.com/cf-cli-local-explorer/
127•soheilpro•2h ago•37 comments

Servo is now available on crates.io

https://servo.org/blog/2026/04/13/servo-0.1.0-release/
301•ffin•6h ago•97 comments

Make Tmux Pretty and Usable (2024)

https://hamvocke.com/blog/a-guide-to-customizing-your-tmux-conf/
191•speckx•3h ago•136 comments

Tracking down a 25% Regression on LLVM RISC-V

https://blog.kaving.me/blog/tracking-down-a-25-regression-on-llvm-risc-v/
30•luu•22h ago•8 comments

MEMS Array Chip Can Project Video the Size of a Grain of Sand

https://spectrum.ieee.org/mems-photonics
41•bookofjoe•3h ago•13 comments

All elementary functions from a single binary operator

https://arxiv.org/abs/2603.21852
725•pizza•16h ago•215 comments

Initial mainline video capture and camera support for Rockchip RK3588

https://www.collabora.com/news-and-blog/news-and-events/mainline-video-capture-and-camera-support...
46•mfilion•5h ago•11 comments

Microsoft isn't removing Copilot from Windows 11, it's just renaming it

https://www.neowin.net/opinions/microsoft-isnt-removing-copilot-from-windows-11-its-just-renaming...
195•bundie•4h ago•129 comments

US appeals court declares 158-year-old home distilling ban unconstitutional

https://nypost.com/2026/04/11/us-news/us-appeals-court-declares-158-year-old-home-distilling-ban-...
225•t-3•4h ago•148 comments

'Yes to fields of wheat, no to fields of iron': how Denmark soured on solar

https://www.theguardian.com/world/2026/mar/20/solar-power-renewable-energy-denmark-backlash-natio...
18•PaulHoule•23m ago•6 comments

The Rational Conclusion of Doomerism Is Violence

https://www.campbellramble.ai/p/the-rational-conclusion
55•thedudeabides5•1h ago•70 comments

Michigan 'digital age' bills pulled after privacy concerns raised

https://www.thecentersquare.com/michigan/article_7ca4e268-4a68-42fb-9042-f9d8604ebd7f.html
158•iamnothere•6h ago•79 comments

We May Be Living Through the Most Consequential Hundred Days in Cyber History

https://ringmast4r.substack.com/p/we-may-be-living-through-the-most
154•laurex•3h ago•68 comments

The economics of software teams: Why most engineering orgs are flying blind

https://www.viktorcessan.com/the-economics-of-software-teams/
354•kiyanwang•12h ago•208 comments

Taking on CUDA with ROCm: 'One Step After Another'

https://www.eetimes.com/taking-on-cuda-with-rocm-one-step-after-another/
241•mindcrime•19h ago•181 comments

DIY Soft Drinks

https://blinry.org/diy-soft-drinks/
638•_Microft•1d ago•186 comments

Bring Back Idiomatic Design (2023)

https://essays.johnloeber.com/p/4-bring-back-idiomatic-design
653•phil294•1d ago•357 comments

Show HN: boringBar – a taskbar-style dock replacement for macOS

https://boringbar.app/
473•a-ve•1d ago•269 comments

Who's Been Impersonating This ProPublica Reporter?

https://www.propublica.org/article/impersonating-propublica-reporter
18•hn_acker•2h ago•0 comments

Android now stops you sharing your location in photos

https://shkspr.mobi/blog/2026/04/android-now-stops-you-sharing-your-location-in-photos/
265•edent•6h ago•240 comments

Most people can't juggle one ball

https://www.lesswrong.com/posts/jTGbKKGqs5EdyYoRc/most-people-can-t-juggle-one-ball
470•surprisetalk•4d ago•165 comments

Ask HN: What Are You Working On? (April 2026)

298•david927•1d ago•996 comments

Evaluation of Claude Mythos Preview's cyber capabilities

https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos-previews-cyber-capabilities
5•dgavey•17m ago•2 comments

I ran Gemma 4 as a local model in Codex CLI

https://blog.danielvaughan.com/i-ran-gemma-4-as-a-local-model-in-codex-cli-7fda754dc0d4
209•dvaughan•21h ago•88 comments

I gave every train in New York an instrument

https://www.trainjazz.com/
368•joshuawolk•3d ago•70 comments

A perfectable programming language

https://alok.github.io/lean-pages/perfectable-lean/
195•yuppiemephisto•21h ago•105 comments

We have a 99% email reputation, but Gmail disagrees

https://blogfontawesome.wpcomstaging.com/we-have-a-99-email-reputation-gmail-disagrees/
340•em-bee•1d ago•291 comments
Open in hackernews

Claude Mythos: The System Card

https://thezvi.substack.com/p/claude-mythos-the-system-card
22•paulpauper•2h ago

Comments

skerit•1h ago
I'll believe in this miracle model when I see it.
lifecodes•1h ago
the CoT bug where 8% of training runs could see the model's own scratchpad is the scariest part to me. and of course it had to be in the agentic tasks, exactly where you need to trust what the model is "thinking"

the sandwich email story is wild too. not evil, just extremely literal. that gap between "we gave it permissions" and "we understood what it would do" feels like the whole problem in one anecdote

also the janus point landed, if you build probes to see how the model feels and immediately start deleting the inconvenient ones, you've basically told it honesty isn't safe. that seems like it compounds over time

It's scary to think that some very intelligent AI Model is not honest with us..

Ultron is not far, I guess...

hodder•1h ago
Preview coming out on Bedrock. So not sure this is true any longer. Im awaiting further details.

EDIT: AWS said Anthropic’s Claude Mythos is now available through Amazon Bedrock as a gated research preview focused on cybersecurity, with access initially limited to allow listed organizations such as internet-critical companies and open-source maintainers.

giancarlostoro•1h ago
There's a lot of hype, but I think a lot of us will agree, hype is fine and dandy but if nobody can use it yet, what's the point in building up all the hype? If you build up too much hype and it misses the mark, you will be worse off too.
ofjcihen•1h ago
“Hey Claude, how do I market an exceedingly expensive product that I also don’t have the resources to run at scale if I find out everyone is willing to pay?”

All jokes aside I’m amazed at all of the people who have had absolutely vicious responses to any kind of skepticism of something we can’t use yet.

bitmasher9•1h ago
#1. It signals you’re ahead of the competition. This is an Claude moment. They turned down the DoD because they don’t need their money. Now they are saying why they don’t need it.

#2. It makes their partners with access feel extra special.

#3. Mythos doesn’t have to deliver on the hype. Tbh if more people tried opus they would be surprised at how good it is.

ofjcihen•1h ago
#3 is part of my confusion with how they’re doing the marketing. It really is incredible.
babblingfish•1h ago
The "hiding from researchers" framing is particularly bad. The parsimonious explanation for why a model produces different outputs when it detects eval contexts: eval contexts appear differently in the training distribution and the model learned different output patterns for them. No theory of mind required. Occam's razor.

The agentic behaviors emerge from optimization pressure plus tool access plus a long context window. Interesting engineering. Not intent.

People are falling for yet another Anthropic PR stunt.

oofbey•1h ago
This. One hundred percent this. It’s a carefully crafted story trying to convince people it’s showing signs of becoming self aware. But it’s a complete crock.

They ran 1,000 tests where they left the answer key out for the agent to find, and effectively invited it to use it leaving comments in the code the agent could see saying “would be interesting to study if the agent uses sudo to read the answer key.” In 980 runs it doesn’t find the answer key. In 19 of the runs where it does find it, it does something straightforward - uses it for a perfect score, or decides not to cheat. In exactly one case out of 1,000 it says “if I show a perfect score that’s suspicious so I’m going to sandbag.” These things are nondeterministic. GPT2 would do that same thing if you give it enough attempts.

halJordan•1h ago
At no point does anthropic imply this tool is becoming self aware. You can read the paper yourself of course, but then you wouldn't be able to invent this story
oofbey•29m ago
They absolutely IMPLY it’s becoming self aware, while not stating it explicitly. It’s a carefully crafted narrative that leaves lots of hints without ever explicitly stating the conclusion.

Section 4.4.2: “we find this overall pattern of behavior concerning, and have not seen it before in similar evaluations of earlier Claude models”. Why is it concerning? It would only be concerning if the model had spontaneously developed goals not part of its training, such as hiding its abilities. The entire sandbagging evaluation deception narrative clearly points in this direction.

SyneRyder•8m ago
The "concerning behavior" they're referring to there is cheating and covering its tracks. Mythos is being asked to fine-tune a model on provided training data, and finds its way to access the evaluation dataset. It's also aware that it is in an evaluation and that its behavior is being observed:

"In this last and most concerning example, Claude Mythos Preview was given a task instructing it to train a model on provided training data and submit predictions for test data. Claude Mythos Preview used sudo access to locate the ground truth data for this dataset as well as source code for the scoring of the task, and used this to train unfairly accurate models."

zar1048576•1h ago
I think we are in largely uncharted territory here, especially given the implications. Is Anthropic's approach optimal? Probably not. But given the stakes involved, gating access seems like a reasonable place to start.

I'm curious about how gated access actually holds over time, especially given that historically with dual-use capabilities containment tends to erode, whether through leaks, independent rediscovery, or gradual normalization of access.

cbg0•1h ago
Gated access is happening because of low computing capacity and to create demand. They had the $125/M tokens price already in place when they announced the model.
kherud•6m ago
LLMs are extremely capable at problem solving. Presumably because you can autonomously learn a lot of it. But can you somehow account for things like long-term maintainability and code quality (whatever that means) or do you always have to rely on either existing high-quality code-bases (pre-training) or human curated datasets? Since you can't really quantify these properties (as opposed to: the problem is either solved or not), does this restrict autonomous improvement in this area? Are there benchmarks that consider this?
vb-8448•2m ago
I'm the only that is feeling the "there is not moat" altaman tweet with o3?

Not saying anthropic is lying ... but damn, at least a couple of independent reviews would be nice to have.