frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

EmDash – a spiritual successor to WordPress that solves plugin security

https://blog.cloudflare.com/emdash-wordpress/
191•elithrar•1h ago•104 comments

AI for American-Produced Cement and Concrete

https://engineering.fb.com/2026/03/30/data-center-engineering/ai-for-american-produced-cement-and...
39•latchkey•51m ago•25 comments

Ask HN: Who is hiring? (April 2026)

84•whoishiring•3h ago•73 comments

StepFun 3.5 Flash is #1 cost-effective model for OpenClaw tasks (300 battles)

https://app.uniclaw.ai/arena?tab=costEffectiveness&via=hn
62•skysniper•1h ago•23 comments

Show HN: Real-time dashboard for Claude Code agent teams

https://github.com/simple10/agents-observe
36•simple10•1h ago•14 comments

CERN levels up with new superconducting karts

https://home.cern/news/news/engineering/cern-levels-new-superconducting-karts
338•fnands•10h ago•77 comments

NASA Artemis II moon mission live launch broadcast

https://plus.nasa.gov/scheduled-video/nasas-artemis-ii-crew-launches-to-the-moon-official-broadcast/
85•apitman•57m ago•22 comments

The OpenAI Graveyard: All the Deals and Products That Haven't Happened

https://www.forbes.com/sites/phoebeliu/2026/03/31/openai-graveyard-deals-and-products-havent-happ...
100•dherls•2h ago•59 comments

Is BGP safe yet?

https://isbgpsafeyet.com/
188•janandonly•4h ago•61 comments

Playing Wolfenstein 3D with one hand in 2026

https://arstechnica.com/gaming/2026/03/playing-wolfenstein-3d-with-one-hand-in-2026/
18•Brajeshwar•4d ago•4 comments

Random numbers, Persian code: A mysterious signal transfixes radio sleuths

https://www.rferl.org/a/mystery-numbers-station-persian-signal-iran-war/33700659.html
67•thinkingemote•6h ago•69 comments

Consider the Greenland Shark (2020)

https://www.lrb.co.uk/the-paper/v42/n09/katherine-rundell/consider-the-greenland-shark
64•mooreds•5d ago•25 comments

Show HN: Zerobox – Sandbox any command with file and network restrictions

https://github.com/afshinm/zerobox
17•afshinmeh•2d ago•7 comments

Randomness on Apple Platforms (2024)

https://blog.xoria.org/randomness-on-apple-platforms/
29•surprisetalk•5d ago•1 comments

Intuiting Pratt Parsing

https://louis.co.nz/2026/03/26/pratt-parsing.html
118•signa11•2d ago•34 comments

Ada and Spark on ARM Cortex-M – A Tutorial with Arduino and Nucleo Examples

http://inspirel.com/articles/Ada_On_Cortex.html
29•swq115•4d ago•4 comments

What Is Copilot Exactly?

https://idiallo.com/blog/what-is-copilot-exactly
41•WhyNotHugo•1h ago•20 comments

Wasmer (YC S19) Is Hiring – Rust and DevRel Positions

https://www.workatastartup.com/companies/wasmer
1•syrusakbary•6h ago

Claude Wrote a Full FreeBSD Remote Kernel RCE with Root Shell (CVE-2026-4747)

https://github.com/califio/publications/blob/main/MADBugs/CVE-2026-4747/write-up.md
193•ishqdehlvi•12h ago•82 comments

A new way to measure poverty shows the US falling behind Europe

https://www.euronews.com/business/2026/03/29/a-new-way-to-measure-poverty-shows-the-us-falling-be...
48•_DeadFred_•1h ago•13 comments

An Introduction to Writing Systems and Unicode

https://r12a.github.io/scripts/tutorial/part2
3•mariuz•3d ago•0 comments

Show HN: Sycamore – next gen Rust web UI library using fine-grained reactivity

https://sycamore.dev
84•lukechu10•5h ago•54 comments

Show HN: CLI to order groceries via reverse-engineered REWE API (Haskell)

https://github.com/yannick-cw/korb
174•wazHFsRy•2d ago•72 comments

Claude Code Unpacked : A visual guide

https://ccunpacked.dev/
925•autocracy101•12h ago•338 comments

The Document Foundation ejects its core developers

https://www.collaboraonline.com/blog/tdf-ejects-its-core-developers/
56•hackernewsblues•6h ago•21 comments

A dot a day keeps the clutter away

https://scottlawsonbc.com/post/dot-system
489•scottlawson•20h ago•146 comments

Chess in SQL

https://www.dbpro.app/blog/chess-in-pure-sql
154•upmostly•3d ago•36 comments

Show HN: 1-Bit Bonsai, the First Commercially Viable 1-Bit LLMs

https://prismml.com/
367•PrismML•21h ago•142 comments

New patches allow building Linux IPv6-only

https://www.phoronix.com/news/Linux-IPv6-IPv4-Legacy-Knobs
96•Bender•4h ago•102 comments

TruffleRuby

https://chrisseaton.com/truffleruby/
189•tosh•3d ago•26 comments
Open in hackernews

StepFun 3.5 Flash is #1 cost-effective model for OpenClaw tasks (300 battles)

https://app.uniclaw.ai/arena?tab=costEffectiveness&via=hn
62•skysniper•1h ago

Comments

skysniper•1h ago
I ran 300+ benchmarks across 15 models in OpenClaw and published two separate leaderboards: performance and cost-effectiveness.

The two boards look nothing alike. Top 3 performance: Claude Opus 4.6, GPT-5.4, Claude Sonnet 4.6. Top 3 cost-effectiveness: StepFun 3.5 Flash, Grok 4.1 Fast, MiniMax M2.7.

The most dramatic split: Claude Opus 4.6 is #1 on performance but #14 on cost-effectiveness. StepFun 3.5 Flash is #1 cost-effectiveness, #5 performance.

Other surprises: GLM-5 Turbo, Xiaomi MiMo v2 Pro, and MiniMax M2.7 all outrank Gemini 3.1 Pro on performance.

Rankings use relative ordering only (not raw scores) fed into a grouped Plackett-Luce model with bootstrap CIs. Same principle as Chatbot Arena — absolute scores are noisy, but "A beat B" is reliable. Full methodology: https://app.uniclaw.ai/arena/leaderboard/methodology?via=hn

I built this as part of OpenClaw Arena — submit any task, pick 2-5 models, a judge agent evaluates in a fresh VM. Public benchmarks are free.

refulgentis•1h ago
Please don’t use AI to write comments, it cuts against HN guidelines.
skysniper•1h ago
sorry didn't know that. Here is my hand writing tldr:

gemini is very unreliable at using skills, often just read skills and decide to do nothing.

stepfun leads cost-effectiveness leaderboard.

ranking really depends on tasks, better try your own task.

refulgentis•1h ago
It’s too late once it’s happened. I was curious, then when I saw the site looked vibecoded and you’re commenting with AI, I decided to stop trying to reason through the discrepancies between what was claimed and what’s on the site (ex. 300 battles vs. only a handful in site data).
skysniper•1h ago
all 300+ battle data are available at https://app.uniclaw.ai/arena/battles, every single battle is shown with raw conversional history, produced files, judge's verdict and final scores
refulgentis•30m ago
Thanks! Is the judge an LLM? There's lot of references to "just like LMArena", but LMArena is human evaluated?
skysniper•14m ago
> Is the judge an LLM?

Yes, judge is one of opus 4.6, gpt 5.4, gemini 3.1 pro (submitter can choose). Self judge (judge model is also one of the participants) is excluded when computing ranking.

> There's lot of references to "just like LMArena", but LMArena is human evaluated?

Yeah LMArena is human evaluated, but here i found it not practical to gather enough human evaluation data because the effort it take to compare the result is much higher:

- for code, judge needs to read through it to check code quality, and actually run it to see the output

- when producing a webpage or a document, judge needs to check the content and layout visually

- when anything goes wrong, judge needs to read the execution log to see whether partial credit shall be granted

if you look at the cost details of each battle (available at the bottom of battle detail page), judge typically cost more than any participant model.

if we evaluate with human, i would say each evaluation can easily take ~5-10 min

refulgentis•10m ago
Fair enough, yeah, agent evals are hard especially across N models :/

Thanks for replying btw, didn't mean any disrespect, good on you for not getting aggro about feedback

rat9988•56m ago
Too late for what? For you? maybe. There are many others that are okay with it and it doesn't disminish the quality of the work. Props to the author.
refulgentis•32m ago
> Too late for what? For you? maybe.

Maybe? :)

> There are many others that are okay with it

Correct.

> and it doesn't disminish the quality of the work.

It does affect incoming people hearing about the work.

I applaud your instinct to defend someone who put in effort. It's one of the most important things we can do.

Another important thing we can do for them is be honest about our own reactions. It's not sunshine and rainbows on its face, but, it is generous. Mostly because A) it takes time B) other people might see red and harangue you for it.

johndough•7m ago
Could you add a column for time or number of tokens? Some models take forever because of their excessive reasoning chains.
hadlock•1h ago
According to openrouter.ai it looks like StepFun 3.5 Flash is the most popular model at 3.5T tokens, vs GLM 5 Turbo at 2.5T tokens. Claude Sonnet is in 5th place with 1.05T tokens. Which isn't super suprising as StepFun is ~about 5% the price of Sonnet.

https://openrouter.ai/apps?url=https%3A%2F%2Fopenclaw.ai%2F

skysniper•1h ago
the real surprising part to me is that, despite being the cheapest model on board, stepfun is often able to score high at pure performance. Other models at the same price range (e.g. kimi) fails to do that.
NitpickLawyer•47m ago
> the most popular model

It was free for a long time. That usually skews the statistics. It was the same with grok-code-fast1.

MaxikCZ•14m ago
Exactly. When I read the headline I thought: "Ofc it is, its free."
skysniper•6m ago
I should have clarified I didn't use the free version...
smallerize•1h ago
It looks like Unsloth had trouble generating their dynamic quantized versions of this model, deleted the broken files, then never published an update.
WhitneyLand•1h ago
StepFun is an interesting model.

If you haven’t heard of it yet there’s some good discussion here: https://news.ycombinator.com/item?id=47069179

tarruda•59m ago
Since that discussion, they released the base model and a midtrain checkpoint:

- https://huggingface.co/stepfun-ai/Step-3.5-Flash-Base

- https://huggingface.co/stepfun-ai/Step-3.5-Flash-Base-Midtra...

I'm not aware of other AI labs that released base checkpoint for models in this size class. Qwen released some base models for 3.5, but the biggest one is the 35B checkpoint.

They also released the entire training pipeline:

- https://huggingface.co/datasets/stepfun-ai/Step-3.5-Flash-SF...

- https://github.com/stepfun-ai/SteptronOss

skysniper•55m ago
thanks for the info. before running the bench i only tried it in arena.ai type of tasks and it was not impressive. i didn't expect it to be that good at agentic tasks
skysniper•36m ago
another thing from the bench I didn't expect: gemini 3.1 pro is very unreliable at using skills. sometimes it just reads the skill and decide to do nothing, while opus/sonnet 4.6 and gpt 5.4 never have this issue.
dmazin•25m ago
why do half the comments here read like ai trying to boost some sort of scam?
skysniper•13m ago
lol