frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: OpenCastor Agent Harness Evaluator Leaderboard

https://craigm26.github.io/OpenCastor/
2•craigm26•1h ago
I've been building OpenCastor, a runtime layer that sits between a robot's hardware and its AI agent. One thing that surprised me: the order you arrange the skill pipeline (context builder → model router → error handler, etc.) and parameters like thinking_budget and context_budget affect task success rates as much as model choice does.

So I built a distributed evaluator. Robots contribute idle compute to benchmark harness configurations against OHB-1, a small benchmark of 30 real-world robot tasks (grip, navigate, respond, etc.) using local LLM calls via Ollama. The search space is 263,424 configs (8 dimensions: model routing, context budget, retry logic, drift detection, etc.). The demo leaderboard shows results so far, broken down by hardware tier (Pi5+Hailo, Jetson, server, budget boards).

The current champion config is free to download as a YAML and apply to any robot. P66 safety parameters are stripped on apply — no harness config can touch motor limits or ESTOP logic.

Looking for feedback on: (1) whether the benchmark tasks are representative, (2) whether the hardware tier breakdown is useful, and (3) anyone who's run fleet-wide distributed evals of agent configs for robotics or otherwise.

Confronting the CEO of the AI company that impersonated me

https://www.theverge.com
1•inaros•48s ago•0 comments

Secret Hitler LLM Benchmark

https://github.com/jordan-gibbs/secret-hitler-bench
3•jordan_gibbs•6m ago•1 comments

Beyond AI Taking Jobs: When Economy Needs No Human Consumer

https://ralphmao.github.io/AI-humanity/
1•emulbasaka•7m ago•0 comments

Cisco Announces DefenseClaw at RSAC 2026

https://github.com/cisco-ai-defense/defenseclaw
1•Khaine•9m ago•1 comments

Why your guitar goes sharp when you play hard: the Kirchoff–Carrier equation

https://mbmccoy.dev/posts/nonlinear-vibes/
3•_alternator_•13m ago•1 comments

Is ChatGPT a Scrabble Genius, or a Scrabble Disaster?

https://www.youtube.com/watch?v=8opLB1D_RYY
2•doener•13m ago•1 comments

Python Software Foundation turned down Trump admin grant (2025)

https://arstechnica.com/tech-policy/2025/10/python-foundation-rejects-1-5-million-grant-over-trum...
1•PaulDavisThe1st•14m ago•0 comments

Designing AI Chip Software and Hardware

https://docs.google.com/document/d/1dZ3vF8GE8_gx6tl52sOaUVEPq0ybmai1xvu3uk89_is/edit?tab=t.0
2•broune•16m ago•1 comments

Global Petrol Prices

https://www.globalpetrolprices.com/gasoline_prices/
1•greedo•19m ago•0 comments

AI boom risks widening wealth divide, says BlackRock's Larry Fink

https://www.theguardian.com/technology/2026/mar/23/ai-boom-risks-widening-wealth-divide-blackrock...
5•devonnull•20m ago•0 comments

Dusking is a trend aimed at helping people switch off at the end of the day

https://theconversation.com/dusking-is-a-trend-aimed-at-helping-people-switch-off-at-the-end-of-t...
3•zeristor•21m ago•0 comments

SynthVision: Building a 110K Synthetic Medical VQA Dataset

https://huggingface.co/blog/OpenMed/synthvision
1•maziyar•23m ago•1 comments

Gabbard plans to shift coveted, CIA-backed high-tech fund In-Q-Tel to her office

https://www.politico.com/news/2026/03/23/in-q-tel-odni-cia-control-00840302
3•avidruntime•23m ago•0 comments

Philosophical DNA

https://diagnostic.millermanschool.com/
1•iamjfu•24m ago•0 comments

Where Should the Agent(s) Live?

https://opencomputer.dev/blog/where-should-the-agent-live
3•iacguy•24m ago•0 comments

Why LLMs can't paragraph well

https://hollisrobbinsanecdotal.substack.com/p/for-the-love-of-god-learn-to-paragraph
2•HR01•25m ago•0 comments

PyTorch 2.11 Released

https://pytorch.org/blog/pytorch-2-11-release-blog/
1•0bytematt•26m ago•0 comments

Minutes before Trump's announcement, $800M in trades made on oil prices

https://www.9news.com.au/world/donald-trump-iran-updates-oil-futures-trade-suspicious-betting-act...
8•inaros•27m ago•0 comments

AI Trained on Birdsong Can Recognize Whale Calls

https://spectrum.ieee.org/foundation-models-google-birds-whales
1•geox•29m ago•0 comments

Leonid Radvinsky, owner of OnlyFans, dies aged 43

https://www.theguardian.com/technology/2026/mar/23/leonid-radvinsky-onlyfans-owner-death
1•chirau•31m ago•1 comments

Absolute Beginner's Guide to Databasemaxxing

https://pthorpe92.dev/databasemaxxing/
1•dvektor•31m ago•0 comments

China Just Killed the B-Pillar Zeekr Mix 2026 [video]

https://www.youtube.com/watch?v=hGV-EUR2GYQ
1•thelastgallon•32m ago•0 comments

Show HN: A CLI for building and deploying Openclaw agents

https://pinata.cloud/blog/from-docker-dread-to-agentic-flow-introducing-the-pinata-cli/
1•madrov•34m ago•0 comments

You can now enable Claude to use your macOS computer to complete tasks

https://xcancel.com/claudeai/status/2036195789601374705
2•doener•36m ago•0 comments

Show HN: VoidLLM – privacy-first LLM proxy (Go, self-hosted)

https://github.com/voidmind-io/voidllm
1•chrisremo85•37m ago•0 comments

Show HN: Mutatr – an open source A/B testing agent

https://github.com/novynlabs-repo/mutatr
2•AhmedAshraf•41m ago•0 comments

Show HN: Nomad – Self-hosted collaborative travel planner

https://github.com/mauriceboe/NOMAD
1•mauriceboe•42m ago•0 comments

Pre-written OpenClaw agent config packs (SOUL.md, HEARTBEAT.md, AGENTS.md)

https://5580846822819.gumroad.com/l/svlapl
1•nami_creator•45m ago•0 comments

I reverse-engineered Claude Code

https://github.com/SeifBenayed/claude-code-sdk
2•seifbenayed1992•47m ago•0 comments

Dear Europe: Germany has shown the way forward

https://blog.documentfoundation.org/blog/2026/03/23/dear-europe/
2•doener•50m ago•0 comments