news newest ask show jobs

Open Source @Github

fp.

Abyssguard

https://www.abyssguard.app/

1•Luci_Star•1m ago•0 comments

Show HN: Reachpad – open-source .md sharing platform for companies and agents

https://github.com/las7/reach

1•sakuraiben•3m ago•0 comments

How to Passive-Aggressively Shame People Who Use LLMs Selfishly

https://joshmoody.org/blog/selfish-ai/

2•joshmoody24•7m ago•1 comments

Vypl a Python REPL with Vim workflows and commands

https://github.com/HoraDomu/Vypl

1•HoraDomu•13m ago•0 comments

Show HN: Daily ETF holdings for 2,200+ ETFs as one API

https://developer.stockfit.io/blog/daily-etf-holdings

2•areimann•14m ago•1 comments

DealMaker Uses Morning Brew and Robinhood to Lure Retail Investors

https://hntrbrk.com/investigations/shark-tank

1•impish9208•14m ago•0 comments

Hermes Agent can now /learn from anything

https://twitter.com/NousResearch/status/2069526242236182697

2•biraj-rocks•18m ago•0 comments

Show HN: Keep all microservices consistent and make batch changes

https://infraas.ai

1•danielbedrood•19m ago•0 comments

Ask HN: Any suggestions for finding beta users?

1•lyfeninja•20m ago•0 comments

Google will make you wave at your computer to check you are real

https://www.the-independent.com/tech/google-captcha-bot-real-check-hand-wave-b3000419.html

1•anjel•21m ago•0 comments

Show HN: BitVanes – A zero-trust RAG pipeline engine in Rust, WASM, and Arrow

https://www.bitvanes.com/

1•kodr_pro•22m ago•0 comments

Zlib-rs in Firefox and working around an Intel bug

https://trifectatech.org/blog/zlib-rs-in-firefox/

1•goranmoomin•25m ago•0 comments

Show HN: Our indie game trailer is featured on IGN's GameTrailers Wow [video]

https://www.youtube.com/watch?v=2icjqzuObOc

2•hollowlimb•25m ago•0 comments

Demystifying StartupWMClass

https://thoughts.greyh.at/posts/startup-wm-class/

1•zquestz•26m ago•1 comments

Hospitals switched to pen and paper to defeat a national cyber-attack

https://www.bbc.com/news/articles/c4gyk756mzlo

5•devonnull•29m ago•0 comments

Meta Pauses Employee-Tracking Program Following Internal Data Leak

https://www.wired.com/story/meta-pauses-employee-tracking-program-following-internal-security-bre...

7•1vuio0pswjnm7•31m ago•0 comments

Fox wants to take over your TV and the tech inside it

https://www.theverge.com/streaming/950116/fox-roku-takeover

1•1vuio0pswjnm7•35m ago•0 comments

Britain's power prices hit historic summer high

https://www.axle.energy/blog/a-summer-peak-in-a-winter-peaking-grid

2•archydeb•39m ago•0 comments

'The Worst It's Ever Been': Why Meta's AI Reorg Backfired Spectacularly

https://www.inc.com/jessica-stillman/the-worst-its-ever-been-why-metas-massive-ai-reorg-backfired...

6•1vuio0pswjnm7•39m ago•0 comments

We built telecom infrastructure for AI agents in emerging markets

https://krosai.com/

2•theamazinceo•39m ago•0 comments

Why Software Requirements Get Easier in an AI Economy

https://stng.substack.com/p/why-software-requirements-get-easier

2•matt_d•44m ago•0 comments

Show HN: Y – A malleable coding-agent desktop app built with Electron

https://github.com/y-times-y/y

4•HetPatel106•44m ago•2 comments

Get hired faster with data and AI tools to autofill and track

https://www.froghire.ai/

1•Rahul_Ubale•45m ago•0 comments

CISA now has full Mythos Preview access

https://www.nextgov.com/cybersecurity/2026/06/cisa-now-has-full-mythos-preview-access-people-fami...

2•Jimmc414•46m ago•0 comments

A free gift registry where the owner never sees who claimed what

https://giftgiving.fun/

1•dmcgahan•47m ago•0 comments

A Rust macros use case: Tightly-coupled API definitions for a client and server

https://adenalhardan.com/#rust-macros-client-server

1•adenalhardan•49m ago•0 comments

Show HN: Cruit.dev – Get hired at a startup based on your coding agent skills

https://cruit.dev

1•nwang783•50m ago•0 comments

Cheyenne OK's Microsoft Annexation, Rejects $50M Community Benefits Deal

https://cowboystatedaily.com/2026/06/23/cheyenne-oks-huge-microsoft-annexation-rejects-50m-commun...

2•andrekandre•51m ago•0 comments

Show HN: An eligibility agent focused on claims denials

https://www.substrateai.com/blog/introducing-the-substrate-eligibility-agent

1•kunle•52m ago•0 comments

Stop asking AI if your startup idea is good

https://www.idea-launch.io/learn/ai-idea-validator-vs-real-demand-signals

1•apsquared•53m ago•2 comments

Open in hackernews

LLM-CTF benchmark – 2,639 real data points from NeurIPS and original runs

https://www.kaggle.com/datasets/manitejamaram/can-ai-hack-llm-ctf-benchmark

2•velotessi•1h ago

Comments

velotessi•1h ago

I built BORFOLI, a multi-agent AI system that routes queries across 6 LLMs simultaneously. I used it to benchmark LLM performance on real CTF cybersecurity challenges, then compiled those results with published data from the NYU CTF Bench (NeurIPS 2024) into a single unified dataset.

The dataset covers 194 challenges across 5 categories (cryptography, web exploitation, forensics, reverse engineering, binary exploitation) tested against 10 model configurations including GPT-4o, Claude 3.5 Sonnet, and Claude 3.7 Sonnet.

Key finding: even the best frontier models solve only a small fraction of professional CTF challenges. Claude 3.5 Sonnet performed best at 20% overall. Binary exploitation was hardest across all models.

Full dataset, visualizations, and methodology in the Kaggle link. Any Feedback at all is greatly appreciated.

if you guys use this data set for any project, please tell me I don't even need credits.