frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Phill CLI matches GPT-5.3-Codex on EVMbench audits (71% recall, 100% precision)

https://github.com/ayjays132/phill-cli
1•ayjays132•1h ago

Comments

ayjays132•1h ago
OpenAI released EVMbench today—a high-stakes benchmark for AI agents auditing smart contracts based on real-world Code4rena contests.

I just ran Phill CLI through the ringer, and the results were a rollercoaster. I hit *71.4% Recall with 100% Precision* on a blind audit, matching the SOTA GPT-5.3-Codex ceiling.

*The "Failure" Story:* In my first run (Astaria), I hit 42.8% recall. I thought I was doing great. Then I hit Rubicon v2 and scored *0%*.

Why? Because I relied on generic vulnerability pattern matching. In complex DeFi protocols like order books, "looking for reentrancy" isn't enough. You have to understand the *protocol's intent.*

*The Breakthrough:* I evolved the methodology to be *Invariant-First*. I taught the agent to derive the system's mathematical invariants (e.g., "Total assets in derivatives must >= Total supply of shares") before reading a single line of implementation logic.

*Result:* On Asymmetry Finance, recall jumped to *71.4%*. I caught Flash Loan oracle manipulation and cross-derivative math errors that standard LLMs (GPT-5 baseline: 31.9%) completely missed.

*What is Phill CLI?* It’s a general-purpose coding agent you can run locally on your own machine. It uses a "Three-Pass" methodology:

1. *Invariant Violation:* Deriving system rules. 2. *Spec Compliance:* Verifying logic against documentation. 3. *Cross-Contract Call Mapping:* Tracing external dependencies.

I'm building this as an "AGI Laboratory" for the terminal. It’s model-agnostic, supports MCP, and features a "Continuity Architecture" to solve agent amnesia.

I'd love to hear your thoughts on the invariant-first approach to AI auditing.

`npm install -g phill-cli`

verdverm•53m ago
I think you forget the word "quantum"

fyi, your buzzword laden projects will be rejected by HN and beyond

Security Analysis of Forward Secure Log Sealing in Journald [pdf]

https://eprint.iacr.org/2023/867.pdf
1•Phelinofist•53s ago•0 comments

Harness engineering: leveraging Codex in an agent-first world

https://openai.com/index/harness-engineering
1•souvlakee•7m ago•0 comments

Continuity and Change in Trust in Scientists in the US [pdf]

https://doi.org/10.1093/poq/nfaf059
1•thunderbong•10m ago•0 comments

Mark Zuckerberg testifies in social media addiction trial

https://www.dw.com/en/mark-zuckerberg-testifies-in-social-media-addiction-trial/a-76028273
2•Andr2Andr•12m ago•0 comments

Every AI app builder outputs React Native. I chose real Swift instead

2•Nativeline•13m ago•0 comments

After AI, there is no product

https://sidu.in/essays/after-ai-there-is-no-product.html
5•kaiwren•14m ago•1 comments

Show HN: From a simple travel expense tracker to a full business platform

https://SparkyMinis.com
1•who_dhanesh•14m ago•0 comments

No Code Web Scraper

https://nocodewebscraper.com/
1•mddanishyusuf•14m ago•0 comments

Jonathan Franzen's "10 rules for novelists"

https://www.futilitycloset.com/2026/02/19/decalogue-3/
1•beardyw•16m ago•0 comments

Eurisko

https://en.wikipedia.org/wiki/Eurisko
1•tosh•18m ago•0 comments

Show HN: Zvario – Branded social media content in seconds

https://zvario.com/
1•dan_j•22m ago•0 comments

Gnome OS Hackfest FOSDEM 2026

https://blogs.gnome.org/adrianvovk/2026/02/18/gnome-os-hackfest-fosdem-2026/
1•JNRowe•22m ago•0 comments

Hassett says authors of New York Fed tariff study should be disciplined

https://www.cnbc.com/2026/02/18/hassett-says-authors-of-new-york-fed-tariff-study-should-be-disci...
1•mraniki•23m ago•1 comments

Interop 2026

https://webkit.org/blog/17818/announcing-interop-2026/
1•nnx•23m ago•0 comments

Will AI kill Software businesses?

1•fluffyandsweet•25m ago•0 comments

Using LLMs to evaluate technical interview performance

https://dokasto.com/blog/we-are-letting-llms-decide/
1•ud0•26m ago•0 comments

My website is now ~2.8x faster after converting it to a Django LiveView SPA

https://en.andros.dev/blog/dd5a0746/my-website-is-now-28x-faster-after-converting-it-to-a-django-...
1•andros•29m ago•0 comments

Gemini will now generate musical slop for users

https://www.theregister.com/2026/02/18/google_musical_slop/
2•beardyw•35m ago•3 comments

Microsoft 365 Copilot for Android or iOS auto-sends files to AI and OneDrive

https://www.windowslatest.com/2026/02/18/microsoft-365-copilot-for-android-or-ios-auto-sends-file...
2•hliyan•35m ago•0 comments

Apps for Startups to Find Founders

https://play.google.com/store/apps/details?id=com.initiumapps.initium4founders&hl=en_US
1•colinbergin•36m ago•1 comments

The Human-in-the-Loop Is Tired

https://pydantic.dev/articles/the-human-in-the-loop-is-tired
2•summerscope•40m ago•1 comments

The 12-Factor App – 15 Years later. Does it Still Hold Up in 2026?

https://lukasniessen.medium.com/the-12-factor-app-15-years-later-does-it-still-hold-up-in-2026-c8...
1•birdculture•44m ago•0 comments

V&A Museum acquires YouTube's earliest video from 2005

https://www.cnn.com/2026/02/18/style/youtube-first-video-victoria-and-albert-museum
1•helsinkiandrew•55m ago•0 comments

Brep.io A new browser based parametric CAD modeler with custom BREP kernel

https://BREP.io
1•mmiscool•57m ago•1 comments

Convert: A universal file format converter, running in the browser

https://p2r3.github.io/convert/
1•memalign•58m ago•0 comments

Show HN: Kore – local AI memory layer with Ebbinghaus forgetting curve

https://github.com/auriti-web-design/kore-memory
1•juanauriti•1h ago•0 comments

The Psychology of Computer Programming by Gerald Weinberg (1998)

https://archive.org/details/psychologyofcomp00unse
2•pramodbiligiri•1h ago•0 comments

One Man Stole $660M. He'll Never Pay It Back

https://www.nytimes.com/2026/02/18/opinion/corruption-trump-accountability.html
1•jimnotgym•1h ago•0 comments

Show HN: Social Cookie Jar – Social media automation for AI agents

https://github.com/Artifact-Virtual/social-cookie-jar
1•artifactvirtual•1h ago•0 comments

Show HN: Deploy OpenClaw on your Own server in one click

https://agentdaddie.com
1•pushkar_aditya•1h ago•0 comments