frontpage.

Hi HN! I built CodeLens.AI - a community-driven AI benchmark using real developer code tasks.

The problem: Existing benchmarks use synthetic problems. I wanted to know which LLM is best at MY actual code challenges.

How it works:

• Submit code + describe your task ("refactor this", "find security issues", etc.)

• 6 models solve it in parallel: GPT-5, Claude Opus 4.1, Claude Sonnet 4.5, Grok 4, Gemini 2.5 Pro, o3

• AI judge scores each solution (correctness, security, performance, etc.)

• You vote on the real winner

• Public leaderboard shows which models actually win on real-world tasks

Currently have 10 evaluations live (100% vote completion rate). Early patterns emerging:

• GPT-5 leads overall with 40% win rate (4/10 wins)

• Gemini 2.5 Pro dominates security tasks

• GPT-5 strongest at refactoring

• Claude Sonnet 4.5 at optimization tasks

Queue system keeps costs predictable ($10/day = 15 free evaluations for the community).

Free during beta - would love your feedback!

https://codelens.ai

Show HN: Gemini Station – A local Chrome extension to organize AI chats

Welfare states build financial markets through social policy design

Market orientation and national homicide rates

California urges people avoid wild mushrooms after 4 deaths, 3 liver transplants

Matthew Shulman, co-creator of Intellisense, died 2019 March 22

Show HN: SuperLocalMemory – AI memory that stays on your machine, forever free

Show HN: Pyrig – One command to set up a production-ready Python project

Fast Response or Silence: Conversation Persistence in an AI-Agent Social Network [pdf]

C and C++ dependencies: don't dream it, be it

Show HN: Vbuckets – Infinite virtual S3 buckets

Open Molten Claw: Post-Eval as a Service

New York Budget Bill Mandates File Scans for 3D Printers

The End of Software as a Business?

Exploring 1,400 reusable skills for AI coding tools

Show HN: A unique twist on Tetris and block puzzle

The logs I never read

How to use AI with expressive writing without generating AI slop

Show HN: LinkScope – Real-Time UART Analyzer Using ESP32-S3 and PC GUI

Cppsp v1.4.5–custom pattern-driven, nested, namespace-scoped templates

The next frontier in weight-loss drugs: one-time gene therapy

At Age 25, Wikipedia Refuses to Evolve

Show HN: ReviewReact – AI review responses inside Google Maps ($19/mo)

Why AlphaTensor Failed at 3x3 Matrix Multiplication: The Anchor Barrier

Ask HN: How much of your token use is fixing the bugs Claude Code causes?

Show HN: Agents – Sync MCP Configs Across Claude, Cursor, Codex Automatically

Hello

FSD helped save my father's life during a heart attack

Show HN: Writtte – Draft and publish articles without reformatting, anywhere

Portuguese icon (FROM A CAN) makes a simple meal (Canned Fish Files) [video]

Brookhaven Lab's RHIC Concludes 25-Year Run with Final Collisions