frontpage.

I built CodeLens.AI - a tool that compares how 6 top LLMs (GPT-5, Claude Opus 4.1, Claude Sonnet 4.5, Grok 4, Gemini 2.5 Pro, o3) handle your actual code tasks.

How it works: - Upload code + describe task (refactoring, security review, architecture, etc.)

- All 6 models run in parallel (~2-5 min)

- See side-by-side comparison with AI judge scores

- Community votes on winners (blind voting)

- Each evaluation gets reflected in the overall AI model leaderboard, showing us best ones

Why I built this: Existing benchmarks (HumanEval, SWE-Bench) don't reflect real-world developer tasks. I wanted to know which model actually solves MY specific problems - refactoring legacy TypeScript, reviewing React components, etc. It's also similar to LMArena, but their evaluations are not entirely transparent.

Current status:

- Live at https://codelens.ai

- 23 evaluations so far (small sample, I know!)

- Free tier processes 3 evals per day (first-come, first-served queue)

- Looking for real tasks to make the benchmark meaningful

- Happy to answer questions about the tech stack, cost structure, or methodology.

Currently in validation stage. What are your first impressions?

Are AI agents ready for the workplace? A new benchmark raises doubts

Show HN: AI Watermark and Stego Scanner

Clarity vs. complexity: the invisible work of subtraction

Solid-State Freezer Needs No Refrigerants

Ask HN: Will LLMs/AI Decrease Human Intelligence and Make Expertise a Commodity?

From Zero to Hero: A Brief Introduction to Spring Boot

NSA detected phone call between foreign intelligence and person close to Trump

How to Fake a Robotics Result

It's time for the world to boycott the US

Show HN: Semantic Search for terminal commands in the Browser (No Back end)

The AI CEO Experiment

Speed up responses with fast mode

MS-DOS game copy protection and cracks

Updates on GNU/Hurd progress [video]

Epstein took a photo of his 2015 dinner with Zuckerberg and Musk

MyFlames: Visualize MySQL query execution plans as interactive FlameGraphs

Show HN: LLM of Babel

A modern iperf3 alternative with a live TUI, multi-client server, QUIC support

Famfamfam Silk icons – also with CSS spritesheet

Apple is the only Big Tech company whose capex declined last quarter

Reverse-Engineering Raiders of the Lost Ark for the Atari 2600

Show HN: Deterministic NDJSON audit logs – v1.2 update (structural gaps)

The Greater Copenhagen Region could be your friend's next career move

Do Not Confirm – Fiction by OpenClaw

The Analytical Profile of Peas

Hallucinations in GPT5 – Can models say "I don't know" (June 2025)

What AI is good for, according to developers

OpenAI might pivot to the "most addictive digital friend" or face extinction

Show HN: Know how your SaaS is doing in 30 seconds

ClawdBot Ordered Me Lunch