frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Real-Time AI Design Benchmark

https://shuffle.dev/ai-design
2•kemyd•2h ago
Hey HN,

We built a different kind of AI benchmark for UI generation.

Instead of static leaderboards or curated screenshots, you can watch multiple models generate the same design live, side-by-side, and decide which output is actually better.

Under the hood, we call AI models from Anthropic (Opus), OpenAI (GPT), Google (Gemini), and Moonshot AI (Kimi).

Each model generates a real, editable project using Tailwind CSS (not screenshots or canvas exports). You can export it for Next.js, Laravel (Blade), Symfony (Twig), WordPress, or plain HTML.

What we noticed building this:

* Popular benchmarks don't reflect UX/UI quality. For a different prompt, one model is better than another (that's why live comparison on a single screen matters).

* Some models overuse wrappers/div soup. Some hallucinate layout constraints.

* Kimi likes Cyrillic, even if all other models won't use it for the same prompt.

The interesting part wasn't ranking models. It was making their outputs easier for humans to compare visually.

Short demo: https://www.youtube.com/watch?v=RCTZlvqMQdc

Curious whether this feels more useful than traditional leaderboard-style AI benchmarks.

Happy to answer technical questions.

Example for HN:

Prompt: Redesign the Hacker News website for 2030, including sample entries that could realistically appear on the platform in that year.

Results: https://shuffle.dev/ai-design/Tjjy7XAFMq25AI

Previews:

Opus: https://shuffle.dev/preview/d6d5ba4eeede381cee7e30c697f010c7...

GPT: https://shuffle.dev/preview/f050359977c1d6dc6c8fc104a24b83c3...

Gemini: https://shuffle.dev/preview/eab78f9748a6d8ccecb94a8b0390f044...

Kimi: https://shuffle.dev/preview/394bb596a8efa50342db4dc88c5f9fab...

From instanceof to Error.isError: safer error checking in JavaScript

https://allthingssmitty.com/2026/02/23/from-instanceof-to-error-iserror-safer-error-checking-in-j...
1•AllThingsSmitty•2m ago•0 comments

Agentic AI Is Neither Intelligent nor an Agent

https://gfrm.in/posts/agentic-ai/index.html
1•slygent•4m ago•0 comments

Meishi Challenges Apple/Google: Open-Source P2P E2EE Contacts

1•marcoparisi•4m ago•0 comments

Towards a science of AI agent reliability

https://www.normaltech.ai/p/new-paper-towards-a-science-of-ai
1•MindGods•4m ago•0 comments

Show HN: UX-demo Seamless scroll restoration for infinite lists in Web apps

https://suhaotian.github.io/broad-infinite-list/?demo=news
1•jeremy_su•5m ago•0 comments

A 3D printed iPad tray for a compact dual-screen setup

https://abishov.com/blog/ipad-tray-dual-screen-setup/
1•araz•5m ago•0 comments

MoPeD

https://moped.base44.app
1•My_team•5m ago•1 comments

Improving Chain-of-Thought Monitorability Through Information Theory

https://arxiv.org/abs/2602.18297
1•simonpure•6m ago•0 comments

Show HN: Clash-IT – IT Knowledge Multiplayer Tactical Game

https://clash-it.com
1•Ado_Sa•6m ago•0 comments

Show HN: I built the WordPress GPG signing workflow that didn't exist

2•mvpprojects•6m ago•0 comments

Scoring and Improving Your Claude Code Setup Across 8 Dimensions

https://daveinside.com/blog/scoring-and-improving-your-claude-code-setup-across-8-dimensions/
1•daveinside•6m ago•1 comments

Show HN: Physics-based simulator for distributed LLM training and inference

https://simulator.zhebrak.io
1•zhebrak•8m ago•1 comments

Show HN: Git-wt – A Bash wrapper for Git worktrees

https://github.com/kuderr/git-wt
1•kuder•9m ago•0 comments

Software Quality

https://pxlnv.com/blog/on-software-quality/
2•latexr•9m ago•0 comments

Show HN: Gist – Zero-cost app specs for AI coding assistants

https://gist.1mb.dev/
1•vnykmshr•9m ago•0 comments

Inference Engineering

https://www.baseten.com/inference-engineering/
1•simonpure•11m ago•0 comments

LLMs feel more like CPUs than applications

1•derverstand•13m ago•1 comments

Show HN: Type.lol – Browse 800 independent type foundries, 14k typefaces

https://type.lol/
1•marktjohnson•14m ago•0 comments

A Mobile Lighthouse for React Native

https://themobileagent.substack.com/p/a-mobile-lighthouse-for-react-native
1•jroger22•14m ago•0 comments

The Looming Taiwan Chip Disaster That Silicon Valley Has Long Ignored

https://www.nytimes.com/2026/02/24/technology/taiwan-china-chips-silicon-valley-tsmc.html
1•archvile•15m ago•2 comments

Sunlight-powered process turns plastic waste into acetic acid without emissions

https://phys.org/news/2026-02-sunlight-powered-plastic-acetic-acid.html
1•westurner•16m ago•2 comments

The Future of Self-Paced Online Education

https://tonyalicea.dev/blog/the-future-of-self-paced-online-education/
1•TonyAlicea10•16m ago•0 comments

The Base Pattern

https://notes.tasshin.com/the-base-pattern
1•tasshin•17m ago•0 comments

LA Ironía DE LA IA ( 3 de 9 mal)

https://aimafia.substack.com/p/alucinaciones-ia
1•borjamoskv•18m ago•0 comments

Show HN: VerdictMail

https://github.com/ascarola/verdictmail
1•ascarola•18m ago•0 comments

Slack MCP Server

https://github.com/korotovsky/slack-mcp-server
1•rusq•18m ago•0 comments

Palantir sues magazine that revealed Switzerland rejected its approaches

https://www.ft.com/content/434b6d98-83d1-4ba1-a929-150341bcaea4
3•Zeldo•18m ago•1 comments

Monty and Islo: Sandbox the Snippet, Isolate the Agent

https://islo.dev/blog/why-islo-loves-monty/
1•zozo123-IB•19m ago•0 comments

Would agencies pay for AI that predicts campaign success from their own data?

1•ericstealtj•20m ago•0 comments

Measuring US workers' capacity to adapt to AI-driven job displacement

https://www.brookings.edu/articles/measuring-us-workers-capacity-to-adapt-to-ai-driven-job-displa...
1•petethomas•21m ago•0 comments