frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

VibeCodingBench: Benchmark Vibe Coding Models for Fun

https://twitter.com/yq_acc/status/2016201908181205358
1•jiayaoqijia•2h ago

Comments

jiayaoqijia•2h ago

  VibeCodingBench: We benchmarked 15 AI coding models on what developers actually do                                                                      
                                                                                                                                                          
  Current benchmarks have an ecological validity crisis. Models score 70%+ on SWE-bench but struggle in production. Why? They optimize for bug fixes in   
  Python repos—not the auth flows, API integrations, and CRUD dashboards that occupy 80% of real dev work.                                                
                                                                                                                                                          
  So we built VibeCodingBench: 180 tasks across SaaS features, glue code, AI integration, frontend, API integrations, and code evolution.                 
  Multi-dimensional scoring: Functional (40%) + Visual (20%) + Quality (20%) - Cost/Speed penalties. Security gate: Any OWASP Top 10 vuln = automatic 0.  
                                                                                                                                                          
  Top 5 Results (Jan 2026):                                                                                                                               
                                                                                                                                                          
   Claude Opus 4.5 — 89.2% | $12.31 | 44s                                                                                                               
   Claude Haiku 4.5 — 89.0% | $3.03 | 22s                                                                                                               
   Grok 4 Fast — 88.8% | $0.21 | 70s                                                                                                                    
  4⃣ OpenAI GPT-5.2 — 88.8% | $5.01 | 28s                                                                                                                 
  5⃣ Qwen3 Max — 88.6% | $5.42 | 45s                                                                                                                      
                                                                                                                                                          
  The real story? Cost varies 60x between similar performers. Grok 4 Fast matches GPT-5.2 at 1/25th the cost. Claude Haiku 4.5 delivers near-Opus quality 
  for $3 total.    
                                                                                                                                                          
   Live dashboard: https://vibecoding.llmbench.xyz/                                                                                                     
   GitHub repo: https://github.com/alt-research/vibe-coding-benchmark-public                                                                            
   Thesis: https://github.com/alt-research/vibe-coding-benchmark-public/blob/main/docs/THESIS.md                                                        
                                                                                                                                                          
  The ultimate test isn't fixing a bug in scikit-learn. It's shipping a feature your users need—safely, efficiently—before the sprint ends.               
                                                                                                                                                          
  Open source. Contributions welcome.

How to Nail Big Tech Behavioral Interviews as a Senior Software Engineer

https://newsletter.eng-leadership.com/p/how-to-nail-big-tech-behavioral-interviews
1•rbanffy•1m ago•0 comments

Zuckerberg blocked curbs on sex-talking chatbots for minors court filing alleges

https://www.reuters.com/legal/government/meta-ceo-zuckerberg-blocked-curbs-sex-talking-chatbots-m...
3•jethronethro•3m ago•0 comments

The evolution of my todo list system over 5 years

https://www.njbrown.com/blog/77/
3•ntnbr•3m ago•0 comments

How Many Chess Games Are Possible?

https://win-vector.com/2026/01/27/how-many-chess-games-are-possible/
1•jmount•5m ago•0 comments

US consumer confidence plunges to 12-year low

https://www.msn.com/en-us/money/markets/consumer-confidence-plunges-to-12-year-low/ar-AA1V6kow
3•akyuu•5m ago•0 comments

Chuck Klosterman on why we've never actually seen a real football game

https://www.latimes.com/entertainment-arts/books/story/2026-01-22/chuck-klosterman-new-book-football
3•proposal•7m ago•0 comments

Convolutional Neural Network Visualizations

https://github.com/utkuozbulak/pytorch-cnn-visualizations
1•auraham•7m ago•0 comments

Gov. Abbott orders Texas universities, agencies to halt H-1B visa petitions

https://www.texastribune.org/2026/01/26/texas-greg-abbott-h1b-visa-schools-universities/
1•malshe•8m ago•0 comments

The Census Bureau was undercounting business AI adoption

https://econlab.substack.com/p/the-census-bureau-was-undercounting
1•gmays•8m ago•0 comments

EU now has its own 'secure and encrypted' satellite communication system

https://www.euronews.com/my-europe/2026/01/27/eu-now-has-its-own-secure-and-encrypted-satellite-c...
3•akyuu•9m ago•0 comments

Betting on War: Prediction Markets and the Corruption of National Security

https://warontherocks.com/2026/01/betting-on-war-prediction-markets-and-the-corruption-of-nationa...
1•coloneltcb•9m ago•0 comments

African nations now send more money to China than they receive in new loans

https://www.reuters.com/business/finance/african-nations-now-send-more-money-china-than-they-rece...
1•DustinEchoes•10m ago•0 comments

You Can't Handle the Buddhabrot

https://lcamtuf.substack.com/p/you-cant-handle-the-buddhabrot
1•weinzierl•11m ago•0 comments

Codeless: From Idea to Software

https://www.anildash.com/2026/01/22/codeless/
3•janpio•11m ago•0 comments

Words with Spaces

https://www.linguabase.org/words-with-spaces.html
2•michaeld123•11m ago•1 comments

Show HN: 50+ open source AI-built SaaS apps

1•bhackett•12m ago•0 comments

The Rubin Observatory Will Rapidly Detect More Supernovae

https://www.universetoday.com/articles/the-rubin-observatory-will-rapidly-detect-more-supernovae
1•rbanffy•12m ago•0 comments

Show HN: Sciro – SDK to detect learner confusion without cameras or mics

https://www.sciro.site/
1•absmugz•12m ago•0 comments

CSS in 2026: The new features reshaping front end development

https://blog.logrocket.com/css-in-2026/
2•ulrischa•13m ago•0 comments

Show HN: pcpb – preview effects of `curl – bash` scripts

https://github.com/federicotdn/pcpb
1•federicotdn•13m ago•0 comments

Alternate Lego builds generated from real set inventories

https://lego-builder-generator.streamlit.app/
1•Vincentsjo•14m ago•0 comments

Founding Engineer at Halfpricesoft – Fintech Solutions

https://www.halfpricesoft.com/career/founding-engineer/
1•mark_ge•14m ago•0 comments

The Home Computer Hybrids: Atari, TI, and the FCC – Creatures of Thought

https://technicshistory.com/2026/01/25/the-home-computer-hybrids/
2•rbanffy•15m ago•0 comments

Show HN: I build production web video and image editors (WebGL, Fabric.js, Next)

https://pablituuu.space/video-editor
1•pablituuu•15m ago•1 comments

The concept of a distance over Riemannian Manifolds

http://science-memo.blogspot.com/2024/04/metric-tensor-basic.html
1•northlondoner•16m ago•1 comments

Show HN: I built an easy to use P2P music streaming site

https://sagasudo.com/
2•soderpop•17m ago•0 comments

Show HN: P.ai.os – A local, modular AI "operating" system for macOS (M4/MLX)

1•vag-mac-mini•18m ago•0 comments

The bachelor tax – what it costs to be single (to the IRS)

https://bachelor-tax.vercel.app/
2•wkaisertexas•20m ago•3 comments

Fungal Degradation of Microplastics: An Environmental Need

https://www.mdpi.com/2305-6304/14/1/70
1•PaulHoule•20m ago•0 comments

Netflix Animation Studios Joins the Blender Development Fund as Corporate Patron

https://www.blender.org/press/netflix-animation-studios-joins-the-blender-development-fund-as-cor...
3•ChrisArchitect•21m ago•0 comments