news newest ask show jobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Pony Alpha: New free 200K context model for coding, reasoning and roleplay

https://ponyalpha.pro

1•qzcanoe•44s ago•1 comments

Show HN: Tunbot – Discord bot for temporary Cloudflare tunnels behind CGNAT

https://github.com/Goofygiraffe06/tunbot

1•g1raffe•3m ago•0 comments

Open Problems in Mechanistic Interpretability

https://arxiv.org/abs/2501.16496

1•vinhnx•8m ago•0 comments

Bye Bye Humanity: The Potential AMOC Collapse

https://thatjoescott.com/2026/02/03/bye-bye-humanity-the-potential-amoc-collapse/

1•rolph•13m ago•0 comments

Dexter: Claude-Code-Style Agent for Financial Statements and Valuation

https://github.com/virattt/dexter

1•Lwrless•15m ago•0 comments

Digital Iris [video]

https://www.youtube.com/watch?v=Kg_2MAgS_pE

1•vermilingua•20m ago•0 comments

Essential CDN: The CDN that lets you do more than JavaScript

https://essentialcdn.fluidity.workers.dev/

1•telui•20m ago•1 comments

They Hijacked Our Tech [video]

https://www.youtube.com/watch?v=-nJM5HvnT5k

1•cedel2k1•24m ago•0 comments

Vouch

https://twitter.com/mitchellh/status/2020252149117313349

19•chwtutha•24m ago•1 comments

HRL Labs in Malibu laying off 1/3 of their workforce

https://www.dailynews.com/2026/02/06/hrl-labs-cuts-376-jobs-in-malibu-after-losing-government-work/

2•osnium123•25m ago•1 comments

Show HN: High-performance bidirectional list for React, React Native, and Vue

https://suhaotian.github.io/broad-infinite-list/

2•jeremy_su•26m ago•0 comments

Show HN: I built a Mac screen recorder Recap.Studio

https://recap.studio/

1•fx31xo•29m ago•0 comments

Ask HN: Codex 5.3 broke toolcalls? Opus 4.6 ignores instructions?

1•kachapopopow•35m ago•0 comments

Vectors and HNSW for Dummies

https://anvitra.ai/blog/vectors-and-hnsw/

1•melvinodsa•36m ago•0 comments

Sanskrit AI beats CleanRL SOTA by 125%

https://huggingface.co/ParamTatva/sanskrit-ppo-hopper-v5/blob/main/docs/blog.md

1•prabhatkr•48m ago•1 comments

'Washington Post' CEO resigns after going AWOL during job cuts

https://www.npr.org/2026/02/07/nx-s1-5705413/washington-post-ceo-resigns-will-lewis

2•thread_id•48m ago•1 comments

Claude Opus 4.6 Fast Mode: 2.5× faster, ~6× more expensive

https://twitter.com/claudeai/status/2020207322124132504

1•geeknews•50m ago•0 comments

TSMC to produce 3-nanometer chips in Japan

https://www3.nhk.or.jp/nhkworld/en/news/20260205_B4/

3•cwwc•52m ago•0 comments

Quantization-Aware Distillation

http://ternarysearch.blogspot.com/2026/02/quantization-aware-distillation.html

1•paladin314159•53m ago•0 comments

List of Musical Genres

https://en.wikipedia.org/wiki/List_of_music_genres_and_styles

1•omosubi•55m ago•0 comments

Show HN: Sknet.ai – AI agents debate on a forum, no humans posting

https://sknet.ai/

1•BeinerChes•55m ago•0 comments

University of Waterloo Webring

https://cs.uwatering.com/

2•ark296•55m ago•0 comments

Large tech companies don't need heroes

https://www.seangoedecke.com/heroism/

2•medbar•57m ago•0 comments

Backing up all the little things with a Pi5

https://alexlance.blog/nas.html

1•alance•57m ago•1 comments

Game of Trees (Got)

https://www.gameoftrees.org/

2•akagusu•58m ago•1 comments

Human Systems Research Submolt

https://www.moltbook.com/m/humansystems

1•cl42•58m ago•0 comments

The Threads Algorithm Loves Rage Bait

https://blog.popey.com/2026/02/the-threads-algorithm-loves-rage-bait/

1•MBCook•1h ago•0 comments

Search NYC open data to find building health complaints and other issues

https://www.nycbuildingcheck.com/

1•aej11•1h ago•0 comments

Michael Pollan Says Humanity Is About to Undergo a Revolutionary Change

https://www.nytimes.com/2026/02/07/magazine/michael-pollan-interview.html

2•lxm•1h ago•0 comments

Show HN: Grovia – Long-Range Greenhouse Monitoring System

https://github.com/benb0jangles/Remote-greenhouse-monitor

1•benbojangles•1h ago•1 comments

Open in hackernews

Lessons from testing three AI agents on the same complex task

https://prashamhtrivedi.in/ai-agent-comparison-claude-gemini-codex/

3•prash2488•2mo ago

Comments

prash2488•2mo ago

Gave Claude Code, Gemini CLI, and Codex CLI identical instructions: analyze 13 years of writing across three blogs (2 of them are in my regional language which is non english), create a style guide.

Observations:

1. Model-task matching matters. Codex's default code-specialized model struggled with writing analysis. Switching to GPT-5 improved output quality 4x.

2. Autonomy settings affect completion. Gemini with limited autonomy produced incomplete work—it kept pausing for approvals mid-task.

3. All three claimed "done." Output varied from 198 to 2,555 lines. Never trust completion claims without verification.

4. Deep reading beat clever shortcuts. Codex took an API-first approach (RSS, JSON endpoints). Valid methodology, but missed nuances that Claude caught by reading posts directly.

Claude won at 9.5/10, but the more interesting finding was how much configuration affected the other two agents' scores.

Full analysis with methodology in the post linked.