frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Zig Package Manager Enhancements

https://ziglang.org/devlog/2026/#2026-02-06
2•jackhalford•1m ago•0 comments

Neutron Scans Reveal Hidden Water in Martian Meteorite

https://www.universetoday.com/articles/neutron-scans-reveal-hidden-water-in-famous-martian-meteorite
1•geox•2m ago•0 comments

Deepfaking Orson Welles's Mangled Masterpiece

https://www.newyorker.com/magazine/2026/02/09/deepfaking-orson-welless-mangled-masterpiece
1•fortran77•3m ago•1 comments

France's homegrown open source online office suite

https://github.com/suitenumerique
2•nar001•6m ago•1 comments

SpaceX Delays Mars Plans to Focus on Moon

https://www.wsj.com/science/space-astronomy/spacex-delays-mars-plans-to-focus-on-moon-66d5c542
1•BostonFern•6m ago•0 comments

Jeremy Wade's Mighty Rivers

https://www.youtube.com/playlist?list=PLyOro6vMGsP_xkW6FXxsaeHUkD5e-9AUa
1•saikatsg•6m ago•0 comments

Show HN: MCP App to play backgammon with your LLM

https://github.com/sam-mfb/backgammon-mcp
1•sam256•8m ago•0 comments

AI Command and Staff–Operational Evidence and Insights from Wargaming

https://www.militarystrategymagazine.com/article/ai-command-and-staff-operational-evidence-and-in...
1•tomwphillips•9m ago•0 comments

Show HN: CCBot – Control Claude Code from Telegram via tmux

https://github.com/six-ddc/ccbot
1•sixddc•10m ago•1 comments

Ask HN: Is the CoCo 3 the best 8 bit computer ever made?

1•amichail•12m ago•0 comments

Show HN: Convert your articles into videos in one click

https://vidinie.com/
2•kositheastro•15m ago•0 comments

Red Queen's Race

https://en.wikipedia.org/wiki/Red_Queen%27s_race
2•rzk•15m ago•0 comments

The Anthropic Hive Mind

https://steve-yegge.medium.com/the-anthropic-hive-mind-d01f768f3d7b
2•gozzoo•17m ago•0 comments

A Horrible Conclusion

https://addisoncrump.info/research/a-horrible-conclusion/
1•todsacerdoti•18m ago•0 comments

I spent $10k to automate my research at OpenAI with Codex

https://twitter.com/KarelDoostrlnck/status/2019477361557926281
2•tosh•19m ago•1 comments

From Zero to Hero: A Spring Boot Deep Dive

https://jcob-sikorski.github.io/me/
1•jjcob_sikorski•19m ago•0 comments

Show HN: Solving NP-Complete Structures via Information Noise Subtraction (P=NP)

https://zenodo.org/records/18395618
1•alemonti06•24m ago•1 comments

Cook New Emojis

https://emoji.supply/kitchen/
1•vasanthv•27m ago•0 comments

Show HN: LoKey Typer – A calm typing practice app with ambient soundscapes

https://mcp-tool-shop-org.github.io/LoKey-Typer/
1•mikeyfrilot•30m ago•0 comments

Long-Sought Proof Tames Some of Math's Unruliest Equations

https://www.quantamagazine.org/long-sought-proof-tames-some-of-maths-unruliest-equations-20260206/
1•asplake•31m ago•0 comments

Hacking the last Z80 computer – FOSDEM 2026 [video]

https://fosdem.org/2026/schedule/event/FEHLHY-hacking_the_last_z80_computer_ever_made/
2•michalpleban•31m ago•0 comments

Browser-use for Node.js v0.2.0: TS AI browser automation parity with PY v0.5.11

https://github.com/webllm/browser-use
1•unadlib•32m ago•0 comments

Michael Pollan Says Humanity Is About to Undergo a Revolutionary Change

https://www.nytimes.com/2026/02/07/magazine/michael-pollan-interview.html
2•mitchbob•32m ago•1 comments

Software Engineering Is Back

https://blog.alaindichiappari.dev/p/software-engineering-is-back
2•alainrk•33m ago•1 comments

Storyship: Turn Screen Recordings into Professional Demos

https://storyship.app/
1•JohnsonZou6523•34m ago•0 comments

Reputation Scores for GitHub Accounts

https://shkspr.mobi/blog/2026/02/reputation-scores-for-github-accounts/
2•edent•37m ago•0 comments

A BSOD for All Seasons – Send Bad News via a Kernel Panic

https://bsod-fas.pages.dev/
1•keepamovin•40m ago•0 comments

Show HN: I got tired of copy-pasting between Claude windows, so I built Orcha

https://orcha.nl
1•buildingwdavid•40m ago•0 comments

Omarchy First Impressions

https://brianlovin.com/writing/omarchy-first-impressions-CEEstJk
2•tosh•46m ago•1 comments

Reinforcement Learning from Human Feedback

https://arxiv.org/abs/2504.12501
7•onurkanbkrc•47m ago•0 comments
Open in hackernews

Computer vision is solved if you let the model use tools

https://www.spatial-reasoning.com/share/45dfaeaa-e5a1-4a8c-a8c1-44f9ff5371a4
1•qasimWani•6mo ago

Comments

qasimWani•6mo ago
i previously co-founded a synthetic data company, focused on fine-tuning diffusion models for robotics and manufacturing. the standard approach: generate better data, train smaller models, deploy. recently, reasoning models like o3, grok, and gemini began showing signs of strong spatial awareness. so i tested them on bounding box detection in complex scenes. they failed. badly.

but the reasoning trace showed impressive semantic understanding. the failure wasn’t conceptual. it came from tokenization and decoding limits. the models knew what they were seeing but couldn’t translate it into precise coordinates. (gemini 2.5 performs better because it uses an MoE with task-specific heads).

as such, i built a simple system that gives these models tools:

1. overlay a reference grid (inspired by Set of Marks, Microsoft 2023) to ground them visually

2. crop and zoom into regions of interest

3. call external detectors like Grounding DINO when helpful

with only prompting, this setup enables zero-shot object detection on tasks that traditional vision models fail. for example, detecting the barely visible YC logo on this person's jacket from a linkedin feed screenshot is only possible once you zoom into the right regions [https://www.spatial-reasoning.com/share/45dfaeaa-e5a1-4a8c-a...]

demo here: [spatial-reasoning.com] open-source code: [https://github.com/QasimWani/spatial-reasoning]

curious to hear thoughts. still exploring edge cases and failure modes. might write a more detailed blog if there’s interest.

qasimWani•6mo ago
another harder example: detecting a street sign on market st in sf that only becomes findable after multiple zoom-ins [https://www.spatial-reasoning.com/share/d7bab348-3389-41c7-9...]

one interesting pattern: forcing the model to keep its reasoning chain internal (i.e., no verbose "think step-by-step") actually improves accuracy. it seems to reduce hallucinations and overcorrections. still working on a clearer theory, but shorter chains seem to preserve spatial focus better.

curious how others think tool use like this could generalize.

also open to any references on visual grounding in LMMs. feels like a strangely underexplored space.

sota_pop•6mo ago
I’ve always felt CNNs are much more natural for visual analysis. It’s funny/unfortunate that transformers work SO well that their performance CAN rival CNNs, but it takes so much more work/processing power/model size. CNNs just feel like a more ergonomic fit to the problem (to me), but my experience is rooted in studying DL from when GANs were all the rage and “Attention Is All You Need” was a brand new paper, and admittedly, I need to brush up on my ViT theory.
qasimWani•6mo ago
yeah having that convolution prior is definitely useful when you're dealing with limited amount of data, because you're encoding problem structure into the model, which is why they get away with being trained on fewer samples but with a trade off around generalization.

but i think this moment is quite different because instead of baking everything in the latent space for these models, you're letting them reason how a human would - if i was asked to detect for the street sign i'd first start by zooming into different regions and iteratively figure out what is relevant. Yolo and other models don't do this well enough because they lack the language component which is a must have for complex reasoning like this for example: https://www.spatial-reasoning.com/share/2d4a8827-b227-4f23-a....

Like 4o can't do this even though it most likely has the same vision encoder as o4. this is the power of reasoning.

sota_pop•6mo ago
Isn’t this (subdividing into regions and analyzing each region within the context of the overall image) - essentially - the methodology of the YOLO algorithm?