frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: MCP Server for TradeStation

https://github.com/theelderwand/tradestation-mcp
1•theelderwand•2m ago•0 comments

Canada unveils auto industry plan in latest pivot away from US

https://www.bbc.com/news/articles/cvgd2j80klmo
1•breve•3m ago•0 comments

The essential Reinhold Niebuhr: selected essays and addresses

https://archive.org/details/essentialreinhol0000nieb
1•baxtr•5m ago•0 comments

Rentahuman.ai Turns Humans into On-Demand Labor for AI Agents

https://www.forbes.com/sites/ronschmelzer/2026/02/05/when-ai-agents-start-hiring-humans-rentahuma...
1•tempodox•7m ago•0 comments

StovexGlobal – Compliance Gaps to Note

1•ReviewShield•10m ago•0 comments

Show HN: Afelyon – Turns Jira tickets into production-ready PRs (multi-repo)

https://afelyon.com/
1•AbduNebu•11m ago•0 comments

Trump says America should move on from Epstein – it may not be that easy

https://www.bbc.com/news/articles/cy4gj71z0m0o
3•tempodox•12m ago•0 comments

Tiny Clippy – A native Office Assistant built in Rust and egui

https://github.com/salva-imm/tiny-clippy
1•salvadorda656•16m ago•0 comments

LegalArgumentException: From Courtrooms to Clojure – Sen [video]

https://www.youtube.com/watch?v=cmMQbsOTX-o
1•adityaathalye•19m ago•0 comments

US moves to deport 5-year-old detained in Minnesota

https://www.reuters.com/legal/government/us-moves-deport-5-year-old-detained-minnesota-2026-02-06/
2•petethomas•22m ago•1 comments

If you lose your passport in Austria, head for McDonald's Golden Arches

https://www.cbsnews.com/news/us-embassy-mcdonalds-restaurants-austria-hotline-americans-consular-...
1•thunderbong•27m ago•0 comments

Show HN: Mermaid Formatter – CLI and library to auto-format Mermaid diagrams

https://github.com/chenyanchen/mermaid-formatter
1•astm•42m ago•0 comments

RFCs vs. READMEs: The Evolution of Protocols

https://h3manth.com/scribe/rfcs-vs-readmes/
2•init0•49m ago•1 comments

Kanchipuram Saris and Thinking Machines

https://altermag.com/articles/kanchipuram-saris-and-thinking-machines
1•trojanalert•49m ago•0 comments

Chinese chemical supplier causes global baby formula recall

https://www.reuters.com/business/healthcare-pharmaceuticals/nestle-widens-french-infant-formula-r...
1•fkdk•52m ago•0 comments

I've used AI to write 100% of my code for a year as an engineer

https://old.reddit.com/r/ClaudeCode/comments/1qxvobt/ive_used_ai_to_write_100_of_my_code_for_1_ye...
2•ukuina•54m ago•1 comments

Looking for 4 Autistic Co-Founders for AI Startup (Equity-Based)

1•au-ai-aisl•1h ago•1 comments

AI-native capabilities, a new API Catalog, and updated plans and pricing

https://blog.postman.com/new-capabilities-march-2026/
1•thunderbong•1h ago•0 comments

What changed in tech from 2010 to 2020?

https://www.tedsanders.com/what-changed-in-tech-from-2010-to-2020/
3•endorphine•1h ago•0 comments

From Human Ergonomics to Agent Ergonomics

https://wesmckinney.com/blog/agent-ergonomics/
1•Anon84•1h ago•0 comments

Advanced Inertial Reference Sphere

https://en.wikipedia.org/wiki/Advanced_Inertial_Reference_Sphere
1•cyanf•1h ago•0 comments

Toyota Developing a Console-Grade, Open-Source Game Engine with Flutter and Dart

https://www.phoronix.com/news/Fluorite-Toyota-Game-Engine
2•computer23•1h ago•0 comments

Typing for Love or Money: The Hidden Labor Behind Modern Literary Masterpieces

https://publicdomainreview.org/essay/typing-for-love-or-money/
1•prismatic•1h ago•0 comments

Show HN: A longitudinal health record built from fragmented medical data

https://myaether.live
1•takmak007•1h ago•0 comments

CoreWeave's $30B Bet on GPU Market Infrastructure

https://davefriedman.substack.com/p/coreweaves-30-billion-bet-on-gpu
1•gmays•1h ago•0 comments

Creating and Hosting a Static Website on Cloudflare for Free

https://benjaminsmallwood.com/blog/creating-and-hosting-a-static-website-on-cloudflare-for-free/
1•bensmallwood•1h ago•1 comments

"The Stanford scam proves America is becoming a nation of grifters"

https://www.thetimes.com/us/news-today/article/students-stanford-grifters-ivy-league-w2g5z768z
5•cwwc•1h ago•0 comments

Elon Musk on Space GPUs, AI, Optimus, and His Manufacturing Method

https://cheekypint.substack.com/p/elon-musk-on-space-gpus-ai-optimus
2•simonebrunozzi•1h ago•0 comments

X (Twitter) is back with a new X API Pay-Per-Use model

https://developer.x.com/
3•eeko_systems•1h ago•0 comments

Zlob.h 100% POSIX and glibc compatible globbing lib that is faste and better

https://github.com/dmtrKovalenko/zlob
3•neogoose•2h ago•1 comments
Open in hackernews

Computer vision is solved if you let the model use tools

https://www.spatial-reasoning.com/share/45dfaeaa-e5a1-4a8c-a8c1-44f9ff5371a4
1•qasimWani•6mo ago

Comments

qasimWani•6mo ago
i previously co-founded a synthetic data company, focused on fine-tuning diffusion models for robotics and manufacturing. the standard approach: generate better data, train smaller models, deploy. recently, reasoning models like o3, grok, and gemini began showing signs of strong spatial awareness. so i tested them on bounding box detection in complex scenes. they failed. badly.

but the reasoning trace showed impressive semantic understanding. the failure wasn’t conceptual. it came from tokenization and decoding limits. the models knew what they were seeing but couldn’t translate it into precise coordinates. (gemini 2.5 performs better because it uses an MoE with task-specific heads).

as such, i built a simple system that gives these models tools:

1. overlay a reference grid (inspired by Set of Marks, Microsoft 2023) to ground them visually

2. crop and zoom into regions of interest

3. call external detectors like Grounding DINO when helpful

with only prompting, this setup enables zero-shot object detection on tasks that traditional vision models fail. for example, detecting the barely visible YC logo on this person's jacket from a linkedin feed screenshot is only possible once you zoom into the right regions [https://www.spatial-reasoning.com/share/45dfaeaa-e5a1-4a8c-a...]

demo here: [spatial-reasoning.com] open-source code: [https://github.com/QasimWani/spatial-reasoning]

curious to hear thoughts. still exploring edge cases and failure modes. might write a more detailed blog if there’s interest.

qasimWani•6mo ago
another harder example: detecting a street sign on market st in sf that only becomes findable after multiple zoom-ins [https://www.spatial-reasoning.com/share/d7bab348-3389-41c7-9...]

one interesting pattern: forcing the model to keep its reasoning chain internal (i.e., no verbose "think step-by-step") actually improves accuracy. it seems to reduce hallucinations and overcorrections. still working on a clearer theory, but shorter chains seem to preserve spatial focus better.

curious how others think tool use like this could generalize.

also open to any references on visual grounding in LMMs. feels like a strangely underexplored space.

sota_pop•6mo ago
I’ve always felt CNNs are much more natural for visual analysis. It’s funny/unfortunate that transformers work SO well that their performance CAN rival CNNs, but it takes so much more work/processing power/model size. CNNs just feel like a more ergonomic fit to the problem (to me), but my experience is rooted in studying DL from when GANs were all the rage and “Attention Is All You Need” was a brand new paper, and admittedly, I need to brush up on my ViT theory.
qasimWani•6mo ago
yeah having that convolution prior is definitely useful when you're dealing with limited amount of data, because you're encoding problem structure into the model, which is why they get away with being trained on fewer samples but with a trade off around generalization.

but i think this moment is quite different because instead of baking everything in the latent space for these models, you're letting them reason how a human would - if i was asked to detect for the street sign i'd first start by zooming into different regions and iteratively figure out what is relevant. Yolo and other models don't do this well enough because they lack the language component which is a must have for complex reasoning like this for example: https://www.spatial-reasoning.com/share/2d4a8827-b227-4f23-a....

Like 4o can't do this even though it most likely has the same vision encoder as o4. this is the power of reasoning.

sota_pop•6mo ago
Isn’t this (subdividing into regions and analyzing each region within the context of the overall image) - essentially - the methodology of the YOLO algorithm?