frontpage.

Show HN: YoloForge – Create object detection datasets using Gemini 3 Pro

https://yoloforge.com

3•Olibier•1mo ago

Hi HN, I’m the creator of YoloForge. I built this because I hit a wall with a hobby computer vision project: I needed a custom dataset, and zero-shot tools like Grounding DINO just weren't accurate enough for my specific classes. I decided I’d rather write code for a couple of weeks than draw another box by hand.

I previously experimented with Grounding DINO and SAM3. While they are amazing for generic objects, I found they struggle with specific semantic requests (e.g. specific manufacturing parts, game characters or distinguishing "a worker" from "a worker without a helmet").

I discovered that Gemini 3 Pro is surprisingly underrated for bounding box tasks if you prompt it with detailed visual descriptions. It handles semantic understanding significantly better than standard zero-shot detectors.

url: yoloforge.com

The Workflow:

Upload a zip of raw images (stored in Cloudflare R2). Describe class/classes in plain English. The system generates a .jsonl batch file and sends it to the Gemini Batch API. This allows us to process thousands of images in parallel at 50% of the standard cost. You review/correct boxes in the UI and export the YOLO train/val/test dataset.

Technical Challenges:

One hard part was getting valid JSON out of the LLM consistently. I ended up writing a robust parser that uses regex fallback strategies to literally "salvage" valid bounding boxes from malformed responses.

The Stack:

- Frontend: Next.js - Backend: FastAPI, Celery (for async zip processing and polling the batch API), Redis. - Storage: Supabase (Auth/DB), Cloudflare R2 (Image Storage). - Model: Google Gemini 3 Pro via Batch API.

There is a live demo on the landing page (no sign-up required) where you can upload a single image to test the detection logic. But of course the tool really shines with datasets that have thousands of images with multiple classes.

If you have any technical questions please ask!

Show HN: Are You Random? – A game that predicts your "random" choices

Poland to probe possible links between Epstein and Russia

Effectiveness of AI detection tools in identifying AI-generated articles

Warsaw Circle

Reverse Engineering Raiders of the Lost Ark for the Atari 2600

The AI4Agile Practitioners Report 2026

Digital Independence Day

What a bot hacking attempt looks like: SQL injections galore

Show HN: FlashMesh – An encrypted file mesh across Google Drive and Dropbox

Show HN: AgentLens – Open-source observability and audit trail for AI agents

Show HN: ShipClaw – Deploy OpenClaw to the Cloud in One Click

Unlock the Power of Real-Time Google Trends Visit: Www.daily-Trending.org

Explanation of British Class System

Show HN: Jwtpeek – minimal, user-friendly JWT inspector in Go

Willow – Protocols for an uncertain future [video]

Feedback on a client-side, privacy-first PDF editor I built

Clay Christensen's Milkshake Marketing (2011)

Show HN: WeaveMind – AI Workflows with human-in-the-loop

Show HN: Seedream 5.0: free AI image generator that claims strong text rendering

A contributor trust management system based on explicit vouches

Show HN: Analyzing 9 years of HN side projects that reached $500/month

The Floating Dock for Developers

Arcan Explained – A browser for different webs

We are not scared of AI, we are scared of irrelevance

Quartz Crystals

Show HN: I built a free dictionary API to avoid API keys

Show HN: Kybera – Agentic Smart Wallet with AI Osint and Reputation Tracking

Show HN: brew changelog – find upstream changelogs for Homebrew packages

Any chess position with 8 pieces on board and one pair of pawns has been solved

LLMs as Language Compilers: Lessons from Fortran for the Future of Coding