I’m building *new.knife.day* (https://new.knife.day), a crowd-sourced database of every cutlery maker—from Al Mar to brands so small they barely show up on Google. That means I need an automated way to fetch each brand’s official website, even for fringe names like “Actilam” or “Aiorosu Knives”.
So I threw the task at eight web-enabled LLMs via OpenRouter:
• gpt-4o and gpt-4o-mini
• claude-sonnet-4
• gemini-2.5-pro and gemini-2.0-flash
• llama-3.1-70b
• qwen-2.5-72b
• perplexity sonar-deep-research
Prompt: Return *only* JSON { brand, official_url, confidence }
Data set: 10 obscure knife brands
Scoring: exact domain = correct; “no official site” (with reason) = correct
Costs: OpenRouter prices on 31 May 2025 (Perplexity billed separately)Highlights ----------
• Perplexity hit 10/10 but cost $9.42 (860 k tokens!).
• GPT-4o-mini & Llama-3.1-70B got 9/10 for ~2 ¢ per correct URL.
• Gemini Flash managed 7/10 for $0.001 total—great if you can QA the misses.
• Half of Gemini 2.5 Pro’s replies were HTML tables my parser rejected.
Full table, code, and raw logs are in the post (and on GitHub).Take-aways ----------
1. 90 % accuracy + quick human review often beats 100 % accuracy that costs
45× more.
2. Structured output is part of model quality—validate JSON on arrival.
3. Promo pricing moves fast; always ping the price API before large runs.
Next step: wire GPT-4o-mini into *new.knife.day* so visitors get verified
manufacturer links. Crawling ~250 brands now costs under $5.Curious what you’d improve, and which model you’d bet on for similar “find the canonical URL” tasks. AMA on the setup, prompts, or results!