frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Show HN: Which LLM Finds Obscure Knife-Brand URLs Cheapest? (8-Model Benchmark)

https://new.knife.day/blog/using-llms-for-knife-brand-research
2•p-s-v•1d ago
Hi HN,

I’m building *new.knife.day* (https://new.knife.day), a crowd-sourced database of every cutlery maker—from Al Mar to brands so small they barely show up on Google. That means I need an automated way to fetch each brand’s official website, even for fringe names like “Actilam” or “Aiorosu Knives”.

So I threw the task at eight web-enabled LLMs via OpenRouter:

  • gpt-4o and gpt-4o-mini
  • claude-sonnet-4
  • gemini-2.5-pro and gemini-2.0-flash
  • llama-3.1-70b
  • qwen-2.5-72b
  • perplexity sonar-deep-research
Prompt: Return *only* JSON { brand, official_url, confidence } Data set: 10 obscure knife brands Scoring: exact domain = correct; “no official site” (with reason) = correct Costs: OpenRouter prices on 31 May 2025 (Perplexity billed separately)

Highlights ----------

  • Perplexity hit 10/10 but cost $9.42 (860 k tokens!).
  • GPT-4o-mini & Llama-3.1-70B got 9/10 for ~2 ¢ per correct URL.
  • Gemini Flash managed 7/10 for $0.001 total—great if you can QA the misses.
  • Half of Gemini 2.5 Pro’s replies were HTML tables my parser rejected.
Full table, code, and raw logs are in the post (and on GitHub).

Take-aways ----------

  1. 90 % accuracy + quick human review often beats 100 % accuracy that costs
     45× more.
  2. Structured output is part of model quality—validate JSON on arrival.
  3. Promo pricing moves fast; always ping the price API before large runs.
Next step: wire GPT-4o-mini into *new.knife.day* so visitors get verified manufacturer links. Crawling ~250 brands now costs under $5.

Curious what you’d improve, and which model you’d bet on for similar “find the canonical URL” tasks. AMA on the setup, prompts, or results!

Microsoft backed AI startup pretending to be AI filed for bankruptcy

https://www.windowscentral.com/microsoft/builder-ai-collapse-microsoft-backed-fake-ai-services
1•jayaprabhakar•3m ago•1 comments

Vibe Coding: Where it works and where it doesn't

https://sachin.devicion.com/blog/vibe-coding-where-it-works-and-where-it-does-not
1•sachin_rcz•11m ago•0 comments

Neuroscience How Much Energy Does It Take to Think?

https://www.quantamagazine.org/how-much-energy-does-it-take-to-think-20250604/
1•nsoonhui•13m ago•0 comments

Dix – Nix Derivation Diff

https://github.com/bloxx12/dix
1•RGBCube•16m ago•0 comments

WizWhisp – a local whisper GUI app for audio/video-to-text on Windows

https://apps.microsoft.com/detail/9pgq3h6jxl4c?hl=en-US&gl=US
1•logicflux•21m ago•0 comments

Timeline of Audio Formats

https://en.wikipedia.org/wiki/Timeline_of_audio_formats
1•exvi•23m ago•0 comments

Self-hosting your own media considered harmful according to YouTube

https://www.jeffgeerling.com/blog/2025/self-hosting-your-own-media-considered-harmful
3•DavideNL•23m ago•1 comments

Show HN: Tectonic Plates Physics Simulator That Generates Maps

https://github.com/jia75/tectonical
1•jia75•25m ago•1 comments

Guide to the History and Beliefs of Roman Catholicism

https://www.thecollector.com/what-do-roman-catholics-believe/
1•Tomte•31m ago•0 comments

The permanent place to store and share all your digital memories in the cloud

https://www.forever.com/preserve-and-share
1•tevrede•35m ago•0 comments

Show HN: A Discord Note Taker - my new year's resolution of finishing a project

https://hedabot.com
1•parker01011001•42m ago•0 comments

Online Media Is at a Fork in the Road, So We're Removing Ads for Members

https://www.theautopian.com/online-media-is-at-a-fork-in-the-road-so-were-removing-ads-for-members/
2•riffraff•47m ago•1 comments

Cory Doctorow on how we lost the internet

https://lwn.net/Articles/1021871/
2•signa11•55m ago•1 comments

Ask HN: How to Teach AI?

1•etienne89•57m ago•0 comments

Discord CTO says he's "constantly bringing up enshittification" during meetings

https://arstechnica.com/gadgets/2025/06/discord-cto-says-hes-constantly-bringing-up-enshittification-during-meetings/
2•ramn7•1h ago•0 comments

Show HN: Memotron – PKM Tool for All

https://memotron.app
2•thyaravind•1h ago•0 comments

My Advice on (Internet) Writing

https://dynomight.net/writing-advice/
1•Curiositry•1h ago•0 comments

Smart screen capture with AI insights

https://cognimate.app
1•dennisweng•1h ago•0 comments

Functionally banning school pizza is a tough sell

https://www.theatlantic.com/health/archive/2025/06/rfk-jr-maha-school-pizza/683040/
2•fortran77•1h ago•0 comments

Quantum Mixed-State Self-Attention Network

https://arxiv.org/abs/2403.02871
1•fs_tab•1h ago•0 comments

Nucleus Launches Embryo

https://mynucleus.com/embryo/press
1•euvin•1h ago•0 comments

Show HN: Most users won't report bugs unless you make it stupidly easy

https://bugdrop.app
3•lakshikag•1h ago•2 comments

Knuth-Bendix Completion Calculator

https://karldray.com/knuth-bendix/
3•karldray•1h ago•0 comments

According to Nielsen, No One Is Watching Anime

https://animebythenumbers.substack.com/p/nielsen-anime
3•zdw•2h ago•0 comments

Switch 2 factory firmware spotted in the wild

https://gbatemp.net/threads/switch-2-factory-firmware-spotted-in-the-wild.671975/
2•takoid•2h ago•0 comments

We should protect the high seas from all extraction, forever

https://www.nature.com/articles/d41586-025-01665-0
3•jdnier•2h ago•0 comments

Chasing Big Money with the Health-Care Hustlers of South Florida

https://www.bloomberg.com/features/2025-deepfake-ads-fueled-florida-health-insurance-scheme/
1•toomuchtodo•2h ago•1 comments

LTX Studio API v1 Featuring LTX-Video and FLUX.1 Kontext

https://useapi.net/blog/250603
1•useapi•2h ago•0 comments

The Beer Gut 2

https://substack.com/inbox/post/165233742
2•thunderbong•2h ago•1 comments

Mexican high school student launches mental health app

https://nomadful.io
1•liquidiguisante•2h ago•1 comments