frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Ask HN: Should my business focus on AI training data instead?

3•aluxnder•3h ago
I run a YouTube operation built on high-quality, screen-recorded software tutorials. We’ve produced 75k videos (2–5 min each) in a couple of months using a trained team of 20 operators. The business is profitable, and the production pipeline is consistent, cheap and scalable.

However, I’m considering whether what we’ve built is more valuable as AI agent training/evaluation data. Beyond videos, we can reliably produce:

- Human demonstrations of web tasks

- Event logs, (click/type/url/timing, JSONL) and replay scripts (e.g Playwright)

- Evaluation runs, (pass/fail, action scoring, error taxonomy)

- Preference labels with rationales (RLAIF/RLHF)

- PII-safe/redacted outputs with QA metrics

I’m looking for some validation from anyone in the industry:

1. Is large-scale human web-task data (video + structured logs) actually useful for training or benchmarking browser/agent systems?

2. What formats/metadata are most useful (schemas, DOM cues, screenshots, replays, rationales)?

3. Do teams prefer custom task generation on demand or curated non-exclusive corpora?

4. Is there any demand for this? If so any recommendations of where to start? (I think i have a decent idea about this)

Im trying to decide whether to formalise this into a structured data/eval offering. Technical, candid feedback is much appreciated!

Comments

alganet•1h ago
Are you sure you don't want to do SaaS?

AI Workers Are Putting in 100-Hour Workweeks to Win the New Tech Arms Race

https://www.wsj.com/tech/ai/ai-race-tech-workers-schedule-1ea9a116
1•doener•5m ago•0 comments

What GPU pricing can tell us about how the AI bubble will pop

https://www.ft.com/content/d49707ae-5d6b-473e-9e2b-487d318e6fe9
2•KnuthIsGod•6m ago•0 comments

Our Voice-AI Assistant Hit Unit Profit – Thanks to Haiku 4.5

https://www.indiehackers.com/post/our-voice-ai-assistant-hit-unit-profit-thanks-to-haiku-4-5-8CKe...
1•Norcim133•7m ago•0 comments

Convention over LLM (CoL) – A novel approach to coding

1•pyeri•8m ago•0 comments

Meta's Alexandr Wang on why the AI team just laid off 600 workers

https://www.businessinsider.com/alexandr-wang-meta-superintelligence-labs-layoffs-memo-2025-10
1•nedsma•14m ago•0 comments

AI capitalism provokes the next financial crisis (German)

https://www.boersen-zeitung.de/konjunktur-politik/ki-kapitalismus-provoziert-naechste-finanzkrise
2•doener•15m ago•0 comments

Motor Temperature Estimation Without a Sensor

https://build-its-inprogress.blogspot.com/2019/10/motor-temperature-estimation-without.html
1•pillars•18m ago•0 comments

Reflections on My Tech Career – Part 1

https://randomascii.wordpress.com/2025/10/22/reflections-on-my-tech-career-part-1/
1•nikbackm•21m ago•0 comments

What it costs to keep up with the A.I. boom

https://messaging-custom-newsletters.nytimes.com/dynamic/render
1•doener•23m ago•0 comments

OpenAI, ARIA, and SEO: Making the Web Worse

https://adrianroselli.com/2025/10/openai-aria-and-seo-making-the-web-worse.html
1•tobr•26m ago•0 comments

Mosquitoes found in Iceland for first time after record heat

https://www.bbc.com/news/articles/clyz3vv62pgo
1•tellarin•28m ago•0 comments

Ask HN: Would you like some help with your OSS project?

1•seph-reed•33m ago•0 comments

A new API for interrupt-aware spinlocks

https://lwn.net/Articles/1039374/
2•pykello•51m ago•0 comments

Andreessen Horowitz lines up $10B for next wave of tech bets

https://www.ft.com/content/92262343-b4e0-406e-8a01-2199d45d719e
3•aarghh•1h ago•0 comments

Measured AI

https://notetoself.studio/post/measured-ai/
2•SLHamlet•1h ago•0 comments

Fixing UUIDv7 (for database use-cases)

https://brooker.co.za/blog/2025/10/22/uuidv7.html
3•mebcitto•1h ago•0 comments

Nobody Is Buying Apple's iPhone Air [video]

https://www.youtube.com/watch?v=bpz_DfxSnsc
3•behnamoh•1h ago•0 comments

The mild mannered Englishman who was the most prolific ghost hunter

https://lithub.com/the-mild-mannered-englishman-who-was-the-worlds-most-prolific-ghost-hunter/
6•gmays•1h ago•0 comments

Too Much Choice, Not Enough Verification

https://www.aivojournal.org/too-much-choice-not-enough-verification/
2•businessmate•1h ago•1 comments

Why Everyone Is Leaving New Zealand [video]

https://www.youtube.com/watch?v=F_VUBpALcVE
7•l8rlump•1h ago•2 comments

Archaeology of the IBM PC110 [video]

https://www.youtube.com/watch?v=8Uja7g9hQlo
2•gilad•1h ago•0 comments

Fed Lost Access to Private Jobs Data Ahead of Government Shutdown

https://www.wsj.com/economy/central-banking/fed-lost-access-to-private-jobs-data-ahead-of-governm...
3•zerosizedweasle•1h ago•0 comments

I built Parall – a native macOS app to run multiple instances of any app

https://parall.app/
2•IGHOR•1h ago•1 comments

Mapping smoking rates by state with Matplotlib and geopandas

https://aaronjbecker.com/posts/matplotlib-choropleth-mapping-smoking-rates/
1•aaronjbecker•1h ago•0 comments

Musk Hijacks Tesla Earnings Call to Pitch $1T Pay Plan

https://www.bloomberg.com/news/articles/2025-10-22/musk-hijacks-tesla-earnings-call-to-pitch-1-tr...
6•zerosizedweasle•1h ago•1 comments

Tesla's increased costs outweighed its revenue growth

https://www.cnbc.com/2025/10/23/cnbc-daily-open-teslas-increased-costs-outweighed-its-revenue-gro...
10•zerosizedweasle•1h ago•0 comments

Devman's RaaS launch: the affiliate who aims to become the boss

https://analyst1.com/devmans-raas-launch-the-affiliate-who-aims-to-become-the-boss/
1•ropable•1h ago•0 comments

The OS/2 Display Driver Zoo

http://www.os2museum.com/wp/the-os-2-display-driver-zoo/
2•userbinator•1h ago•0 comments

Clojure Zippers

https://grishaev.me/en/clojure-zippers/
2•prydt•1h ago•0 comments

The First Data-Driven Platform That Makes Hosting Comparisons Fair

2•Hostingmoz•1h ago•0 comments