Ask HN: How do companies like OpenAI, Perplexity fine tune rich output?

8•agaase19•7mo ago

I see fine tune as one of the major ways companies like OpenAI, Perplexity, Claude companies differ when it comes to provide higher quality of answers (correct me if I am wrong).

One curious question is how do they fine tune rich data (markdown, html outputs, tables, graphs etc) at scale. Currently, performing fine tuning involves the laborious process of carefully editing inputs (prompts) and outputs one by one. Becomes more difficult as the data context increases and one has to carefully examine the input data and provide the right output including things like formatting, grammar, UI etc.

Considering such a wide variety of questions they are processing, it amazes me how are they doing it at scale. Any thoughts?

Comments

pizza•7mo ago

Anything with a linter means, at minimum, free verifiable rewards for RL (though whether something parses versus looks good is another story). That, plus, they have more data than anyone, and also it seems somewhat reasonable that stronger models could learn 'more' from a given instance or set of examples.

agaase19•7mo ago

Can you elaborate on "linter means and verifiable rewards for RL"? Is this something others would find extremely difficult to do ?

holden_nelson•7mo ago

They’re saying that they can use linters to check the output from a reinforcement learning model and reward it for correct output.

Former Tumblr Head Jeff D'Onofrio Steps in as Acting CEO at the Washington Post

Bounded Flexible Arrays in C

The Invisible Labor Force Powering AI

Reading Recursion via Pascal

Show HN: I made a website that finds patterns on your spreadsheet

Jokes on You AI: Turning the Tables – LLMs for Learning

You don't need RAG in 2026

WatchLLM – Cost kill switch for AI agents (with loop detection)

I turned myself into an AI-generated deathbot – here's what I found

Management style doesn't predict survival

One Generation Runs the Country. The Next Cashed in on Crypto

"I Was Wrong": Why the Civil War Is Running Late [video][2h21m]

Show HN: A sandboxed execution environment for AI agents via WASM

Wine-Staging 11.2 Brings More Patches to Help Adobe Photoshop on Linux

The Nature of the Beast

From Prediction to Compilation: A Manifesto for Intrinsically Reliable AI

Show HN: Curated list of 1000 open source alternatives to proprietary software

AI's Real Problem Is Illegitimacy, Not Hallucination

'I fell into it': ex-criminal hackers urge UK pupils to use web skills for good

Why 175-Year-Old Glassmaker Corning Is Suddenly an AI Superstar

Keeping WSL Alive

Unlocking core memories with GoldSrc engine and CS 1.6 (2025)

Gtrace an advanced network path analysis tool

America does not trust Putin or Trump

Let's Do Music in Linux [video]

"Nothing" is the secret to structuring your work

AI Makes the Easy Part Easier and the Hard Part Harder

Show HN: Fine-tuned Qwen2.5-7B on 100 films for probabilistic story graphs

A failed wantrepreneur's view on common startup advice

Show HN: BestClaw Simple OpenClaw/MoltBot for non tech people