Simulacrum of Knowledge Work

https://blog.happyfellow.dev/simulacrum-of-knowledge-work/

53•thehappyfellow•5h ago

Comments

balamatom•1h ago

>We've automated ourselves into Goodhart's law.

Yes.

This does not however mean that progress is not being made.

It just means the progress is happening along such dimensions that are completely illegible in terms of the culture of the early XXI century Internet, which is to say in terms of the values of the society which produced it.

downboots•1h ago

Feels like a parallel with https://en.wikipedia.org/wiki/Constructivism_%28philosophy_o... where "it's not valid until you checked"

balamatom•1h ago

I didn't see the connection initially.

firefoxd•1h ago

Everybody's output is someone else's input. When you generate quantity by using an LLM, the other person uses an LLM to parse it and generate their own output from their input. When the very last consumer of the product complains, no one can figure out which part went wrong.

balamatom•1h ago

Well the last consumer is holding it wrong of course. Why? The last consumer is present, and everyone else is behind 7 proxies.

mrtesthah•1h ago

>"is the RLHF judge happy with the answer."

Reinforcement Learning with Verifiable Rewards (RLVR) to improve math and coding success rates seems like an exception.

rowanG077•1h ago

I don't really agree with the premise of the article. Sure proxy measures are everywhere. But for knowledge work specifically you can usually check real quality. Of course it's not as extremely easy as "oh this report contains a few spelling errors", but it is doable. If you accepted work purely based on superficial proxy measures you were not fairly evaluating work at all.

zingar•59m ago

I think there’s a weaker claim that holds true: we were able to ignore lots of content based on the superficial (and pay proper attention to work that passed this test) and now we are overwhelmed because everything meets the superficial criteria and we can’t pay proper attention to all of it.

thehappyfellow•55m ago

That's what I had in mind! The whole post is a claim that evaluating knowledge work got more expensive because cheaper measures stopped correlating well with quality.

If someone was already evaluating the work output using a metric closer to the underlying quality then it might not have been a big shift for them (other than having much more work to evaluate).

rowanG077•28m ago

Yes, I agree that this is true!

You could however only do that if you were fine with unfairly judging the quality of work, as you now readily discarded quality work based on superficial proxies. Which admittedly is done in a lot of cases.

bensyverson•56m ago

The article asserts that the quality of human knowledge work was easier to judge based on proxy measures such as typos and errors, and that the lack of such "tells" in AI poses a problem.

I don't know if I agree with either assertion… I've seen plenty of human-generated knowledge work that was factually correct, well-formatted, and extremely low quality on a conceptual level.

And AI signatures are now easy for people to recognize. In fact, these turns of phrase aren't just recognizable—they're unmistakable. <-- See what I did there?

Having worked with corporate clients for 10 years, I don't view the pre-LLM era as a golden age of high-quality knowledge work. There was a lot of junk that I would also classify as a "working simulacrum of knowledge work."

downboots•53m ago

Yes. I think the main warning here is that it is an added risk. A little glitch here and there until something breaks.

bambax•40m ago

It's not that pre-LLM era was a "golden age of quality", far form it. It's that LLMs have removed yet another tell-tale of rushed bullshit jobs.

bensyverson•24m ago

Have they though?

mbreese•18m ago

I’m also not sure I agree with the assertion that LLMs will produce a high quality (looking) report with correct time frames, lack of typos, and good looking figures. I’m just as willing to disregard human or LLM reports with obvious tells. An LLM or a person can produce work that’s shoddy or error filled. It may be getting harder to differentiate between a good or bad report, but that helps to shift the burden more onto the evaluator.

This is especially true if we start to see more of a split in usage between LLMs based on cost. High quality frontier models might produce better work at a higher cost, but there is also economic cost pressure from the bottom. And just like with human consultants or employees, you’ll pay more for higher quality work.

I’m not quite sure what I’m trying to argue here. But the idea that an LLM won’t produce a low quality report just seemed silly to me.

zby•51m ago

If you have a test that fails 50% times - is that test valuable or not? A 50% failure rate alone looks like a coin toss, but by itself that does not tell us whether the test is noise or whether it is separating bad states from good ones. For a test to be useful it needs to have positive Youden’s statistic (https://en.wikipedia.org/wiki/Youden%27s_J_statistic): sensitivity + specificity - 1. A 50% failure rate alone does not let us calculate sensitivity and specificity.

I can see a similar problem with this article - the author notices that LLMs produce a lot of errors - then concludes that they are useless and produce only simulacrum of work. The author has an interesting observation about how llms disrupt the way we judge knowledge work. But when he concludes that llms do only simulacrum of work - this is where his arguments fail.

card_zero•30m ago

Gee, a thing by a guy, with a name. What are you saying exactly? So the test in question is a test the LLM is asked to carry out, right? Then your point is that if it's a load of vacuous flannel 49% of the time, but meaningful 51% of the time, on average this is genuine work so we can't complain about the 49%?

Wait, you're probably talking about the test of discarding a report based on something superficial like spelling errors. Which fails with LLMs due to their basic conman personalities and smooth talking. And therefore ..?

simianwords•35m ago

The FUD about LLM's will never get old. The way I know and trust LLM's is the same way a manager would trust their reportees to do good work.

For most tasks, the complexity/time required to verify a task is << the time required to do the task itself. Sure there can be hallucinations on the graph that the LLM made. But LLMs are hallucinating much less than before. And the time to verify is much lower than the time required for a human to do the task.

I wrote a post detailing this argument https://simianwords.bearblog.dev/the-generation-vs-verificat...

wxw•32m ago

Ultimately to understand a thing is to do the thing. And to not understand (which is ok!) is to trust others to, proxy measures or not. Agreed that the future of work is in a precarious place: doing less and trusting more only works up to a point.

`simulacrum` is a great word, gotta add that to my vocabulary.

NickNaraghi•24m ago

It's a funny thing to write, like an article in an old newspaper that aged quickly. I suspect that this will be wildly out of date within 2-3 years.

krackers•22m ago

I think it's already out of date with verifiable reward based RL, e.g. on maths domain. When "correctness" arguments fall, the argument will probably just shift to whether it's just "intelligent brute force".

Can you stop beans from making you gassy?

The Free Universal Construction Kit

1-Bit Hokusai's "The Great Wave" (2023)

Using coding assistance tools to revive projects you never were going to finish

New 10 GbE USB adapters are cooler, smaller, cheaper

Mine, an IDE for Coalton and Common Lisp

Simulacrum of Knowledge Work

The Joy of Folding Bikes

Desmond Morris has died

Martin Galway's music source files from 1980's Commodore 64 games

How Hard Is It to Open a File?

Discret 11, the French TV encryption of the 80s

GPT‑5.5 Bio Bug Bounty

Show HN: Kloak, A secret manager that keeps K8s workload away from secrets

What async promised and what it delivered

Lute: A Standalone Runtime for Luau

Which one is more important: more parameters or more computation? (2021)

Hokusai and Tesselations

America's Geothermal Breakthrough Could Unlock a 150-Gigawatt Energy Revolution

Insights into firewood use by early Middle Pleistocene hominins

A web-based RDP client built with Go WebAssembly and grdp

Only one side will be the true successor to MS-DOS – Windows 2.x

The AI Industry Is Discovering That the Public Hates It

Plain text has been around for decades and it’s here to stay

North American Millets Alliance(2023)

Lambda Calculus Benchmark for AI

Replace IBM Quantum back end with /dev/urandom

HEALPix

Commenting and approving pull requests

Sabotaging projects by overthinking, scope creep, and structural diffing

Simulacrum of Knowledge Work

Comments

Can you stop beans from making you gassy?

The Free Universal Construction Kit

1-Bit Hokusai's "The Great Wave" (2023)

Using coding assistance tools to revive projects you never were going to finish

New 10 GbE USB adapters are cooler, smaller, cheaper

Mine, an IDE for Coalton and Common Lisp

Simulacrum of Knowledge Work

The Joy of Folding Bikes

Desmond Morris has died

Martin Galway's music source files from 1980's Commodore 64 games

How Hard Is It to Open a File?

Discret 11, the French TV encryption of the 80s

GPT‑5.5 Bio Bug Bounty

Show HN: Kloak, A secret manager that keeps K8s workload away from secrets

What async promised and what it delivered

Lute: A Standalone Runtime for Luau

Which one is more important: more parameters or more computation? (2021)

Hokusai and Tesselations

America's Geothermal Breakthrough Could Unlock a 150-Gigawatt Energy Revolution

Insights into firewood use by early Middle Pleistocene hominins

A web-based RDP client built with Go WebAssembly and grdp

Only one side will be the true successor to MS-DOS – Windows 2.x

The AI Industry Is Discovering That the Public Hates It

Plain text has been around for decades and it’s here to stay

North American Millets Alliance(2023)

Lambda Calculus Benchmark for AI

Replace IBM Quantum back end with /dev/urandom

HEALPix

Commenting and approving pull requests

Sabotaging projects by overthinking, scope creep, and structural diffing