Anthropic Education the AI Fluency Index

https://www.anthropic.com/research/AI-fluency-index

41•armcat•4h ago

Comments

Kye•3h ago

You could arrive at the essence of this by just having read and internalized Carl Sagan's The Demon-Haunted World. Especially the Baloney Detection Kit.

In my experience good prompting is mostly just good thinking.

esafak•21m ago

And having the experience and judgment to ask the right thing.

bigstrat2003•1h ago

To the extent that this should be a thing, there are very few people I would want doing it less than the company who has repeatedly been caught lying about its product's achievements. Anthropic should not be taken seriously after their track record.

bargainbin•1h ago

I’m not alone in finding this against the claims of the product right?

Claude is meant to be so clever it can replace all white collar work in the next n-years, but also “you’re not using it right?” Which one is it?

SpicyLemonZest•1h ago

I'm not quite convinced of the maximalist claims, but these two aren't incompatible. Every time we talk about a company being "mismanaged" by e.g. a private equity buyout, what we mean is that the owners had access to a large volume of high quality white collar work but couldn't figure out how to use it right.

dsr_•1h ago

Which one will convince you to buy more Claude? Please answer honestly, it's for the sake of profits.

rsynnott•38m ago

Anthropic in particular seem to be in a weird place where on the one hand they fund some real research, which is often not all roses and sunshine for them, but on the other hand, like all AI companies, they feel the need to make absurdly over-the-top claims about what's coming up Real Soon Now(TM).

sarkarghya•1h ago

Honestly to use llms properly all you need to know is that it’s a next word (or action) prediction model and like all models increased entropy hurts it. Try to reduce entropy to get better results. Rest is just sugarcoated nonsense. To use llms properly you need a physics class.

Barbing•1h ago

Which class? Or what subjects

rishabhaiover•1h ago

And then some alignment, prompting structure, and task decomposition.

dmk•1h ago

So I guess the key takeaway is basically that the better Claude gets at producing polished output, the less users bother questioning it. They found that artifact conversations have lower rates of fact-checking and reasoning challenges across the board. That's kind of an uncomfortable loop for a company selling increasingly capable models.

Florin_Andrei•1h ago

I think we're still at the stage where model performance largely depends on:

- how many data sources it has access to

- the quality of your prompts

So, if prompting quality decreases, so does model performance.

dmk•1h ago

Sure, but the study is saying something slightly different, it's not that people write bad prompts for artifacts, they actually write better ones (more specific, more examples, clearer goals,...). They just stop evaluating the result. So the input quality goes up but the quality control goes down.

candiddevmike•1h ago

What does prompting quality even mean, empirically? I feel like the LLM providers could/should provide prompt scoring as some kind of metric and provide hints to users on ways they can improve (possibly including ways the LLM is specifically trained to act for a given prompt).

dsr_•1h ago

That would be a quality metric, and right now they are focused on quantity metrics.

Terr_•8m ago

> the less users bother questioning it

This makes me think of checklists. We have decades of experience in innumerable fields showing that checklists improve outcomes: Is the chemical mixture at the temperature indicated by the chart? Did you get confirmation from Air Traffic Control? Are you about to amputate the correct limb? Is this really the file you want to permanently erase?

Yet our human brains are usually primed to skip steps, take shortcuts, and see what we expect rather than what's really there. It's surprisingly hard to keep doing the work both consistently and to notice deviations.

Now here we are with LLMs, with output that seems to strike us where we our squishy brains are weakest, our ability to do intentional review in a deep and sustained way.

boplicity•3m ago

> So I guess the key takeaway is basically that the better Claude gets at producing polished output, the less users bother questioning it.

This is exactly what I worry about when I use AI tools to generate code. Even if I check it, and it seems to work, it's easy to think, "oh, I'm done." However, I'll (often) later find obvious logical errors that make all of the code suspect. I don't bother, most of the time though.

I'm starting to group code in my head by code I've thoroughly thought about, and "suspect" code that, while it seems to work, is inherently not trustworthy.

mlpoknbji•1h ago

> But we know that any person who uses AI is likely to improve at what they do.

Do we?

co_king_5•1h ago

I would suggest that any person who uses AI will atrophy their compositional skills unless they specifically take care to preserve those skills.

Insanity•1h ago

Yah and this seems to be supported by preliminary evidence on the impact of AI on things like retention and cognitive ability.

rishabhaiover•1h ago

As a student, I constantly worry about this. But everyone in my class is producing output at a pace I can't compete with without AI assistance.

Avshalom•54m ago

what class are you in that "producing output at a [rapid] pace" is relevant to the grade?

rishabhaiover•45m ago

pick any cs class

Avshalom•43m ago

I have a minor in CS and no -producing the assignment by the deadline is important- grades are not based on quantity of code vs classmates.

rsynnott•39m ago

I mean, maybe things have changed (I finished college about 20 years ago), but I don't remember producing large volumes of stuff as being a particularly important part of a CS degree.

rishabhaiover•13m ago

Between a challenging job market, increasing new frontiers of learning (AI, MLops, parallel hardware) and an average mind like mine, a tool that increases throughput is likely to be adopted by masses, whether you like it or not and quality is not a concern for most, passing and getting an A is (most of my professors actively encourage to use LLMs for reports/code generation/presentations)

lawn•35m ago

That was never a worry in any of my CS classes.

theappsecguy•8m ago

Copying AI slop isn’t producing output! It’s also not conducive to learning

dsr_•1h ago

Not until large-N research is done without sponsorship, support, or veiled threats from AI companies.

At which point, if the evidence turns out to be negative, it will be considered invalid because no model less recent than November 2027 is worth using for anything. If the evidence turns out to be slightly positive, it will be hailed as the next educational paradigm shift and AI training will be part of unemployment settlements.

throwaw12•1h ago

Let me add a single data point.

> is likely to improve at what they do

personally, my skills are not improving.

professionally, my output is increased

mobattah•46m ago

My software development skillset has improved. I’m learning and stress testing new patterns that would have taken far longer pre-AI. I’m also working in new domains and tech stacks that would have taken me much longer to get up to speed on.

poszlem•38m ago

I would even say it's likely the opposite. My output as a programmer is now much higher than before, but I am losing my programming skills with each use of claude code.

selridge•6m ago

We DEEPLY do not.

That's not, IMO, a "skills go down" position. It's respecting that this is a bigger maybe than anyone in living memory has encountered.

kseniamorph•58m ago

I feel like the authors make a logical inconsistency. They present the drop in "identify missing context" behavior in artifact conversations as potentially concerning, like people are thinking less critically. But their own data suggests a simpler explanation: artifact conversations show higher rates of upfront specification (clarifying goals +14.7pp, specifying format +14.5pp, providing examples +13.4pp). It's obvious that when you provide more context upfront, you end up with less missing context later. I'd be more sceptical about such research.

The Age Verification Trap: Verifying age undermines everyone's data protection

Ladybird Browser adopts Rust

Show HN: PgDog – Scale Postgres without changing the app

'Viking' was a job description, not a matter of heredity: Ancient DNA study

Elsevier shuts down its finance journal citation cartel

Show HN: Sowbot – open-hardware agricultural robot (ROS2, RTK GPS)

A simple web we own

The Lighthouse: How extreme isolation transforms the body and mind

Anthropic Education the AI Fluency Index

The peculiar case of Japanese web design (2022)

Hadrius (YC W23) Is Hiring Designers Who Code

Sub-$200 Lidar could reshuffle auto sensor economics

Magical Mushroom – Europe's first industrial-scale mycelium packaging producer

Show HN: Shibuya – A High-Performance WAF in Rust with eBPF and ML Engine

Binance Fired Employees Who Found $1.7B in Crypto Was Sent to Iran

0 A.D. Release 28: Boiorix

Generalized Sequential Probability Ratio Test for Families of Hypotheses [pdf]

Benchmarks for concurrent hash map implementations in Go

Emulating Goto in Scheme with Continuations

femtolisp: A lightweight, robust, scheme-like Lisp implementation

ASML unveils EUV light source advance that could yield 50% more chips by 2030

What it means that Ubuntu is using Rust

SETI@home: Data Acquisition and Front-End Processing (2025)

Show HN: AI Timeline – 171 LLMs from Transformer (2017) to GPT-5.3 (2026)

A lithium-ion breakthrough that could boost range and lower costs

What Is a Centipawn Advantage?

My journey to the microwave alternate timeline

I built Timeframe, our family e-paper dashboard

Americans are destroying Flock surveillance cameras

Decided to fly to the US to buy some hard drives

The Age Verification Trap: Verifying age undermines everyone's data protection

Ladybird Browser adopts Rust

Show HN: PgDog – Scale Postgres without changing the app

'Viking' was a job description, not a matter of heredity: Ancient DNA study

Elsevier shuts down its finance journal citation cartel

Show HN: Sowbot – open-hardware agricultural robot (ROS2, RTK GPS)

A simple web we own

The Lighthouse: How extreme isolation transforms the body and mind

Anthropic Education the AI Fluency Index

The peculiar case of Japanese web design (2022)

Hadrius (YC W23) Is Hiring Designers Who Code

Sub-$200 Lidar could reshuffle auto sensor economics

Magical Mushroom – Europe's first industrial-scale mycelium packaging producer

Show HN: Shibuya – A High-Performance WAF in Rust with eBPF and ML Engine

Binance Fired Employees Who Found $1.7B in Crypto Was Sent to Iran

0 A.D. Release 28: Boiorix

Generalized Sequential Probability Ratio Test for Families of Hypotheses [pdf]

Benchmarks for concurrent hash map implementations in Go

Emulating Goto in Scheme with Continuations

femtolisp: A lightweight, robust, scheme-like Lisp implementation

ASML unveils EUV light source advance that could yield 50% more chips by 2030

What it means that Ubuntu is using Rust

SETI@home: Data Acquisition and Front-End Processing (2025)

Show HN: AI Timeline – 171 LLMs from Transformer (2017) to GPT-5.3 (2026)

A lithium-ion breakthrough that could boost range and lower costs

What Is a Centipawn Advantage?

My journey to the microwave alternate timeline

I built Timeframe, our family e-paper dashboard

Americans are destroying Flock surveillance cameras

Decided to fly to the US to buy some hard drives

Anthropic Education the AI Fluency Index

Comments