Knuth on ChatGPT (2023)

https://cs.stanford.edu/~knuth/chatGPT20.txt

87•b-man•2h ago

Comments

wslh•2h ago

It would be great to have an update from Knuth. There is no other Knuth.

vbezhenar•2h ago

For question 3, ChatGPT 5 Pro gave better answer:

> It isn’t “wrong.” Wolfram defines Binomial[n,m] at negative integers by a symmetric limiting rule that enforces Binomial[n,m] = Binomial[n,n−m]. With n = −1, m = −1 this forces Binomial[−1,−1] = Binomial[−1,0] = 1. The gamma-formula has poles at nonpositive integers, so values there depend on which limit you adopt. Wolfram chooses the symmetry-preserving limit; it breaks Pascal’s identity at a few points but keeps symmetry. If you want the convention that preserves Pascal’s rule and makes all cases with both arguments negative zero, use PascalBinomial[−1,−1] = 0. Wolfram added this explicitly to support that alternative definition.

Of course this particular question might have been in the training set.

Honestly 2.5 years feel like infinity when it comes to AI development. I'm using ChatGPT very regularly, and while it's far from perfect, recently it gave obviously wrong answers very rarely. Can't say anything about ChatGPT 5, I feel like in my conversations with AI, I've reached my limit, so I'd hardly notice AI getting smarter, because it's already smart enough for my questions.

seanhunter•1h ago

On Wolfram specifically, GPT-5 is a huge step up from GPT-4. One of the first things I asked it was to write me a mathematica program to test the basic properties (injectivity, surjectivity, bijectivity) of various functions. The notebook it produced was

1) 100% correct

2) Really useful (ie it includes various things I didn’t ask for but are really great like a little manipulator to walk through the function at various points and visualize what the mapping is doing)

3) Built in a general way so I can easily change the mapping to explore different types of functions and how they work.

It seems very clear (both from what they said in the launch demos etc and from my experience of trying it out) that performance on coding tasks has been an area of massive focus and the results are pretty clear to me.

hinkley•31m ago

Sooner or later, I end up with a coworker or two who takes it for granted that my code is usually very solid (a bit too solid according to some). I end up with a bug in production that was way too dumb not to catch it in review, and the reviewer confesses to not looking very closely at the code.

I think of that every time people talk about trusting generated code. Or the obfuscated code competition. It’s going to get you into the dumbest trouble some day.

croes•26m ago

I gave GPT 5 a web component to find errors and fix them and it totally messed up the the design.

It suggested two new attributes which it did not add after claiming that this had been done, and after this was done, the attributes were not used.

tra3•1h ago

Right, I’m still trying to wrap my mind around how gpts work.

If we keep retraining them on the currently available datasets then the questions that stumped ChatGPT3 are in the training set for chatgpt5.

I don’t have the background to understand the functional changes between ChatGPT 3 and 5. It can’t be just the training data can it?

godelski•51m ago

  > gave *obviously wrong* answers very rarely.

I don't think this is a reason I'd trust it, actually this is a reason I don't trust it.

There's a big difference between "obviously wrong" and "wrong". It is not objective but entirely depends on the reader/user.

The problem is it optimizes deception alongside accuracy. It's a useful tool but good design says we should want to make errors loud and apparent. That's because we want tools to complement us, to make us better. But if errors are subtle, nuanced, or just difficult to notice then there is actually a lot of danger to the tool (true for any tool).

I'm reminded of the Murray Gell-Mann Amnesia effect: you read something in the news paper that you're an expert in and lambast it for its inaccuracies, but then turn the page to something you don't have domain knowledge and trust it.

The reason I bring up MGA is because we don't often ask GPT things we know about or have deep knowledge in. But this is a good way to learn about how much we should trust it. Pretend to know nothing about a topic you are an expert in. Are its answers good enough? If not, then be careful when asking questions you can't verify.

Or, I guess... just ask it to solve "5.9 = x + 5.11"

hinkley•25m ago

Something a lot of people don’t get is that there are ways you can trust a consistently mediocre developer that you can’t trust a volatile but brilliant coworker. You almost always know how the formers’ work will be broken. The latter will blow up spectacularly on you at some terribly in opportune moment.

That’s not to say fire all your brilliant devs and hire mediocrity, but the reverse case is often made by loudmouths trying to fluff their own egos. Getting rid of the average devs is ignoring the vocational aspects of the job.

jlarocco•39m ago

> recently it gave obviously wrong answers very rarely

Are you concerned it may be giving you subtley wrong answers that you're not noticing? If you have to double check everything, is it really saving time?

hodgehog11•32m ago

P != NP in my experience. In many cases, it is indeed much faster to check a "proof" that ChatGPT spits out than to come up with one yourself.

The problem is that doing this enough will make you forget how to come up with proofs in the first place.

ayhanfuat•1h ago

Previous discussion: Don Knuth plays with ChatGPT - May 20, 2023, 626 comments, 927 points https://news.ycombinator.com/item?id=36012360

krackers•1h ago

I'll never get over the fact that the grad student didn't even bother to use gpt-4, so this was using gpt 3.5 or something.

bigyabai•1h ago

It's not the end of the world. Both are equally "impressive" at basic Q/A skills and GPT-4 is noticeably more sterile writing prose.

Even if GPT-3.5 was noticeably worse for any of these questions, it's honestly more interesting for someone's first experience to be with the exaggerated shortcomings of AI. The slightly-screwy answers are still endemic of what you see today, so it all ended well enough I think. Would've been a terribly boring exchange if Knuth's reply was just "looks great, thanks for asking ChatGPT" with no challenging commentary.

rvba•1h ago

[flagged]

gjvc•45m ago

not everyone sees every article the first time

d-lisp•29m ago

Exactly. But also I enjoy experiencing good things more than once and sometimes need that kind of help to be remembered about good things I may have forgotten.

dang•13m ago

Reposts are fine on HN as long as

(1) it has been a year or so since the article last had significant attention, and

(2) the post is genuinely interesting.

(the latter condition ought to apply to any HN submission of course)

https://news.ycombinator.com/newsfaq.html#reposts

TZubiri•42m ago

I was reading yesterday about a Buddhist concept (albeit quite popular in the west) called Begginer's Mind. I think this post represents it perfectly.

We are presented with a first reaction to chatgpt, we must never forget how incredible this technology is, and not become accustomed to it.

Donald knuth approached several of the questions from the absence of knowledge, asking questions as basic as "12. Write a sentence that contains only 5-letter words.", and being amazed not only by correct answers, but incorrect answers parsed effectively and with semantic understanding.

jlarocco•40m ago

It's sad that we've made the internet so disorganized and crammed with advertising and crap that we now need tools to find actual information and summarize it for us.

isoprophlex•31m ago

I've just watched our new lord and saviour, GPT-5 in agent mode, enter a death loop of frustration. I asked it to find an image online. Ostensibly an easy task...

First it spent three minutes getting fucked by cookie banners, then it ddossed Wikipedia by guessing article names, then it started searching for stock photo sites offering an API, then it hallucinated a python script to search stock photography vaguely related to what i wanted. This failed as well, so it called its image generator and finally served me some made up AI slop.

Ten minutes, kilowatts of GPU power, and jack shit in return. So not even the shiny new tools are up to the task.

ysofunny•29m ago

nobody made the internet in that way

the internet was made to happen. and that is what happened.

I would say I have seen 3 completely different internets. and I started keeping track in the late 90s after the dotcom boom made it truly global and everywhere

gosub100•18m ago

it could easily tip the other way. we could collectively decide to use sites that either only use web 1.0, possibility with a carefully-curated set of extensions. This could be enforced by, say, browser extensions that refuse to load "bad" pages, or load copies of sanitized pages (like how archive.ph does).

hodgehog11•40m ago

2023 was a crazy and exciting year for AI research. LLMs have come a long way, but clearly still have a long way to go. They should do much better on most of these questions.

The discussion at the end also reminded me of how a lot of us took Gary Marcus' prose more seriously at the time before many of his short-term predictions started failing spectacularly.

oli5679•30m ago

Here is gpt 5 thinking posting all 20 questions verbatim. Appreciate I might get better results one question at a time.

https://chatgpt.com/share/6897a21b-25c0-8011-a10a-85850870da...

Pretty interesting - some contamination, some better answers, and it failed to write a sentence with all 5-letter-words. I’d have expected it to pass this one!

Simple example: “Every night, dreams swirl swiftly.

simsla•20m ago

The problem is that ChatGPT doesn't really know letters, it writes in wordpieces (BPE), which may be one or more letters.

For example, something like "running" might get tokenizef like "runn"+"ing", being only two tokens for ChatGPT.

It'll learn to infer some of these things over the course of training, but limited.

Same reason it's not great at math.

omarious•10m ago

The Haj novel by Leon Uris, by the way, is a disturbing Zionist book that depicts Arabs as backward, violent, and incapable of progress without outside control, and that justifies the taking of Palestinian land by Jewish settlers.

hinkley•6m ago

> Silly jokes told with mirth bring mirthful grins.

Anyone have an idea how this happened? Supposed to be a sentence of only 5 letter words.

Show HN: The current sky at your approximate location, as a CSS gradient

Long-term exposure to outdoor air pollution linked to increased risk of dementia

Simon Willison's Lethal Trifecta Talk at the Bay Area AI Security Meetup

MCP's Disregard for 40 Years of RPC Best Practices

Debian 13 "Trixie"

A CT scanner reveals surprises inside the 386 processor's ceramic package

OpenFreeMap survived 100k requests per second

Quickshell – building blocks for your desktop

ChatGPT Agent – EU Launch

ESP32 Bus Pirate 0.5 – A Hardware Hacking Tool That Speaks Every Protocol

Stanford to continue legacy admissions and withdraw from Cal Grants

The current state of LLM-driven development

Testing Bitchat at the music festival

The mystery of Alice in Wonderland syndrome

Accessibility and the Agentic Web

Ratfactor's Illustrated Guide to Folding Fitted Sheets

Knuth on ChatGPT (2023)

Jan – Ollama alternative with local UI

Ch.at – a lightweight LLM chat service accessible through HTTP, SSH, DNS and API

End-User Programmable AI

Cordoomceps – replacing an Amiga's brain with Doom

Isle FPGA Computer: creating a simple, open, modern computer

I want everything local – Building my offline AI workspace

The dead need right to delete their data so they can't be AI-ified, lawyer says

Car has more than 1.2M km on it – and it's still going strong

Sandstorm- self-hostable web productivity suite

Mexico to US livestock trade halted due to screwworm spread

JD Vance's team had water level of Ohio river raised for family's boating trip

Physical Media Is Cool Again. Streaming Services Have Themselves to Blame

Residents cheer as Tucson rejects data center campus

Knuth on ChatGPT (2023)

Comments

Show HN: The current sky at your approximate location, as a CSS gradient

Long-term exposure to outdoor air pollution linked to increased risk of dementia

Simon Willison's Lethal Trifecta Talk at the Bay Area AI Security Meetup

MCP's Disregard for 40 Years of RPC Best Practices

Debian 13 "Trixie"

A CT scanner reveals surprises inside the 386 processor's ceramic package

OpenFreeMap survived 100k requests per second

Quickshell – building blocks for your desktop

ChatGPT Agent – EU Launch

ESP32 Bus Pirate 0.5 – A Hardware Hacking Tool That Speaks Every Protocol

Stanford to continue legacy admissions and withdraw from Cal Grants

The current state of LLM-driven development

Testing Bitchat at the music festival

The mystery of Alice in Wonderland syndrome

Accessibility and the Agentic Web

Ratfactor's Illustrated Guide to Folding Fitted Sheets

Knuth on ChatGPT (2023)

Jan – Ollama alternative with local UI

Ch.at – a lightweight LLM chat service accessible through HTTP, SSH, DNS and API

End-User Programmable AI

Cordoomceps – replacing an Amiga's brain with Doom

Isle FPGA Computer: creating a simple, open, modern computer

I want everything local – Building my offline AI workspace

The dead need right to delete their data so they can't be AI-ified, lawyer says

Car has more than 1.2M km on it – and it's still going strong

Sandstorm- self-hostable web productivity suite

Mexico to US livestock trade halted due to screwworm spread

JD Vance's team had water level of Ohio river raised for family's boating trip

Physical Media Is Cool Again. Streaming Services Have Themselves to Blame

Residents cheer as Tucson rejects data center campus