frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

All Souls exam questions and the limits of machine reasoning

https://resobscura.substack.com/p/all-souls-exam-questions-and-the
30•benbreen•1d ago

Comments

SamBam•2h ago
I think the implication is that to be interesting you need to write from an individual's standpoint. That's why fiction written by LLMs sounds so boring (at least right now): because you can't amalgamate all the text in the world and not sound like an average.

> ‘Oh, do let me go on,’ said Wilde, ‘I want to see how it ends.’

Pretty great line.

wjnc•1h ago
People are average on average. OP is measuring LLM succes based on a super human test which most of us would likely fail. Creativity is just longer context and opinionated prompting. (For discussion purposes. I’m on 70% true.) Average Joe LLM and me are having a great time.
hydrogen7800•1h ago
Not really on the topic of the FA, but I've heard a few times about the All Souls Exams and seen some sample essay prompts, and I would love to read some real essays written by test takers. Any pointers?
decimalenough•1h ago
They're written in pencil and not returned, so nobody (except All Souls staff) has access to them.
andyjohnson0•1h ago
> The ultimate example may be All Souls College, which has a ritual, the Mallard Song, that occurs once a century.

You can't walk for more than five minutes in the UK without tripping over some nonsense like this. History is very important, and traditon has its place, but really? As a brit I find it all kind of tediously performative sometimes.

xg15•1h ago
Not a Brit, but Terry Pratchett's ritual of the Other Jacket told me all I need to know.

https://community.pearljam.com/discussion/71416/tradition-go...

andyjohnson0•35m ago
> Here is an example of how mindless adherence to tradition can get a bit weird and very funny

See also; the King's Remembrancer and the Quit Rent Ceremony and the Trial of the Pyx:

https://en.m.wikipedia.org/wiki/King%27s_Remembrancer

It is truly strange how my country can create a political and cultural operating system that allows this stuff to just go on and on for almost 800 years, right up to now.

xg15•30m ago
> The King's Remembrancer swears in a jury of 26 Goldsmiths who then count, weigh and otherwise measure a sample of 88,000 gold coins produced by the Royal Mint.

I mean, you have to admire the stamina for that.

_hark•1h ago
I sat the All Souls exam, taking the philosophy specialist papers, though I'm a math/physics/ML guy. It was a lot of fun, I really appreciate that there's somewhere in the world where these kinds of questions are asked in a formal setting. My questions/answers are written up in brief here [1]

[1] https://www.reddit.com/r/oxforduni/comments/q0giir/my_all_so...

* Oops, they link to my post at the bottom. Sorry for the redundancy.

lordnacho•1h ago
I went to see the last Mallard Song. Just to say I went, of course. It looked like a bunch of weirdos in a courtyard to me, but it was a literally once-in-a-century event, and I was living less than a minute away, so why not?

I don't think I've ever heard of a scheduled ritual that has a longer period. You're guaranteed to never have anyone present at more than one of these, so surely many aspects of the ritual will wander quite far from the original?

As for LLMs on the All Souls test, it's predictable that it mostly whiffs. After all it takes in a diet of Reddit+Wikipedia+etc, none of which is the kind of writing they are looking for.

Reddit is a lot of crappy comments. If you have no grounding in reality (being a thing that lives in a datacentre), how are you going to curate it? Some subs are really quite good, but most are really quite bad. It's not easy to get guidance, of the kind you would get if you sat with a professor for three or four hours a week for a few years, which is what the humanities students actually do.

Wikipedia is a great reference work, but it tends to not have any of the kinds of connections you're supposed to make in these essays. It has a lot of factual stuff, so questions about Persia will look ok, like in the article. But questions that glue together ideas across areas? Nah. Even if that's in the dataset somewhere, how is the LLM supposed to know that the sort of refined writing of a cross-subject academic is the highest level of the humanities? It doesn't, so it spits out what the average Redditor might glue together from a bit of googling.

dash2•26m ago
OK, interesting hypothesis. So, I wondered how it would do with "Why should cultural historians care about ice cores?" which indeed requires gluing together ideas across areas. I asked ChatGPT 5 on Thinking mode:

https://chatgpt.com/share/689e5361-fad8-8010-b203-f4f80d1457...

It does a pretty good job summarizing an abstruse, but known, subfield of frontier research. (So, perhaps not doing its own "gluing" of areas....) It clearly lacks "depth", in the sense of deep thinking about the why and how of this. (Many cultural historians might have reasons for deep scepticism of invasion by a bunch of quantitative data nerds, I suspect, and might be able to articulate why quite well.) It's bullet points, not an essay. I tried asking it for a 1000 word essay specifically and got:

https://chatgpt.com/share/689e5545-0688-8010-8bdf-632d3c3466...

which seems only superficially different - an essay in form, but secretly a bunch of bullet points.

For a comparison, here's a Guardian article that came up when I googled for "cultural historians ice cores":

https://www.theguardian.com/science/2024/feb/20/solar-storms...

It seems to do a good job at explaining why they should, though not in a deep essayistic style.

autelius•46m ago
Past exams: https://www.asc.ox.ac.uk/past-examination-papers
munchler•17m ago
A few years ago, the Turing Test was universally seen as sufficient for identifying intelligence. Now we’re scouring the planet for obscure tests to make us feel superior again. One can argue that the Turing Test was not actually adequate for this purpose, but we should at least admit how far we have shifted the goalposts since as a result.

Gemma 3 270M: Compact model for hyper-efficient AI

https://developers.googleblog.com/en/introducing-gemma-3-270m/
434•meetpateltech•5h ago•180 comments

We Rewrote the Ghostty GTK Application

https://mitchellh.com/writing/ghostty-gtk-rewrite
33•tosh•37m ago•1 comments

Streaming services are driving viewers back to piracy

https://www.theguardian.com/film/2025/aug/14/cant-pay-wont-pay-impoverished-streaming-services-are-driving-viewers-back-to-piracy
192•nemoniac•5h ago•170 comments

Steve Wozniak: Life to me was never about accomplishment, but about happiness

https://yro.slashdot.org/comments.pl?sid=23765914&cid=65583466
334•MilnerRoute•3h ago•227 comments

Org-social is a decentralized social network that runs on Org Mode

https://github.com/tanrax/org-social
79•tanrax•1d ago•36 comments

I made a real-time C/C++/Rust build visualizer

https://danielchasehooper.com/posts/syscall-build-snooping/
133•dhooper•5h ago•43 comments

New protein therapy shows promise as antidote for carbon monoxide poisoning

https://www.medschool.umaryland.edu/news/2025/new-protein-therapy-shows-promise-as-first-ever-antidote-for-carbon-monoxide-poisoning.html
196•breve•10h ago•47 comments

OneSignal (YC S11) Is Hiring Engineers

https://onesignal.com/careers
1•gdeglin•55m ago

Show HN: OWhisper – Ollama for realtime speech-to-text

https://docs.hyprnote.com/owhisper/what-is-this
63•yujonglee•6h ago•25 comments

What's the strongest AI model you can train on a laptop in five minutes?

https://www.seangoedecke.com/model-on-a-mbp/
459•ingve•2d ago•167 comments

Airbrush art of the 80s was Chrome-tastic (2015)

https://www.coolandcollected.com/airbrush-art-of-the-80s-was-chrome-tastic/
22•Michelangelo11•2h ago•3 comments

Architecting large software projects [video]

https://www.youtube.com/watch?v=sSpULGNHyoI
59•jackdoe•2d ago•26 comments

Show HN: I built a free alternative to Adobe Acrobat PDF viewer

https://github.com/embedpdf/embed-pdf-viewer
120•bobsingor•6h ago•31 comments

All Souls exam questions and the limits of machine reasoning

https://resobscura.substack.com/p/all-souls-exam-questions-and-the
32•benbreen•1d ago•13 comments

Blood oxygen monitoring returning to Apple Watch in the US

https://www.apple.com/newsroom/2025/08/an-update-on-blood-oxygen-for-apple-watch-in-the-us/
292•thm•8h ago•216 comments

Launch HN: Cyberdesk (YC S25) – Automate Windows legacy desktop apps

45•mahmoud-almadi•6h ago•31 comments

1976 Soviet edition of 'The Hobbit' (2015)

https://mashable.com/archive/soviet-hobbit
227•us-merul•3d ago•75 comments

Homekit-steam-user-switcher: A way to remotely switch Steam users using HomeKit

https://github.com/rcarmo/homekit-steam-user-switcher
10•rcarmo•3d ago•0 comments

Bluesky: Updated Terms and Policies

https://bsky.social/about/blog/08-14-2025-updated-terms-and-policies
66•mschuster91•5h ago•84 comments

Show HN: MCP Security Suite

https://github.com/NineSunsInc/mighty-security
10•jodoking•1h ago•8 comments

Reverse Proxy Deep Dive: Why Load Balancing at Scale Is Hard

https://startwithawhy.com/reverseproxy/2025/08/08/ReverseProxy-Deep-Dive-Part4.html
26•miggy•3d ago•2 comments

"Privacy preserving age verification" is bullshit

https://pluralistic.net/2025/08/14/bellovin/
169•Refreeze5224•4h ago•111 comments

Lambdas, Nested Functions, and Blocks

https://thephd.dev/lambdas-nested-functions-block-expressions-oh-my
4•zaikunzhang•2d ago•0 comments

What does Palantir actually do?

https://www.wired.com/story/palantir-what-the-company-does/
139•mudil•22h ago•112 comments

DINOv3

https://github.com/facebookresearch/dinov3
12•reqo•1h ago•5 comments

How to rig elections [video]

https://media.ccc.de/v/why2025-218-how-to-rig-elections
113•todsacerdoti•9h ago•94 comments

Nyxt: The Emacs-like web browser

https://lwn.net/Articles/1001773/
111•signa11•3d ago•25 comments

500 days of math

https://gmays.com/500-days-of-math/
138•gmays•2d ago•80 comments

Show HN: Modelence – Supabase for MongoDB

https://github.com/modelence/modelence
24•artahian•5h ago•8 comments

iPhone DevOps (2023)

https://clearsky.dev/blog/iphone-devops-ssh/
126•ustad•13h ago•121 comments