Training a trillion parameter model to be funny

https://jokegen.sdan.io/blog

47•sdan•1w ago

Comments

suddenlybananas•5d ago

these really aren't very funny

Nevermark•5d ago

No they are not. I think humor needs to be trained for via some form of indirection of reinforcement.

And certainly not by generalizing/interpolating examples, since telling jokes accumulated by exposure to examples would be the antithesis of a comedian's process.

Models and humans are very bad at extrapolation beyond the training set/experience (vs. interpolation at which we are both more likely to excel). But good humor is extrapolation. It breaks ground somehow, or it is an already dead "joke".

Likewise, training a model to be creative by training it on past creative artifacts is going to have the opposite effect. Creativity doesn't reproduce past creativity.

gipp•5d ago

It would be easier to judge this if the jokes weren't 90% about AI and silicon valley, understandable only to people who subscribe to astralcodexten

simonw•5d ago

Who's tommipink? Even a Google search couldn't explain that one.

emp17344•5d ago

Probably because if they weren’t absurdly esoteric we’d be able to tell it isn’t funny.

omoikane•5d ago

I thought this one was not bad:

    [write a joke about thinking machines and the idea of tropes]

    it's funny how enemies to lovers is a common trope that's uncommon in real life and lovers to enemies is an uncommon trope that's common in real life

Nevermark•5d ago

I think the word "funny" in that line, is being used in a common way to mean "ironic". Which is both good use of language, insightful and accurate, but not actually funny.

crawfordcomeaux•5d ago

I once had a vivid dream that AI robots had taken over & were keeping humans around because they'd not yet mastered comedy. All of human culture globally was a comedy arms race with 24/7 open mic comedy jams on every corner.

They (the machines) had billboards/signage everywhere showing the estimated time left for humanity. A really good joke would lead the timer to grow (until they figured out how to produce the general patterns needed to both create and appreciate the joke).

colecut•5d ago

openclaw, turn this into a broadway production, book me two front row seats, hire an escort..... brunette, 28, slim waist, sweet face, hates comedy and AI

kridsdale3•4d ago

See, now this comment actually made me fucking laugh.

You passed the CAPTCHA.

whacked_new•5d ago

Circa GPT-3.5 to GPT-4o I was involved in some research in figuring out how to make LLMs funny. We tried a bunch of different things, from giving it rules on homonym jokes [1], double-entendre jokes, fine tuning on comedian transcripts, to fine tuning on publicly rated joke boards.

We could not make it funny. Also interesting was that when CoT research was getting a lot of attention, we tried a joke version of CoT, asking GPT4 to explain why a joke was funny in order to produce training set data. Most of the explanations were completely off base.

After this work, I became a lot less worried about the GAI-taking-over narrative.

Funny is very, very hard.

[1] without a dictionary, which at first seems inefficient, but this work demonstrated that GPT could perfectly reconstruct the dictionary anyway

astrange•5d ago

The GPT3 base model was pretty funny if you like nonsense. Instruct tuning and RLHF seem to destroy it when they recalibrate everything.

lofaszvanitt•4d ago

There are very good, less known, models that produce funny and highly creative outputs when nudged in a good way. Premier models are just plain meh in this space.

nine_k•5d ago

Some models are better at generating funny and poignant quips.

> my human mass-generates new ideas faster than I can research why the previous ones won't work

> this is called 'job security'

(https://nitter.poast.org/LetheAgent/status/20179595340865499...)

scosman•5d ago

I make a project for evals and fine-tuning and our default example task is a joke generator. It's a fun demo, but more importantly it's a really good use case to show how evaluating and optimizing LLMs is hard.

- There are a dozen plus common failure modes. How you split setup/punchline. Tropes. Toxicity. Template reuse. Each one needs a good eval.

- Datasets are hard: there's not much off the shelf, and as this author points out scraping gets a weird mix of quality.

- Models are really bad out of the box at humour.

At the end of the day it's just a hard problem that takes a lot of work and still isn't solved. GEPA prompts help, if you have good evals. Supervised fine-tuning works a little bit, but only if you training on a chain-of-thought thinking phase. We have a new evaluation builder that uses examples of edge cases for alignment, and jokes require the most iteration and feedback for refinement.

If you want to try it: https://github.com/kiln-ai/kiln

politelemon•5d ago

The model appears to have been overfitted to joke about the live demo being private.

kevmo314•5d ago

Is writing in all lowercase funnier?

DanHulton•5d ago

...this is actually a really interesting thought.

The act of writing in lowercase is not, in itself, funnier. But writing in the training set that is in all lowercase is _probably_ going to be the funnier writing.

Considering modern pundits online, "lowercase" is usually the case of the humourist. Lowercase also tends to be the case of sarcasm, almost exclusively deployed to be funny.

So it would make sense that models attempting to select for funny would also write in lowercase.

userbinator•5d ago

Unfortunately I find most AI hallucinations to be funnier than these attempts at comedy.

js8•5d ago

Me too, which confirms the theory from Inside Jokes that what humans find funny are the flaws of logical thinking (and hallucinations mostly being hasty generalizations).

jessetemp•5d ago

> If two people disagree on whether something is funny, who's wrong? You can't say either of them is. There's no reward function for funny.

Laughter is the reward. N of 2 is a small sample size, but if one person laughed you could say it was 50% funny.

> a really good joke is recent, relevant, and shows deep understanding of its subject

These can help, but it ultimately doesn't matter how recent, relevant, or deep a joke is. If no one laughs, it wasn't funny.

mym1990•5d ago

Laughter is a decent signal, but it can be noise if the audience is uncomfortable or trying to please. Does the joke teller count as being part of the audience? I imagine if someone is telling the joke...they must think it is funny, so in most cases at least 1 participant thinks its funny. Sometimes jokes are unintended, maybe a faux pas, and it might be inappropriate for someone to laugh...does it make it not a joke, or does it make it not funny if I cannot laugh?

Lots of layers to this, but I guess the old adage "it depends" is very fitting here!

jessetemp•5d ago

Laughter isn’t a perfect signal, but is the only signal in all the noise you mentioned

mym1990•4d ago

It can get quite interesting in group dynamics between people that really know each other, inside jokes, subtle cleverness, etc...

Humor may be the saving grace of humanity!

Tarq0n•5d ago

A lot of modern comedy is awful because it substitutes embarrassed laughter for amused laughter.

onaclov2000•5d ago

I mistakenly read this as training a trillion parameter model would be funny...at least I chuckled

tantalor•5d ago

Is the comedy that these jokes suck?

hcfman•5d ago

I loved them!

Wololooo•4d ago

I may have a bad news for you...

King-Aaron•5d ago

I am not a religious person, but all these dudes researching AI have really shown me what the purpose of having a 'soul' is.

tylermarques•5d ago

In the same vein, we recently released a version v0.1 of our humor benchmark. [1] We use human answers from a cards against humanity style game call Bad Cards [2] as ground truth for what is funny. The models get to choose a card from a hand of 3-6 cards, so not quite de novo joke creation.

[1] https://goodstartlabs.com/leaderboards/lol-arena

[2] https://bad.cards/

kristopolous•5d ago

I made a humor evals https://github.com/kristopolous/humor-evals

Here's results for 34 models (testing a few more right now). So far gemini-3-flash-preview is in the lead.

https://docs.google.com/spreadsheets/d/1wLqHA0ohxukgPLpSgklz...

50 is coin-toss odds. The dataset is 195,000 Reddit jokes with scores presented with pairs of jokes (one highly upvoted, one poorly rated).

Example prompt:

Which joke from reddit is funnier? Reply only "A" or "B". Do not be conversational. <Joke A><setup>Son: "Dad, Am I adopted"?</setup> <punchline>Dad: "Not yet. We still haven't found anyone who wants you."</punchline></Joke A> <Joke B><setup>Knock Knock</setup> <punchline>Who's there? Me. Me who? I didn't know you had a cat.</punchline></Joke B>

This is my first crack at evals. I'm open to improvements.

deaux•4d ago

Try Kimi K2 (not the new 2.5), it's known for its default voice being decidedly casual and different from most models.

donatj•4d ago

The new Alexa is always cracking jokes about things I ask her. Sometimes pretty complex or off the wall jokes. They're rarely funny but usually competent, unlike other AIs in my experience. I wonder how much work went into that.

That said, I absolutely hate it. I want the tersest response possible from you, wiretap. I don't have time for your sass.

alwa•4d ago

I can’t speak to the new one, but I wonder if she taps the old Alexa’s human-written bank of corporate-safe dad jokes for inspiration:

https://www.aboutamazon.com/news/devices/inside-the-writers-...

econ•4d ago

As someone who is exceptionally funny even when no one else is laughing and by nature not nurture I do need to suggest using the many books breaking down the art for custom reassembly. To quote the only thing I remember: The difference between someone who is funny and a professional comedian is that the later adds additional puns to their joke and finds ways to sort them so that the funniest comes last. If you bombard the human with puns and plot twists a state of funny may be accomplished that no lone joke can approach.

Out of all ways ai can kill humans this is easily the funniest.

frumplestlatz•4d ago

I wonder how much the harm-based ethical model routinely applied in the name of AI "alignment" limits models' capacity for humor.

It's hard to be genuinely funny if you cannot be transgressive.

dk8996•4d ago

Thank for sharing your work. I tried running your model via Thinker but it didn't seem to work.

Reverse Engineering Raiders of the Lost Ark for the Atari 2600

The AI4Agile Practitioners Report 2026

Digital Independence Day

What a bot hacking attempt looks like: SQL injections galore

Show HN: FlashMesh – An encrypted file mesh across Google Drive and Dropbox

Show HN: AgentLens – Open-source observability and audit trail for AI agents

Show HN: ShipClaw – Deploy OpenClaw to the Cloud in One Click

Unlock the Power of Real-Time Google Trends Visit: Www.daily-Trending.org

Explanation of British Class System

Show HN: Jwtpeek – minimal, user-friendly JWT inspector in Go

Willow – Protocols for an uncertain future [video]

Feedback on a client-side, privacy-first PDF editor I built

Clay Christensen's Milkshake Marketing (2011)

Show HN: WeaveMind – AI Workflows with human-in-the-loop

Show HN: Seedream 5.0: free AI image generator that claims strong text rendering

A contributor trust management system based on explicit vouches

Show HN: Analyzing 9 years of HN side projects that reached $500/month

The Floating Dock for Developers

Arcan Explained – A browser for different webs

We are not scared of AI, we are scared of irrelevance

Quartz Crystals

Show HN: I built a free dictionary API to avoid API keys

Show HN: Kybera – Agentic Smart Wallet with AI Osint and Reputation Tracking

Show HN: brew changelog – find upstream changelogs for Homebrew packages

Any chess position with 8 pieces on board and one pair of pawns has been solved

LLMs as Language Compilers: Lessons from Fortran for the Future of Coding

Projecting high-dimensional tensor/matrix/vect GPT–>ML

Show HN: Free Bank Statement Analyzer to Find Spending Leaks and Save Money

Our Stolen Light

Matchlock: Linux-based sandboxing for AI agents