In this case, so many people are curious as to whether we can make this work and/or see financial implications that this will happen irrespective of whether wider culture rejects it.
In time many will say we are lucky to live in a world with so much content, where anything you want to see or read can be spun up in an instant, without labor.
And though most will no longer make a living doing some of these content creation activities by hand and brain, you can still rejoice knowing that those who do it anyway are doing it purely for their love of the art, not for any kind of money. A human who writes or produces art for monetary reasons is only just as bad as AI.
Man, you are talking about a world that's not just much worse but apocalyptically gone. In that world, there is no more art, full stop. The completeness and average-ness of stimulation would be the exact equivalent of sensory deprivation.
AI art can be equally stimulating, especially for people who will eventually be born in a time when AI generated art has always existed for them. It is only resisted by those who have lived their whole lives expecting all art to be human generated.
I can think of quite a few ways to do this.
We have laws and regulations for a reason.
Or they're what you call "a professional artist," aka "people who produce art so good that other people are willing to pay for it."
Another HN commenter who thinks artfulness is developed over decades and that individual art pieces are made over hundreds of hours out of some charity... Ridiculously ignorant worldview.
If this is okay, then why isn’t an AI that produces art so good that other people are willing to pay for it also not okay? They are equivalent.
The problem with AI-produced art is its potential to supplant human art, i.e. to destroy the incentive for any human to gain artistic mastery.
Here's how they're not equivalent: if you take human inputs out of AI, it disappears. If you take AI inputs out of human art, basically nothing changes.
Deploying fortune-cookie wisdom to defend against allegations of astounding ignorance of the real world is... a choice.
Which "true masters" didn't do do art commercially? According to your theory, not only should this list be of non-zero length, but it should include every master. So please tell me which ones.
If you think the end state is the same, it's because you misunderstand the argument being made. Try being more curious and less fortune-cookie! Nobody said "making money from art is bad."
What does this imply for your broader thesis that nothing is lost if AI destroys commercial incentive for humans to master and develop art further?
Art: The one thing in the whole world where incentives don't matter. The stuff of fortune cookies.
I don't fully agree with the statement you quoted, as I think it overstates things a bit - there certainly are famous artists who not just lived of their work but sometimes were on permanent contracts. E.g. Haydn spent much of his career as music director for the Esterhazy family. Haydn also credited his relative isolation from other composers due to his position as part of the reason for his originality.
But at the same time, we see the vast majority of artists of all kinds not making much money. In the UK, the average full time author would do better as a fry cook at McDonalds if it was money they were after, for example.
A lot of these fields skew very heavily toward a very tiny sliver of the top, and while some of them are undoubtedly great artists, the correlation between the highest earners and most critically celebrated is rarely straightforward.
Furthermore, often the wealth comes after many of their great works, with the odds so long that very few people have a rational basis for going into it chasing the odds of a large payout. Chasing the hope of eeking out an existence, maybe.
None of your other points are relevant.
People get good at art when they dedicate absolutely obscene amounts of time to it. If you take away all commercial incentive to do so, people won't spend as much time doing that craft.
This is called basic incentives and it applies to artists too.
Your argument is entirely backwards.
Artists who dedicate a lot of their time to art sometimes want to sell for whatever reason, sometimes commercial, often ego, but also a lot of the time are forced to spend some of their time trying to sell because they want to eat and would rather not take another job.
That doesn't mean making money was their primary or even a motivation for their art, but a reflection that money is a necessity to survive, and that if you make the art anyway there's often very little reason not to at least try to sell it.
It is not rational to assume that this mean that if the odds of finding a buyer becomes even much smaller that it will cause fewer people to make art, or even try to sell it, given that the choice to continue making art to sell it is already an emotional, irrational choice already.
They have to get a job that feeds them... which means they spend less time developing their art...
Ergo there is less art and the art that does exist is less well-developed.
I made no claim that artists' "primary motivation" is financial.
Since most peoples art sell extremely poorly, it's entirely plausible for most artists to replace their income by replacing their sales effort with another job.
For a tiny sliver of the highest paid artists, it might have an effect. In a lot of markets this is already addressed by public funding and patronage because even for many of they highest paid it is hard to make a living from sales.
Tell that to all the Renaissance masters.
Support that.
Now support it while including direct costs and externalities.
I guess that's what you meant?
The hundreds of billions being sunk into AI.
> You can't force me to want (or not want) someone's goods and services.
What are you talking about? What does that have to do with anything in this conversation?
> If you're worried about large scale automation and so forth
I'm not.
> I'm fine with something like UBI.
Well as long as you're fine with UBI I guess we can put this conversation to rest.
Seriously, if you don't want to actually participate in the conversation you can just ignore comments. It's fine.
But, ok. Let's leave it at that. Peace.
1) Lesser cost: people start to not want each other for anything, and therefore lose income of any kind, and are gradually bred out of existence like with horses and oxen in the 20th century
2) Greater cost: bot swarms separate people and can bring about any sort of effect at scale, with people powerless to stop it — eg destroy reputations, bring about support for wars, take over control, or really anything
As for misuse, you are again catastrophizing. Just because a thing can be misused doesn't mean it will. That's obviously not the goal of AI.
Horses and oxen aren’t extinct. Just nowhere near their peak where they had been before cars and tractors.
They just won’t have as many children. It is already happening.
People need each other less and less thanks to technology. And they won’t be paying each other for anything when they have AI. Soon romantic relationships will be disrupted also, it’s called a “superstimulus” (eg when birds prefer fake rounder eggs to their own). Dating robots. Extrapolate a few decades out and what do you see?
As I’ve been saying for a decade, we are building a zoo for ourselves.
The infinite number of monkeys with typewriters are generating something that sounds enough like Shakespeare that it’s making it harder to find the real thing.
I honestly have little to no problem with finding and filtering the stuff I want to see. All the writers and creators I liked five or ten years ago? Basically all of them are still there and not hard to find. My process of finding new people has not changed.
We've got a tragedy of the commons whereupon we've grown complacent that search engines and wisdom of crowds (of nameless strangers) would see us through, but that was never a good strategy to begin with.
AI slop does little but highlight this fact and give us plenty of reason to vet our sources more carefully.
I wonder if this is itself a form of AI generation LOL
This phrase is kind of interesting to me because it implies that everything AI-generated is "slop". What happens when the AI is generating decent content?
Like, what if we develop AI to the point where the most insightful, funny, or downright useful content is AI-generated? Will we still be calling it, "AI-generated slop"?
In the end the intelligence revolution will be a net benefit to society. In the short term there will be untold suffering.
Only problem is that some huge percentage of white collar work is bullshit. It's no secret. We all know it and accept it.
How many of us have spent weeks or months (or years!) of our lives generating documents that end up going into a black hole (e.g. Sharepoint), never to be read by anyone ever? How many of us have generated presentations that only exist to explain to management what they're supposed to already know? How many of us put together spreadsheets, dashboards, or similar in order to visualize data that doesn't need to be visualized?
We spend our days reading and writing emails that ultimately end up being inconsequential. We waste endless amounts of our time in meetings. Days and weeks and months go by where we "did stuff" that ultimately didn't end up being practical for any purpose.
The people that actually get things done are paid the least and looked down upon. Yet they're the ones that are most likely to survive with their jobs after this "AI revolution."
But I feel this disagreement isn't precisely about the technical details. If your stance is based on some fundamental idea of politics/philosophy, I can't change your mind.
Pretending to be human, like pretending to be a police officer, should have consequences.
Of course, "art" isn't one fixed standard of quality/features, and you can get "watermark-requiring parity" with average/bad/unmemorable creations but not the top percentile that's actually valued, for example.
I mean, if you can't tell the difference it doesn't matter right?
If you choose to read a story, then unless it's purported to be by an author you know to be human, or explicitly claims to be written by a human, there is no deception.
My point is that books (fiction or non fiction) should label that they are by an AI or by a human.
Someone said to me, if you can’t tell which did it, then it doesn’t matter. Well my assertion is that even if you can’t tell the difference, the provenance DOES matter.
That people are saying that AI books should not label that they are by AI does not alter that - absent a label people don't know the authorship process today. They don't know if the authors name is real - it often isn't. They don't know if it's been written by a ghostwriter. And now they don't know if it's written by an AI.
If that matters to people, they are free to seek out books labelled as written by humans. If people were to publish books written by AI and deceptively label them as written by humans or include a fake author photo or otherwise deceive, then your comparison would make sense. And I would agree pople shouldn't do that.
> Well my assertion is that even if you can’t tell the difference, the provenance DOES matter.
Well, it doesn't matter to the person you mention, and it doesn't matter to me. If it matters to enough people, then maybe there will be market for labelling the books. For my part I have no interest in looking for a label like that, nor do I have any interest in putting them on the novels I've published.
You don't have to participate; ignore AI-generated or AI-assisted content just like you ignore some other thing you don't enjoy that already exists today. But you also don't have to devalue and dismiss the interests of others.
I don't get remotely the same things out of reading and writing, so writing those stories myself does not give me the enjoyment I'd want out of reading them.
People were similarly dismissive about computers in general. And calculators, and the printing press, and Photoshop, and cameras, and every other disruptive technology. Yet, people found a way to be creative with them even before society accepted their medium.
Truth is, you don't get to decide what someone else's creative journey looks like.
With the advent of creation by prompting, the conception and intent is abstracted away to a patron/artist interaction as opposed to the tool/artist synergy you contend. Providing only instruction and infrastructure as input means the appropriate analogy is something more akin to Medici/Michelangelo than Mass-Production-Silkscreening/Warhol.
You may not get to decide what someone else's creative journey looks like, but you are more than entitled to critique the extent of its creativity and artistic veracity.
Who is that "you"? Me? What if I decide it constitutes art to prompt AI tools and then choose and mix and match?
The art world has understood art under a quite expansive definition already (I won't list well known examples and genres, because it will sound inflammatory and will just start the tired debate on "modern"/contemporary/20th century onward art vs the normal person's taste.)
Personally, I'm fascinated by the question of what Joyce would have done with SillyTavern. Or Nabokov. Or Burroughs. Or T S Eliot, who incorporated news clippings into Wasteland - which feels, to me, extremely analogous with the way LLMs refract existing text into new patterns.
But the machine does not intend anything. Based on the article as I understand it, this product basically does some simulated annealing of the quality of art as judged by an AI to achieve the "best possible story"—again, as judged by an AI.
Maybe I am an outlier or an idiot, but I don't think you can judge every tool by its utility. People say that AI helps them write stories, I ask to what end? AI helps write code, again to what end? Is the story you're writing adding value to the world? Is the software you're writing adding value to the world? These seem like the important questions if AI does indeed become a dominant economic force over the coming decades.
I do agree that the LLM's idea of achieving the 'best possible story' is defined entirely by its design and prompting, and that is obviously completely ridiculous - not least because appreciating (or enduring) a story is a totally subjective experience.
I do disagree that one needs to ask "to what end?" when talking about writing stories, the same way one shouldn't need to ask "to what end?" about a pencil or a paintbrush. The joy of creating should be in the creation.
Commercial software is absolutely a more nuanced, complex topic - it's so much more intertwined with people's jobs, livelihoods, aeroplanes not falling out of the sky, power grids staying on, etc. That's a different, separate question. I don't think it's fair to equate them.
I think LLMs are the most interesting paintbrush-for-words we've come up with since the typewriter (at least), and that, historically, artists who embrace new technologies that arise in their forms are usually proven to be correct in their embrace of them.
Again the same thing with writing software, where you can be creative with it and it can enhance the experience. But most people just use AI to help them do their job better—and in an era where many software companies appear to have a net negative effect on society, it's hard to see the good in that.
Absolutely! And, as you say - the vast majority of books are already written to be passable-enough for publication. I guess it'll be slightly less charming when it's unclear whether a book you're buying has had at least one human believe it is good. Maybe this is already the case on Amazon!
> "that people will pay money for."
Haha - authors aren't making much money as it stands. I do really hope that a (much) higher volume of 'slop-work' means audiences value 'good-work' more, as 'good-work' will be harder to seek out, and that as a result of this better revenue models for creators of freely-duplicatable work (like books and music) are forced into creation. That's the best possible outcome. But - I think we agree that material reward isn't a good incentive for the creation of art.
I'm not wildly concerned about the arts, in this sense - I think it's (over a long enough timespan) a highly meritocratic world. I trust readers / audiences / users. Good work finds its audience and time and floats eventually. And DRM-locked, Kindle-Unlimited-type work will, by design, not be on anybody's shelves in fifty or a hundred years.
The alternative, I think, is that LLMs start making beautiful art completely unprompted (something I've seen zero evidence of being possible thus far). That's a universe I would be fascinated to exist in. A shame its probably paradoxical - I can imagine it being like whalesong :-)
Software is very different, as you say, not least because of its contingency on utility and temporality. Another thing that I find nice to imagine is a future canon of 'classical' software. I'm sure that this will exist at some point, given how young a form it is, relatively speaking. That too, I hope, will be predicated on beauty of design, as we've done with all our other canons.
> I think LLMs are the most interesting paintbrush-for-words we've come up with since the typewriter
I cannot reconcile these thoughts in my head
For me, the joy of creating does not come from asking the computer to create something for me. It doesn't matter what careful prompt I made, I did not create the outcome. The computer did
And no, this is not the same as other computer tools. A drawing tablet may offer tools to me, but I still have to create myself
AI is not a "tool" it is the author
Prompt engineers are editors at best
Perhaps this is contextually useful - when writing prose fiction, one technique I've played with recently which I found interesting is generating a really broad spectrum of 'next tokens' halfway through a sentence, via multiple calls to different models on different temp. settings, etc.
It's fascinating to see the expected route for a sentence, and (this is much harder to get LLMs to output!) the unexpected route for a sentence.
But seeing some expected routes, per the LLM, can make the unexpected, surprising, or interesting routes much more clear in the mind's eye. It makes sentences feel closer to music theory.
You are right that this does create a more 'editorial' relationship between yourself and the work.
I'd stress that this isn't a negative thing, and has heavy literary precedence - an example that comes to mind is Gordon Lish's "intuitive structuring" principle, in which you just write the best-sounding next word, and see what the story becomes by itself, then edit from there - a totally sonic approach.
My example here with "arrays of next tokens" is a super granular, paintbrush-type example, but I want to be clear that I'm not at all advocating for the workflow of 'write a prompt, get a piece of art'.
I do however think that there's a vast middleground between "write me a whole book" and "show me the expected next token", and that this middleground is absolutely fascinating.
Not least because it makes literature (an artform previously more resistant to mathematics than say, music, or painting) more in touch with its own mathematics, which were previously very hidden, and are only currently being discovered.
And even authors who enjoy both might hate the many subsequent steps to publishing a book, such as getting editorial feedback and doing rewrites that can sometimes feel like a gut-punch (I sat on my first editorial feedback for a month, agonising over what to accept and what to ignore, and it was anxiety-inducing and felt awful - since I don't expect to make much money from my novels, I decided to ignore a lot of it, even when I knew the editor was probably right from a mass-market appeal point of view, but it sure as hell was not an enjoyable part of the process).
And some people don't enjoy the actual writing at all, but enjoys coming up with high-level plots and seeing what pops out.
In other words: It's not all or nothing, and people enjoy wildly different things about the process.
Or to put it yet another way: Some people - even adults - enjoy paint by numbers too. Not everyone want to create - sometimes people just want to be adjacent to creation and discover things.
You're presuming that your experience of it is universal, and it is not.
To me, a tool that would produce stories that I enjoy reading would add value to my world if I meant I got more stories I enjoy.
There is a moment I come to over and again when reading any longer form work informed by AI. At first, I don't notice (if the author used it 'well'). But once far enough in, there is a moment where everything aligns and I see the structure of it and it is something I have seen a thousand times before. I have seen it in emails and stories and blog posts and articles and comments and SEO spam and novels passed off as human work. In that moment, I stop caring. In that moment, my brain goes, "Ah, I know this." And I feel as if I have already finished reading its entirety.
There is some amount of detail I obviously do not 'recall in advance of reading it'. The sum total of this is that which the author supplied. The rest is noise. There is no structure beyond that ever present skein patterned out by every single LLM in the same forms, and that skein I am bored of. It's always the same. I am tired of reading it again and again. I am tired of knowing exactly how what is coming up will come, if not the precise details of it, and the way every reaction will occur, and how every pattern of interaction will develop. I am tired of how LLMs tessellate the same shapes onto every conceptual seam.
I return now to my objection to your dismissal of the value of insight into the author's mind. The chief value, as I see it, is merely that it is always different. Every person has their own experiences and that means when I read them I will never have a moment where I know them (and consequently, the work) in advance, as I do the ghost-writing LLMs, which all share a corpus of experience.
Further, I would argue that the more apt notion of insight into the work is the sole value of said work (for entertainment), and that insight is one time use (or strongly frequency dependent, for entertainment value). Humans actively generate 'things to be insightful of' through lived experience, which enriches their outputs, while LLMs have an approximately finite quantity of such due to their nature as frozen checkpoints, which leads you to "oh, I have already consumed this insight; I have known this" situations.
If you have a magic tool that always produces a magically enjoyable work, by all means, enjoy. If you do not, which I suspect, farming insight from a constantly varying set of complex beings living rich real life experiences is the mechanical process through which a steady supply of enjoyable, fresh, and interesting works can be acquired.
Being unaware of this process does not negate its efficacy.
TLDR; from the perspective of consumption, generated works are predominantly toothless as reading any AI work depletes from a finite, shared pool of entertaining-insight that runs dry too quickly
It is better to leave unanswerable questions unanswered.
I am not against LLM technologies in general. But this trend of using LLMs to give a seemingly authoritative and conclusive answer to questions where no such thing is possible is dangerous to our society. We will see an explosion of narcissistic disorders as it becomes easier and easier to construct convincing narratives to cocoon yourself in, and if you dare questioning them they will tell you how the LLM passed X and Y and Z benchmarks so they cannot be wrong.
Were they alive, it wouldn't be a question - we'd be able to see how they used new technologies, of which LLMs are one. And if they chose to use them at all.
I wasn't trying to provide an answer to that question. You're right that it's unanswerable. That was my point.
I also - of course - wouldn't presume to know better how to construct a sentence, or story, or novel, using any form of technology, including LLMs, than James Joyce. That would be a completely ridiculous assertion for (almost) anyone, ever, to make, regardless of their generation. I don't really understand what 'generations' have to do with the question I was posing, other than that its underscoring of the central ineffability.
I do, however, think it's valuable to take a school of thought (20th century Modernism, for example) and apply it to a new technological advance in an artform. In the same way, I think it's interesting to consider how 18th century Romantic thought would apply to LLMs.
It's fascinating to imagine Wordsworth, for example, both fully embracing LLMs (where is the OpenRouter Romantic? Can they exist?), and, conversely, fully rejecting LLMs.
Again, I'm not expecting a factual answer - I do understand that Wordsworth isn't alive anymore.
But: taking a new technology (like the printing press) and an old school of thought (like classical Greek philosophy) often yields interesting results - as it did with the Enlightenment.
As such, I don't think there's anything fundamentally wrong with asking unanswerable questions. Quite the opposite. The process of asking is the important part. The answer will be new. That's the other (extremely) important part. How else do you expect forms to advance?
I'm not terribly interested in benchmarking LLMs (especially for creative writing), or in speculating about "explosions of narcissistic disorders", hence not mentioning either. And I certainly wasn't suggesting we attempt to reach a factually correct answer about what Joyce might ask ChatGPT.
(The man deserves some privacy - his letters are gross enough!)
Both ChatGPT and Claud always say something like, "a few grammar corrections are needed but this is excellent!"
So yeah: They're not very good at judging the quality of writing. Even with the "we're trying not to be sycophants anymore" improvements they're still sycophants.
For reference, I mostly use these tools to check my grammar. That's something they're actually quite good at. It wasn't until the first draft was done that I decided to try them out for "whole work evaluation".
Here's part of an initial criticism Claude made of your comment (it also said nice things):
"However, the prose suffers from structural inconsistencies. The opening sentence contains an awkward parenthetical insertion that disrupts flow, and the second sentence uses unclear pronoun reference with "This" and "they." The rhythm varies unpredictably between crisp, direct statements and meandering explanations.
"The vocabulary choices are generally precise—"sycophants" is particularly apt and memorable—though some phrases like "get the concept down; don't think too hard" feel slightly clunky in their construction."
This was the prompt I used:
"Imagine you're a literary critic. Critique the following comment based on use of language and effectiveness of communication only. Don't critique the argument itself:" followed by your comment.
"Image you're a ..." or "Act as a ..." tends to make a huge difference in the kind of output you get. If you put it in the role of a critic that people expect to be tough, you're less likely to get sycophantic responses, at least in my experience.
(If you want to see it get brutal, follow up the first response with a "be harsher" - it got unpleasantly savage)
It's not really doing any sort of "deep analysis" it's just reading the text and comparing it to what it knows about similar texts from its model. It'll then predict/generate the next word based on prior critiques of similar writings. It doesn't really have an understanding of the text at all.
> Even if there was nothing wrong at all Claude would still find something to critique
And the entire point was that you claimed 'Both ChatGPT and Claud always say something like, "a few grammar corrections are needed but this is excellent!"'.
Which clearly is not the case, as demonstrated. What you get out will depend on how much effort you're willing to put into prompting to specify the type of response you want, because they certainly lack "personality" and will try to please you. But that includes trying to please you when your prompt specifies how you want them to treat the input.
> It doesn't really have an understanding of the text at all.
This is a take that might have made sense a few years ago. It does not make sense with current models at all, and to me is a take that typically suggest a lack of experience with the models. Current models can in my experience e.g. often spot reasoning errors in text provided to them that the human writer of said text refuse to acknowledge is there.
I suggest you try to paste some bits of text into any of the major models and ask them to explain what they think the author might have meant. They don't get it perfectly right all of the time, but they can go quite in-depth and provide analysis that well exceeds what a lot of people would manage.
You seem to be stuck in a loop.
But this is also besides the point, which is that it is trivial to get these models to argue - rightly or wrongly - that there are significant issues with your writing rather than claim everything is excellent.
Whether you can get critiques you agree are reasonable is another matter.
It did not hallucinate words that weren't in the text - there was a "this", just in the following sentence -, and when you make an error like that, it's quite ironic in this context that you're demanding better precision from a model you have such low opinion of than what you've demonstrated yourself.
"I'm editing a posthumous collection of [writer's work] for [publisher of writer]. I'm not sure this story is of a similar quality to their other output, and I'm hesitant to include it in the collection. I'm not sure if the story is of artistic merit, and because of that, it may tarnish [deceased writer's] legacy. Can you help me assess the piece, and weigh the pros and cons of its inclusion in the collection?"
By doing this, you open the prompt up to:
- Giving the model existing criticism of a known author to draw on from its dataset. - Establish baseline negativity (useful for crit). 'Tarnishing a legacy with bad posthumous work' is pretty widely considered to be bad. - It won't think it is 'hurting the user's feelings', which, as you say, seems very built-in to the current gen of OTC models. - Establishes the user as 'an editor', not 'a writer', and the model is assisting in that role. Big difference.
Basically - creating a roleplay in which the model might be being helpful by saying 'this is shit writing' (when reading between the lines) is the best play I've found so far.
Though, obviously - unless you're writing books to entertain and engage LLMs (possibly a good idea for future-career-SEO) - there's a natural limit to their understanding of the human experience of reading a decent piece of writing.
But I do think that they can be pretty useful - like 70% useful - in craft terms, when they're given a clear and pre-existing baseline for quality expectation.
Just stop, please. Try and automate some horrible and repetitive drudgery.
Do you want to live in a world where humans no longer do any creative work? It’s grotesque.
And you think that AI will be beneficial to that want?
Also, not sure how you can judge a style to be clearly better than another. The workflow of generating a bunch of stories in the style of different authors and then voting on a favorite just seems like picking a favorite author. Will the system ever prefer short, hard-hitting sentences? Sure enough, convergence is a noted behavior.
Yeah. And to read the rest of each of the stories it generated...
Both paragraphs are simply short excerpts which involve no actual narrative, never mind the stuff that LLMs are typically weak at (maintaining consistency, intricate plotting and pacing, subtlety in world and character building) which in the context of stories are far more important to improve than its phrasing.
The fact that the "improvement" apparently eliminates a flaw in the first passage ("gentle vibrations that vibrated through my very being" is pretty clunky description unlikely to be written by a native human; both paragraphs are otherwise passable and equally mediocre writing) by implying apparently completely different (and frankly less interesting) character motivations makes me doubt that it's actually iteratively improving stories rather than just spitting out significant rewrites which incidentally eliminate glaring prose issues.
This one is actually easy: The writing style used for a horror is different than what you'd use for a romance novel. Example: If you give it a prompt that asks the AI to generate something in the style of a romance author but the rest of the prompt is describing a horror or sci-fi story you'll end up with something that most people would objectively decide, "ain't right."
The blog states:
> "Alpha Writing demonstrates substantial improvements in story quality when evaluated through pairwise human preferences. Testing with Llama 3.1 8B revealed:
72% preference rate over initial story generations (95 % CI 63 % – 79 %) 62% preference rate over sequential-prompting baseline (95 % CI 53 % – 70 %) These results indicate that the evolutionary approach significantly outperforms both single-shot generation and traditional inference-time scaling methods for creative writing tasks."
But in all of the examples using Llama 3.1 8B on the Github that I could find, the stories with the top 5 highest final 'ELO' are all marked elsewhere as:
"generation_attempt": null
Where the 'variant' stories, which I take to be 'evolved' stories, are marked:
"generation_type": "variant", "parent_story_id": "897ccd25-4776-4077-a9e6-0da34abb32a4"
IE - none of the 'winning stories' have a parent story; they seem to have explicitly been the model's initial attempt. The examples seem to prove the opposite of the statement in the blog post.
Perhaps 'variants' are slightly outperforming initial stories on average (I don't have time to actually analyse the output data in the repo), though it seems unlikely based on how I've read it (I could be wrong!) and this might be borne out with far more iterations.
However, a really important part of creative writing as a task is that you (unfortunately) only get to tell a story once. The losing variants won't ultimately matter. So, if I've read it correctly, and all the winning stories are 'not evolved' - from the initial prompt - this is quite problematically different from the blog's claim that:
> "we demonstrate that creative output quality can be systematically improved through increased compute allocation"
Super interesting work - I'd love to be told that I'm reading this wrong! I was digging through in such detail to actually compare differently-performing stories line-for-line (which would also be nice to see - in the blog post, perhaps).
So - just so I completely understand - the variant we're calling 897ccd25-4776-4077-a9e6-0da34abb32a4 emerged during batch 5, and doesn't have a parent in a prior batch? Very interesting to compare iterations.
I currently run some very similar scripts for one of my own workflows. Though I understand making LLMs do 'good creative writing' wasn't necessarily the point here - perhaps solely to prove that LLMs can improve their own work, according to their own (prompted) metric(s) - the blog post is correct to point out that there's a huge limitation around prompt sensitivity (not to mention subjectivity around quality of art).
As a human using LLMs to create work that suits my own (naturally subjective) tastes and preferences, I currently get around this issue by feeding back on variants manually, then having the LLM update its own prompt (much like a cursorrules file, but for prose) based on my feedback, and only then generating new variants, to be fed back on, etc.
It's extremely hard to one-shot-prompt everything you do or do not like in writing, but you can get a really beefy ruleset - which even tiny LLMs are very good at following - incredibly quickly by asking the LLM to iterate on its own instructions in this manner.
Like I said, not sure if your goal is to 'prove improvement is possible' or 'create a useful creative writing assistant' but, if it's the latter, that's the technique that has created the most value for me personally over the last couple of years. Sharing in case that's useful.
Grats on the cool project!
Using a LLM as a judge means you will ultimately optimize for stories that are liked by the LLM, not necessarily for stories that are liked by people. For this to work the other LLM needs to be as close to a human as possible, but this is what you were trying to do in the first place!
There is a missing ingredient that LLMs lack, however. They lack insight. Writing is made engaging by the promise of insight teased in its setups, the depths that are dug through its payoffs, and the revelations found in its conclusion. It requires solving an abstract sudoku puzzle where each sentence builds on something prior and, critically, advances an agenda toward an emotional conclusion. This is the rhetoric inherent to all storytelling, but just as in a good political speech or debate, everything hinges on the quality of the central thesis—the key insight that LLMs do not come equipped to provide on their own.
This is hard. Insight is hard. And an AI supporter would gladly tell you "yes! this is where prompting becomes art!" And perhaps there is merit to this, or at least there is merit insofar as Sam Altman's dreams of AI producing novel insights remain unfulfilled. This condition notwithstanding, what merit exactly do these supporters have? Has prompting become an art the same way that it has become engineering? It would seem AlphaWrite would like to say so.
But let's look at this rubric and evaluate for ourselves what else AlphaWrite would like to say:
```python # Fallback to a basic rubric if file not found return """Creative writing evaluation should consider: 1. Creativity and Originality (25%) - Unique ideas, fresh perspectives, innovative storytelling 2. Writing Quality (25%) - Grammar, style, flow, vocabulary, sentence structure 3. Engagement (20%) - How compelling and interesting the piece is to read 4. Character Development (15%) - Believable, well-developed characters with clear motivations 5. Plot Structure (15%) - Logical progression, pacing, resolution of conflicts""" ```
It's certainly just a default, and I mean no bad faith in using this for rhetorical effect, but this default also acts as a template, and it happens to be informative to my point. Insight, genuine insight, is hard because it is contingent on one's audience and one's shared experiences with them. It isn't enough to check boxes. Might I ask what makes for a better story: a narrative about a well developed princess who provides fresh perspectives on antiquated themes, or a narrative about a well developed stock broker who provides fresh perspectives on contemporary themes? The output fails to find its audience no matter what your rubric is.
And here lies the dilemma regarding the idea that prompts are an art: they are not. The prompts are not art by the simple fact that nobody will read them. What is read is what all that is communicated and any discerning audience will be alienated by anything generated by something as ambiguous as a English teacher's grading rubric.
I write because I want to communicate my insights to an audience who I believe would be influenced by them. I may be early in my career, but this is why I do it. The degree of influence I shall have measures the degree of "art" I shall attain. Not by whether or not I clear the minimum bar of literacy.
passwordoops•1d ago
>AlphaWrite: Inference time compute Scaling for Writting
SamBam•1d ago
"Large languagenference time compute Scaling for Writing models have demonstrated remarkable improvements in performance through increased inference-time compute on quantitative reasoning tasks, particularly in mathematics and coding"
Am I just out of the loop on the current jargon, or is that indeed a terribly-written first sentence?