The Em Dash ( — ) has been in use for centuries by some of the most renowned writers — from journalists to classic literature to modern Sci-Fi authors. Not a good way to identify AI output.
I would say that in the realms of stylometry, unless it's visible in a corpus from an author before AI emerged, -- it would be very plausible to say for that author, the use of em-dash was indicative of having copy-pasted from the generation of AI systems which used it as a marker.
After this is all history, it won't be a good determinant.
If enough people begin using em dashes in some coherent manner, again in the realms of stylometry, it will be a statistical statement if their pattern matches the ones humans use, specifically this human, or the ones AI use, specifically these AI.
These kinds of things (it turns out) can be quite hard to fake. It's like when people try to act as a random number generator, the patterns they emit simply don't act like the RNG. I doubt people using em dash will match how robots do viewed at a distance, in volume.
I feel confident that if Jane Austen or Samuel Johnson use em dashes, (or their printer at least) we can state with some confidence, it's not an AI. IF however comparison of the ASCII from Gutenberg, and an apparent "AI cleaned up" copy of their work diverge, what would we make of things like this? Is it material change? For "Tristram Shandy" it would be: Sterne deliberately told his printer to put some things into the text, such as a black page, and some other printer marks. If they get elided out, or magnified up, thats dicking with the text.
How about "A Humument" ?
ofalkaed•1h ago
From what I have seen, AI's use of the em-dash always falls into the technically correct "semicolon like" usage that is ambiguous and confusing when semicolons are also used. Most people who use em-dashes this way do it to "fix" their serial comma use and the difference is generally easy to spot and they often forget to remove the comma they meant to replace as ggm did;
> before AI emerged, -- it would be very plausible
Not criticizing ggm, posting on the internet is closer to spoken language than written with punctuation being more about replacing the lost verbal cues than structuring a carefully composed thought.
amcclure•2h ago
More from The Lunduke Journal: https://lunduke.com/
ggm•2h ago
After this is all history, it won't be a good determinant.
If enough people begin using em dashes in some coherent manner, again in the realms of stylometry, it will be a statistical statement if their pattern matches the ones humans use, specifically this human, or the ones AI use, specifically these AI.
These kinds of things (it turns out) can be quite hard to fake. It's like when people try to act as a random number generator, the patterns they emit simply don't act like the RNG. I doubt people using em dash will match how robots do viewed at a distance, in volume.
I feel confident that if Jane Austen or Samuel Johnson use em dashes, (or their printer at least) we can state with some confidence, it's not an AI. IF however comparison of the ASCII from Gutenberg, and an apparent "AI cleaned up" copy of their work diverge, what would we make of things like this? Is it material change? For "Tristram Shandy" it would be: Sterne deliberately told his printer to put some things into the text, such as a black page, and some other printer marks. If they get elided out, or magnified up, thats dicking with the text.
How about "A Humument" ?