frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

LLMs are getting better at character-level text manipulation

https://blog.burkert.me/posts/llm_evolution_character_manipulation/
42•curioussquirrel•6h ago

Comments

simonw•2h ago
If you take a look at the system prompt for Claude 3.7 Sonnet on this page you'll see: https://docs.claude.com/en/release-notes/system-prompts#clau...

> If Claude is asked to count words, letters, and characters, it thinks step by step before answering the person. It explicitly counts the words, letters, or characters by assigning a number to each. It only answers the person once it has performed this explicit counting step.

But... if you look at the system prompts on the same page for later models - Claude 4 and upwards - that text is gone.

Which suggests to me that Claude 4 was the first Anthropic model where they didn't feel the need to include that tip in the system prompt.

ivape•2h ago
Or they’d rather use that context window space for more useful instructions for a variety of other topics.
astrange•1h ago
Claude's system prompt is still incredibly long and probably hurting its performance.

https://github.com/asgeirtj/system_prompts_leaks/blob/main/A...

kristianp•1h ago
Does that mean they've managed to post train the thinking steps required to get these types of questions correct?
simonw•34m ago
That's my best guess, yeah.
malshe•1h ago
I play Quartiles in Apple News app daily (https://support.apple.com/guide/iphone/solve-quartiles-puzzl...). Occasionally when I get stuck, I use ChatGPT to find a word that uses four word fragments or tiles. It never worked before GPT 5. And with GPT 5 it works only with reasoning enabled. Even then, there is no guarantee it will find the correct word and may end up hallucinating badly.
hansonkd•1h ago
chatgpt5 still is pathetically bad at roman numerals. I asked it to find the longest roman numeral in a range. first guess was the highest number in the range despite being a short numeral. second guess after help was a longer numeral but outside the range. last guess was the correct longest numeral but it miscounted how many characters it contained.
necovek•43m ago
I think the base64 decoding is interesting: in a sense, model training set likely had lots of base64-encoded data (imagine MIME data in emails, JSON, HTML...), but for it to decode successfully, it had to learn decode sequences for every 4 base64 characters (which turn into 3 bytes). This could have been generated as a training set data easily, and I only wonder if each and every one was them was found enough times to end up in the weights?
viraptor•39m ago
Why bother testing though? I was hoping this topic has finally died recently, but no. Someone's still interested in testing LLMs for something they're explicitly not designed for and nobody is using them for this in practice. I really hope one day openai will just add a "when asked about character level changes, insights and encodings, generate and run a program to answer it" to their system so we can never hear about it again...
IncreasePosts•14m ago
Wouldn't a llm that just tokenized by character be good at it?
neerajsi•11m ago
https://www.anthropic.com/news/analysis-tool

Seems like they already built this capability.

DDoS Botnet Aisuru Blankets US ISPs in Record DDoS

https://krebsonsecurity.com/2025/10/ddos-botnet-aisuru-blankets-us-isps-in-record-ddos/
56•JumpCrisscross•2h ago•28 comments

NanoChat – The best ChatGPT that $100 can buy

https://github.com/karpathy/nanochat
848•huseyinkeles•10h ago•174 comments

Dutch government takes control of Chinese-owned chipmaker Nexperia

https://www.cnbc.com/2025/10/13/dutch-government-takes-control-of-chinese-owned-chipmaker-nexperi...
293•piskov•15h ago•229 comments

Sony PlayStation 2 fixing frenzy

https://retrohax.net/sony-playstation-2-fixing-frenzy/
41•ibobev•2h ago•14 comments

First device based on 'optical thermodynamics' can route light without switches

https://phys.org/news/2025-10-device-based-optical-thermodynamics-route.html
111•rbanffy•5d ago•16 comments

Show HN: SQLite Online – 11 years of solo development, 11K daily users

https://sqliteonline.com/
338•sqliteonline•12h ago•114 comments

Show HN: AI Toy I worked on is in stores

https://www.walmart.com/ip/SANTA-SMAGICAL-PHONE/16364964771
57•Sean-Der•1d ago•57 comments

Modern iOS Security Features – A Deep Dive into SPTM, TXM, and Exclaves

https://arxiv.org/abs/2510.09272
108•todsacerdoti•7h ago•2 comments

JIT: So you want to be faster than an interpreter on modern CPUs

https://www.pinaraf.info/2025/10/jit-so-you-want-to-be-faster-than-an-interpreter-on-modern-cpus/
83•pinaraf•1d ago•15 comments

Accidentally Made a Zig Dotenv Parser

https://dayvster.com/blog/accidentally-made-a-zig-dotenv-parser/
20•ibobev•5d ago•2 comments

No Science, No Startups: The Innovation Engine We're Switching Off

https://steveblank.com/2025/10/13/no-science-no-startups-the-unseen-engine-were-switching-off/
258•chmaynard•12h ago•241 comments

Strudel REPL – a music live coding environment living in the browser

https://strudel.cc
90•birdculture•7h ago•16 comments

LLMs are getting better at character-level text manipulation

https://blog.burkert.me/posts/llm_evolution_character_manipulation/
42•curioussquirrel•6h ago•11 comments

Smartphones and being present

https://herman.bearblog.dev/being-present/
190•articsputnik•11h ago•126 comments

Hackers can steal 2FA codes and private messages from Android phones

https://arstechnica.com/security/2025/10/no-fix-yet-for-attack-that-lets-hackers-pluck-2fa-codes-...
39•sipofwater•1h ago•26 comments

Thoughts on Omarchy: Slick distro, complicated ethics

https://tedium.co/2025/10/13/omarchy-linux-distro-commentary/
32•raybb•4h ago•34 comments

Why did containers happen?

https://buttondown.com/justincormack/archive/ignore-previous-directions-8-devopsdays/
54•todsacerdoti•14h ago•63 comments

StreamingVLM: Real-Time Understanding for Infinite Video Streams

https://arxiv.org/abs/2510.09608
6•badmonster•1h ago•0 comments

Abstraction, not syntax

https://ruudvanasseldonk.com/2025/abstraction-not-syntax
65•unripe_syntax•16h ago•31 comments

Root cause analysis? You're doing it wrong

https://entropicthoughts.com/root-cause-analysis-youre-doing-it-wrong
89•davedx•2d ago•64 comments

JSON River – Parse JSON incrementally as it streams in

https://github.com/rictic/jsonriver
155•rickcarlino•5d ago•74 comments

Scaling request logging with ClickHouse, Kafka, and Vector

https://www.geocod.io/code-and-coordinates/2025-10-02-from-millions-to-billions/
101•mjwhansen•5d ago•17 comments

Optery (YC W22) – Hiring Tech Lead with Node.js Experience (U.S. & Latin America)

https://www.optery.com/careers/
1•beyondd•8h ago

Uv overtakes pip in CI

https://wagtail.org/blog/uv-overtakes-pip-in-ci/
150•ThibWeb•1w ago•113 comments

America is getting an AI gold rush instead of a factory boom

https://www.washingtonpost.com/business/2025/10/13/manufacturing-artificial-intelligence/
82•voxleone•10h ago•113 comments

Roger Dean – His legendary artwork in gaming history (Psygnosis)

https://spillhistorie.no/2025/10/03/legends-of-the-games-industry-roger-dean/
91•thelok•11h ago•22 comments

Android's sideloading limits are its most anti-consumer move

https://www.makeuseof.com/androids-sideloading-limits-are-anti-consumer-move-yet/
582•josephcsible•10h ago•385 comments

Don't Be a Sucker (1943) [video]

https://www.youtube.com/watch?v=vGAqYNFQdZ4
282•surprisetalk•5h ago•84 comments

America's future could hinge on whether AI slightly disappoints

https://www.noahpinion.blog/p/americas-future-could-hinge-on-whether
58•jxmorris12•8h ago•54 comments

Software update bricks some Jeep 4xe hybrids over the weekend

https://arstechnica.com/cars/2025/10/software-update-bricks-some-jeep-4xe-hybrids-over-the-weekend/
318•gloxkiqcza•11h ago•217 comments