https://g.co/gemini/share/e173d18d1d80
This is a random image from Twitter with no transcript or English translation provided, so it's not going to be in the training data.
The result from Gemini 3 Pro using the default media resolution (the medium one): "(Заголовок / Header): Арсеньев (Фамилия / Surname - likely "Arsenyev")
Состояние удовл-
t N, кожные
покровы чистые,
[л/у не увел.]
В зеве умерен. [умеренная]
гипер. [гиперемия]
В легких дыха-
ние жесткое, хрипов
нет. Тоны серд-
[ца] [ритм]ичные.
Живот мяг-
кий, б/б [безболезненный].
мочеисп. [мочеиспускание] своб. [свободное]
Ds: ОРЗ [или ОРВИ]" and with the translation: "Arsenyev
Condition satisfactory.
Temp normal, skin coverings [skin] are clean, lymph nodes not enlarged.
In the throat [pharynx], moderate hyperemia [redness].
In the lungs, breathing is rigid [hard], no rales [crackles/wheezing].
Heart tones are rhythmic.
Abdomen is soft, painless.
Urination is free [unhindered].
Diagnosis: ARD (Acute Respiratory Disease)."It's most likely "но кашель сохр-ся лающий" ("but barking cough is still present"), not "кожные покровы чистые" ("the skin is clean"). Diagnose is probably wrong too. Judging by symptoms it should be "ОРЗ", but I have no idea what's actually written there.
Still, it's very, very impressive.
I'm using TrOCR because it's a smaller model that I can fine tune on a consumer card, but the age of the model and resources certainly make it a challenge. The official notebook for fine tuning hasn't been updated in years and has several errors due to the march of progress in the primary packages.
This one works, you can check the versions https://pastebin.com/QPjGHN8j
Transkribus got a new model architecture around the corner and the results look impressive. Not only for trivial cases like text, but also for table structures and layouting.
Best of all, you can train it on your own corpus of text to support obscure languages and handwriting systems.
Really looking forward to it.
"The comparison between handwriting and typing reveals important differences in their neural and cognitive impacts. Handwriting activates a broader network of brain regions involved in motor, sensory, and cognitive processing, contributing to deeper learning, enhanced memory retention, and more effective engagement with written material. Typing, while more efficient and automated, engages fewer neural circuits, resulting in more passive cognitive engagement. These findings suggest that despite the advantages of typing in terms of speed and convenience, handwriting remains an important tool for learning and memory retention, particularly in educational contexts."
https://pmc.ncbi.nlm.nih.gov/articles/PMC11943480/
You are literally handicapping yourself by not thinking with pen and paper, or keeping paper notes.
The future is handwriting with painless digitization for searchability, until we invent a better input device for text that leverages our motor-memory facilities in the brain.
Which is exactly my experience with handwriting through my school years. When handwriting notes during lectures all focus goes to plotting down words, and it becomes impossible to actually focus on the meaning behind them.
Hopefully next generations will feel the same about legal contracts, law in general, and Java code bases. They're incomprehensible not because of fonts but because of unfathomable complexity.
Not a chance, sorry.
Ideally something that I can train with my own handwriting. I had a look at Tesseract, wondering if there’s anything better out there.
Historical handwriting, Gemini 3 is the only one which gave a decent result on a 19th century minutes from a town court in Northern Norway (Danish gothic handwriting with bleed through). I'm not 100% sure it's correct, but that's because it's so dang hard to read it to verify it. At least I see it gets many names, dates and locations right.
I've been waiting a long time for this.
Please share. I am out of the loop and my searches have not pointed me to the state of the art, which has seen major steps forward in the past 3 or 4 years but most of it seems to be closed or attached to larger AI products.
Is it even still called OCR?
Personally I found magistral-small-2509 to be overall most accurate, but it completely fails on some samples, while qwen3-vl-30b doesn't struggle at all with those same samples. So seems training data is really uneven depending on what exactly you're trying to OCR.
And the trade-off of course is that these are LLMs so not exactly lightweight nor fast on consumer hardware, but at least with the approach of using multiple you greatly increase the accuracy.
Am I nuts or is this wrong, not “perfect”?
It doesn’t look crossed out at all to me in the image, just some bleeding?
Still very impressive, of course
Whenever any progress is made, this is the logical conclusion. And yet, those who decide about how your time is being used, have an opposing view.
I actually think that will be the case. We're designing society for the technology, not the technology for the people in it. The human brain wasn't built to fit whatever gap is left by AI, regardless of how many words the technologists spew to claim otherwise.
For instance: AI already is undermining education by enabling mental laziness students (why learn the material when ChatGPT can do your homework for you). It seems the current argument is that AI will replace entry-level roles but leave space for experienced and skilled people (but block the path to get there). Some of the things LLMs do a mediocre but often acceptable job at are the things one needs to do to build and hone higher-level skills.
With GPS we have seen people confidently drive past road closed signs and around barriers off bridges.
With self-driving technology, we have seen them defeat safe guards so they can sit in the back while the car accelerates up to 70 in a subdivision.
So I'm not completely disagreeing with you, but I also am not too pessimistic, either. We will adapt, and benefit through the adoption of AI, even though some things will probably be lost, too.
“What doesn’t kill you, makes you stronger”. We will adapt and benefit, or we will not — time will tell.
I do not think the executive class is actually in on the power of AI to increase productivity, but rather to increase reliance.
IMO, cyber security, for example, will have to become a government mandate with real penalties for non-compliance (like seat belts in cars were mandated) in order to force organizations to slow down, and make sure systems are built carefully and as correctly as possible to protect data.
This is in conflict with the hurtling pace of garbage in/garbage out AI generated stuff we see today.
Things may well reach a point elsewhere in the world finding out that some software is for sale in the European Union is itself a marker of quality, and therefore justifies some premium.
Software providers are also likely to be specifying narrow “fit for purpose” statements and short (ish) support window. If costs go up too much, people will be using “inappropriate” and/or EOL stuff because the “right thing” is too expensive.
To be clear, this is a step in the right direction but is not the panacea.
When you sit down to think about it, what does it really even mean to do "more research"? What concrete phenomenon are you observing to decide what that is?
Across the journey from "subsistence agriculture", there have been countless approaches to nurturing innovation and discovery, but abstracting it into an abstract game measured by papers published and citations received is extremely novel and so far seems to correlate more with a waste and noise than it does discovery. Science and research is not in a healthy period these days, and the model that you describe, and seem to take for granted or may even be celebrating, plays a big role in why.
The UN estimates that around 500 million households or 2 billion people are still subsistence farmers. In 2025.
Fat lot of good competition has done them, especially when they don’t have enough surplus to participate in a market economy to begin with.
Exactly. Some people forget we live in a capitalist society, which does not prioritize or support the contentment of the masses. We exist to work for the owners or starve, they're not going to pay us to enjoy ourselves.
> ...those who decide about how your time is being used...
which stops individuals from:
> [spending] more time thinking, writing, playing piano, and taking walks — with other people.
Which it seems you would agree with. I don't see where they asserted whether this was a problem to address.
So no, no retirees or students or unemployed or disabled in that figure.
Because all the trends seem to indicate that to make a living people are working longer hours, holding multiple concurrent jobs (eg https://gameofjobs.org/are-americans-now-more-likely-than-ev...), and holding off retirement.
Now the service economy is turning into the sharing economy, I think the only thing we are sharing is the greater profits and they are taking the lions share.
I once visited a high school where they had a wall of signatures from every graduating senior going back to the 1920s or so. The "personality" evident in the signatures showed a steady decline, from very stylish in the oldest ones to mostly just poorly printed names in the 2020s.
Ah, maybe I'll pick up Qin seal when I retire, if I retire.
Yet, it occurs to me that that "guess and check" is exactly what I'm doing when trying to read my 6yo's writing. Often I will do a pass to detect the main sounds, but then I start thinking of what was current on his thoughts and see if I can make a match. Not surprisingly, often I do.
Then again, getting this result from a heavily-generalized SOTA model is pretty incredible too.
We almost solved OCR 20 years ago. Then we spent 20 years on the last percentage. We see the same in self-driving cars.
coolness•2mo ago
suddenlybananas•2mo ago
spwa4•2mo ago
They're very good at it.
timdiggerm•2mo ago
suddenlybananas•2mo ago
MrSkelter•2mo ago
vertnerd•2mo ago
embedding-shape•2mo ago
seidleroni•2mo ago
red75prime•2mo ago
seidleroni•2mo ago
akoboldfrying•2mo ago
red75prime•2mo ago
[0] https://ieeexplore.ieee.org/abstract/document/10832237
[1] https://arxiv.org/abs/2412.14737
[2] https://arxiv.org/abs/2509.25532
[3] https://arxiv.org/abs/2510.10913
criemen•2mo ago
SoftTalker•2mo ago
suddenlybananas•2mo ago
dmd•2mo ago
lccerina•2mo ago
embedding-shape•2mo ago
Same here, for diaries/journals written in mixed Swedish/English/Spanish and with absolutely terrible hand-writing.
I'd love for the day where the writing is on the wall for handwriting recognition, which is something I bet on when I started with my journals, but seems that day has yet to come. I'm eager to get there though so I can archive all of it!
GaggiX•2mo ago
butlike•2mo ago
pbronez•2mo ago
When does a character model become a language model?
If you're looking at block text with no connections between letter forms, each character mostly stands on its own. Except capital letters are much more likely at the beginning of a word or sentence than elsewhere, so you probably get a performance boost if you incorporate that.
Now we're considering two-character chunks. Cursive script connects the letterforms, and the connection changes based on both the source and target. We can definitely get a performance boost from looking at those.
Hmm you know these two-letter groupings aren't random. "ng" is much more likely if we just saw an "i". Maybe we need to take that into account.
Hmm actually whole words are related to each other! I can make a pretty good guess at what word that four-letter-wide smudge is if I can figure out the word before and after...
and now it's an LLM.