The Visual Word Form Area (VWFA) on the left side of the brain is where the visual representation of words is transformed to something more meaningful to the organism.
https://en.wikipedia.org/wiki/Visual_word_form_area
The DeepSeek-OCR encoding (rather than simple text encoding) appears analogous to what occurs in the VWFA.
This model may not only be more powerful than text-based LLMs but may open the curtain of ignorance that has stymied our understanding of how language works and ergo how we think, what intelligence is precisely, etc.
Kudos to the authors: Haoran Wei, Yaofeng Sun, and Yukun Li - you may have tripped over the Rosetta Stone of intelligence itself! Bravo!
bigyabai•2h ago
Why do people write posts like this without providing an example of extraordinary outputs? The theory matters 100x less to me than the benchmarking performance compared to SOTA.