frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

60% Fable cost cut by converting code to images and having the model OCR it

https://github.com/teamchong/pxpipe
44•dimitropoulos•2h ago

Comments

dimitropoulos•2h ago
there's also a DeepSeek whitepaper on this technique https://www.seangoedecke.com/text-tokens-as-image-tokens
genxy•1h ago
This seems like a pricing hack that burns resources, that when the loophole gets closed the price of OCR will have to rise?
ricardobeat•1h ago
It’s not a loophole, it just happens that encoding information as optical tokens is much more efficient than text.
guardiangod•40m ago
Truly a picture is worth a thousand words.
TZubiri•32m ago
Of course it isn't

A text encoding uses 8bits per character on average, tokenization further compresses that

An image font would be 25 bits if 5x5, and most fonts are 12 pixels high

Of course it isn't efficient, this is a pricing inefficiency and a hack to exploit it (even the author describes it as an exploit)

legel•10m ago
You are wrong.

Text tokens are high-dimensional vectors, not 8 bits per character. Every token has a deep embedding, e.g. 1024 float values per text token.

DeepSeek-OCR proved 10x+ compression from visual embedding of text, which was a groundbreaking result. [1]

Very cool to see OP's project hacking on this principle. It's still not lossless, as noted in the github, but is a promising research direction.

[1] https://github.com/deepseek-ai/DeepSeek-OCR/blob/main/DeepSe...

geor9e•6m ago
If it's not intuitive by 8 bit characters compress better than 8x8 pixel squares, then step back and think about it another way - ask which scenario is more likely:

Some random person discovered a 60% across the board gain in all LLMs, using an extremely simple trick that none of the labs noticed in all these years of multi-trillion dollar growth

or

Anthropic's marketing team might not have priced images on par with text in their rush to drive growth via money losing offerings

samrus•48m ago
Not really. They arent actually using more resources this way either. This might be a fundamental inefficiency thats being removed

It kinda makes sense too. Because while people do read code word by word, we often "glance over" it and do roughly pattern recognition on it to know what it does. Only homing in on something when we need to answer a specific question. I think humans kinda naturally do this exploit anyway

aabhay•1h ago
Ahhh my eyes the vibe coded readme
mpalmer•34m ago
What, you don't like your caveats to be honest?
lpellis•48m ago
I tried the same thing last year (with openai models), back then it worked to reduce prompt tokens, but you needed way more completion tokens, ultimately more expensive (and slower) https://pagewatch.ai/blog/post/llm-text-as-image-tokens/
aabhay•34m ago
In Gemini at least, if you look at how they process PDFs, they do an OCR and then feed the text + image to the model, without charging you for the text tokens (I believe).

So my guess is that Claude’s backend is doing the same — so this hack is probably more of a loophole in token accounting that might get closed if Claude is doing what Gemini does

dippogriff•33m ago
I want to see more text-free foundation models
puppycodes•26m ago
That is hilarious and an amazing find.
__hugues•7m ago
seems really dumb and like it would need to violate basic information theory to work?

input tokens are cheaper than output tokens. seems like it would maybe reduce input tokens at the expense of many more output tokens if you're actually triggering OCR via thinking?

himata4113•5m ago
Related: https://blog.can.ac/2026/06/10/snapcompact/

Claude, please stop trying to memorize random crap

https://12gramsofcarbon.com/p/agentics-memorizing-session-transcripts
95•theahura•2h ago•72 comments

The Life and Times of Maxis, Part 1: SimEverything

https://www.filfre.net/2026/07/the-life-and-times-of-maxis-part-1-simeverything/
51•doppp•2h ago•0 comments

Half-Baked Product

https://weli.dev/blog/half-baked-product/
1004•weli•9h ago•298 comments

Jamesob's guide to running SOTA LLMs locally

https://github.com/jamesob/local-llm
81•livestyle•3h ago•33 comments

International chess federation sanctions Kramnik

https://www.fide.com/fide-ethics-disciplinary-commission-issues-a-decision-in-case-involving-gm-v...
25•DarkContinent•1h ago•10 comments

Factories Are Just Rooms

https://interconnected.org/home/2026/07/03/factories
67•arbesman•2h ago•23 comments

Hunting a 16-year-old SQLite WAL bug with TLA+

https://ubuntu.com/blog/hunting-a-16-year-old-sqlite-bug-with-tla-is-dqlite-affected
75•peterparker204•3d ago•2 comments

PostgreSQL and the OOM Killer: Why We Use Strict Memory Overcommit

https://www.ubicloud.com/blog/postgresql-and-the-oom-killer-why-we-use-strict-memory-overcommit
103•furkansahin•5h ago•32 comments

My Dad Helped Build North America's Oat Supply Chain: Can It Be Remade?

https://ambrook.com/offrange/perspective/how-we-lost-our-oats
39•surprisetalk•3d ago•4 comments

Valve open source the Steam Machine e-ink screen so you can make your own

https://www.gamingonlinux.com/2026/07/valve-open-source-the-steam-machine-e-ink-screen-so-you-can...
357•ahlCVA•5h ago•57 comments

The Fall and Rise of Screwworm

https://www.construction-physics.com/p/the-fall-and-rise-of-screwworm
75•crescit_eundo•5h ago•26 comments

Best Simple System for Now

https://dannorth.net/blog/best-simple-system-for-now/
39•daan-k•2h ago•7 comments

Wordgard: The new in-browser rich-text editor from the creator of ProseMirror

https://wordgard.net/
175•indy•9h ago•71 comments

America, 1926: What a Forgotten 100-Year-Old Report Says About Who We Are

https://www.derekthompson.org/p/america-1926-an-absurdly-deep-dive
78•momentmaker•2h ago•77 comments

Right to Local Intelligence

https://righttointelligence.org/
443•thoughtpeddler•18h ago•156 comments

Supersonic flight returning to US after half-century ban

https://www.forbes.com/sites/suzannerowankelleher/2026/06/30/faa-supersonic-flight-no-boom/
103•lobbly•2d ago•104 comments

CarPlay Is Additive

https://www.caseyliss.com/2026/7/2/carplay-is-additive-you-dolts
504•sprawl_•17h ago•645 comments

Give Smart People the Tools to Do Smart Things

https://superuserdone.com/posts/2026-07-03-give-smart-people-the-tools/
68•SuperUserDone•3h ago•51 comments

Anatomy of Persistent Memory's 3 Layers: Comparing ContextNest, Mem0 and Zep

https://promptowl.ai/resources/persistent-memory-ai-agents/
17•sparkystacey•3h ago•0 comments

Show HN: Mcpsnoop – Wireshark for MCP (transparent proxy and live TUI)

https://github.com/kerlenton/mcpsnoop
3•kerlenton•1h ago•1 comments

60% Fable cost cut by converting code to images and having the model OCR it

https://github.com/teamchong/pxpipe
46•dimitropoulos•2h ago•16 comments

US residents angry datacenters 'shoved down our throats' are recalling officials

https://www.theguardian.com/us-news/2026/jul/03/datacenter-recall-elections
45•beardyw•1h ago•20 comments

The Safari MCP server for web developers

https://webkit.org/blog/18136/introducing-the-safari-mcp-server-for-web-developers/
220•coloneltcb•16h ago•63 comments

How working with a blind client revealed invisible accessibility gaps

https://iinteractive.com/resources/blog/read-only
76•fortyseven•3d ago•59 comments

crustc: entirety of `rustc`, translated to C

https://github.com/FractalFir/crustc
360•Philpax•19h ago•81 comments

Commodore 64 Basic for PostgreSQL

https://thombrown.blogspot.com/2026/07/load-plcbmbasic81-commodore-64-basic.html
52•hans_castorp•8h ago•8 comments

Markets are competitive if and only if P != NP

https://arxiv.org/abs/2602.20415
179•kscarlet•2h ago•115 comments

Reality has a surprising amount of detail (2017)

https://johnsalvatier.org/blog/2017/reality-has-a-surprising-amount-of-detail
347•vinhnx•5d ago•131 comments

Quake in 13 Kilobytes (2021)

https://js13kgames.com/games/q1k3
125•mortenjorck•6d ago•18 comments

Program-as-Weights: A Programming Paradigm for Fuzzy Functions

https://arxiv.org/abs/2607.02512
28•simonpure•5h ago•4 comments