frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Tested OpenAI's prompt caching across models. Found undocumented behavior

3•harsharanga•1h ago
Been building an AI agent from scratch to understand token economics. Spent a week on prompt caching. Found something interesting that isn't in OpenAI's docs. Setup: Network device monitoring chatbot, 10 tools, ~1,400 token prefix. Tested gpt-4o-mini, gpt-5-mini, gpt-5. Logged cached_tokens from every response.

Finding 1: Caching works as documented Once prefix exceeds 1024 tokens, OpenAI caches it automatically. I saw 80-90% cache hit rates after the first call. Cost reduction of 47-49% on input tokens. Cache discount is 50% for 4o-mini, 90% for gpt-5 family.

Finding 2: Tool schema tokenization is heavily compressed Added 4 tools to my existing 6. Expected +400-500 tokens based on JSON size. Actual increase: 56 tokens. OpenAI is clearly doing aggressive compression on function schemas.

Finding 3: Cache is shared across model generations (undocumented) This is the interesting part. Test: Call gpt-4o-mini first (cold start). Wait 5 seconds. Call gpt-5-mini with identical prefix. Result: gpt-5-mini got a cache hit on its first call. Tested all permutations. Every time, model 2 and 3 hit cache from model 1's warmup. The prefix-processing cache is shared across 4o-mini, 5-mini, and 5. I couldn't find this documented anywhere.

Why it matters: If you have many cold starts (separate user sessions, different contexts), you can warm cache with the cheapest model. Example - 1,000 cold starts/day, 10K token prefix, primary model gpt-5: Without cross-model warming: Each session pays 10K tokens at $1.25/1M = $0.0125 Daily: $12.50, Annual: $4,562 With nano warming first: 10K tokens at $0.05/1M = $0.0005 per warmup Daily: $0.50, Annual: $182 Savings: $4,380/year At gpt-5-pro pricing ($15/1M), difference is $54K+/year on warmup costs alone.

Technical note: This is prefix-processing cache sharing, not KV-cache sharing. Models share tokenization and prefix hashing, not attention states. But billing-wise, cached tokens are cached tokens.

Reproduction: Create 1024+ token prefix. Call model A, log cached_tokens. Call model B with same prefix. Check if B's first call shows cached tokens. Field is in response.usage.prompt_tokens_details.cached_tokens. Happy to share test scripts.

Firm pioneers 3D printing copper coolers directly onto processors

https://www.tomshardware.com/3d-printing/firm-pioneers-3d-printing-copper-coolers-directly-onto-p...
1•Teever•1m ago•0 comments

Join the Parasite Rebellion on T-day

https://usop.substack.com/
1•richardatlarge•2m ago•0 comments

Ask HN: Why do people say LLMs create bad code "quality"?

2•chaidhat•3m ago•0 comments

Comparing Obelisk with DBOS

https://obeli.sk/blog/comparing-dbos-part-1/
1•todsacerdoti•5m ago•0 comments

The Context Tax: Why AI-Assisted Coding Fails Without Flow

https://arif.sh/book
1•Arifcodes•8m ago•0 comments

Training Foundation Models on a Full-Stack AMD Platform

https://arxiv.org/abs/2511.17127
2•srameshc•9m ago•0 comments

Age of "Don't do it yourself"

https://blog.rybarix.com/2025/11/26/age-of-dont-diy.html
3•sandruso•13m ago•1 comments

Anomalous electronic state opens pathway to room-temperature superconductivity

https://phys.org/news/2025-11-anomalous-electronic-state-pathway-room.html
1•rbanffy•13m ago•0 comments

Reminder that HN Active exists and is arguably better

https://news.ycombinator.com/active
3•loteck•14m ago•1 comments

What's Hiding Inside Haribo's Power Bank and Headphones?

https://www.lumafield.com/first-article/posts/whats-hiding-inside-haribos-power-bank-and-headphones
1•rozenmd•14m ago•0 comments

Show HN: MXP – A2A-compatible agent protocol, 37x faster than JSON

1•ferasawady•15m ago•0 comments

China completes first emergency mission to Tiangong space station

https://www.reuters.com/business/media-telecom/china-launch-shenzhou-22-spaceship-0411-gmt-state-...
1•Teever•16m ago•0 comments

France to bring in form of military service

https://www.bbc.co.uk/news/articles/c0edw7g7z79o
1•AIBytes•18m ago•0 comments

Z-Image, free online image generator

https://zimage.net
1•BruceWok•20m ago•0 comments

Cooldown Myths for Runners

https://therundownbytherunningeffect.substack.com/p/cooldowns-are-overrated
1•RalphHavensPT•22m ago•1 comments

Google says hackers stole data from 200 companies following Gainsight breach

https://techcrunch.com/2025/11/21/google-says-hackers-stole-data-from-200-companies-following-gai...
1•SilverElfin•22m ago•1 comments

Blender facial animation tool. What else should it do?

https://github.com/shun126/livelinkface_arkit_receiver/wiki
1•happy-game-dev•23m ago•0 comments

Walrus – distributed message streaming in Rust

4•janicerk•24m ago•0 comments

The Last Programming Language, and the End of (A Bit of) History

https://davegriffith.substack.com/p/the-last-programming-language-and
1•dxs•30m ago•0 comments

When Life Gets Too Easy

https://woodypearson.substack.com/p/when-life-gets-too-easy
1•heywoods•33m ago•0 comments

Show HN: Save Trippy – A Thanksgiving Game

https://www.savetrippy.com/
4•nezaj•33m ago•2 comments

Build Your Ideas with Gemini

https://app.new
1•tzury•33m ago•0 comments

Show HN: The Participatory Interface Theory

1•bobsh•35m ago•0 comments

Tesla CEO Elon Musk admits tough realization about FSD

https://www.thestreet.com/automotive/tesla-ceo-elon-musk-admits-tough-realization-about-fsd
2•gochuks•37m ago•0 comments

Show HN: A1 – Local Sandbox and JIT Compiler for AI Agents

https://github.com/stanford-mast/a1
1•calebhwin•38m ago•1 comments

Enterprise security can be messy: Building a Security-Aware Culture

2•rezliant•38m ago•2 comments

Math Skill for Claude Code

https://github.com/ananddtyagi/claude-code-marketplace/tree/main/plugins/math
1•ananddtyagi•41m ago•1 comments

The Input Stack on Linux: An End-to-End Architecture Overview

https://venam.net/blog/unix/2025/11/27/input_devices_linux.html
4•venamresm__•42m ago•0 comments

Israel proposes Kiryat Tivon for Nvidia's multibillion-$ tech campus in North

https://www.timesofisrael.com/israel-proposes-kiryat-tivon-for-nvidias-multibillion-dollar-tech-c...
3•thenaturalist•43m ago•1 comments

Asahi Investigation Results and Future Measures on Cyberattack Data Exposure

https://www.asahigroup-holdings.com/en/newsroom/detail/20251127-0204.html
1•ChrisArchitect•48m ago•0 comments