frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Tested OpenAI's prompt caching across models. Found undocumented behavior

5•harsharanga•2mo ago
Been building an AI agent from scratch to understand token economics. Spent a week on prompt caching. Found something interesting that isn't in OpenAI's docs. Setup: Network device monitoring chatbot, 10 tools, ~1,400 token prefix. Tested gpt-4o-mini, gpt-5-mini, gpt-5. Logged cached_tokens from every response.

Finding 1: Caching works as documented Once prefix exceeds 1024 tokens, OpenAI caches it automatically. I saw 80-90% cache hit rates after the first call. Cost reduction of 47-49% on input tokens. Cache discount is 50% for 4o-mini, 90% for gpt-5 family.

Finding 2: Tool schema tokenization is heavily compressed Added 4 tools to my existing 6. Expected +400-500 tokens based on JSON size. Actual increase: 56 tokens. OpenAI is clearly doing aggressive compression on function schemas.

Finding 3: Cache is shared across model generations (undocumented) This is the interesting part. Test: Call gpt-4o-mini first (cold start). Wait 5 seconds. Call gpt-5-mini with identical prefix. Result: gpt-5-mini got a cache hit on its first call. Tested all permutations. Every time, model 2 and 3 hit cache from model 1's warmup. The prefix-processing cache is shared across 4o-mini, 5-mini, and 5. I couldn't find this documented anywhere.

Why it matters: If you have many cold starts (separate user sessions, different contexts), you can warm cache with the cheapest model. Example - 1,000 cold starts/day, 10K token prefix, primary model gpt-5: Without cross-model warming: Each session pays 10K tokens at $1.25/1M = $0.0125 Daily: $12.50, Annual: $4,562 With nano warming first: 10K tokens at $0.05/1M = $0.0005 per warmup Daily: $0.50, Annual: $182 Savings: $4,380/year At gpt-5-pro pricing ($15/1M), difference is $54K+/year on warmup costs alone.

Technical note: This is prefix-processing cache sharing, not KV-cache sharing. Models share tokenization and prefix hashing, not attention states. But billing-wise, cached tokens are cached tokens.

Reproduction: Create 1024+ token prefix. Call model A, log cached_tokens. Call model B with same prefix. Check if B's first call shows cached tokens. Field is in response.usage.prompt_tokens_details.cached_tokens. Happy to share test scripts.

The AI Talent War Is for Plumbers and Electricians

https://www.wired.com/story/why-there-arent-enough-electricians-and-plumbers-to-build-ai-data-cen...
1•geox•1m ago•0 comments

Show HN: MimiClaw, OpenClaw(Clawdbot)on $5 Chips

https://github.com/memovai/mimiclaw
1•ssslvky1•1m ago•0 comments

I Maintain My Blog in the Age of Agents

https://www.jerpint.io/blog/2026-02-07-how-i-maintain-my-blog-in-the-age-of-agents/
1•jerpint•2m ago•0 comments

The Fall of the Nerds

https://www.noahpinion.blog/p/the-fall-of-the-nerds
1•otoolep•3m ago•0 comments

I'm 15 and built a free tool for reading Greek/Latin texts. Would love feedback

https://the-lexicon-project.netlify.app/
1•breadwithjam•6m ago•1 comments

How close is AI to taking my job?

https://epoch.ai/gradient-updates/how-close-is-ai-to-taking-my-job
1•cjbarber•7m ago•0 comments

You are the reason I am not reviewing this PR

https://github.com/NixOS/nixpkgs/pull/479442
2•midzer•8m ago•1 comments

Show HN: FamilyMemories.video – Turn static old photos into 5s AI videos

https://familymemories.video
1•tareq_•10m ago•0 comments

How Meta Made Linux a Planet-Scale Load Balancer

https://softwarefrontier.substack.com/p/how-meta-turned-the-linux-kernel
1•CortexFlow•10m ago•0 comments

A Turing Test for AI Coding

https://t-cadet.github.io/programming-wisdom/#2026-02-06-a-turing-test-for-ai-coding
2•phi-system•10m ago•0 comments

How to Identify and Eliminate Unused AWS Resources

https://medium.com/@vkelk/how-to-identify-and-eliminate-unused-aws-resources-b0e2040b4de8
2•vkelk•11m ago•0 comments

A2CDVI – HDMI output from from the Apple IIc's digital video output connector

https://github.com/MrTechGadget/A2C_DVI_SMD
2•mmoogle•12m ago•0 comments

CLI for Common Playwright Actions

https://github.com/microsoft/playwright-cli
3•saikatsg•13m ago•0 comments

Would you use an e-commerce platform that shares transaction fees with users?

https://moondala.one/
1•HamoodBahzar•14m ago•1 comments

Show HN: SafeClaw – a way to manage multiple Claude Code instances in containers

https://github.com/ykdojo/safeclaw
2•ykdojo•17m ago•0 comments

The Future of the Global Open-Source AI Ecosystem: From DeepSeek to AI+

https://huggingface.co/blog/huggingface/one-year-since-the-deepseek-moment-blog-3
3•gmays•18m ago•0 comments

The Evolution of the Interface

https://www.asktog.com/columns/038MacUITrends.html
2•dhruv3006•19m ago•1 comments

Azure: Virtual network routing appliance overview

https://learn.microsoft.com/en-us/azure/virtual-network/virtual-network-routing-appliance-overview
2•mariuz•20m ago•0 comments

Seedance2 – multi-shot AI video generation

https://www.genstory.app/story-template/seedance2-ai-story-generator
2•RyanMu•23m ago•1 comments

Πfs – The Data-Free Filesystem

https://github.com/philipl/pifs
2•ravenical•26m ago•0 comments

Go-busybox: A sandboxable port of busybox for AI agents

https://github.com/rcarmo/go-busybox
3•rcarmo•27m ago•0 comments

Quantization-Aware Distillation for NVFP4 Inference Accuracy Recovery [pdf]

https://research.nvidia.com/labs/nemotron/files/NVFP4-QAD-Report.pdf
2•gmays•28m ago•0 comments

xAI Merger Poses Bigger Threat to OpenAI, Anthropic

https://www.bloomberg.com/news/newsletters/2026-02-03/musk-s-xai-merger-poses-bigger-threat-to-op...
2•andsoitis•28m ago•0 comments

Atlas Airborne (Boston Dynamics and RAI Institute) [video]

https://www.youtube.com/watch?v=UNorxwlZlFk
2•lysace•29m ago•0 comments

Zen Tools

http://postmake.io/zen-list
2•Malfunction92•32m ago•0 comments

Is the Detachment in the Room? – Agents, Cruelty, and Empathy

https://hailey.at/posts/3mear2n7v3k2r
2•carnevalem•32m ago•1 comments

The purpose of Continuous Integration is to fail

https://blog.nix-ci.com/post/2026-02-05_the-purpose-of-ci-is-to-fail
1•zdw•34m ago•0 comments

Apfelstrudel: Live coding music environment with AI agent chat

https://github.com/rcarmo/apfelstrudel
2•rcarmo•35m ago•0 comments

What Is Stoicism?

https://stoacentral.com/guides/what-is-stoicism
3•0xmattf•36m ago•0 comments

What happens when a neighborhood is built around a farm

https://grist.org/cities/what-happens-when-a-neighborhood-is-built-around-a-farm/
1•Brajeshwar•36m ago•0 comments