frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

HTTP Caching, a Refresher

https://danburzo.ro/http-caching-refresher/
180•danburzo•1mo ago

Comments

baggy_trough•1mo ago
A lot of this seems irrelevant these days with https everywhere.
nerdbaggy•1mo ago
These are still used in CDN and internal browser caching
jarofgreen•1mo ago
Some of it is different, but the basics are still the same and still relevant. Just today I've been working with some of this.

I took a Django app that's behind an Apache server and added cache-control and vary headers using Django view decorators, and added Header directives to some static files that Apache was serving. This had 2 effects:

* Meant I could add mod_cache to the Apache server and have common pages cached and served directly from Apache instead of going back to Django. Load testing with vegeta ( https://github.com/tsenart/vegeta ) shows the server can now handle multiples more simultaneous traffic than it could before.

* Meant users browsers now cache all the CSS/JS. As users move between HTML pages, there is now often only 1 request the browser makes. Good for snappier page loads with less server load.

But yeah, updating especially the sections on public vs private caches with regards to HTTPS would be good.

esseph•1mo ago
Just the opposite, caching is everywhere now. How do you think a CDN works?
tekno45•1mo ago
how is https making caching irrelevant?
pat2man•1mo ago
At one point with http only your isp could do its own cache, large corporate it networks could have a cache, etc. which was very efficient for caching. But horrible for privacy. Now we have CDN edge caching etc but nothing like the multi layer caching that was available with http.
afiori•1mo ago
That sounds like it is one expiration bug away from debugging hell
QuantumNomad_•1mo ago
It is not uncommon for enterprises to intercept HTTPS for inspection and logging. They may or may not also do caching of responses at the point where HTTPS is intercepted.

I previously experimented a bit with Squid Cache on my home network for web archival purposes, and set it up to intercept HTTPS. I then added the TLS certificate to the trust store on my client, and was able to intercept and cache HTTPS responses.

In the end, Squid Cache was a little bit inflexible in terms of making sure that the browsed data would be stored forever as was my goal.

This Christmas I have been playing with using mitmproxy instead. I previously used mitmproxy for some debugging, and found out now that I might be able to use it for archival by adding a custom extension written in Python.

It’s working well so far. I browse HTTPS pages in Firefox and I persist URLs and timestamps in SQLite and write out request and response headers plus response body to disk.

My main focus at the moment is archiving some video courses that I paid for in the past, so that even the site I bought the courses from ceased operation I will still have those video courses. After I finish archiving the video courses, I will proceed to archiving other digital things I’ve bought like VST plugins, sample packs, 3d assets etc.

And after that I will give another shot at archiving all the random pages on the open web that I’ve bookmarked etc.

For me, archiving things by using an intercepting proxy is the best way. I have various manually organised copies of files from all over the place, both paid stuff and openly accessible things. But having a sort of Internet Archive of my own with all of the associated pages where I bought things and all the JS and CSS and images surrounding things is the dream. And at the moment it seems to be working pretty well with this mitmproxy + custom Python extension setup.

I am also aware of various existing web scrapers and internet archival systems for self hosting and have tried a few of them. But for me the system I am doing is the ideal.

mariusor•1mo ago
If you implement any of the ends of a HTTP communication caching is still very important.

This website is chock full of site operators raging mad at web crawlers created by people that didn't bother to implement proper caching mechanisms.

rfmoz•1mo ago
CDNs manage user TLS certificates and that is one of the advantages of using them.

A node server could negociate https close to the user, do caching stuff and create an other https connection to your local server (or reuse an existing one).

Https everywhere with your CDN in middle.

gaigalas•1mo ago
Can you elaborate on what is the reasoning here?
cryptonector•1mo ago
Besides MITM proxies, server-side proxies can also do caching. Thus applications should use the Vary: header.
Joker_vD•1mo ago
As is traditional with most explanations of HTTP caching, it doesn't mention Vary header. Although apparently some CDNs (e.g. Cloudflare) straight up ignore it for some reason [0].

[0] https://news.ycombinator.com/item?id=38346382

paulddraper•1mo ago
Vary is Very important.

> the cache MUST NOT use that stored response without revalidation unless all the presented request header fields nominated by that Vary field value match those fields in the original request

You’ll find that some have creative readings of MUST NOT.

danburzo•1mo ago
Good call! Honestly I just wanted to wrap it up before the holidays, but you’re right that a small section on Vary would have been useful.

Things like non-conforming caching services made me punt actual suggestions to a later article, as I wasn’t sure how my sense of the RFC interacted with the real world. HTTP Caching Tests seems like a great resource for this, but only includes Fastly out of the big providers, and it seems to be doing okay with Vary. https://cache-tests.fyi/

danburzo•1mo ago
Updated the article with some information on the `Vary` and `No-Vary-Search` headers. I’ve left out the details of how revalidation works with `Vary` since I haven’t been able to reconcile yet what the spec seems to encourage vs what the tests on cache-tests.fyi suggest is conformant behavior.
JimDabell•1mo ago
There was a recent discussion on X about this that had a couple of Cloudflare people chip in, including their CTO:

https://xcancel.com/simonw/status/1988984600346128664

lucideer•1mo ago
The highlight from that thread https://xcancel.com/dok2001/status/1989005141450846470#m
bmandale•1mo ago
I would say "vary" is the wrong way to solve that problem. The issue is that there can easily be a bunch of stupid inconsequential differences between accept headers, far beyond simply asking for type x versus type y. Slightly different priorities, order, including an extra mime in the list, putting some irrelevant format nobody uses first just in case, etc.

An optimal solution would involve: the response listing which alternate content-types can be returned for that endpoint, the cache considering the accept header, if it sees a type from the alternates list higher in the accept header priority than whatever it has in cache, then it would forward the request to the server. Once it had all the alternatives in cache, it would pass them through according to the accept without hitting the server.

The closest existing header to the above would be the link header, if you give it rel=alternate, and type as the mime type. It's not clear what href you would be, since it usually is to a different document, but we want the same url but a different mime type. So clearly this would be an abuse of the header, but could work.

Joker_vD•1mo ago
That's tangentially related to the Vary header. Not only Accept can go into its value, you know.

And an optimal solution IMHO would be for the origin server to simply return 302 to a specific resource, selected upon the value of the Accept header:

    GET /thumb.php?id=kekw HTTP/1.1
    Accept: image/avif,image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8

    HTTP/1.1 302 Found
    Location: /media/thumb.jpg?id=kekw
    Vary: Accept

    GET /media/thumb.jpg HTTP/1.1
    Content-Type: image/jpeg
bmandale•1mo ago
Sure, except I doubt most people want to uglify all their urls with extensions for occasional alternates. Plus, if the url with the extension gets past around instead of the original (as would inevitably be done) you're back to square one.

I had thought about recommending that people just use an alternate link as intended, to point to an alternate format. I think that would work best using existing web standards as intended, but it has the downside of initially serving the original format regardless of the content type.

Joker_vD•1mo ago
> if the url with the extension gets past around instead of the original (as would inevitably be done) you're back to square one.

Why? It has no "Vary" header, and it's the one that's supposed to get cached anyhow.

bmandale•1mo ago
If people see it in the url bar and copy paste it from there. In the case of images if they "copy image url".
aleksandrm•1mo ago
This is nothing new and doesn't add anything new to the topic, so am I the only that thinks this is just an attempt at boosting their SEO through HN?
danburzo•1mo ago
I’m sorry you didn’t get anything out of it. I wasn’t operating at the edge of caching knowledge, just a person refreshing and clarifying for themselves how caching works. Some things were new to me, and after spending so much time with the RFC, I just thought others may benefit or, more selfishly, would point out errors or ways to make it better.

I mean, do those <meta> tags really suggest someone who’s into SEO? Call me stale but what I really want is validation :-)

masklinn•1mo ago
It clearly notes that it's "a refresher", does not claim that it's novel research, and extensively links to the reference documents. It is, essentially, a review article (https://en.wikipedia.org/wiki/Review_article). And there's absolutely nothing wrong with that.

Hell, the author could probably have called it a primer and I think it'd have been fair.

loloquwowndueo•1mo ago
Dunno man, sometimes I write blog posts for my own benefit, to document my knowledge and understanding of something. I could put it in a private note, but I can also put it in my blog and who knows, maybe someone else can benefit from it - even if it’s nothing you couldn’t google research yourself or god forbid, ask an LLM to summarize for you.

No need to be mean and assume the worst possible purpose :)

danburzo•1mo ago
As many have pointed out here, the nature of caching has changed in the current climate of ubiquitous HTTPS, and I want to add a paragraph or two about it. Is there a good summary somewhere that I could reference? What are the the usual, most prevalent uses of HTTP intermediaries involving caches, besides CDNs and origin-controlled caches (eg Varnish)?
wyuenho•1mo ago
HN is full of noobs loudly proclaiming what they don't know is true these days. Ubiquitous HTTPS does not change the nature of private browser caches, and only nullify the proxy related cache headers if the origin encrypts traffic all the way to the client, which is quite rare in real life, unless we are merely talking about a dude serving this blog from his basement computer.

In general, your answer depends on where the TLS cert terminates. In most situation a CDN or a reverse proxy is involved, and the TLC cert you use to encrypt traffic from the origin to the proxy is different from the one the proxy uses to encrypt traffic from it to the browser. Whenever a MITM intermediary is involved, you should read the intermediary's documentation. These usually include Cloudflare, AWS Cloudfront, Akamai etc. With with exceptions, like the Vary header as pointed out elsewhere, these vendors largely follow HTTP caching semantics for proxy caches.

danburzo•1mo ago
Thanks! I’ve updated the introduction with some ‘now vs then’ pointers.
wbadart•1mo ago
Great write up!

Wanted to highlight MDN's HTTP caching guide[0] that OP links in the conclusion. It's written at a higher level than the underlying reference material and has been a great resource I've turned to several times in the last few years.

[0]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/Cac...

KronisLV•1mo ago
I found that Cache-Control with no-cache worked pretty well EXCEPT Apache2 would fail to return 304 when also compressing some of the resources: https://stackoverflow.com/questions/896974/apache-is-not-sen...

I think setting FileETag None solved it. With that setup, the browser won't use stale JS/CSS/whatever bundles, instead always validating them against the server, but when the browser already has the correct asset downloaded earlier, it will get a 304 and avoid downloading a lot of stuff. Pretty simple and works well for low traffic setups.

It was surprisingly easy to mess up, or having your translation bundles have cached out of date versions in the browser.

(nothing against other web servers, Apache2 was just a good fit for other reasons)

nesarkvechnep•1mo ago
For 10+ years in the industry I can safely say that almost nobody knows or cares about HTTP caching. It’s sad.

Start all of your commands with a comma

https://rhodesmill.org/brandon/2009/commands-with-comma/
142•theblazehen•2d ago•42 comments

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
668•klaussilveira•14h ago•202 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
949•xnx•19h ago•551 comments

How we made geo joins 400× faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
122•matheusalmeida•2d ago•32 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
53•videotopia•4d ago•2 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
229•isitcontent•14h ago•25 comments

Jeffrey Snover: "Welcome to the Room"

https://www.jsnover.com/blog/2026/02/01/welcome-to-the-room/
16•kaonwarb•3d ago•19 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
222•dmpetrov•14h ago•117 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
27•jesperordrup•4h ago•16 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
330•vecti•16h ago•143 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
494•todsacerdoti•22h ago•243 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
381•ostacke•20h ago•95 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
359•aktau•20h ago•181 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
288•eljojo•17h ago•169 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
412•lstoll•20h ago•278 comments

Was Benoit Mandelbrot a hedgehog or a fox?

https://arxiv.org/abs/2602.01122
19•bikenaga•3d ago•4 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
63•kmm•5d ago•6 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
90•quibono•4d ago•21 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
256•i5heu•17h ago•196 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
32•romes•4d ago•3 comments

What Is Ruliology?

https://writings.stephenwolfram.com/2026/01/what-is-ruliology/
43•helloplanets•4d ago•42 comments

Where did all the starships go?

https://www.datawrapper.de/blog/science-fiction-decline
12•speckx•3d ago•4 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
59•gfortaine•12h ago•25 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
33•gmays•9h ago•12 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1066•cdrnsf•23h ago•446 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
150•vmatsiiako•19h ago•67 comments

Why I Joined OpenAI

https://www.brendangregg.com/blog/2026-02-07/why-i-joined-openai.html
149•SerCe•10h ago•138 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
287•surprisetalk•3d ago•43 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
182•limoce•3d ago•98 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
73•phreda4•13h ago•14 comments