It's "common" to lower a TTL in preparation for a change to an existing RR, but you need to make sure you lower it at least as long as the current TTL prior to the change. Keeping the TTL low after the change isn't beneficial unless you're planning for the possibility of reverting the change.
A low TTL on a new record will not speed propagation. Resolvers either have the new record cached or they don't. If it's cached, the TTL doesn't matter because it already has the record (propogated). If it doesn't have it cached, then it doesn't know the TTL so doesn't matter if it's 1 second or 1 month.
> such that all online caches get updated
There's no such thing. Apart from millions of dedicated caching servers, each end device will have it's own cache. You can't invalidate DNS entries at that scope.
And a similar version of the same blog post on a personal blog in 2019 https://news.ycombinator.com/item?id=21436448 (thanks to ChrisArchitect for noting this in the only comment on a copy from 2024).
Maybe this weekend I'll finally get the energy up to just do it.
of course, as internet speeds increase and resources are cheaper to abuse, people lose sight of the downstream impacts of impatience and poor planning.
In the HTTP/1.1 (1997) or HTTP/2 era, the TCP connection is made once and then stays open (Connection: Keep-Alive) for multiple requests. This greatly reduces the number of DNS lookups per HTTP request.
If the web server is configured for a sufficiently long Keep-Alive idle period, then this period is far more relevant than a short DNS TTL.
If the server dies or disconnects in the middle of a Keep-Alive, the client/browser will open a new connection, and at this point, a short DNS TTL can make sense.
(I have not investigated how this works with QUIC HTTP/3 over UDP: how often does the client/browser do a DNS lookup? But my suspicion is that it also does a DNS query only on the initial connection and then sends UDP packets to the same resolved IP address for the life of that connection, and so it behaves exactly like the TCP Keep-Alive case.)
> patched an Encrypted DNS Server to store the original TTL of a response, defined as the minimum TTL of its records, for each incoming query
The article seems to be based on capturing live dns data from some real network. While it may be true that persistent connections help reduce ttl it certainly seems like the article is accounting for that unless their network is only using http1.0 for some reason.I agree that low TTL could help during an outage if you actually wanted to move your workload somewhere else, and I didn't see it mentioned in the article, but I've never actually seen this done in my experience, setting TTL extremely low for some sort of extreme DR scenario smells like an anti pattern to me.
Consider the counterpoint, having high TTL can prevent your service going down if the dns server crashes or loses connectivity.
If you set your TTL to an hour, it raises the costs of DNS issues a lot: A problem that you fix immediately turns into an hour-long downtime. A problem that you don't fix on the first attempt and have to iteratively try multiple fixes turns into an hour-per-iteration downtime.
Setting a low TTL is an extra packet and round-trip per connection; that's too cheap to meter [1].
When I first started administering servers I set TTL high to try to be a good netizen. Then after several instances of having to wait a long time for DNS to update, I started setting TTL low. Theoretically it causes more friction and resource usage but in practice it really hasn't been noticeable to me.
[1] For the vast majority of companies / applications. I wouldn't be surprised to learn someone somewhere has some "weird" application where high TTL is critical to their functionality or unit economics but I would be very surprised if such applications were relevant to more than 5% of websites.
When you run a website that receives new POSTed information every 60 seconds, you sure do. ;)
bjourne•1w ago
c45y•1w ago
Relatively simple inside a network range you control but no idea how that works across different networks in geographical redundant setups
preisschild•1w ago
deceptionatd•1w ago
Seems like you'd be trying to work against the basic design principles of Internet routing at that point.
_bernd•1w ago
deceptionatd•1w ago
_bernd•1w ago
BitPirate•1w ago
Matheus28•1w ago
toast0•1w ago
And then if you're dealing with browsers, they're not the best at trying everything, or they may wait a long time before trying another host if the first is non-responsive. For browsers and rotations that really do change, I like a 60 second TTL. If it's pretty stable most of the time, 15 minutes most of the time, and crank it down before intentional changes.
If you've got a smart client that will get all the answers, and reasonably try them, then 5-60 minutes seems reasonable, depending on how often you make big changes.
All that said, some caches will keep your records basically forever, and there's not much you can do about that. Just gotta live with it.
deceptionatd•1w ago
And a BGP failure is a good example too. It doesn't matter how resilient the failover mechanisms for one IP are if the routing tables are wrong.
Agreed about some providers enforcing a larger one, though. DNS propagation is wildly inconsistent.
bjourne•1w ago
Bender•1w ago
[1] - https://en.wikipedia.org/wiki/Anycast
kevincox•1w ago
Failover is different and more of a concern, especially if the client doesn't respect multiple returned IPs.
johntash•1w ago
electroly•6d ago
mannyv•6d ago
Now there are multiple kinds of HA, so we'll go over a bunch of them here.
Case 1: You have one host (host A) on the internet and it dies, and you have another server somewhere (host B) that's a mirror but with a different IP. When host A dies you update DNS so clients can still connect, but now they connect to host B. In that case the client will not connect to the new IP until their DNS resolver gets the new IP. This was "failover" back in the day. That is dependent on the DNS TTL (and the resolver, because many resolvers and aches ignore the TTL and used their own).
In this case a high TTL is bad, because the user won't be able to connect to your site for TTL seconds + some other amount of time. This is how everyone learned it worked, because this is the way it worked when the inter webs were new.
Case 2: instead of one DNS record with one host you have a DNS record with both hosts. The clients will theoretically choose one host or the other (round robin). In reality it's unclear if that actually do that. Anecdotal evidence shows that it worked until it didn't, usually during a demo to the CEO. But even if it did that means that 50% of your requests will hit a X second timeout as the clients try to connect to a dead host. That's bad, which is why nobody in their right minds did it. And some clients always picked the first host because that's how DNS clients are sometimes.
Putting a load balancer in front of your hosts solves this. Do load balancers die? Yeah, they do. So you need two load balancers...which brings you back to case 1.
These are the basic scenarios that a low DNS TTL fixes. There are other, more complicated solutions, but they're really specialized and require more control of the network infrastructure...which most people don't have.
This isn't an "urban legend" as the author states. These are hard-won lessons from the early days of the internet. You can also not have high availability, which is totally fine.