Having only one SSL certificate is a single point of failure, we have eliminated single points of failure almost everywhere else.
Edit: but to be clear, I don’t understand why you’d want this. If you’re worried about your CA going offline, you should shorten your renewal period instead.
Update: looks like the answer is yes. So then the issue is people not taking advantage of this technique.
Both Apache (SSLCertificateFile) and nginx (ssl_certificate) allow for multiple files, though they cannot be of the same algorithm: you can have one RSA, one ECC, etc, but not (say) an ECC and another ECC. (This may be a limitation of OpenSSL.)
So if the RSA expires on Feb 1, you can have the ECC expire on Feb 14 or Mar 1.
But for human persons and personal websites HTTP+HTTPS fixes this easily and completely. You get the best of both worlds. Fragile short lifetime pseudo-privacy if you want it (HTTPS) and long term stable access no matter what via HTTP. HTTPS-only does more harm than good. HTTP+HTTPS is far better than either alone.
The main lesson we took from this was: you absolutely need monitoring for cert expiration, with alert when (valid_to - now) becomes less than typical refresh window.
It's easy to forget this, especially when it's not strictly part of your app, but essential nonetheless.
I have a simple Python script that runs every day and checks the certificates of multiple sites.
One time this script signaled that a cert was close to expiring even though I saw a newer cert in my browser. It turned out that I had accidentally launched another reverse proxy instance which was stuck on the old cert. Requests were randomly passed to either instance. The script helped me correct this mistake before it caused issues.
> There’s no natural signal back to the operators that the SSL certificate is getting close to expiry.
There is. The not after is right there in the certificate itself. Just look at it with openssl x509 -text and set yourself up some alerts… it’s so frustrating having to refute such random bs every time when talking to clients because some guy on the internet has no idea but blogs about their own inefficiencies.
Furthermore, their autorenew should have been failing loud and clear, everyone should know from metrics or logs… but nobody noticed anything.
It is not about encryption (that a self-signed certificate lasting till 2035 will suffice), but verification, who am I talking with, because reaching the right server can be messed up with DNS or routing, among other things. Yes, that adds complexity, but we are talking more about trust than technology.
And once you recognize that it is essential to have a trusted service, then give it the proper instrumentation to ensure that it work properly, including monitoring and expiration alerts, and documentation about it, not just "it works" and dismiss it.
May we retitle the post as "The dangers of not understanding SSL Certificates"?
You can update your cert to prepare for it by appending—-NEW CERT—-
To the same file as ——-OLD CERT—-
But you also need to know where all your certificates are located. We were using Venafi for the auto discovery and email notifications. Prometheus ssl_exporter with Grafana integration and email alerts works the same. The problem is knowing where all hosts, containers and systems that have certs are located. Simple nmap style scan of all endpoints can help. But, you might also have containers with certs or you might have certs baked into VM images. Sure, there all sorts of things like storing the cert in a CICD global variable, bind mounting secrets, Vault Secret Injector, etc
But it’s all rooted in maintaining a valid, up to date TLS inventory. And that’s hard. As the article states: “ There’s no natural signal back to the operators that the SSL certificate is getting close to expiry. To make things worse, there’s no staging of the change that triggers the expiration, because the change is time, and time marches on for everyone. You can’t set the SSL certificate expiration so it kicks in at different times for different cohorts of users.”
Every time this happens you whack a mole a change. You get better at it but not before you lose some credibility
loloquwowndueo•1h ago
A certificate renewal process has several points at which failure can be detected and action taken, and it sounds like this team was relying only on a “failed to renew” alert/monitor.
A broken alerting system is mentioned “didn’t alert for whatever reason”.
If this certificate is so critical, they should also have something that alerts if you’re still serving a certificate with less than 2 weeks validity - by that time you should have already obtained and rotated in a new certificate. This gives plenty of time for someone to manually inspect and fix.
Sounds like a case of “nothing in this automated process can fail, so we only need this one trivial monitor which also can’t fail so meh” attitude.
yearolinuxdsktp•1h ago