It's downdetectorsdown all the way down.
https://downdetectorsdowndetectorsdowndetectorsdowndetectors...
https://datatracker.ietf.org/doc/html/rfc1035
Also I think I triggered a nice error log in domaintools just now. https://whois.domaintools.com/downdetectorsdowndetectorsdown...
From there, the "who's watching who?" can become mathematically interesting.
Looks like it's hosted in London?
Since down detectors serve to detect failures of centralized (and decentralized systems) the idea would be to at least get that right: a distributed system to detect outages.
You basically run detectors that heartbeat each others. Just a few suffice.
Once you start to see clusters of detectors go silent, you can assume things are falling apart, which is fine so long as a few remain.
Self healing also helps to make the web of nodes resilient to inevitable infrastructure failures.
Jokes aside, as far as I can tell, https://downdetectorsdowndetector.com/ is NOT using Cloudflare CDN/Proxy
https://downdetectorsdowndetector.com/ is NOT using Cloudflare SSL
However, selesti reports it uses cloudflare DNS?
https://checkforcloudflare.selesti.com/?q=https://downdetect...
https://downdetectorsdowndetector.com/ is using Cloudflare DNS!
Checked 8 global locations, found DNS entries for Cloudflare in 3
Found in: England, Russia, USA
Not found in: China, Denmark, Germany, Spain, Netherlands
Cloudflare > Bunny.net
AWS > Hetzner
Business email > Infomaniak
Not a single client site has experienced downtime, and it feels great to finally decouple from U.S. services.
Ah yes, the place for RabbitMQ endpoints.
for the hobby crowd it's a shame, for a corporation it's still cheaper than aws with the extra bonus of not having any tie to the us.
Hetzner provides a much simpler set of services than AWS. Less complexity to go wrong.
A lot of people want the brand recognition too. Its also become the standard way of doing things and is part of the business culture. I have sometimes been told its unprofessional or looks bad to run things yourself instead of using a managed service.
This sounds like a good thing.
It does mean that you get fewer services, you have to do more sysadmin internally or use other providers for those which a lot of people are very reluctant to do.
S3 is something of an exception, but it does not tie you down (everyone provides block storage now, and you can use S3 even if everything else is somewhere else) for me if storing lots of large files that are not accessed very much (so egress fees are low).
However, I would say that the effect of this outage on customer retention will be (relatively) smaller than it would be for a smaller CDN.
Cloudflare just caused a massive internet outage costing millions of dollars worldwide, in part due to a very sloppy mistake that definitely ought to have been prevented (using Rust's “unwrap” in production ). Let's see how many customers they lose because of that and we'll see how big are their incentives. (If you look at the evolution of their share value, it doesn't look like the incident terrified their shareholders at least…)
That's an incredibly bad take lol.
There are times where "The Cloud" makes sense, sure. But in my experience the majority of the time companies over-use the cloud. On Prem is GOOD. It's cheaper, arguably more secure if you configure it right (a challenge, I know, but hear me out) and gives you data sovereignty.
I don't quite think companies realize how bad it would be if EG AWS was hacked.
Any Data you have on the cloud is no longer your data. Not really. It's Amazon, Microsoft, Apple, whoevers.
I don't think they'd care. Companies only care about one thing: stock price. Everything rolls up into that. If AWS got hacked and said company was affected by it, it wouldn't be a big deal because they'd be one of many and they'd be lost in the crowd. Any hit to their stock/profits would be minimal and easily forgotten about.
Now, if they were on prem or hosted with Bob's Cloud and got hacked? Different story altogether.
Its rarely affected in any case. Take a look at the Crowdstrike price chart (or revenue or profits). I think most people (including investors) just take it for granted that systems are unreliable and regard it as something you live with.
But it's since been restored. According to the news, they lost very little customers over the incident. That is why their stock came back. If they continued having problems, I doubt it would have been so rosy. So yes, to your point, a blip here or there happens.
Not to mention the familiarity of the company, its services and expectations. You can hire people with experience with AWS, Azure or GCP, but the more niche you go, the higher the possibility that some people you hire might not know how to work with those systems and their nuances, which is fine they can learn as they work, but that adds to ramp up time and could lead to inadvertent mistakes happening.
Smaller providers tend to have simpler systems so it only adds to ramp up time if you hire someone who only knows AWS or whatever. Simpler also means fewer mistakes.
If you stick to a simple set of services (e.g. VPS or containers + object storage) there are very few service specific nuances.
Hard disagree. A smaller provider will think twice about whether they use a Tier 1 data center versus a Tier IV data center because the cost difference is substantial and in many cases prohibitively expensive.
Are smaller scale services more reliable? I think that's too simple a question to be relevant. Sometimes yes, sometimes no, but we know one thing for sure - when smaller services go down the impact radius is contained. When a corrupt MBA who wants to pump short term metrics for a bonus gains power, the damage they can do is similarly contained. All risk factors are boxed in like this. With a hyperscale business, things are capable of going much more wrong for many more people, and the recursive nature of vertical+horizontal integration causes a calamity engine that can be hard to correct.
Take the financial sector in 08. Huge monoliths that had integrated every kind of financial service with every other kind of financial service. Few points of failure, every failure mode exposed to every other failure mode.
There's a reason asymmetric warfare is hard for both parties - cellular networks of small units that can act independently are extremely fault tolerant and robust against changing conditions. Giants, when they fall, do so in spectacular fashion.
If AWS goes down, no one will blame you for your web store being down as pretty much every other online service will be seeing major disruptions.
But when your super small provider goes down, it's now your problem and you better have some answers ready for your manager. And you'll still be affected by the AWS outage anyways as you probably rely on an API that runs on their cloud!
It's a "feature" right up there with planned obsolescence and garbage culture (the culture of throw-away).
The real problem is not having a fail-over provider. Modern software is so abstracted (tens, hundreds, even thousands of layers), and yet we still make the mistake of depending on one, two layers to make things "go".
When your one small provider goes down, no problem, switch over to your other provider. Then laugh at the people who are experiencing AWS downtime...
> Then laugh at the people who are experiencing AWS downtime...
Let's not stroke our egos too much here, mkay?
I disagree because conversely, outages for larger providers cause millions or maybe even billions of dollars in losses for its customers. They might be more "stuck" in their current providers' proprietary schemes, but these kinds of losses will cause them to move away, or at least diversify cloud providers. In turn, this will cause income losses to the cloud provider.
First I used an ex101 with an i9-13900. Within a week it just froze. It could not be reset remotely. Nothing in kern.log. Support offered no solution but a hard reboot. No mention of what might be wrong other than user error.
A few months later, one of the drives just disconnects from raid by itself. It took support 1 hour to respond and they said they found no issue so it must be my fault.
Then I changed to a ryzen based server and it also mysteriously had problems like this. Again the support blamed the user.
It was only after I cancelled the server and several months later that I see this so I know it isn't just me.
https://docs.hetzner.com/robot/dedicated-server/general-info...
Am I missing something or is bunny.net not actually a replacement for that?
MailPace data is also hosted in the EU only
You can use whatever infrastructure you want for whatever reason, but you may not have an accurate picture of the availability.
This may be true over a long enough timeframe, but GP stated that their clients had experienced no downtime since switching at the start of the year.
That is clearly better than both AWS and Cloudflare during that time.
I don't use cloud flare for anything, so no comment there.
Valid. I should have made it clear that I meant "clearly better from GP's perspective."
That's the least useful information.
What matters for his service availability is what he should expect going forward. What matters for reviewing his decision making process is what he should have expected at the time of choosing service providers.
Note that I'm not saying Hetzner is bad. Just incidents happen in Europe too. The server didn't have a lot of issues like this over the years.
I wonder though where is it hosted? Digital Ocean? :)
As the Web becomes more and more entangled, I don't know if there is any guarantee of what is really independent. We should make a diagram of this. Hopefully no cyclic dependencies there yet.
but who detects the down detector detecting the down detector detecting the down detector
Maybe distributed down detection?
I know there are people here perfectly capable of running with that idea and we might just see a distributed down detector announced on HN :)
Arbites.
Downdetector was indeed down during the cf outage, but I think the index page was still returning 200 (although I didn't check).
Running a headless browser to take a screenshot to check would probably get you blocked by cf...
script.js calls `fetchStatus()`, which calls `generateMockStatus()` to get the statuses, which just makes up random response times:
// ---- generate deterministic mock data for the current 3-min window ----
function generateMockStatus() {
const bucket = getCurrentBucket();
const rng = createRng(bucket);
// "Virtual now" = middle of this 3-minute bucket
const virtualNowMs = bucket * BUCKET_MS + BUCKET_MS / 2;
// Checked a few minutes ago (2–5 min, plus random seconds)
const minutesOffset = randomInt(rng, 2, 5);
const secondsOffset = randomInt(rng, 0, 59);
const checkedAtMs =
virtualNowMs - minutesOffset * 60_000 - secondsOffset * 1000;
const checkedAtDate = new Date(checkedAtMs);
return {
checkedAt: checkedAtDate.toISOString(),
target: "https://downdetector.com/",
regions: [
{
name: "London, UK",
status: "up",
httpStatus: 200,
responseTimeMs: randomInt(rng, 250, 550),
error: null
},
{
name: "Auckland, NZ",
status: "up",
httpStatus: 200,
responseTimeMs: randomInt(rng, 300, 650),
error: null
},
{
name: "New York, US",
status: "up",
httpStatus: 200,
responseTimeMs: randomInt(rng, 380, 800),
error: null
}
]
};
}So if any of the things you want to know is down is down, chances are this site will be too ;)
It looks really nice, good job!
ulf-77723•14h ago
kijin•13h ago