Cloudflare was down | Modern Orange

816
522
mektrik
1 day ago
cloudflare.com

pm90
·
1 day ago
·
[ - ]

This is not good. One major outage? Something exceptional. Several outages in a short time? As someone thats worked in operations, I have empathy; there are so many “temp havks” that are put in place for incidents. but the rest of the world won’t… they’re gonna suffer a massive reputation loss if this goes on as long as the last one.

berkes
·
1 day ago
·
[ - ]

At least this warrants a good review of anyone's dependency on cloudflare.

If it turns out that this was really just random bad luck, it shouldn't affect their reputation (if humans were rational, that is...)

But if it is what many people seem to imply, that this is the outcome of internal problems/cuttings/restructuring/profit-increase etc, then I truly very much hope it affects their reputation.

But I'm afraid it won't. Just like Microsoft continues to push out software, that, compared to competitors, is unstable, insecure, frustrating to use, lacks features, etc, without it harming their reputation or even bottomlines too much. I'm afraid Cloudflare has a de-facto monopoly (technically: big moat) and can get away with offering poorer quality, for increasing pricing by now.

zelphirkalt
·
1 day ago
·
[ - ]

Microsoft's reputation couldn't be much lower at this point, that's their trick.

The issue is the uninformed masses being led to use Windows when they buy a computer. They don't even know how much better a system could work, and so they accept whatever is shoved down their throats.

coffeebeqn
·
1 day ago
·
[ - ]

Vibe infrastructure

rvz
·
1 day ago
·
[ - ]

So that is what the best case definition of what "Vibe Engineering" is.

rsynnott
·
1 day ago
·
[ - ]

> Just like Microsoft continues to push out software, that, compared to competitors, is unstable, insecure, frustrating to use, lacks features, etc, without it harming their reputation or even bottomlines too much.

Eh.... This is _kind_ of a counterfactual, tho. Like, we are not living in the world where MS did not do that. You could argue that MS was in a good place to be the dominant server and mobile OS vendor, and simply screwed both up through poor planning, poor execution, and (particularly in the case of server stuff) a complete disregard for quality as a concept.

I think someone who'd been in a coma since 1999 waking up today would be baffled at how diminished MS is, tbh. In the late 90s, Microsoft practically _was_ computers, with only a bunch of mostly-dying UNIX vendors for competition. And one reasonable lens through which to interpret its current position is that it's basically due to incompetence on Microsoft's part.

MrAureliusR
·
1 day ago
·
[ - ]

well that's the thing, such a huge number of companies route all their traffic through Cloudflare. This is at least partially because for a long time, there was no other company that could really do what Cloudflare does, especially not at the scales they do. As much as I despise Cloudflare as a company, their blog posts about stopping attacks and such are extremely interesting. The amount of bandwidth their network can absorb is jaw-dropping.

I've said to many people/friends that use Cloudflare to look elsewhere. When such a huge percentage of the internet flows through a single provider, and when that provider offers a service that allows them to decrypt all your traffic (if you let them install HTTPS certs for you), not only is that a hugely juicy target for nation-states but the company itself has too much power.

But again, what other companies can offer the insane amount of protection they can?

gbrindisi
·
1 day ago
·
[ - ]

The crowdstrike incident taught us that no one is going to review any dependency whatsoever.

ezst
·
1 day ago
·
[ - ]

Yep, that's what late stage capitalism leaves you with: consolidation, abuse, helplessness and complacency/widespread incompetence as a result

bluerooibos
·
1 day ago
·
[ - ]

I'm quite sure the reputational damage has already been done.

How do they not have better isolation of these issues, or redundancy of some sort?

brandensilva
·
1 day ago
·
[ - ]

The seed has been planted. It will take awhile for others to fill the void. Still the big players see this as an opportunity to steal market share if Cloudflare cannot live up to their reputation.

rvz
·
1 day ago
·
[ - ]

We are now seeing which companies do not consider the third party risk of single point of failures in systems they do not control as part of their infrastructure and what their contingency plan is.

It turns out so far, there isn't one. Other than contacting the CEO of Cloudflare rather than switching on a temporary mitigation measure to ensure minimal downtime.

Therefore, many engineers at affected companies would have failed their own systems design interviews.

throwaway42346
·
1 day ago
·
[ - ]

Alternative infrastructure costs money, and it's hard to get approval from leadership in many cases. I think many know what the ideal solution looks like, but anything linked to budgets is often out of the engineer's hands.

In some cases it is also a valid business decision. If you have 2 hour down time every 5 years, it may not have a significant revenue impact. Most customers think it's too much bother to switch to a competitor anyway, and even if it were simple the competition might not be better. Nobody gets fired for buying IBM

The decision was probably made by someone else who moved on to a different company, so they can blame that person. It's only when down time significantly impacts your future ARR (and bonus) that leadership cares (assuming that someone can even prove that they actually lose customers).

cryptonym
·
1 day ago
·
[ - ]

Sometimes it's not worth it. Your plan is just to accept you'll be off for a day or two, while you switch to a competitor.

creamyhorror
·
1 day ago
·
[ - ]

If there's a fitting competitor worth switching to.

Plus most people don't get blamed when AWS (or to a lesser extent Cloudflare) goes down, since everyone knows more than half the world is down, so there's not an urgent motivation to develop multi-vendor capability.

rvz
·
1 day ago
·
[ - ]

Can't say that when it is a time critical service such as hospitals, banks, financial institutions or air-traffic control services.

cryptonym
·
23 hours ago
·
[ - ]

Only a fool would build an architecture for critical air-traffic with Cloudflare as a SPoF.

formerly_proven
·
1 day ago
·
[ - ]

On the other thread there were comments claiming it’s unknowable what IaaS some SaaS is using, but SaaS vendors need to disclose these things one way or another, e.g. DPAs. Here is for example renders list of subprocessors: https://render.com/security

It’s actually fairly easy to know which 3rd party services a SaaS depends on and map these risks. It’s normal due diligence for most companies to do so before contracting a SaaS.

jcmfernandes
·
1 day ago
·
[ - ]

Absolutely. I wouldn’t be surprised if they turned the heat up a little after the last incident. The result? Even more incidents.

belter
·
1 day ago
·
[ - ]

This will be another post-mortem of...config file messed...did not catch...promise to be doing better next....We are sorry.

They problem is architectural.

lucyjojo
·
14 hours ago
·
[ - ]

cloudflare is a huge system in active development.

it will randomly fail. there is no way it cannot.

there is a point where the cost to not fail simply becomes too high.

pyuser583
·
1 day ago
·
[ - ]

Lots of big sites are down

wooque
·
1 day ago
·
[ - ]

2 days ago they had outage that affected Europe, Cloudflare seems to be going down the drain. I removed it for my personal sites.

karmakurtisaani
·
1 day ago
·
[ - ]

Probably fired a lot of their best people in the past few years and replaced it with AI. They have a de-facto monopoly, so we'll just accept it and wait patiently until they fix the problem. You know, business as usual in the grift economy.

5d41402abc4b
·
1 day ago
·
[ - ]

>They have a de-facto monopoly

On what? There are lots of CDN providers out there.

esseph
·
1 day ago
·
[ - ]

They do fare more than just CDN. It's the combination of service, features, reach, price, and the integration of it all.

immibis
·
1 day ago
·
[ - ]

There's only one that lets everyone sign up for free.

rvz
·
1 day ago
·
[ - ]

The "AI agents" are on holiday when an outage like this happens.

mvdtnz
·
20 hours ago
·
[ - ]

This didn't happen at all. You're just completely making shit up.

PlotCitizen
·
1 day ago
·
[ - ]

This is a good reminder for everyone to reconsider making all of their websites depend on a single centralized point of failure. There are many alternatives to the different services which Cloudflare offers.

berkes
·
1 day ago
·
[ - ]

But the nature of a CDN and most other products CF offers, is central by nature.

If you switch from CF to the next CF competitor, you've not improved this dependency.

The alternative here, is complex or even non-existing. Complex would be some system that allows you to hotswap a CDN, or to have fallback DDOS protection services, or to build you own in-house. Which, IMO, is the worst to do if your business is elsewhere. If you sell, say, petfood online, the dependency-risk that comes with a vendor like CF, quite certainly is less than the investment needed- and risk associted with- building a DDOS protection or CDN on your own; all investment that's not directed to selling more pet-food or get higher margins at doing so.

agnivade
·
1 day ago
·
[ - ]

You can load-balance between CDN vendors as well

otikik
·
1 day ago
·
[ - ]

Then your load balancer becomes the single point of failure.

roryirvine
·
1 day ago
·
[ - ]

BGP Anycast will let you dynamically route traffic into multiple front-end load balancers - this is how GSLB is usually done.

Needs an ASN and a decent chunk of PI address space, though, so not exactly something a random startup will ever be likely to play with.

DaanDL
·
1 day ago
·
[ - ]

Then add a load balancer in front of your load balancer, duh. /s

sofixa
·
1 day ago
·
[ - ]

With what? The only (sensible) way is DNS, but then your DNS provider is your SPOF. Amazon used to run 2 DNS providers (separate NS from 2 vendors for all of AWS), but when one failed, there was still a massive outage.

altmanaltman
·
1 day ago
·
[ - ]

yeah there is no incentive to do a CDN in house, esp for businesses that are not tech-oriented. And the costs of the occasional outage has not really been higher than the cost of doing it in-house. And I'm sure other CDNs gets outages as well, just CF is so huge everyone gets to know about it and it makes the news

coffeebeqn
·
1 day ago
·
[ - ]

We just love to merge the internet into single points of failure

phatfish
·
1 day ago
·
[ - ]

This is just how free markets work, on the internet with no "physical" limitations it is simply accelerated.

Left alone corporations to rival governments emerge, which are completely unaccountable. At least there is some accountability of governments to the people, depending on your flavour of government.

mschuster91
·
1 day ago
·
[ - ]

no one loves the need for CDNs other than maybe video streaming services.

the problem is, below a certain scale you can't operate anything on the internet these days without hiding behind a WAF/CDN combo... with the cut-off mark being "we can afford a 24/7 ops team". even if you run a small niche forum no one cares about, all it takes is one disgruntled donghead that you ban to ruin the fun - ddos attacks are cheap and easy to get these days.

and on top of that comes the shodan skiddie crowd. some 0day pops up, chances are high someone WILL try it out in less than 60 minutes. hell, look into any web server log, the amount of blind guessing attacks (e.g. /wp-admin/..., /system/login, /user/login) or path traversal attempts is insane.

CDN/WAFs are a natural and inevitable outcome of our governments and regulatory agencies not giving a shit about internet security and punishing bad actors.

koakuma-chan
·
1 day ago
·
[ - ]

My Cloudflare Pages website works fine.

inferiorhuman
·
1 day ago
·
[ - ]

  There are many alternatives

Of varying quality depending on the service. Most of the anti-bot/catpcha crap seems to be equivalently obnoxious, but the handful of sites that use PerimeterX… I've basically sworn off DigiKey as a vendor since I keep getting their bullshit "press and hold" nonsense even while logged in.

I don't like that we're trending towards a centralized internet, but that's where we are.

luastoned
·
1 day ago
·
[ - ]

From the incident page:

A change made to how Cloudflare's Web Application Firewall parses requests caused Cloudflare's network to be unavailable for several minutes this morning. This was not an attack; the change was deployed by our team to help mitigate the industry-wide vulnerability disclosed this week in React Server Components. We will share more information as we have it today.

https://www.cloudflarestatus.com/incidents/lfrm31y6sw9q

reassess_blind
·
1 day ago
·
[ - ]

I’m really curious what their rollout procedure is, because it seems like many of their past outages should have been uncovered if they released these configuration changes to 1% of global traffic first.

lima
·
1 day ago
·
[ - ]

They don't appear to have a rollout procedure for some of their globally replicated application state. They had a number of major outages over the past years which all had the same root cause of "a global config change exposed a bug in our code and everything blew up".

I guess it's an organizational consequence of mitigating attacks in real time, where rollout delays can be risky as well. But if you're going to do that, it would appear that the code has to be written much more defensively than what they're doing it right now.

JB_Dev
·
1 day ago
·
[ - ]

Yea agree.. This is the same discussion point that came up last time they had an incident.

I really don’t buy this requirement to always deploy state changes 100% globally immediately. Why can’t they just roll out to 1%, scaling to 100% over 5 minutes (configurable), with automated health checks and pauses? That will go along way towards reducing the impact of these regressions.

Then if they really think something is so critical that it goes everywhere immediately, then sure set the rollout to start at 100%.

Point is, design the rollout system to give you that flexibility. Routine/non-critical state changes should go through slower ramping rollouts.

franktankbank
·
23 hours ago
·
[ - ]

Can't get hacked when you are down.

ethbr1
·
1 day ago
·
[ - ]

For hypothetical conflicting changes (read worst case: unupgraded nodes/services can't interop with upgraded nodes/services), what's best practice for a partial rollout?

Blue/green and temporarily ossify capacity? Regional?

cryptonym
·
22 hours ago
·
[ - ]

- Push a version with the new logic but not yet enabled, still using legacy logic, able to implement both

- Push a version that enables new logic for 1% of traffic

- Continue rollout until 100%

nrhrjrjrjtntbt
·
14 hours ago
·
[ - ]

Can also do canary rollout before that. Canary means rollout to endpoints only used by CF to test. Monitor metrics and automated test results.

cryptonym
·
5 hours ago
·
[ - ]

That's ok but doesn't solve issues you notice only on actual prod traffic. While it can be a nice addition to catch issues earlier with minimal user impact, best practice on large scale systems still requires a staged/progressive prod rollout.

nrhrjrjrjtntbt
·
5 hours ago
·
[ - ]

Yep. This is definitely an "as well as"

Unit test, Integration Test, Staging Test, Staging Rollout, Production Test, Canary, Progressive Rollout

Can all be automated can smash through all that quickly with no human intervention.

tehlike
·
1 day ago
·
[ - ]

You can selectively bypass many roll out procedures in a properly designed system.

lima
·
1 day ago
·
[ - ]

If there is a proper rollout procedure that would've caught this, and they bypass it for routine WAF configuration changes, they might as well not have one.

nrhrjrjrjtntbt
·
14 hours ago
·
[ - ]

Not sure I buy it. Do 1% for 10 minutes. I mean it must have taken over half a day to code and test a patch. Why not wait another 10 minutes.

gpi
·
1 day ago
·
[ - ]

I believe they use Argo according to a previous post mortem.

https://blog.cloudflare.com/deep-dive-into-cloudflares-sept-...

stogot
·
1 day ago
·
[ - ]

The update they describe should never bring down all services. I agree with other posters that they must lack a rollout strategy yet they sent spam emails mocking the reliability of other clouds

brandensilva
·
1 day ago
·
[ - ]

The irony is they support rolling out incrementally with some of their products for deployment.

They need that same mindset for themselves in config/updates/infra changes but probably easier said than done.

·
1 day ago
·
[ - ]

Traubenfuchs
·
1 day ago
·
[ - ]

"Please don‘t block the rollout pipleline with a simple react security patch update."

philipwhiuk
·
1 day ago
·
[ - ]

So their parser broke again I guess.

And no staged rollout I assume?

tialaramex
·
1 day ago
·
[ - ]

Apparently somehow this had never been how Cloudflare did this. I expressed incredulity about this to one of their employees, but yeah, seems like their attitude was "We never make mistakes so it's fastest to just deploy every change across the entire system immediately" and as we've seen repeatedly in the past short while that means it sometimes blows up.

They have blameless post mortems, but maybe "We actually do make mistakes so this practice is not good" wasn't a lesson anybody wanted to hear.

rhdunn
·
1 day ago
·
[ - ]

Blameless post mortems should be similar to air accident investigations. I.e. don't blame the people involved (unless they are acting maliciously), but identify and fix the issues to ensure this particular incident is unlikely to recur.

The intent of the postmortems is to learn what the issues are and prevent or mitigate similar issues happening in the future. If you don't make changes as a result of a postmortem then there's no point in conducting them.

meindnoch
·
1 day ago
·
[ - ]

>don't blame the people involved (unless they are acting maliciously)

Or negligently.

jq-r
·
1 day ago
·
[ - ]

That still shouldn't be a part of post mortem, more of a performance review item.

tempaccount420
·
1 day ago
·
[ - ]

They should be performantly removed.

__turbobrew__
·
22 hours ago
·
[ - ]

The aviation industry regularly requires certifications, check rides, and re-qualifications when humans mess up. I have never seen anything like that in tech.

Sometimes the solution is to not let certain people do certain things which are risky.

Xunjin
·
1 day ago
·
[ - ]

Agree 100%, however using your example, there is no regulatory agency that investigate the issue and demand changes to avoid related future problems. Should the industry move towards this way?

tialaramex
·
1 day ago
·
[ - ]

However, one of the things you see (if you read enough of them) in accident investigation reports for regulated industries is a recurring pattern

1. Accident happens 2. Investigators conclude Accident would not happen if people did X. Recommend regulator requires that people do X, citing previous such recommendations each iteration 3. Regulator declined this recommendation, arguing it's too expensive to do X, or people already do X, or even (hilariously) both 4. Go to 1.

Too often, what happens is that eventually

5. Extremely Famous Accident Happens, e.g. killing loved celebrity Space Cowboy 6. Investigators conclude Accident would not happen if people did X, remind regulator that they have previously recommended requiring X 7. Press finally reads dozens of previous reports and so News Story says: Regulator killed Space Cowboy! 8. Regulator decides actually they always meant to require X after all

ethbr1
·
1 day ago
·
[ - ]

As bad as (3) sounds, I'll strongman the argument: it's important to keep the economic cost of any regulation in mind.*

On the one hand, you'd like to prevent the thing the regulation is seeking to prevent.

On the other hand, you'd have costs for the regulation to be implemented (one-time and/or ongoing).

"Is the good worth the costs?" is a question worth asking every time. (Not least because sometimes it lets you downscope/target regulations to get better good ROI)

*Yes, the easy pessimistic take is 'industry fights all regulation on cost grounds', but the fact that the argument is abused doesn't mean it doesn't have some underlying merit

tialaramex
·
23 hours ago
·
[ - ]

I think conventionally the verb is "to steelman" with the intended contrast being to a strawman, an intentionally weak argument by analogy to how straw isn't strong but steel is. I understood what you meant by "strongman" but I think that "steelman" is better here.

There is indeed a good reason regulators aren't just obliged to institute all recommendations - that would be a lot of new rules. The only accident report I remember reading with zero recommendations was a MAIB (Maritime accidents) report here which concluded that a crew member of a fishing boat has died at sea after their vessel capsized because they both they and the skipper (who survived) were on heroin, the rationale for not recommending anything was that heroin is already illegal, operating a fishing boat while on heroin is already illegal, and it's also obviously a bad idea, so, there's nothing to recommend. "Don't do that".

Cost is rarely very persuasive to me, because it's very difficult to correctly estimate what it will actually cost to change something once you decided it's required - based on current reality where it is not. Mass production and clever cost reductions resulting from the normal commercial pressures tend to drive down costs when we require something but not before (and often not after we cease to require it either)

It's also difficult to anticipate all benefits from a good change without trying it. Lobbyists against a regulation will often try hard not to imagine benefits after all they're fighting not to be regulated. But once it's in action, it may be obvious to everyone that this was just a better idea and absurd it wasn't always the case.

Remember when you were allowed to smoke cigarettes on aeroplanes? That seems crazy, but at the time it was normal and I'm sure carriers insisted that not being allowed to do this would cost them money - and perhaps for a short while it did.

ethbr1
·
4 minutes ago
·
[ - ]

> it's very difficult to correctly estimate what it will actually cost to change something once you decided it's required - based on current reality where it is not. Mass production and clever cost reductions resulting from the normal commercial pressures tend to drive down costs

Difficult, but not impossible.

What are calculable and do NOT scale down is cost for compliance documentation and processes. Changing from 1 form of documentation to 4 forms of documentation has measurable cost, that will be imposed forever.

> It's also difficult to anticipate all benefits from a good change without trying it.

That's not a great argument, because it can be counterbalanced by the equally true opposite: it's difficult to anticipate all downsides to a change without trying it.

> Remember when you were allowed to smoke cigarettes on aeroplanes?

Remember when you could walk up to a gate 5 minutes before a flight, buy a ticket, and fly?

The current TSA security theater has had some benefits, but it's also made using airports far worse as a traveler.

kypro
·
1 day ago
·
[ - ]

> They have blameless post mortems, but maybe "We actually do make mistakes so this practice is not good" wasn't a lesson anybody wanted to hear.

Or they could say, "we want to continue to prioritise speed of security rollouts over stability, and despite our best efforts, we do make mistakes, so sometimes we expect things will blow up".

I guess it depends what you're optimising for... If the rollout speed of security patches is the priority then maybe increased downtime is a price worth paying (in their eyes anyway)... I don't agree with that, but at least it's an honest position to take.

That said, if this was to address the React CVE then it was hardly a speedy patch anyway... You'd think they could have afforded to stagger the rollout over a few hours at least.

lima
·
1 day ago
·
[ - ]

It's just poor risk management at this point. Making sure that a configuration change doesn't crash the production service shouldn't take more than a few seconds in a well-engineered system even if you're not doing staged rollout.

meindnoch
·
1 day ago
·
[ - ]

React (a frontend JS framework) can now bring down critical Internet infrastructure.

I will repeat it because it's so surreal: React (a frontend JS framework) can now bring down critical Internet infrastructure.

cryptonym
·
1 day ago
·
[ - ]

That's Next.js, not React.

Mentioning React Server Components in the status page can be seen as a bad way to shift the blame. Would have been better to not specify which CVE they were trying to patch. The issue is their rollout management, not the Vendor and CVE.

JimDabell
·
1 day ago
·
[ - ]

> That's Next.js, not React.

React seems to think that it was React:

https://react.dev/blog/2025/12/03/critical-security-vulnerab...

cryptonym
·
1 day ago
·
[ - ]

True, thanks for sharing. Worth mentioning that's on the "full-stack" part of the framework. It doesn't impact most React website while it impacts most next.js websites.

tempaccount420
·
1 day ago
·
[ - ]

It was React. Code in React's repository had to be patched to fix this.

Next.JS just happens to be the biggest user of this part of React, but blaming Next.JS is weird...

cryptonym
·
23 hours ago
·
[ - ]

Thanks, that's what I acknowledged in the message you just replied to.

I'm not blaming anyone. Mostly outlining who was impacted as it's not really related to the front-end parts of the framework that the initial comment was referring to.

philipwhiuk
·
1 day ago
·
[ - ]

I think the "argument" is that it's a critical vuln so they can't "go slow".

So now a vuln check for a component deployed on, being generous, 1% of servers causes an outage for 30% of the internet.

The argument is dumb.

spiffytech
·
1 day ago
·
[ - ]

To be accurate: React developed server-side capabilities, and that's where the vulnerability exists.

It's feels noteworthy because React started out frontend-only, but pedantically it's just another backend with a vulnerability.

phplovesong
·
1 day ago
·
[ - ]

[flagged]

mvandermeulen
·
1 day ago
·
[ - ]

What was the AI slop part?

GaryBluto
·
1 day ago
·
[ - ]

When something goes wrong, people are starting to immediately assume it's because of the thing they don't like.

o_m
·
23 hours ago
·
[ - ]

I wonder if this is the new normal? Weekly Cloudflare outages that breaks huge parts of the internet.

uyzstvqs
·
1 day ago
·
[ - ]

Ah yes, Cloudflare's worst enemy: The configuration change.

hinkley
·
19 hours ago
·
[ - ]

On fridays, yes.

aatd86
·
1 day ago
·
[ - ]

so it's react again in the end .. zzzzzzz

pepoluan
·
23 hours ago
·
[ - ]

So. Another regex problem?

xyproto
·
1 day ago
·
[ - ]

Yes.

Weird that https://www.cloudflarestatus.com/ isn't reporting this properly. It should be full of red blinking lights.

javier2
·
1 day ago
·
[ - ]

Yeah. I only work for a small company, but you can be certain we will not update the status page if only a small portion of customers are affected, and if we are fully down, rest assured there will be no available hands to keep the status page updated

s_dev
·
1 day ago
·
[ - ]

>rest assured there will be no available hands to keep the status page updated

That's not how status pages if implemented correctly work. The real reason status pages aren't updated is SLAs. If you agree on a contract to have 99.99% uptime your status page better reflect that or it invalidates many contracts. This is why AWS also lies about it's uptime and status page.

These services rarely experience outages according their own figures but rather 'degraded performance' or some other language that talks around the issue rather than acknowledging it.

It's like when buying a house you need an independent surveyor not the one offered by the developer/seller to check for problems with foundations or rotting timber.

redm
·
1 day ago
·
[ - ]

SLA’s usually just give you a small credit for the exact period of the incident, which is arymetric to the impact. We always have to negotiate for termination rights for failing to meet SLA standards but, in reality, we never exercise them.

Reality is that in an incident, everyone is focused on fixing issue, not updating status pages; automated checks fail or have false positives often too. :/

korm
·
1 day ago
·
[ - ]

Yep, every SLA I've ever seen only offers credit. The idea that providers are incentivized to fudge uptime % due to SLAs makes no sense to me. Reputation and marketing maybe, but not SLAs.

The compensation is peanuts. $137 off a $10,000 bill for 10 hours of downtime, or 98.68% uptime in a month, is well within the profit margins.

laurent123456
·
1 day ago
·
[ - ]

This is weird - at this level contracts are supposed to be rock solid so why wouldn't they require accurate status reporting? That's trivial to implement, and you can even require to have it on a neutral third-party like UptimeRobot and be done with it.

I'm sure there are gray areas in such contracts but something being down or not is pretty black and white.

franga2000
·
1 day ago
·
[ - ]

> something being down or not is pretty black and white

This is so obviously not true that I'm not sure if you're even being serious.

Is the control panel being inaccessible for one region "down"? Is their DNS "down" if the edit API doesn't work, but existing records still get resolved? Is their reverse proxy service "down" if it's still proxying fine, just not caching assets?

laurent123456
·
1 day ago
·
[ - ]

I understand there are nuances here, and I may be oversimplifying, but if part of the contract effectively says "You must act as a proxy for npmjs.com" yet the site has been returning 500 Cloudflare errors across all regions several times within a few weeks while still reporting a shining 99.99% uptime, something doesn't quite add up. Still, I'm aware I don't know much about these agreements, and I'm assuming the people involved aren't idiots and have already considered all of this.

remus
·
1 day ago
·
[ - ]

> I'm sure there are gray areas in such contracts but something being down or not is pretty black and white.

Is it? Say you've got some big geographically distributed service doing some billions of requests per day with a background error rate of 0.0001%, what's your threshold for saying whether the service is up or down? Your error rate might go to 0.0002% because a particular customer has an issue so that customer would say it's down for them, but for all your other customers it would be working as normal.

javier2
·
22 hours ago
·
[ - ]

> something being down or not is pretty black and white

it really isn't. We often have degraded performance for a portion of customers, or just down for customers of a small part of the service. It has basically never happened that our service is 100% down.

lucianbr
·
1 day ago
·
[ - ]

Are the contracts so easy to bypass? Who signs a contract with an SLA knowing the service provider will just lie about the availability? Is the client supposed to sue the provider any time there is an SLA breach?

netdevphoenix
·
1 day ago
·
[ - ]

Anyone who doesn't have any choice financially or gnostically. Same reason why people pay Netflix despite the low quality of most of their shows and the constant termination of tv series after 1 season. Same reason why people put up with Meta not caring about moderating or harmful content. The power dynamics resemble a monopoly

lucianbr
·
1 day ago
·
[ - ]

Why bother to put the SLA in the contract at all, if people have no choice but to sign it?

Netflix doesn't put in the contract that they will have high-quality shows. (I guess, don't have a contract to read right now.)

ozim
·
1 day ago
·
[ - ]

Most of services are not really critical but customers want to have 99.999% on the paper.

Most of the time people will just get by and ignore even full day of downtime as minor inconvenience. Loss of revenue for the day - well you most likely will have to eat that, because going to court and having lawyers fighting over it most likely will cost you as much as just forgetting about it.

If your company goes bankrupt because AWS/Cloudflare/GCP/Azure is down for a day or two - guess what - you won't have money to sue them ¯\_(ツ)_/¯ and most likely will have bunch of more pressing problems on your hand.

heipei
·
1 day ago
·
[ - ]

The client is supposed to monitor availability themselves, that is how these contracts work.

immibis
·
1 day ago
·
[ - ]

The company that is trying to cancel its contract early needs to prove the SLA was violated, which is very easy of the company providing the service also provides a page that says their SLA was violated. Otherwise it's much harder to prove.

8cvor6j844qw_d6
·
1 day ago
·
[ - ]

I imagine there will be many levels of "approvals" to get the status page actually showing down, since SLA uptime contracts is involved.

javier2
·
1 day ago
·
[ - ]

I work for a small company. We have no written SLA agreements.

lawnchair
·
1 day ago
·
[ - ]

I have to say that if an incident becomes so overwhelming that nobody can spare even a moment to communicate with customers, that points to a deeper operational problem. A status page is not something you update only when things are calm. It is part of the response itself. It is how you keep users informed and maintain trust when everything else is going wrong.

If communication disappears entirely during an outage, the whole operation suffers. And if that is truly how a company handles incidents, then it is not a practice I would want to rely on. Good operations teams build processes that protect both the system and the people using it. Communication is one of those processes.

onion2k
·
1 day ago
·
[ - ]

if we are fully down, rest assured there will be no available hands to keep the status page updated

There is no quicker way for customers to lose trust in your service than it to be down and for them to not know that you're aware and trying to fix it as quickly as possible. One of the things Cloudflare gets right is the frequent public updates when there's a problem.

You should give someone the responsibility for keeping everyone up to date during an incident. It's a good idea to give that task to someone quite junior - they're not much help during the crisis, and they learn a lot about both the tech and communication by managing it.

·
1 day ago
·
[ - ]

GoblinSlayer
·
1 day ago
·
[ - ]

You won't be able to update the status page due to failures anyway.

PhilippGille
·
1 day ago
·
[ - ]

Why not? A good status page runs on a different cloud provider in a different region, specifically to not be affected at the same time.

63stack
·
1 day ago
·
[ - ]

This is just business as usual, status pages are 95% for show now. The data center would have to be under water for the status page to say "some users might be experiencing disruptions".

csomar
·
1 day ago
·
[ - ]

They just did an update, and it is bad (in the sense that they are not realizing their clients are down?)

> Investigating - Cloudflare is investigating issues with Cloudflare Dashboard and related APIs.

> These issues do not affect the serving of cached files via the Cloudflare CDN or other security features at the Cloudflare Edge.

> Customers using the Dashboard / Cloudflare APIs are impacted as requests might fail and/or errors may be displayed.

Eikon
·
1 day ago
·
[ - ]

> (in the sense that they are not realizing their clients are down?)

Their own website seems down too https://www.cloudflare.com/

500 Internal Server Error

cloudflare

mikkom
·
1 day ago
·
[ - ]

>Customers using the Dashboard / Cloudflare APIs are impacted as requests might fail and/or errors may be displayed.

"Might fail"

yapyap
·
1 day ago
·
[ - ]

well it does say that now, so…

which datacenter got flooded?

rvnx
·
1 day ago
·
[ - ]

> In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary. Dec 05, 2025 - 09:00 UTC

It's a scheduled maintenance, so SLA should not apply right ?

darccio
·
1 day ago
·
[ - ]

https://updog.ai/status/cloudflare reported the incident 13 minutes ago (at the moment of writing this).

chironjit
·
1 day ago
·
[ - ]

Yeah, their status site reports nothing but then clicking on some of the links on that site bring you the 500 error

mikkom
·
1 day ago
·
[ - ]

Company internal status pages are always like this. When you don't report problems they don't exist!

Havoc
·
1 day ago
·
[ - ]

It’s wild how non of the big corporations can make a functional status page

javier2
·
1 day ago
·
[ - ]

They could, but accurate reporting is not good for their SLAs

dncornholio
·
1 day ago
·
[ - ]

They can. They don't want to though.

hinkley
·
1 day ago
·
[ - ]

They were intending to start a maintenance window starting 6 minutes ago, but they were already down by then.

dinoqqq
·
1 day ago
·
[ - ]

There is an update:

"Cloudflare Dashboard and Cloudflare API service issues"

Investigating - Cloudflare is investigating issues with Cloudflare Dashboard and related APIs.

Customers using the Dashboard / Cloudflare APIs are impacted as requests might fail and/or errors may be displayed. Dec 05, 2025 - 08:56 UTC

rollulus
·
1 day ago
·
[ - ]

Not weird, that’s tradition by now.

jbuild
·
1 day ago
·
[ - ]

Interesting, I get a 500 if I try to visit coinbase.com, but my WebSocket connections to advanced-trade-ws.coinbase.com are still live with no issues.

emakarov
·
1 day ago
·
[ - ]

probably these websockets are not going through cloudflare

fxd123
·
1 day ago
·
[ - ]

> Investigating - Cloudflare is investigating issues with Cloudflare Dashboard and related APIs.

They seem to now, a few min after your comment

redm
·
1 day ago
·
[ - ]

Im much more concerned with customer sites being down which indicates are not impacted. They are.. :/

jonathanlydall
·
1 day ago
·
[ - ]

Now showing a message, posted at 08:56 UTC.

jachee
·
1 day ago
·
[ - ]

Management is always going to take too long (in an engineer’s opinion) to manually throw the alerts on. They’re pressing people for quick fixes so they can claim their SLAs are intact.

devmor
·
1 day ago
·
[ - ]

Yes, the incident report claims this was limited to their client dashboard. It most certainly was not. I have the PagerDuty alerts to prove it...

tjpnz
·
1 day ago
·
[ - ]

They have enough data to at least automate yellow.

rvz
·
1 day ago
·
[ - ]

The AI agents can't help out on this time.

rifycombine1
·
1 day ago
·
[ - ]

maybe we can back to stackoverflow :)

·
1 day ago
·
[ - ]

csomar
·
1 day ago
·
[ - ]

> In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary. Dec 05, 2025 - 07:00 UTC

Something must have gone really wrong.

headmelted
·
1 day ago
·
[ - ]

It's 1AM in San Francisco right now. I don't envy the person having to call Matthew Prince and wake him up for this one. And I feel really bad for the person that forgot a closing brace in whatever config file did this.

artlovecode
·
1 day ago
·
[ - ]

Agreed, I feel bad for them. But mostly because cloudflare's workflows are so bad that you're seemingly repeatedly set up for really public failures. Like how does this keep happening without leadership's heads rolling. The culture clearly is not fit for their level of criticality

esseph
·
1 day ago
·
[ - ]

> The culture clearly is not fit for their level of criticality

I don't think anyone's is.

everfrustrated
·
1 day ago
·
[ - ]

How often do you hear of Akamai going down and they host a LOT more enterprise/high value sites than Cloudflare.

There's a reason Cloudflare has been really struggling to get into the traditional enterprise space and it isn't price.

inferiorhuman
·
1 day ago
·
[ - ]

A quick google turned up an Akamai outage in July that took Linode down and two in 2021. At that scale nobody's going to come up smelling like roses. I mostly dealt with Amazon crap at megacorp, but nobody that had to deal with our Akamai stuff had anything kind to say about them as a vendor.

At first blush it's getting harder to "defend" use of Cloudflare, but I'll wait until we get some idea of what actually broke. For the time being I'll save my outrage for the AI scrapers that drove everyone into Cloudflare's arms.

esseph
·
21 hours ago
·
[ - ]

The last place I heard of someone deploying anything to Akamai was 15 years ago in FedGov.

Akamai was historically only serving enterprise customers. Cloudflare opened up tons of free plans, new services, and basically swallowed much of that market during that time period.

viraptor
·
1 day ago
·
[ - ]

> I don't envy the person having to call Matthew Prince

They shouldn't need to do that unless they're really disorganised. CEOs are not there for day to day operations.

csomar
·
1 day ago
·
[ - ]

> And I feel really bad for the person that forgot a closing brace in whatever config file did this.

If a closing brace take your whole infra. down, my guess is that we'll see more of this.

shafyy
·
1 day ago
·
[ - ]

Life hack: Announce bug that brings your entire network down as scheduled maintenance.

·
1 day ago
·
[ - ]

tommek4077
·
1 day ago
·
[ - ]

Yes, it’s really ‘weird’ that they refuse to share any details. Completely unlike AWS, for example. As if being open about issues with their own product wouldn’t be in their best interest. /s

timvdalen
·
1 day ago
·
[ - ]

Wow, just plain 500s on customer sites. That's a level of down you don't see that often.

ablation
·
1 day ago
·
[ - ]

Yeah that's a hard 500 right? Not even Cloudflare's 500 branded page like last time. What could have caused this, I wonder.

mckirk
·
1 day ago
·
[ - ]

"A cable!"

"How do you know?"

"I'm holding it!"

Hamuko
·
1 day ago
·
[ - ]

I hope it’s not another Result.unwrap().

singularity2001
·
1 day ago
·
[ - ]

maybe this would cause rust to adopt exception handling, and by exception I mean panic

maxekman
·
1 day ago
·
[ - ]

A precious glimpse of the less seen page renders.

gwd
·
1 day ago
·
[ - ]

Unlike the previous outage, my server seems fine, and I can use Cloudflare's tunnel to ssh to the host as well.

willtemperley
·
1 day ago
·
[ - ]

Yes Claude is down with a 500 (cloudflare).

disillusioned
·
1 day ago
·
[ - ]

At least they branded it!

·
1 day ago
·
[ - ]

Eikon
·
1 day ago
·
[ - ]

Mine [0] seems to be very high latency but no 500s. But yes, most cloudflare-proxied websites I tried seems to just return 500s.

[0] https://www.merklemap.com/

ransom1538
·
1 day ago
·
[ - ]

So. I don't understand the 5 nines they promote. One bad day those nines are gone. So they next year you are pushing 2 nines.

kingstnap
·
1 day ago
·
[ - ]

Its just fabricated bullshit. It's how all the companies do it. 99.999% over a year is literally 5 minutes. Or under an hour in a decade, that's wildly unrealistic.

Reddit was once down for a full day and that month they reported 99.5% uptime instead of 99.99% as they normally claimed for most months.

There is this amazing combination of nonsense going on to achieve these kinds of numbers:

1. Straight up fraudulent information on status page. Reporting incendents as more minor than any internal monitors would claim.

2. If it's working for at least a few percent of customers it's not down. Degraded is not counted.

3. If any part of anything is working then it's not down. For example with the reddit example even if the site was dead as long as the image server is still at 1% functional with some internal ping the status is good.

zelphirkalt
·
21 hours ago
·
[ - ]

Funnily enough an hour in a decade on a good hoster, with a stable service running on it, occasionally updated by version number ... it might even be possible. Maybe not quite, but close, if one tries. While it seems completely impossible with cloudflare, AWS, and whatnot, who are having outages every other week these days.

jondot
·
1 day ago
·
[ - ]

its like someone-shut-down-the-power 500s

madjam002
·
1 day ago
·
[ - ]

Looking forward to the post mortem on this one. We weren't affected (just using the CDN), and people are saying they weren't affected who are using Cloudflare Workers (a previous culprit which we've since moved off), so I wonder what service / API was actually affected that brought down multiple websites with a 500 but not all of them.

Wise was just down which is a pretty big one.

Also odd how some websites were down this time that previously weren't down with the global outage in November

archon810
·
1 day ago
·
[ - ]

Our locations excluded from Cloudflare WAF were up, but the rest was down. I think WAF took a dump.

reassess_blind
·
1 day ago
·
[ - ]

Yeah it's strange. My sites that are are proxied through Cloudflare remained up, but Supabase was taken offline so some backends were down. Either a regional PoP style issue, or a specific API or service had to be used to be affected.

gritzko
·
1 day ago
·
[ - ]

The entire Cloud/SaaS story had a lot of happy-path cost optimization. The particular glitch that triggered the domino effect may be irrelevant relative to the fact that the effect reproduces.

thinkindie
·
1 day ago
·
[ - ]

we were not affected too and we realised it was Cloudflare because Linear was down and they were mentioning an upstream service. Also Ecosia was affected, and I then realised they might be relying on Cloudflare too.

themly
·
1 day ago
·
[ - ]

CDN was definitely down also. We were widely impacted by it with 500's.

gowthamgts12
·
1 day ago
·
[ - ]

CDN was also affected for some customers. we were down with 500.

m_mueller
·
1 day ago
·
[ - ]

Maven Repository was down for me for a while, now it recovered.

cryptonym
·
1 day ago
·
[ - ]

> Looking forward to the post mortem

This is becoming a meme.

meandmycode
·
1 day ago
·
[ - ]

This has to be setting off some alarm bells internally, a well written postmortem on an occasional issue, great, but when your postmortem talks about learnings and improvements yet major outages keep happening, it becomes meaningless..

kryptn
·
1 day ago
·
[ - ]

was interesting, some of our stuff failed, but some other stuff that used cloudflare indirectly didn't.

da_grift_shift
·
1 day ago
·
[ - ]

The excuse:

>A change made to how Cloudflare's Web Application Firewall parses requests caused Cloudflare's network to be unavailable for several minutes this morning.

>The change was deployed by our team to help mitigate the industry-wide vulnerability disclosed this week in React Server Components.

>We will share more information as we have it today.

https://www.cloudflarestatus.com/incidents/lfrm31y6sw9q

madjam002
·
1 day ago
·
[ - ]

It's quite an unfortunate coincidence that React has indirectly been the reason for two recent issues at Cloudflare haha

brobdingnagians
·
1 day ago
·
[ - ]

Two's a coincidence, three's a pattern; I guess we will have to wait until next month to see if it becomes a pattern. Was there a particular aspect of the React Server Components that made it easy to have this problem appear? would it have been caught or avoided in another framework or language?

GoblinSlayer
·
1 day ago
·
[ - ]

Who sent an xml request?

Palmik
·
1 day ago
·
[ - ]

This is second time this week: https://news.ycombinator.com/item?id=46140145

The previous one affected European users for >1h and made many Cloudflare websites nearly unusable for them.

AmateurAlert
·
1 day ago
·
[ - ]

https://downdetector.com/ classic

26d0
·
1 day ago
·
[ - ]

hmm... https://downdetectorsdowndetector.com/

(edit: it's working now (detecting downdetector's down))

vanyauhalin
·
1 day ago
·
[ - ]

So,

This one is green: https://downdetectorsdowndetector.com

This one is not openning: https://downdetectorsdowndetectorsdowndetector.com

This one is red: https://downdetectorsdowndetectorsdowndetectorsdowndetector....

Recursing
·
1 day ago
·
[ - ]

https://en.wikipedia.org/wiki/Fundamental_theorem_of_softwar...

superdisk
·
1 day ago
·
[ - ]

Lol. The fact that the 4x one actually works and is correctly reporting that the 3x one is down actually makes this a lot funnier to me.

altmanaltman
·
1 day ago
·
[ - ]

it's like they didn't fully think it through/expect people to actually use it so soon

mrducksy
·
1 day ago
·
[ - ]

It’s down detectors all the way down!

ssolarsystem1
·
1 day ago
·
[ - ]

downdetectorsdowndetectors didn't detect breakdown of downdetectors with 500 Error

xyproto
·
1 day ago
·
[ - ]

A wrong downdetectordowntector is worse than a 500 one. :D

·
1 day ago
·
[ - ]

deveesh_shetty
·
1 day ago
·
[ - ]

You had one job.

manyaoman
·
1 day ago
·
[ - ]

So down²detector was fake all along?

andy_ppp
·
1 day ago
·
[ - ]

https://www.youtube.com/watch?v=OC06Z6lCB_Q

Andugal
·
1 day ago
·
[ - ]

So DownDetector is down, but DownDetectorDownDetector does not detect it... We probably need one more DownDetector. (no)

namjh
·
1 day ago
·
[ - ]

Yes we do have[^1] but unfortunately it looks like not checking the integrity, just reachability.

[1]: https://downdetectorsdowndetectorsdowndetector.com/

halgir
·
1 day ago
·
[ - ]

We have one. But according to Down Detector's Down Detector's Down Detector's Down Detector, that's also down.

Dilettante_
·
1 day ago
·
[ - ]

Well Down Detector's Down Detector isn't down...What we might need is a Down Detector's Down Detector Validator

O4epegb
·
1 day ago
·
[ - ]

This is a fake detector that just has frontend logic for mocking realistic data, you can easily see it in the source code.

·
1 day ago
·
[ - ]

maxlin
·
1 day ago
·
[ - ]

>half the internet is down >downdetector is down >downdetector down detector reports everything is fine

software was a mistake

aurareturn
·
1 day ago
·
[ - ]

Ehh, so down detector for down detector is up but it is inaccurate.

aroman
·
1 day ago
·
[ - ]

great news, schrodingersdetector.com is available!

xx_ns
·
1 day ago
·
[ - ]

At least it's still right in spite of being down.

asmor
·
1 day ago
·
[ - ]

That's the 30% vibe code they promised us.

Cynicism aside, something seems to be going wrong in our industry.

joenada
·
1 day ago
·
[ - ]

Going? I think we got there a long time ago. I'm sure we all try our best but our industry doesn't take quality seriously enough. Not compared to every other kind of engineering discipline.

asmor
·
1 day ago
·
[ - ]

Always been there. But it seems to be creeping into institutions that previously cared over the past few years, accelerating in the last.

themafia
·
1 day ago
·
[ - ]

Salaries are flat relative to inflation and profits. I've long felt that some of the hype around "AI" is part of a wage suppression tactic.

nlitened
·
1 day ago
·
[ - ]

Also “Rewrite it in Rust”.

P.S. it’s a joke, guys, but you have to admit it’s at least partially what’s happening

koakuma-chan
·
1 day ago
·
[ - ]

No, it has nothing to do with Rust.

gwd
·
1 day ago
·
[ - ]

But it might have something to do with the "rewrite" part:

> The idea that new code is better than old is patently absurd. Old code has been used. It has been tested. Lots of bugs have been found, and they’ve been fixed. There’s nothing wrong with it. It doesn’t acquire bugs just by sitting around on your hard drive.

> Back to that two page function. Yes, I know, it’s just a simple function to display a window, but it has grown little hairs and stuff on it and nobody knows why. Well, I’ll tell you why: those are bug fixes. One of them fixes that bug that Nancy had when she tried to install the thing on a computer that didn’t have Internet Explorer. Another one fixes that bug that occurs in low memory conditions. Another one fixes that bug that occurred when the file is on a floppy disk and the user yanks out the disk in the middle. That LoadLibrary call is ugly but it makes the code work on old versions of Windows 95.

> Each of these bugs took weeks of real-world usage before they were found. The programmer might have spent a couple of days reproducing the bug in the lab and fixing it. If it’s like a lot of bugs, the fix might be one line of code, or it might even be a couple of characters, but a lot of work and time went into those two characters.

> When you throw away code and start from scratch, you are throwing away all that knowledge. All those collected bug fixes. Years of programming work.

From https://www.joelonsoftware.com/2000/04/06/things-you-should-...

windward
·
1 day ago
·
[ - ]

A lot of words for a 'might'. We don't know what caused the downtime.

gwd
·
1 day ago
·
[ - ]

Not this time; but the rewrite was certainly implicated in the previous one. They actually had two versions deployed; in response to unexpected configuration file size, the old version degraded gracefully, while the new version failed catastrophically.

perching_aix
·
1 day ago
·
[ - ]

Both versions were taken off-guard by the defective configuration they fetched, it was not a case of a fought and eliminated bug reappearing like in the blogpost you quoted.

·
1 day ago
·
[ - ]

perching_aix
·
1 day ago
·
[ - ]

[dead]

zwnow
·
1 day ago
·
[ - ]

The first one had something to do with Rust :-)

kortilla
·
1 day ago
·
[ - ]

Not really. In C or C++ that could have just been a segfault.

.unwrap() literally means “I’m not going to handle the error branch of this result, please crash”.

mike_hearn
·
1 day ago
·
[ - ]

Indeed, but fortunately there are more languages in the world than Rust and C++. A language that performed decently well and used exceptions systematically (Java, Kotlin, C#) would probably have recovered from a bad data file load.

koakuma-chan
·
1 day ago
·
[ - ]

There is nothing that prevents you from recovering from a bad data file load in Rust. The programmer who wrote that code chose to crash.

mike_hearn
·
1 day ago
·
[ - ]

That's exactly my point. There should be no such thing as choosing to crash if you want reliable software. Choosing to crash is idiomatic in Rust but not in managed languages in which exceptions are the standard way to handle errors.

koakuma-chan
·
1 day ago
·
[ - ]

I am not a C# guy, but I wrote a lot of Java back in the day, and I can authoritatively tell you that it has so-called "checked exceptions" that the compiler forces you to handle. However, it also has "runtime exceptions" that you are not forced to handle, and they can happen any where and any time. Conceptually, it is the same as error versus panic in Rust. One such runtime exception is the notorious `java.lang.NullPointerException` a/k/a the billion-dollar mistake. So even software in "managed" languages can and does crash, and it is way more likely to do so than software written in Rust, because "managed" languages do not have all the safety features Rust has.

mike_hearn
·
22 hours ago
·
[ - ]

In practice, programs written in managed languages don't crash in the sense of aborting the entire process. Exceptions are usually caught at the top level (both checked and unchecked) and then logged, usually aborting the whole unit of work.

For trapping a bad data load it's as simple as:

    try {
        data = loadDataFile();
    } catch (Exception e) {
        LOG.error("Failed to load new data file; continuing with old data", e);        
    }

This kind of code is common in such codebases and it will catch almost any kind of error (except out of memory errors).

koakuma-chan
·
21 hours ago
·
[ - ]

Here is the Java equivalent of what happened in that Cloudflare Rust code:

  try {
    data = loadDataFile();
  } catch (Exception e) {
    LOG.error("Failed to load new data file", e);
    System.exit(1);
  }

So the "bad data load" was trapped, but the programmer decided that either it would never actually occur, or that it is unrecoverable, so it is fine to .unwrap(). It would not be any less idiomatic if, instead of crashing, the programmer decided to implement some kind of recovery mechanism. It is that programmer's fault, and has nothing to do with Rust.

Also, if you use general try-catch blocks like that, you don't know if that try-catch block actually needs to be there. Maybe it was needed in the past, but something changed, and it is no longer needed, but it will stay there, because there is no way to know unless you specifically look. Also, you don't even know the exact error types. In Rust, the error type is known in advance.

mike_hearn
·
16 hours ago
·
[ - ]

Yes, I know. But nobody writes code like that in Java. I don't think I've ever seen it outside of top level code in CLI tools. Never in servers.

> It is that programmer's fault, and has nothing to do with Rust.

It's Rust's fault. It provides a function in its standard library that's widely used and which aborts the process. There's nothing like that in the stdlibs of Java or .NET

> Also, if you use general try-catch blocks like that, you don't know if that try-catch block actually needs to be there.

I'm not getting the feeling you've worked on many large codebases in managed languages to be honest? I know you said you did but these patterns and problems you're raising just aren't problems such codebases have. Top level exception handlers are meant to be general, they aren't supposed to be specific to certain kinds of error, they're meant to recover from unpredictable or unknown errors in a general way (e.g. return a 500).

koakuma-chan
·
6 hours ago
·
[ - ]

> It's Rust's fault. It provides a function in its standard library that's widely used and which aborts the process. There's nothing like that in the stdlibs of Java or .NET

It is the same as runtime exceptions in Java. In Rust, if you want to have a top-level "exception handler" that catches everything, you can do

  ::std::panic::catch_unwind(|| {
    // ...
  })

In case of Cloudflare, the programmer simply chose to not handle the error. It would have been the same if the code was written in Java. There simply would be no top-level try-catch block.

GoblinSlayer
·
1 day ago
·
[ - ]

When dotnet has an unhandled exception, it terminates with abort.

·
1 day ago
·
[ - ]

MegaThorx
·
1 day ago
·
[ - ]

Did you consider to rewrite your joke in rust?

kenonet
·
1 day ago
·
[ - ]

it's never the technology, it's the implementation

rifycombine1
·
1 day ago
·
[ - ]

cc: @oncall then trigger pagerduty :)

iso1631
·
1 day ago
·
[ - ]

> Cynicism aside, something seems to be going wrong in our industry.

Started after the GFC and the mass centralisation of infrastructure

domysee
·
1 day ago
·
[ - ]

I'm just realizing how much we depend on Cloudflare working. Every service I use is unreachable. Even worse than last time. It's almost impossible to do any work atm.

makkoncept
·
1 day ago
·
[ - ]

https://downdetectorsdowndetector.com/ is up :) but the status is not correct.

erikbye
·
1 day ago
·
[ - ]

Cloudflare uptime has worsened a lot lately, AI coding has increased exponentially, hmm

glimshe
·
1 day ago
·
[ - ]

Not only they make my browsing experience a LOT worse (seconds per site for bot detection and additional "are you human" clicks even without VPNs), now they are bringing the entire Internet down. They don't deserve the position they currently have.

gilrain
·
1 day ago
·
[ - ]

> Not only they make my browsing experience a LOT worse

No, I did (metaphorically, for the websites I control). And I did it because otherwise those sites are fully offline or unusable thanks to the modern floods of unfilterable scrapers.

Months of piecemeal mitigations, but Attack Mode is the only thing that worked. Blame the LLM gold rush and the many, many software engineers with no ethics and zero qualms about racing to find the bottom of the Internet.

The_President
·
15 hours ago
·
[ - ]

The whole “not a bot” prompt every three hours seems like it has potential to get out of the way more often.

wrobelda
·
1 day ago
·
[ - ]

You make it sound like the DDoS and Bots are their fault.

glimshe
·
1 day ago
·
[ - ]

They make gazillions. I'm sure they can do better than that.

How many awful things in tech can be rationalized away by "sorry, but this is for you/our protection"?

headmelted
·
1 day ago
·
[ - ]

Claude offline too. 500 errors on the web and the mobile app has been knocked out.

lionkor
·
1 day ago
·
[ - ]

I had to switch to Gemini for it to help me form a thought so I could type this reply. Its dire.

hasperdi
·
1 day ago
·
[ - ]

Even LinkedIn is now down. Opening linkedin.com gives me a 500 server error and Cloudflare at the bottom. Quite embarassing.

asmor
·
1 day ago
·
[ - ]

At least they were available when Front Door was down!

thiscatis
·
1 day ago
·
[ - ]

Somebody at Cloudflare is stretching that initial investigation time as much as possible to avoid having to update their status to being down and losing that Christmas bonus.

phartenfeller
·
1 day ago
·
[ - ]

Wow, three times in a month is really crushing their trust.

8cvor6j844qw_d6
·
1 day ago
·
[ - ]

I'll need to checkup on DigitalOcean uptime, may be better than Cloudflare.

phartenfeller
·
1 day ago
·
[ - ]

My Hetzner servers have been running fine for years. Okay, there were times when I broke something, but at least I was able to fix it quickly and never felt dependent on others.

iso1631
·
1 day ago
·
[ - ]

CxOs want to be dependent on someone else, specifically suppliers with pieces of paper saying "we are great, here's a 1% discount on next years renewal"

If the in house tech team breaks something and fixes it, that's great from an engineer point of view - we like to be useful, but the person at the top is blamed.

If an outsourced supplier (one which the consultants recommend, look at Gartner Quadrants etc) fails, then the person at the top is not blamed, even though they are powerless and the outage is 10 times longer and 10 times as frequent.

Outsourcing is not about outcome, it's about accountability, and specifically avoiding it.

dabeeeenster
·
1 day ago
·
[ - ]

3?! When was the second>

ubercore
·
1 day ago
·
[ - ]

https://news.ycombinator.com/item?id=46140145

reneberlin
·
1 day ago
·
[ - ]

I can imagine the horror of pressure of the people responsible for resolution. On that scale of impact it is very hard to keep calm - but still the hive of minds have to cooperate and solve the puzzle while the world is basically halted and ready to blame the company you work for.

tin7in
·
1 day ago
·
[ - ]

For us also Digital Ocean, Render, and a few other vendors are down.

At this point picking vendors that don't use Cloudflare in any way becomes the right thing to do.

bigfudge
·
1 day ago
·
[ - ]

Claude was also down (which brought me here)

ianberdin
·
1 day ago
·
[ - ]

I have 10B idea: cloudflare that does not fail so often.

biql
·
1 day ago
·
[ - ]

How about: internet that is actually decentralized.

ianberdin
·
1 day ago
·
[ - ]

Yes, on one hand, it was so wonderful. Cloudflare came and said, "Yeah, now we'll save everyone from DDoS, everything's perfect, we'll speed up your site," and bam, they became a bottleneck for the entire internet. It's some kind of nightmare. Why didn't several other such popular startups appear, into which more money was invested, and which would allow some failure points to be created? I don't understand this. Or at least Cloudflare itself should have had some backup mechanism, so that in case of failures, something still works, even slowly, or at least they could redirect traffic directly, bypassing their proxies. They just didn't do that at all. Something is definitely wrong.

viraptor
·
1 day ago
·
[ - ]

> Why didn't several other such popular startups appear

bunny.net

fastly.com

gcore.com

keycdn.com

Cloudfront

Probably some more I forgot now. CF is not the only option and definitely not the best option.

> Yeah, now we'll save everyone from DDoS, everything's perfect, we'll speed up your site,

... and host the providers selling DDoS services. https://privacy-pc.com/articles/spy-jacking-the-booters.html

ianberdin
·
1 day ago
·
[ - ]

Thank you for sending these alternatives, they look good. And, of course, the most important thing is that Cloudflare is free, while these alternatives cost money. And they cost hundreds of dollars at my traffic volume of tens of terabytes. Of course, I really don't want to pay. So, as they say, mice wept and jabbed, but they kept gnawing on the cactus.

viraptor
·
1 day ago
·
[ - ]

Nothing's free - one day they will come knocking. Better be prepared to serve at an affordable level.

iso1631
·
1 day ago
·
[ - ]

Nobody got fired for choosing clownflare

reddalo
·
1 day ago
·
[ - ]

It exists and it's called Bunny.net

SoKamil
·
1 day ago
·
[ - ]

Looking at their market cap it’s 71.5B idea

theginger
·
1 day ago
·
[ - ]

I don't want to criticize cloud flare, I love what they do and understand the scale of the challenge, but most people don't and 2 in a month or so like this is going to hit their reputation.

The_President
·
15 hours ago
·
[ - ]

After being overly critical of Matrix the other day on here I have reeled back into another conclusion, is that talent issues are industry wide and it sucks making a bad hire where competence issues arise that don’t match the resume.

OtherShrezzing
·
1 day ago
·
[ - ]

The site is back up, but it feels fairly silly that a platform that has inserted itself as a single point of failure has an architecture that's got single points of failure.

The other companies working at that scale have all sensibly split off into geographical regions & product verticals with redundancy & it's rare that "absolutely all of AWS everywhere is offline". This is two total global outages in as many weeks from Cloudflare, and a third "mostly global outage" the week before.

themafia
·
1 day ago
·
[ - ]

Crop monoculture created the potato famine. We failed to learn the larger lesson. "Hyperscale" is inherently dangerous.

ricardo81
·
1 day ago
·
[ - ]

Their uptime over the year is likely faring worse than your average hosting company, DNS provider or CDN.

cryptonym
·
1 day ago
·
[ - ]

Some may experience more downtime due to their outages than they'd have from DDoS.

iso1631
·
1 day ago
·
[ - ]

Their uptime over the year is faring worse than one of my pi holes, let alone the resilient service.

SwedishPerson_A
·
1 day ago
·
[ - ]

https://www.tandfonline.com/doi/full/10.1080/02673843.2023.2... https://www.perplexity.ai/ https://www.researchgate.net/

All give me

"500 Internal Server Error cloudflare.."

So I'm guessing yes.

markaroo
·
1 day ago
·
[ - ]

React Server Components. Let that sink in. What did they do to us? This is a thing that should not exist, causing pain in the real world.

The_President
·
23 hours ago
·
[ - ]

Like a turkey made of tofu. It’s all in the name: react.

nlstitch
·
1 day ago
·
[ - ]

What ever happened to "no deploys on fridays"? haha

kenonet
·
1 day ago
·
[ - ]

haha for real

asim
·
1 day ago
·
[ - ]

I'd like to start seeing the architecture and design of how cloudflare works. Not blog posts, like a whole write-up. If you're going to have this many outages and you're a public company which 2/3rd of US infrastructure probably depends on then it might need some external input. Obviously they know what they're doing. This is not a blame game but the tools are starting to creak.

f311a
·
1 day ago
·
[ - ]

> A change made to how Cloudflare's Web Application Firewall parses requests caused Cloudflare's network to be unavailable for several minutes this morning. This was not an attack; the change was deployed by our team to help mitigate the industry-wide vulnerability disclosed this week in React Server Components. We will share more information as we have it today.

cherioo
·
1 day ago
·
[ - ]

Where’s the source for this?

It doesn’t look good when similar WAF issues caused their big outage a few years back.

xomiachuna
·
1 day ago
·
[ - ]

https://www.cloudflarestatus.com/incidents/lfrm31y6sw9q

capnsketch
·
1 day ago
·
[ - ]

If I had a nickel for everytime cloudflare went down. Then I would have 2 nickels which is not a lot but still wierd that it happened twice.

cryptonym
·
1 day ago
·
[ - ]

You would have 2 nickels, this week.

It also went down multiple times in the past; not to say that's bad, everyone does from time to time.

TheGilDev
·
1 day ago
·
[ - ]

I’m still glad they’re here to provide great services and help secure the internet for lots of us!

chistev
·
1 day ago
·
[ - ]

It's really cool to me that this site is never down with all these outages of major websites.

Representative of having the best developers behind it.

ilaksh
·
1 day ago
·
[ - ]

They just don't use Cloudflare.

chistev
·
1 day ago
·
[ - ]

How do they handle DDoS?

dlillard0
·
1 day ago
·
[ - ]

[dead]

cv_h
·
1 day ago
·
[ - ]

Free accounts seem to be fine, only enterprise accounts seem to be affected.

doublerabbit
·
1 day ago
·
[ - ]

Irony.

xomiachuna
·
1 day ago
·
[ - ]

And so it seems that the cause is close to RSC vulnerability from yesterday: https://www.cloudflarestatus.com/incidents/lfrm31y6sw9q

So much for the react being just a frontend library, amirite

rco8786
·
1 day ago
·
[ - ]

React server components initially came out 5ish years ago, for whatever that's worth.

ianberdin
·
1 day ago
·
[ - ]

So, I understand correctly that all websites and services want protection from DDoS attacks, and that's basically their number one concern. The second is caching in different parts of the world. So, it's caching and DDoS. But at the same time, nobody wants to use CloudFront from AWS because it’s not that simple yet. And it’s more expensive, while Cloudflare is free. So, what should we do about all this? This won’t do. We’ve created a gigantic bottleneck that controls the entire internet, just like in the movie Mad Max, where he controlled the only source of water. That’s wrong. And we all fell for it like fools. So, the question is, what can be done in this situation? Are there reliable competitors? Are there any fault-tolerant systems for this? The whole problem is that our DNS, and with Cloudflare, they proxy it. So, if their proxy goes down, everything falls apart. What should we do about this?

sammy2255
·
1 day ago
·
[ - ]

Nobody is being forced to use Cloudflare

ianberdin
·
1 day ago
·
[ - ]

Since everything is absolutely correct, no one forced it; they just provided a good, excellent solution for free, and consequently, the whole internet has gotten hooked on it. As they say, free cocaine causes harm. So, what are the alternatives? What options are there to protect against DDoS attacks and to make a website quickly accessible from different parts of the world? And at the same time, without paying a sky-high price for it.

ectospheno
·
22 hours ago
·
[ - ]

> So, what are the alternatives?

That sums up my gripe with the vocal cloudflare haters. They will tell you all day long to move but every solution they push costs more time and money.

nedt
·
1 day ago
·
[ - ]

Everyone trying to access a site behind Cloudflare is forced.

sammy2255
·
1 day ago
·
[ - ]

Then you make a complaint to the company whose site you cannot access...

chaidhat
·
1 day ago
·
[ - ]

Someone should make an open source system that lets you easily host containers so that if one fails, we can easily switchover across providers. Like Vercel AI SDK but for containers. That is, if docker isnt failing (it is right now cause it depends on Cloudflare)

drexlspivey
·
1 day ago
·
[ - ]

Who is we? You are free to stop using their service

6031769
·
1 day ago
·
[ - ]

Let your hoster take care of the DDoS and stop using the flaky behemoth.

You haven't actually watched Mad Max, have you? I do recommend it.

ThalesX
·
1 day ago
·
[ - ]

I just started getting npm errors while developing something; I was like hmm, strange... then I tried to go down to isitdown. That was also down. I was like, oh this must be something local to me (I'm in a remote place visiting my gramps).

Then I go to Hacker News to check. Lo and behold, it's Cloudflare. This is sort of worrying...

bilekas
·
1 day ago
·
[ - ]

This is painful, if I'm not mistaken this is during a scheduled maintenance too ?

Whenever I deploy a new release to my 5 customers, I am pedantic about having a fast rollback.. Maybe I'm not following the apparent industry standard and instead should just wing it.

zwnow
·
1 day ago
·
[ - ]

Let AI wing it instead.

sega_sai
·
20 hours ago
·
[ - ]

After reading the post, my personal takeaway (not being expert) is that there are simply so many moving parts/configuration options that the complexity of the whole system is too high. I think without some sort of formal validation/enumeration of all possible states of the system the reliability is simply not possible. Whether this formal verification is achievable, I don't know.

dev0p
·
1 day ago
·
[ - ]

Isn't it happening a little too often now? Did someone .unwrap in production again?

erikbye
·
1 day ago
·
[ - ]

This is getting embarrassing.

segev608
·
1 day ago
·
[ - ]

Luckly https://downdetectorsdowndetectorsdowndetectorsdowndetector.... is up :)

arjie
·
1 day ago
·
[ - ]

How interesting. As of 00:30 or so I could still access Claude but then it went down with a 500 from Cloudflare and I thought I'd nab a quick something off Slickdeals but that's down too. My own blog is on Cloudflare's `cloudflared` tunnel and it's working just fine, even the cache, so it must be something hitting some specific type of configuration or some shard hitting some region.

And they're back before I finished the comment. Such a pity, I was hoping to hog some more Claude for myself through Claude Code.

robotfelix
·
1 day ago
·
[ - ]

Our site is fine, including files served by Cloudflare's CDN and Cloudflare Workers, but the Cloudflare dashboard is definitely down.

The Cloudflare status page says that it's the dashboard and Cloudflare APIs that are down. I wonder if the problem is focused on larger sites because they are more dependent on / integrated with Cloudflare APIs. Or perhaps it's only an Enterprise tier feature that's broken.

If it's not everything that is down, I guess things are slightly more resilient than last time?

piker
·
1 day ago
·
[ - ]

At least the 500 error announces ownership.

Imagine how productive we'll be now!

jazzyjackson
·
1 day ago
·
[ - ]

Is it at all achievable to be fronted by a CDN but fallback to the raw server in case the front falls off? Better to be vulnerable to DDoS than be unreachable altogether

koolba
·
1 day ago
·
[ - ]

With CloudFlare specifically probably not. IIRC, they require DNS resolution of your domain to operate so if they’re down, I don’t see how you’d change it to route directly to the underlying site.

Even if you could, having two sets of TLS termination is going to be a pain as well.

calyhre
·
1 day ago
·
[ - ]

But then you end up potentially exposing the origin server. This could be an opt-in option though

yoctosec
·
1 day ago
·
[ - ]

I use Cloudflare because of their Tunnel to protect my Raspberry Pi, but I think I will just use it without the Tunnel now. My main concern is privacy, but I'm not ready to accept so frequent downtime and dependence on them. The whole reason to host self-host was to be independent anyway. Does anyone have a recommendation for that (that is free)? Should I worry about privacy? My name and my city are on the website anyway.

The_President
·
23 hours ago
·
[ - ]

You can basically do this yourself without punching holes out to the public. Create VPN for Pi and clients and access from the same private network.

DocJade
·
1 day ago
·
[ - ]

my tunnels are still working, oddly

yoctosec
·
1 day ago
·
[ - ]

Now mine works again too, I guess it was a short outage

runeb
·
1 day ago
·
[ - ]

Checkout tailscale

yoctosec
·
1 day ago
·
[ - ]

And what about a website I want to make public? I'm just concerned about my IP being visible, like for my personal website or my searxng instance

runeb
·
20 hours ago
·
[ - ]

Tailscale Funnel, but might need a paid account

iso1631
·
1 day ago
·
[ - ]

Personally I'd just proxy it through a vm running on hertzer, linode, rackspace, etc

unixfox
·
1 day ago
·
[ - ]

Tailscale's control plane uses Cloudflare.

runeb
·
20 hours ago
·
[ - ]

Thanks, I did not know this. My Tailscale was unaffected by the outage.

SwedishPerson_A
·
1 day ago
·
[ - ]

https://www.researchgate.net/ https://www.tandfonline.com/ https://www.perplexity.ai/ All give me "500 Internal Server Error cloudflare"

So My guess is yes It´s down.

aroman
·
1 day ago
·
[ - ]

looks like a big one. interestingly, our site, which uses a TON of Cloudflare services[0] — yet not their front-line proxy — is doing fine: https://magicgarden.gg.

So it seems like it's just the big ol' "throw this big orange reverse proxy in front of your site for better uptime!" is what's broken...

[0] Workers, Durable Objects, KV, R2, etc

reassess_blind
·
1 day ago
·
[ - ]

My sites that use their main proxy are seemingly up and working? Could be a regional PoP issue.

bpye
·
1 day ago
·
[ - ]

Moving off of Cloudflare for my personal domain is on my todo list for the holidays...

Ueland
·
1 day ago
·
[ - ]

Interestingly enough, also some MS/Azure services are down. For example https://www.office.com/ just returns:

>We are sorry, something went wrong. >Please try refreshing the page in a few minutes. If the problem persists, please visit status.cloud.microsoft for updates regarding known issues.

The status page of course says nothing

codeisforever
·
1 day ago
·
[ - ]

Seems all of Shopify.com is down. Every store

·
1 day ago
·
[ - ]

GeertVL
·
1 day ago
·
[ - ]

Linkedin -> the same

·
1 day ago
·
[ - ]

nikanj
·
1 day ago
·
[ - ]

For me Linkedin returns the 500 cloudflare error

arunaugustine
·
1 day ago
·
[ - ]

They had a scheduled maintenance between 7am and 11am UTC in Chicago. But that should have re-routed traffic not take down internet right?

PrayagS
·
1 day ago
·
[ - ]

I'm in India and we're affected as well.

J4PJ1T
·
1 day ago
·
[ - ]

Oceania here gang and i think that it is a global issue

xingwu
·
1 day ago
·
[ - ]

Classic. https://imgur.com/a/B3QxB1R

alextingle
·
1 day ago
·
[ - ]

"Content not available in your region."

Please avoid Imgur.

sebzim4500
·
1 day ago
·
[ - ]

Use a vpn or avoid the UK

JeremyJaydan
·
1 day ago
·
[ - ]

I moved away from Cloudflare over a month ago because I didn't understand how they don't have pricing caps for their upgraded plans, they genuinely seem like the mob but I haven't looked any further into it..

Either way it's been interesting to see the bullets I've been dodging.

jjude
·
1 day ago
·
[ - ]

What service(s) are you using now? What did you move to?

JeremyJaydan
·
1 day ago
·
[ - ]

A small one (afaik) in a location that I wanted in the US [1]. I'm not running a bank so I'd prefer to just go down if I'm ever attacked.

[1] https://shifthosting.com/

reassess_blind
·
1 day ago
·
[ - ]

The "half the internet is down, nothing we can do" excuse works great the first time, but doesn't fly the second time in a month.

What solutions are there for Multi DNS/CDN failover that don't rely on a single point of failure?

testemailfordg2
·
22 hours ago
·
[ - ]

Looks like this post somehow is not even on the front page of HN anymore. CF pulling some strings maybe, they don't have this incident on top of their current list.

Artur-Defences
·
1 day ago
·
[ - ]

"Scheduled maintenance is currently in progress" I image the maintenance was conducted like this: "fix detroit data center bugs, please be very careful, don't mess up like last time :)" bypass permissions on

·
1 day ago
·
[ - ]

elijah3040448
·
1 day ago
·
[ - ]

Pretty awkward. Thought my WIFI was acting up when I wasn't even able to pull up the Cloudfare website to see if something was down. Then, trying to go to Downdetector and that wasn't working either.

m078
·
1 day ago
·
[ - ]

Cloudflare is investigating issues with Cloudflare Dashboard and related APIs.

scirob
·
1 day ago
·
[ - ]

Wonder if supabase auth down is also related https://status.supabase.com/incidents/rgz3dl2rcmq8

zwnow
·
1 day ago
·
[ - ]

Will it be down for 10 days again? Who knows. Would've stopped using it after the first 10 day outage anyway.

nabla9
·
1 day ago
·
[ - ]

It's configuration error or related to configuration. It always is with this big things.

Nice thing about Cloudflare being down is that almost everything is down at once. Time for peace and quiet.

norskeld
·
1 day ago
·
[ - ]

Damn, I wish CloudFlare being down also affected local development, so I could take a break from doing frontend… :'(

countWSS
·
1 day ago
·
[ - ]

Everything i use depend on perfect cloudflare operation workflow, practically 99% of these services go down. What magical qualities it has that no competitors form for its services?

imperfectfourth
·
1 day ago
·
[ - ]

downdetector is also down

maxlin
·
1 day ago
·
[ - ]

it being the first google result and serving the exact same error as the pages one is trying to get info from is too funny

r721
·
1 day ago
·
[ - ]

Just after Matthew Prince's interview at Wired :)

https://news.ycombinator.com/item?id=46157295

meindnoch
·
1 day ago
·
[ - ]

Maybe they should stop vibe coding and vibe reviewing their PRs?

ednevsky
·
1 day ago
·
[ - ]

Notion is also down (haven't seen a comment on that). It's so funny how the biggest companies literally just have their sites not loading because of Cloudflare.

udarij
·
1 day ago
·
[ - ]

It's ok to fail. but the most frustrating thing ever is... there's no contact point or supporting team easily and directly accessible.. this is bad..

c16
·
1 day ago
·
[ - ]

CloudFlare: You can't go down if you're never up.

NKosmatos
·
1 day ago
·
[ - ]

LOL, 500 returned for many big sites…this is going to hurt and make people rethink. If it’s not DNS, then someone pushed to production on Friday :-)

rgun
·
1 day ago
·
[ - ]

https://registry.npmjs.org/ is down, affecting our builds

CGamesPlay
·
1 day ago
·
[ - ]

So is https://hub.docker.com which is why I am here and not doing useful work.

import
·
1 day ago
·
[ - ]

Fraudulent status page again. For sure we are seeing something very different than what they see in the internal monitoring systems.

udarij
·
1 day ago
·
[ - ]

It's ok to fail, but most frustrationg thing is there's no suppoting team or any contact point accessible directly. this is bad..

virtualritz
·
1 day ago
·
[ - ]

Yeah, and because of this for example Claude Code is down too because the auth goes through CF. F*cking marvelus, the decentralized web ...

headmelted
·
1 day ago
·
[ - ]

"In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary. Dec 05, 2025 - 07:00 UTC"

No need. Yikes.

BluSyn
·
1 day ago
·
[ - ]

Perhaps related? My main fiber WAN went out few hrs ago, failing over to Starlink backup. Discovered it’s a cloudflare issue, as my multi-wan setup tests against 1.1.1.1, which suddenly stopped responding (but only from my fiber ISP). Switched to testing 8.8.8.8 to restore.

If it weren’t for recent cloudflare outages, never would have considered this was the problem.

Even until I saw this, I assumed it was an ISP issue, since Starlink still worked using 1.1.1.1. Now I’m thinking it’s a cloudflare routing problem?

chaidhat
·
1 day ago
·
[ - ]

For those saying we have an over-reliance on software -- is there a way to use multiple CDNs for the same frontend website?

jonathanlydall
·
1 day ago
·
[ - ]

It seems regular reverse proxying and R2 still works, as we use those and seem to be working fine still.

Can't get to the Dashboard though.

chessmaster-hex
·
1 day ago
·
[ - ]

Some big fishes were affected as well... Crunchyroll, Fortnite, LinkedIn let's wait for the explanation of this one.

techguy1954
·
1 day ago
·
[ - ]

I can still visit some websites that use Cloudflare, but other don't work.

Blender Artists works, but DownDetector and Quillbot dont.

SherryWong
·
1 day ago
·
[ - ]

LinkedIn and MEdium are also down as a result

igleria
·
1 day ago
·
[ - ]

Heads will roll at cloudfare. E-commerce customers must be furious.

Impossible not to feel bad for whoever is tasked to cleanup the mess.

zppln
·
1 day ago
·
[ - ]

Especially around christmas. I was about to buy a pair of Birkenstocks. Nope, site is down. Went on to buy a microphone holder, nope, that site is down as well. :) Sure, I'll still get around to it eventually.

MildlySerious
·
1 day ago
·
[ - ]

I can't update DNS entries for my domains with Porkbun, because it's "Powered by Cloudflare".

MarcelGerber
·
1 day ago
·
[ - ]

Just started working for me again (in Germany), both on our own CF-hosted page and on cloudflare.com itself.

Towaway69
·
1 day ago
·
[ - ]

for me docker is failing with:

    unknown: failed to copy: httpReadSeeker: failed open: unexpected status from GET request to https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/....

so coffee time.

·
1 day ago
·
[ - ]

hnarturpl
·
1 day ago
·
[ - ]

We use workers and dns proxy and I got flooded with pages. We were getting 503s from workers.

dynamite-ready
·
1 day ago
·
[ - ]

Some of the sites I maintain, are fine. But I'm guessing it's just a matter of time?

skylurk
·
1 day ago
·
[ - ]

> Monitoring - A fix has been implemented and we are monitoring the results.

> Dec 05, 2025 - 09:12 UTC

MrAureliusR
·
1 day ago
·
[ - ]

Yeah, cloudflare.com is working and the website that first clued me in to the outage (chess.com) is also working.

matt3210
·
1 day ago
·
[ - ]

Everyone says vibe coding but people are just fine at being incompetent without the AI help

koolba
·
1 day ago
·
[ - ]

Sure, but with AI we can automate that incompetence.

nickdothutton
·
1 day ago
·
[ - ]

So many outages now they all begin to swim into 1, what's that 3 or 4 this quarter?

valdemarrolfsen
·
1 day ago
·
[ - ]

No engineers from Cloudflare reading hackernews these days? Should update your status page!

isaac3307
·
1 day ago
·
[ - ]

This is so cool guys. All of us get to lose millions of dollars together so late at night!

LucasLanglois
·
1 day ago
·
[ - ]

Love that Cloudflare put together a participative and community-driven advent calendar!

chinathrow
·
1 day ago
·
[ - ]

Looks like (some) sites behind Cloudflare still work if they do not have caching on.

jonathanlydall
·
1 day ago
·
[ - ]

It's not simply about caching as we have CDN and reverse proxying which are still running without issue.

DirkH
·
19 hours ago
·
[ - ]

I feel like all the BS we were taught about architecture design principles multi-AZ, failover strategies, graceful degradation etc was gaslighting us all into thinking any of out work on it actually matters.

This isn't true, but it feels like this when the entire engineering world order seems to actually run on single-point-of-failures where one CEO just messages another when some 3rd party is down. And reputational risk here is completely safeguarded because as long as everyone is down you are fine. Use a service everyone uses and it goes down = no reputational risk. Use a more robust architecture and make some mistake = massive reputational risk and everyone asks why you don't use what everyone else uses.

Blind leading the blind and all that.

maverick98
·
1 day ago
·
[ - ]

The quality of code in most apps has gone downhill. Let's guess why?

polaris64
·
1 day ago
·
[ - ]

DownDetector'sDownDetector does not detect that DownDetector's down

tdrz
·
1 day ago
·
[ - ]

I'm looking for cofounders and investors to build a working cloudflare.

alxbenjamin
·
1 day ago
·
[ - ]

It is up again. There will be a lot of hard talk with Cloudflare, I guess

8cvor6j844qw_d6
·
1 day ago
·
[ - ]

Interested if its the same issue that brought down Cloudflare previously.

reustle
·
1 day ago
·
[ - ]

Maybe centralizing the internet wasn’t a great idea after all, huh

moralestapia
·
1 day ago
·
[ - ]

Ooof, this one looks like a big one!

canva.com

chess.com

claude.com

coinbase.com

kraken.com

linkedin.com

medium.com

notion.so

npmjs.com

shopify.com (!)

and many more I won't add bc I don't want to be spammy.

Edit: Just checked all my websites hosted there (~12), they're all ok. Other people with small websites are doing well.

Only huge sites seem to be down. Perhaps they deal with them separately, the premium-tier of Cloudflare clients, ... and those went down, dang.

reddalo
·
1 day ago
·
[ - ]

My small websites are also up. I wonder if they're going to go down soon, or if we're safe.

otherme123
·
1 day ago
·
[ - ]

readthedocs down is hurting me the most. My small websites are doing OK.

shultays
·
1 day ago
·
[ - ]

zoom

digiajay
·
1 day ago
·
[ - ]

Basecamp was down couple of minutes ago and it's back now online.

nicolailolansen
·
1 day ago
·
[ - ]

They had a few good weeks.

kaliqt
·
1 day ago
·
[ - ]

NPM is down as a result.

chokominto
·
1 day ago
·
[ - ]

Craaazzzyy

kvam
·
1 day ago
·
[ - ]

Infrastructure-as-Vibe?

max23_
·
1 day ago
·
[ - ]

It is up for me.

All the sites that were 500 error before are able to load now.

wildcard1210
·
1 day ago
·
[ - ]

My Shopify store is down. My competitor stores are also down.

sammy2255
·
1 day ago
·
[ - ]

500 internal server error on most things:

500 Internal Server Error cloudflare

neo_tokyo
·
1 day ago
·
[ - ]

Someone's been vibe coding the scheduled maintenance.

grim_io
·
1 day ago
·
[ - ]

I wonder how many uptime SLAs will be violated this year.

Hashversion
·
1 day ago
·
[ - ]

how long cloudflarestatus.com takes it to detect usually?

ianberdin
·
1 day ago
·
[ - ]

NPM is down. World is collapsing thanks to Cloudflare.

atraac
·
1 day ago
·
[ - ]

All those enterprise architects must be fuming now

monstertank
·
1 day ago
·
[ - ]

How do you tell if this is a cyberattack or not?

nekkooo2e
·
1 day ago
·
[ - ]

Perplexity AI shows 500 Internal Server Error

aurareturn
·
1 day ago
·
[ - ]

My company's services went down as well.

neonnbits
·
1 day ago
·
[ - ]

i was watching the climax of "Fullmetal Alchemist: Brotherhood" on crunchyroll and cloudflare went down again

Vivianfromtheo
·
1 day ago
·
[ - ]

This got me and the anime community stressed

matt3210
·
1 day ago
·
[ - ]

Ooof status 500 someone’s getting fiiiiired!

vimwizard
·
1 day ago
·
[ - ]

seems related to CF tunnels... policies are being enforced but perhaps origin servers are not being properly served.

arunaugustine
·
1 day ago
·
[ - ]

Shopify is down.

nrhrjrjrjtntbt
·
1 day ago
·
[ - ]

Not as down as last time. My site is up.

hnarn
·
1 day ago
·
[ - ]

One has to wonder how many times or how often proprietary cloud services have to go down before there is a general shift away from using the cloud and "infinite scaling" for everything. For many, many use cases you do not need neither Cloudflare nor Github nor nine nines for everything (which you are clearly not getting anyway). It's obviously not enough with once a year for most businesses, or perhaps once a month. Weekly outages? For how long?

If you host something that actually matters that other people depend upon and, please review your actual needs and if possible stop making yourself _completely_ dependent on giant cloud corporations.

nish__
·
1 day ago
·
[ - ]

Just in time for the London work day :)

yread
·
1 day ago
·
[ - ]

Hah even Linkedin is showing 500 for me

lousken
·
14 hours ago
·
[ - ]

Cloudflare 362

samwreww
·
1 day ago
·
[ - ]

claude.ai is down bc of it :( good for OpenAI as they're using something else maybe Vercel?

starkindustries
·
1 day ago
·
[ - ]

Never push to production on Friday!

jjdinho
·
1 day ago
·
[ - ]

Godspeed, Cloudflare, for the fix

Titan2189
·
1 day ago
·
[ - ]

They're back online it seems

sharts
·
1 day ago
·
[ - ]

HaHa -Nelson

b_bloch
·
1 day ago
·
[ - ]

That's quite unfortunate xD

Artur-Defences
·
1 day ago
·
[ - ]

This is not a good look, at all

blackhaz
·
1 day ago
·
[ - ]

Anyone shorting the damn stock?

Andugal
·
1 day ago
·
[ - ]

Notion is also down as a result

3xstphvs
·
1 day ago
·
[ - ]

aw, i cant go on rateyourmusic

CafeRacer
·
1 day ago
·
[ - ]

Even digital ocean is down :D

odie5533
·
1 day ago
·
[ - ]

How is Hacker News still up?

sunbum
·
1 day ago
·
[ - ]

Because it doesn't use cloudflare duh.?

PrayagS
·
1 day ago
·
[ - ]

From their response headers, it seems like the request is coming from NGINX directly. How do they defend themselves against DOS attacks?

sunbum
·
1 day ago
·
[ - ]

Big server. And if it goes down it goes down? Who cares, it's hackernews.

otherme123
·
1 day ago
·
[ - ]

I have a handful of sites DNS/NS through Cloudflare, with their certificates, and they are working OK.

grundrausch3n
·
1 day ago
·
[ - ]

I thought they are running classic FreeBSD servers like in ye olde times.

dinoqqq
·
1 day ago
·
[ - ]

LinkedIn, Perplexity as well

maxlin
·
1 day ago
·
[ - ]

>Go to <social media page> - 500 error from cloudflare >Google is <social media page> down -> click first link - literally the exact same 500 cloudflare error html from downdetector

I thought we were meant to learn something ... ?

isaac3307
·
1 day ago
·
[ - ]

This is so cool guys!!! We all get to savor this moment and lose millions of DOLL HAIRS together!!

isaac3307
·
1 day ago
·
[ - ]

I love being here with you guys!!!

lionkor
·
1 day ago
·
[ - ]

eval(requestBody).unwrap()

elcapithanos
·
1 day ago
·
[ - ]

Shortest damn outage ever

ojm
·
1 day ago
·
[ - ]

Turnstile seems up still.

tippa123
·
1 day ago
·
[ - ]

Curious to see which big companies were caught flat-footed during the 18 November outage compared with today. In my opinion, if a company was caught out twice, that reflects poor decision-making and urgency. As the saying goes, fool me once, shame on you, fool me twice, shame on me.

If a company was able to overcome all the red tape within three weeks and not be impacted today, that's impressive.

Hashversion
·
1 day ago
·
[ - ]

what's the estimated loss? any guesses or estimations?

dodos
·
1 day ago
·
[ - ]

Looks to be back now.

pzs
·
1 day ago
·
[ - ]

Just experienced this and came here to check, because even their website is down. The referenced link also returns with 500.

SCdF
·
1 day ago
·
[ - ]

Really disappointed that down detectors down detector[1] isn't detecting that down detector[2] is down

[1] https://downdetectorsdowndetector.com/

[2] https://downdetector.com/

ccdragon
·
1 day ago
·
[ - ]

[dead]

JustSkyfall
·
1 day ago
·
[ - ]

Seems to be back up?

decimalenough
·
1 day ago
·
[ - ]

Seems to be back up.

ammo1662
·
1 day ago
·
[ - ]

"Given Cloudflare's importance in the Internet ecosystem any outage of any of our systems is unacceptable. "

Is this a joke?

And their blog of above statement is also down:

https://blog.cloudflare.com/18-november-2025-outage/

nekkooo2e
·
1 day ago
·
[ - ]

Perplexity is down

epolanski
·
1 day ago
·
[ - ]

I can absolutely accomplish nothing today...can't download npm packages, cannot login to services.

I've been a Cloudflare fan for the longest time, but the more they grow the more they look like the weak link of the internet. This is the second major outage in less than few weeks. Terrible.

sandruso
·
1 day ago
·
[ - ]

it's back on

but wow, it must be stressful to deal with this

pprotas
·
1 day ago
·
[ - ]

What a joke of a company. They have the internet in the palm of their hands, and yet let vibe coding ambitions ruin their empire.

Time for everyone to drop this company and move on to better solutions (until those better solutions rot from the inside out, just like their predecessor did)

hax0r1338
·
1 day ago
·
[ - ]

This gotta be an attack, no way its configuration error again.

jonathrg
·
1 day ago
·
[ - ]

Why not? They have been proudly vibe coding for a while.

grimblee
·
1 day ago
·
[ - ]

Always has been

jondot
·
1 day ago
·
[ - ]

LinkedIn is down

CodinM
·
1 day ago
·
[ - ]

came here for this thx

paweladamczuk
·
1 day ago
·
[ - ]

I noticed this when my Claude iPhone app stopped working.

domysee
·
1 day ago
·
[ - ]

It's back!

chinathrow
·
1 day ago
·
[ - ]

It's back.

StrLght
·
1 day ago
·
[ - ]

And it's on Friday again — never change, Cloudflare.

Gentle reminder that every affected company brought it upon themselves. Very few companies care about making their system resilient to 3rd party failures. This is just another wake-up call for them.

someothherguyy
·
1 day ago
·
[ - ]

for all of 20 minutes, the world cried.

mercurialsolo
·
1 day ago
·
[ - ]

As is supabase

Hashversion
·
1 day ago
·
[ - ]

cloudflare pages seems to be working!

LeonenTheDK
·
1 day ago
·
[ - ]

Nice, just got woken up by my outage alarms, just for it to be Cloudflare again. At least it's _my_ problem!

But my goodness, they're really struggling over the last couple weeks... Can't wait to read the next blog post.

koakuma-chan
·
1 day ago
·
[ - ]

Outage alarms?

bytejanitor
·
1 day ago
·
[ - ]

gitlab.com hasn't noticed yet.

alex_suzuki
·
1 day ago
·
[ - ]

it has now, for me. can't access web UI (SaaS, not self-hosted, obviously)

lrvick
·
1 day ago
·
[ - ]

I was just arguing yesterday to coworkers I would quit tech before helping centralize any more of the internet on Cloudflare as a massive single point of failure.

Thank you, Cloudflare, for again proving my point.

·
1 day ago
·
[ - ]

justmarc
·
1 day ago
·
[ - ]

Obligatory song https://www.youtube.com/watch?v=OC06Z6lCB_Q&t=30s

nromiun
·
1 day ago
·
[ - ]

I wonder if it is another bug , like unwrap, in their rewritten code.

Also, I don't think their every service got affected. I am using their proxy and pages service and both are still up.

pech0rin
·
1 day ago
·
[ - ]

Rewriting in Rust is paying dividends.

w4zz
·
1 day ago
·
[ - ]

Up again!

jachee
·
1 day ago
·
[ - ]

Update title to “Tell HN: Cloudflare was down”

maxlin
·
1 day ago
·
[ - ]

>half internet down >first "is site down" result (downdetector) down >downdetectorsdowndetector.com: "everything is fine" >downdetectorsdowndetectorsdowndetector.com: not even responding >downdetectorsdowndetectorsdowndetectorsdowndetector.com: "everything is broken"

divanvisagie
·
1 day ago
·
[ - ]

Wishbone12

manupati
·
1 day ago
·
[ - ]

Odd

basisword
·
1 day ago
·
[ - ]

I’m sure everybody learnt their lesson from last months outage and built in redundancy or stopped relying on Cloudflare.

da_grift_shift
·
1 day ago
·
[ - ]

https://www.cloudflarestatus.com/incidents/hlr9djcf3nyp

>We will be performing scheduled maintenance in ORD (Chicago) datacenter

>Traffic might be re-routed from this location, hence there is a possibility of a slight increase in latency during this maintenance window for end-users in the affected region.

Looks like it's not just Chicago that CF brought down...

yessferatu
·
1 day ago
·
[ - ]

South African here. Down on our side. Huge sites, like our primary news site is down - medical services, emergency service/information etc... all down. It's been like this since 11:00am our time, so about 13minutes now.

·
1 day ago
·
[ - ]

tovej
·
1 day ago
·
[ - ]

Internet-level companies are having more outages recently. Is the exposed surface area increasing or is the quality of service suffering?

csomar
·
1 day ago
·
[ - ]

Interestingly, my site running on workers https://codeinput.com is still functioning. Worth mentioning that I don't use Cloudflare firewall/caching (directly exposed workers)

·
1 day ago
·
[ - ]

Geep5
·
1 day ago
·
[ - ]

Claude RIP

Oras
·
1 day ago
·
[ - ]

Went to ahref to check a domain, saw 500 and came here to check.

I have a few domains on cloudflare and all of them are working with no issues so it might not be a global issue

wildcard1210
·
1 day ago
·
[ - ]

my shopify store is down

donbox
·
1 day ago
·
[ - ]

seems its back \m/

kinensake
·
1 day ago
·
[ - ]

Every time Cloudflare is down I'm not sure if it's really down or not because most down detector websites use Cloudflare. Lmao

mercurialsolo
·
1 day ago
·
[ - ]

claude code works tho

davidcheungo123
·
1 day ago
·
[ - ]

wtf, cannot work now

andy_ppp
·
1 day ago
·
[ - ]

Just a reminder that every dependency you rely on, both inside your codebase and external services, has a price.

dracotomes
·
1 day ago
·
[ - ]

and it's back

kenonet
·
1 day ago
·
[ - ]

stock going whoops

w4zz
·
1 day ago
·
[ - ]

gitlab down aswell

kUdtiHaEX
·
1 day ago
·
[ - ]

Cloudflare just closed down the incident on their status page without any additional explanation. Sigh.

yellow_lead
·
1 day ago
·
[ - ]

Is anyone else woken up by this? My company's service is down too. Considering a move away

vimwizard
·
1 day ago
·
[ - ]

she's back

Artur-Defences
·
1 day ago
·
[ - ]

"Monitoring - A fix has been implemented and we are monitoring the results."

computersuck
·
1 day ago
·
[ - ]

waaay too soon

dale1110
·
1 day ago
·
[ - ]

Tried to watch anime then realized that cloudflare was down...again. smh

dale1110
·
1 day ago
·
[ - ]

Tried to watch anime and then i realized it was down....again. smh

·
1 day ago
·
[ - ]

mercurialsolo
·
1 day ago
·
[ - ]

shopify.com

Dilettante_
·
1 day ago
·
[ - ]

"I warned you about Cloudflare bro!!!! I told you dog!"

yapyap
·
1 day ago
·
[ - ]

seems to have been resolved

vinskabun
·
1 day ago
·
[ - ]

pixiv.net

songtianlun1
·
1 day ago
·
[ - ]

yes...

udev4096
·
1 day ago
·
[ - ]

Clownflare strikes again!

zwnow
·
1 day ago
·
[ - ]

I love it, and we wont learn from this again :-) Looking forward for the 3rd outage in a few weeks.

venturecruelty
·
16 hours ago
·
[ - ]

We should have a weekly thread for it! It can be a fun meetup. Hell, move it to a cafe or a pub or something, and we can use it as a chance to disconnect from the internet and talk to people face to face. Instead of saying "touch grass", we can say "get Cloudflared".

rvz
·
1 day ago
·
[ - ]

Round 2 of Cloudflare outages.

We can now see which companies have failed in their performative systems design interviews.

Looking forward to the post-mortem.

michael-sumner
·
1 day ago
·
[ - ]

It's up now!!! London, UK

DaSilentStorm
·
1 day ago
·
[ - ]

Aaand ... we're back!

DaSilentStorm
·
1 day ago
·
[ - ]

Aaand ... we're back

sushidev
·
1 day ago
·
[ - ]

Are you serious?

meerab
·
1 day ago
·
[ - ]

It is up now!

dale1110
·
1 day ago
·
[ - ]

You sure?

dale1110
·
1 day ago
·
[ - ]

Just checked. It's up!!

wyboy86110
·
1 day ago
·
[ - ]

nope... order page is still 500

adityashankar
·
1 day ago
·
[ - ]

it's fine now...I believe

·
1 day ago
·
[ - ]

33ROC
·
1 day ago
·
[ - ]

[dead]

·
1 day ago
·
[ - ]

clark21
·
1 day ago
·
[ - ]

[dead]

elijah3040448
·
1 day ago
·
[ - ]

[dead]

strangeness
·
1 day ago
·
[ - ]

Who knows, maybe it will be because of C or C++ this time. Or something else.

etyhhgfff
·
1 day ago
·
[ - ]

They rewrote some of their core components from nginx+LuaJit to Rust for better perf and lateny recently. I guess there are some bugs in the new codebase.

throwaway48476
·
1 day ago
·
[ - ]

Shipping .unwrap() to prod speaks ill of their engineering culture. Quality is a process not a checkbox.

0xfedcafe
·
1 day ago
·
[ - ]

Funny how even safe Rust isn’t able to stop vibecoding without a proper validation. And the fact that it's a monopoly isn't so funny anymore.

dkdbejwi383
·
1 day ago
·
[ - ]

There is no language that makes it impossible to have any kind of bug ever. The safety languages like Rust offer is around memory, not bad configuration or faulty business logic.

lionkor
·
1 day ago
·
[ - ]

Rust is one of the few languages where I found AI to be very well checked. The type system can enforce so many constraints that you do avoid lots of bugs, and the AI will get caught writing shit code.

Of course, vibe coding will always find a way to make something horribly broken but pretty.

nromiun
·
1 day ago
·
[ - ]

I have noticed LLMs tend to generate very verbose code. What an average human might do in 10 LoC, LLMs will stretch that to 50-60 lines. Sometimes with comments on every line. That can make it hard to see those bugs.

0xfedcafe
·
1 day ago
·
[ - ]

Yep, that’s what I wrote. It wasn’t a sarcasm