Here's an interesting question that I haven't seen anyone really engage with yet:

If the nigh monoculture of CrowdStrike didn't exist, and malicious behavior protection wasn't as consistent as a result, would the aggregate harm of multiple smaller bad events occurring over years be above or below the one-shot harm of CrowdStrike's screwup?

Maybe the answer is obvious if you have more context than I do, but to me it doesn't seem so obvious it can be taken for granted one way or the other.

Good question. Its difficult to answer because without EDR solutions, I think there would probably be more motivation to make it such that the compromise of a single endpoint couldn't ransomware your whole company. Aka real "zero trust" architecture rather than the buzzword.

But I've heard the other side of the story from sysadmins that say this is wishful thinking and that it would take enormous amounts of effort to achieve that.

I think if you are a small company or a new company with a disproportionate amount of technically skilled employees and without a lot of legacy active directory and shared drive stuff going on, it's probably a net positive not to have an EDR and to properly secure everything instead. That isn't most companies though.

So I guess my answer to your question is "I don't know" and sysadmins seem to be in major disagreement about this, with Windows focused admins tending to be more favorable to EDR (for easily explainable reasons) and Linux folks being especially opposed.

Given Linux admins tend to have less interaction with the lowest common denominator end user, I'm actually tending towards believing the Windows admins have a point that a lot of us on HackerNews maybe don't have the experience to understand.

I realize your question was about Crowdstrike in particular, but I think it's a question that might apply to EDR software in general.

It is an interesting question. I think most people have an intuitive belief/understanding that say AWS is more resilient than home grown services but it’s very high profile when AWS goes down.

That being said it is probably better for say 1 hospital in a city to go down every month rather than all 10 hospitals in a city to go down at the same time once a year.

> would the aggregate harm of multiple smaller bad events occurring over years be above or below the one-shot harm of CrowdStrike's screwup?

See US federal and state separation of powers as a laboratory for policy.

The harm is only "aggregate" if nothing is learned from each event.

If lessons are rapidly learned and diffused from each event, then multiple smaller events can reduce local blast radius, immunize others against known failure categories, and increase global resilience.

The only lesson ever learned from small events is "that would never happen here". Until it does.
Maybe smaller events can be tracked with an incident ID and analysis to forestall bigger events.

https://news.ycombinator.com/item?id=41018029

  • gruez
  • ·
  • 1 month ago
  • ·
  • [ - ]
>If the nigh monoculture of CrowdStrike didn't exist

What "monoculture"? They're allegedly the #1 provider in the EDR space and they only control 17.7% of the market.

https://www.crowdstrike.com/resources/reports/idc-worldwide-...

  • pxc
  • ·
  • 1 month ago
  • ·
  • [ - ]
Not speaking for GP or the way they meant to use the language, but a monoculture doesn't have to be a monopoly. It just has to be the only crop you grow on a given field, or in this case, the only EDR tool in use at a given company.

The fact that this took out whole clusters at some companies, or nearly all DCs at some companies, is part of what made it so hard to recover from for companies hit by it. That is the metaphorical 'monoculture', not CrowdStrike's market position.

The answer is clear, as many companies don't run CrowdStrike :-)
Even before the outage, I had zero confidence in crowdstrike, and I was surprised that seemingly competent organizations would adopt it. It seems like an industry failure.

Same for "network security" proxies that actually break security.

I don't necessarily disagree with all of Dan Geer's assertions, but I am unconvinced that regulation can overcome organizational stupidity.

"We know that in a large system redundant components make intentional faults more likely to produce global faults."

This is... non-obvious to me. Anyone know what he means by this?

It's only counter intuitive, if you don't look at the whole paragraph

"We know that in a large system redundant components make random faults less likely to produce global faults. We know that in a large system redundant components make intentional faults more likely to produce global faults. In short, we know that redundancy can be protective or it can be risk creating."

Author talks about intentional faults. Like for example somebody sabotaging a specific model of CPU, or a specific airplane engine model and firmware version of that engine electronics. That would expand the fault across all redundant components.

That is not what would be defined as a really redundant system...:-)

In this case a monoculture within your redundant systems, is what causes the risk he is talking about.

Many of us knew it was time to act, like 20 years ago.

Cost cutting trumps ALL other concerns. It's the ruling class' irrevocable policy.

C'est la vie.

If you don't like it, start your own business that does better. ¯\_(ツ)_/¯

> If you don't like it, start your own business that does better

You have the answer why that is not enough in the article...

"...We know that markets evolve to three generalist suppliers for any widespread consumer need or want. We also know that the contraction from many suppliers to three suppliers happens faster in the absence of regulation..."

I've seen that but I am not sure I agree with it. The categorization seems solid, at the same time the real world offers a lot of surprises and niche market settings.
Dan Geer had some good essays and some good insights in the past.
[dead]
[dead]
  • pipes
  • ·
  • 1 month ago
  • ·
  • [ - ]
Stopped reading when the causes listed didn't mention the EU regulation that prevented Microsoft from delivering its api that would have meant that cloud strikes software wouldn't have been running in kernel mode.
https://www.theverge.com/2024/7/26/24206719/microsoft-window...

> the software giant is calling for changes to Windows and has dropped some subtle hints that it’s prioritizing making Windows more resilient and is willing to prevent security vendors like CrowdStrike from accessing the Windows kernel.. calls out a new VBS enclaves feature “that does not require kernel mode drivers to be tamper resistant” and Microsoft’s Azure Attestation service as examples of recent security innovations.

Maybe because MS wants to deflect blame, and the EU is one of the things it's trying to deflect to?
  • gruez
  • ·
  • 1 month ago
  • ·
  • [ - ]
...and? "Deflecting blame" isn't a valid argument by itself. You have have to argue why the deflecting isn't justified. Otherwise there's nothing wrong with "deflecting blame" in and of itself. So far as I can tell based on the source in a sibling comment[1], the deflecting blame seems plausibly justified.

[1] https://news.ycombinator.com/item?id=41088117

If that's the case the EU should demand MS to leave the US. There's no shortage of bloatware there, they will be fine if Microsoft decides they don't have enough resources to work there
I had to read through your whole comment to realize it didn't confirm something I already know, what a waste.
Microsoft natural insecurity culture, being blamed on the EU, is one of the weirdest gas lighting takes I have ever seen...
  • pipes
  • ·
  • 1 month ago
  • ·
  • [ - ]
In what way is it gas lighting? Are you disputing the fact that Microsoft was blocked by EU regulators from killing off kernel mode access for third party software?
This is a guy who apparently knows a lot (he says "we" but I do not know all o that) but certainly not about cybersecurity operations.

I was expecting all kinds of experts to discuss how "this was expected" and "you should have done it another way" after the CS incident, while failing to understand why their monitor does not work when switched off.

I guess that a week in an active organization's secops team would show them how much more we are in control of what is happening on end-user devices today than it was 10 years ago. I wish them all the best in managing the security of a few dozens of thousands of machines with their knowledge about what cybersecurity could be like in an alternative world.

> guy who apparently knows a lot

https://www.csmonitor.com/About/People/Dan-Geer