Yes, the loose wire was the immediate cause, but there was far more going wrong here. For example:
- The transformer switchover was set to manual rather than automatic, so it didn't automatically fail over to the backup transformer.
- The crew did not routinely train transformer switchover procedures.
- The two generators were both using a single non-redundant fuel pump (which was never intended to supply fuel to the generators!), which did not automatically restart after power was restored.
- The main engine automatically shut down when the primary coolant pump lost power, rather than using an emergency water supply or letting it overheat.
- The backup generator did not come online in time.
It's a classic Swiss Cheese model. A lot of things had to go wrong for this accident to happen. Focusing on that one wire isn't going to solve all the other issues. Wires, just like all other parts, will occasionally fail. One wire failure should never have caused an incident of this magnitude. Sure, there should probably be slightly better procedures for checking the wiring, but next time it'll be a failed sensor, actuator, or controller board.
If we don't focus on providing and ensuring a defense-in-depth, we will sooner or later see another incident like this.
Running a 'tight ship' is great when you have a budget to burn on excellent quality crew. But shipping is so incredibly cut-throat that the crew members make very little money, are effectively modern slaves and tend to carry responsibilities way above their pay grade. They did what they could, and more than that, and for their efforts they were rewarded with what effectively amounted to house arrest while the authorities did their thing. The NTSB of course will focus on the 'hard' causes. But you can see a lot of frustration shine through towards the owners who even in light of the preliminary findings had changed absolutely nothing on the rest of their fleet.
The recommendation to inspect the whole ship with an IR camera had me laughing out loud. We're talking about a couple of kilometers of poorly accessible duct work and cabinets. You can do that while in port, but while you're in port most systems are idle or near idle and so you won't ever find an issue like this until you are underway, when vibration goes up and power consumption shoots up compared to being in port.
There is no shipping company that is effectively going to do a sea trial after every minor repair, usually there is a technician from some supplier that boards the vessel (often while it is underway), makes some fix and then goes off-board again. Vessels that are not moving are money sinks so the goal is to keep turnaround time in port to an absolute minimum.
What should really amaze you is how few of these incidents there are. In spite of this being a regulated industry it is first and foremost an oversight failure, if the regulators would have more budget and more manpower there maybe would be a stronger drive to get things technically in good order (resist temptation: 'shipshape').
Between making money, perceived culpability and risks offloaded to insurance companies why would they?
> The problem is that there are a thousand merchant marine vessels operating right now that are all doing great
Are they tho?
I generally think you have good takes on things, but this comes across like systemic fatalistic excuse making.
> The recommendation to inspect the whole ship with an IR camera had me laughing out loud.
Where did this come from? What about the full recommendations from the NTSB. This comment makes it seem like you are calling into question the whole of the NTSB's findings.
"Don't look for a villain in this story. The villain is the system itself, and it's too powerful to change."
https://en.wikipedia.org/wiki/Francis_Scott_Key_Bridge_colla...
Because it is the right thing to do, and the NTSB thinks so too.
>> The problem is that there are a thousand merchant marine vessels operating right now that are all doing great > Are they tho?
In the sense that they haven't caused an accident yet, yes. But they are accidents waiting to happen and the owners simply don't care. It usually takes a couple of regulatory interventions for such a message to sink in, what the NTSB is getting at there is that they would expect the owners to respond more seriously to these findings.
>> The recommendation to inspect the whole ship with an IR camera had me laughing out loud. > Where did this come from?
Page 58 of the report.
And no, obviously I am not calling into question the whole of the NTSB's findings, it is just that that particular one seems to miss a lot of the realities involving these vessels.
> "Don't look for a villain in this story. The villain is the system itself, and it's too powerful to change."
I don't understand your goal with this statement, it wasn't mine so the quotes are not appropriate and besides I don't agree with it.
Loose wires are a fact of life. The amount of theoretical redundancy is sufficient to handle a loose wire, but the level of oversight and the combination of ad-hoc work on these vessels (usually under great time pressure) together are what caused this. And I think that NTSB should have pointed the finger at those responsible for that oversight as well, which is 'MARAD', however, MARAD does not even rate a mention in the report.
> I don't understand your goal with this statement, it wasn't mine so the quotes are not appropriate and besides I don't agree with it.
fwiw, your first comment left with me the exact same impression as it did sitkack.
And they should be smacked down hard, but that isn't going to happen because then - inevitably - the role of the regulators would come under scrutiny as well. That is the main issue here. The NTSB did a fantastic job - as they always do - at finding the cause, it never ceases to amaze me how good these people are at finding the technical root cause of accidents. But the bureaucratic issues are the real root cause here: an industry that is running on wafer thing margins with ships that probably should not be out there, risking peoples lives for a miserly wage.
Regulators should step in and level the playing field. Yes, that will cause prices of shipping to rise. But if you really want to solve this that is where I think they should start and I am not at all saying that the system is too powerful to change, just that for some reason they seem to refuse to even name it, let alone force it to change.
There was also no fatalistic tone about the system being too powerful to change. Just clear sharing of observations IMO.
It is not unusual to receive this reaction (being blamed for fatalism and making excuses) from observations like these, I have noticed.
Passenger carrying vessels are better, but even there you can come across some pretty weird stuff.
https://eu.usatoday.com/story/travel/cruises/2025/08/27/msc-...
And that one was only three years old, go figure.
> > Between making money, perceived culpability and risks offloaded to insurance companies why would they?
> Because it is the right thing to do, and the NTSB thinks so too.
Doing great is much different than, "accidents waiting to happen".
I don't understand the goal of your changing rhetoric.
This ship wasn't towed by a tug, it was underway under its own power and in order for the ship to have any control authority at all it needs water flowing over the rudder.
Without that forward speed you're next to helpless and these things don't exactly turn on a dime. So even if there had been a place where it could have run aground it would never have been able to reach it because it was still in the water directly in front of the passage way under the bridge.
100,000 tonnes doing 7 Kph is a tremendous amount of kinetic energy.
The exact moment the systems aboard the Dali failed could not have come at a worse time, it had - as far as I'm aware of the whole saga - just made a slight course correction to better line up with the bridge and the helm had not yet been brought back to neutral. After that it was just Newton taking over, without control I don't think there is much that would have stopped it.
This is a good plot of the trajectory of the vessel from the moment it went under way until the moment it impacted the bridge:
https://www.pilotonline.com/wp-content/uploads/2024/03/5HVqi...
You can clearly see the kink in the trajectory a few hundred meters before it hit the bridge.
The question is simple: who will pay for it? Apparently we are ok with this kind of risk, if we weren't we would not be doing this at all.
There is a similar thing going on in my country with respect to railway crossings. Every year people die on railway crossings. But it took for a carriage full of toddlers to be hit by a train before the sentiment switched from 'well, they had it coming' to 'hm, maybe we should do something about this'. People don't like to pay for risks they see as small or that they perceive as that they're never going to affect them.
This never was about technology, it always was about financing. Financing for proper regulatory tech oversight (which is vastly understaffed) on the merchant marine fleet, funding for better infrastructure, funding for (mandatory) tug assistance for vessels of this size near sensitive structures, funding for better educated and more capable crew and so on. The loose wire is just a consequence of a whole raft of failures that have nothing to do with a label shroud preventing a wire from making proper contact.
The 'root cause' here isn't really the true root cause, it is just the point at which technology begins and administration ends.
Better to spend the effort in fleet education
> The NTSB found that the Key Bridge, which collapsed after being struck by the containership Dali on March 26, 2024, was almost 30 times above the acceptable risk threshold for critical or essential bridges, according to guidance established by the American Association of State Highway and Transportation Officials, or AASHTO.
> Over the last year, the NTSB identified 68 bridges that were designed before the AASHTO guidance was established — like the Key Bridge — that do not have a current vulnerability assessment. The recommendations are issued to bridge owners to calculate the annual frequency of collapse for their bridges using AASHTO’s Method II calculation.
Letters to the 30 bridge owners and their responses https://data.ntsb.gov/carol-main-public/sr-details/H-25-003
Stopping a car normally vs crashing a car. Skydiving with a parachute vs skydiving without a parachute.
For something like ship vs bridge you have to account for the crunch factor. USS Iowa going the same speed probably would've hit way harder despite having ~1/3 the tonnage.
Plan the bridge so any ship big enough to hurt it grounds before it gets that close. Don't put pilings in the channel. It's just money. But it's a lot of money so sometimes it's better to just have shipping not suck.
Alternatively, the Chunnel will almost certainly never get hit with a ship.
Have a look at the trajectory chart that I posted upthread and tell me how in this particular case you would have arranged that.
At least on US flagged vessels.
Sometimes when I see vocal but rather uninformed opposition to the Jones Act, I wonder if it isn't partially an aim at union busting.
The act is problematic because it hasn't really been modernized, with the handful of revisions essentially just expanding its scope. The US either has to seriously figure out getting domestic shipbuilding going again (to the point where it can be economical to also export them) or at least whitelist foreign countries (eg South Korea) to allow their ships to be used. But that's unlikely in today's political climate.
This ended under Reagan.
At first lots of people didn't care because Reagan was also doing his 600 ship navy so everyone was busy doing navy work, but after that ended the MM and american shipbuilding entered a death spiral.
Now the only work US flagged vessels can get is supporting the navy, and a tiny sliver of jones act trade. This means there are no economies of scale. If a ship is built, one is built to that class not 10. Orders are highly intermittent and there is no ability to build up a skilled workforce in efficient serial production. On the seagoing side, ships either get run ragged on aggressive schedules (ex: El Faro) or they sit in layup for long stretches rusting away.
If the US wants to fix its merchant marine it needs to provide incentive for increased cargos and increased shipbuilding. As Sal points out, the US is the second-biggest shipowning country in the world. US business like owning ships, they just don't want to fly the American flag because their incentives are towards offshoring.
And yet finding crews was never a problem before differential subsidies ended.
In fact crewing US flagged is harder now because the work is intermittent. If people can't find berths they time out on their licenses and go do something else in a different industry.
> People from the Philippines will do it because it is life-changing amounts of money
The international minimum wage for seafarers is about $700/mo. In comparison wages in the Philippines are between 20k-50k pesos a month or $340-$850. Seafaring is an above-average income job in the Philippines but not "life-changing."
There are so many layers of failures that it makes you wonder how many other operations on those ships are only working because those fallbacks, automatic switchovers, emergency supplies, and backup systems save the day. We only see the results when all of them fail and the failure happens to result in some external problem that means we all notice.
As Sidney Dekker (of Understanding Human Error fame) says: Murphy's Law is wrong - everything that can go wrong will go right. The problem arises from the operators all assuming that it will keep going right.
I remember reading somewhere that part of Qantas's safety record came from the fact that at one time they had the highest number of minor issues. In some sense, you want your error detection curve to be smooth: as you get closer to catastrophe, your warnings should get more severe. On this ship, it appeared everything was A-OK till it bonked a bridge.
Your car engaging auto brake to prevent a collision shouldn't be a "whew, glad that didn't happen" and more a "oh shit, I need to work on paying attention more."
1: rear-cross-traffic i.e. when backing up and cars are coming from the side.
> Our investigators routinely accomplish the impossible, and this investigation is no different...Finding this single wire was like hunting for a loose rivet on the Eiffel Tower.
In the software world, if I had an application that failed when a single DNS query failed, I wouldn't be pointing the blame at DNS and conducting a deep dive into why this particular query timed out. I'd be asking why a single failure was capable of taking down the app for hundreds or thousands of other users.
The YouTube animation they published notes that this also wasn't just one wire - they found many wires on the ship that were terminated and labeled in the same (incorrect) way, which points to an error at the ship builder and potentially a lack of adequate documentation or training materials from the equipment manufacturer, which is why WAGO received mention and notice.
Oh, the wire was blue?
In all seriousness, listing just the triggering event in the headline isn't that far out of line. Like the Titanic hit an iceburg, but it was also traveling faster than it should in spite of iceberg warnings, and it did so overloaded and without adequate lifeboats, and it turns out there were design flaws in the hull. But the iceberg still gets first billing.
If this represents a change in style and/or substance of these kinds of press releases, my hunch would be that the position was previously hired for technical writers but was most recently filled by PR.
The flushing pump not restarting when power resumed did also cause a blackout in port the day before the incident. But you know, looking into why you always have two blackouts when you have one is something anybody could do; open the main system breaker, let the crew restore it and that flushing pump will likely fail in the same way every time... but figuring out why and how the breaker opened is neat, when it's not something obvious.
The NTSB also had some comments on the ship's equivalent of a black box. Turns out it was impossible to download the data while it was still inside the ship, the manufacturer's software was awful and the various agencies had a group chat to share 3rd party software(!), the software exported thousands of separate files, audio tracks were mixed to the point of being nearly unusable, and the black box stopped recording some metrics after power loss "because it wasn't required to" - despite the data still being available.
At least they didn't have anything negative to say about the crew: they reacted timely and adequately - they just didn't stand a chance.
Rather than doing the process of purging high-sulphur fuel that can't be used in USA waters, they had it set so that some of the generators were fed from USA-approved fuel, resulting in redundancy & automatic failover being compromised.
It seems probable that the wire failure would not have caused catastrophic overall loss of power if the generators had been in the normal configuration.
They weren't freight ships destined for Baltimore, but it wasn't hard to imagine future freight ship sizes when designing the bridge in the early 1970s.
The London sewer system was designed in the 1850s, when the population was around two million people.
It was so overdesigned that it held up to the 1950s, when the population was over 8 million. It didn't start to become a big problem until the 1990s.
Why the bridge piers weren't set into artificial islands, I can't fathom. Sure. Let's build a bridge across a busy port but not make it ship-proof. The bridge was built in the 1970s, had they forgotten how to make artificial islands?
The organizations that made the bridge happen were so much more vast and so much higher turnover and subject to way, way, way looser application of consequences than the one that built the fort it would be literally impossible to get them to produce something so unnecessarily robust for the average use case.
This sort of "everything I depend on will just have to not suck because my shit will keel right over if it sucks in the slightest" type engineering is all over the modern world and does work well in a lot of places when you consider lifetime cost. But when it fails bridges fall over and cloudflare (disclaimer, didn't actually read that PM, have no idea what happened) goes down or whatever.
Unless the military was relocated to Mars (or at least the Moon) during the shutdown, I think the word is "metaphorically" instead of "literal".
The regular fuel pumps were set up to automatically restart, which is why a set of them came online to feed generator 3 (which automatically spinned up after 1 & 2 failed, and wasn't tied to the fuel-line-flushing pump) after the second blackout.
I remember that the IT guys at my old company, used to immediately throw out every ethernet cable, and replace them with ones right out of the bag; first thing.
But these ships tend to be houses of cards. They are not taken care of properly, and run on a shoestring budget. Many of them look like floating wrecks.
If I come across a CATx (solid core) cable being used as a really long patch lead then I lose my shit or perhaps get a backbox and face plate and modules out along with a POST tool.
I don't look after floating fires.
I once had a recurring problem with patch cables between workstations and drops going bad, four or five in one area that had never had that failure rate before. Turns out, every time I replaced one somebody else would grab the "perfectly good" patch cable from the trash can beside my desk. God knows why people felt compelled to do that when they already had perfectly good wires, maybe they thought because it was a different colour it would make their PC faster... So, now every time I throw out a cable that I know to be defective, I always pop the ends off. No more "mystery" problems.
You can get replacement clips for those for a quick repair.
https://www.amazon.com/Construct-Pro-RJ-45-Repair-Cat5e/dp/B...
It wasn't.
And the physical layer issues I do see are related to ham fisted people doing unrelated work in the cage.
Actual failures are pretty damn rare.
Like you said (and illustrated well in the book) it's never just 1 thing, these incidents happen when multiple systems interact and often reflect a the disinvestment in comprehensive safety schemes.
I was sure you were going to link to Clarke and Dawe, The Front Fell off.
"nuisance" issues like that are deferred bcz they are not really causing a problem, so maintenance spends time on problems with things that make money, rather than what some consider spit n polish on things that have no prior failures.
So much complexity, plenty of redundancy, but not enough adherence to important rules.
A whole bunch of things might have gone wrong, but if only you hadn't done/not-done that one thing, we'd all be fine. So it's all your fault!
Also, they're basically inadmissible in court [49 U.S.C.§1154(b)] so are useless for determining financial liability.
> The settlement does not include any damages for the reconstruction of the Francis Scott Key Bridge. The State of Maryland built, owned, maintained, and operated the bridge, and attorneys on the state’s behalf filed their own claim for those damages. Pursuant to the governing regulation, funds recovered by the State of Maryland for reconstruction of the bridge will be used to reduce the project costs paid for in the first instance by federal tax dollars.
There's probably some combination of "everyone just posts up a bond into a fund to cover this stuff" plus a really high deductible on payout that basically deletes all those expensive man hours without causing any increased incentive for carnage.
Events like these are a VERY rare exception compared to all the shipping activities that go on in an uneventful manner. Doesn't take a genius to do the napkin math here. Whatever the solution is probably ought to try to avoid expending resources in the base case where everything is fine.
I like a government that pays workers to look out for my safety.
If everyone saved $100M by doing this and it only cost one shipper $100M, then of course everyone else would do it and just hope they aren’t the one who has bad enough luck to hit the bridge.
And statistically, almost all of them will be okay!
Because then anyone who owns a bridge/needs to pay for said bridge damage goes, ‘well clearly the costs of running into a bridge on the runs-into-bridges-due-to-negligence-group isn’t high enough, so we need to either create more rules and inspections, or increase the penalties, or find a way to stop these folks from breaking our bridges, or the like - and actually enforce them’.
It’s why airplanes are so safe to fly on, despite all the same financial incentives. If you don’t comply with regulators, you’ll be fined all to hell or flat out forbidden from doing business. And that is enforced.
And the regulators take it all very seriously.
Ships are mostly given a free pass (except passenger liners, ferries, and hazmat carrying ships) because the typical situation if the owner screws up is ‘loses their asset and the assets of anyone who trusted them’, which is a more socially acceptable self correcting problem than ‘kills hundreds of innocent people who were voters and will have families crying, gnashing their teeth, and pointing fingers on live TV about all this’.
Basically, the line of causation of the mishap has to pass through a metaphorical block of Swiss cheese, and a mishap only occurs if all the holes in the cheese line up. Otherwise, something happens (planned or otherwise) that allows you to dodge the bullet this time.
Meaning a) it's important to identify places where firebreaks and redundancies can be put in place to guard against failures further upstream, and b) it's important to recognize times when you had a near-miss, and still fix those root causes as well.
Which is why the "retrospectives are useless" crowd spins me up so badly.
I mentioned this principal to the traffic engineer when someone almost crashed into me because of a large sign that blocked their view. The engineer looked into it and said the sight lines were within spec, but just barely, so they weren't going to do anything about it. Technically the person who almost hit me could have pulled up to where they had a good view, and looked both ways as they were supposed to, but that is relying on one layer of the cheese to fix a hole in another, to use your analogy.
The fact that the situation on the ground isn't safe in practice is irrelevant to the law. Legally the hedge is doing everything, so the blame falls on the driver. At best a "tragic accident" will result in a "recommendation" to whatever board is responsible for the rules to review them.
Which is why if you want to be a bastard, you send it to the owners, the city, and both their insurance agencies.
If your goal is to get the intersection fixed, this is a reasonable thing to do.
If those certifications try to teach you bad approaches. Then they don't help competence. In fact, they can get people stuck in bad approaches. Because it's what they have been Taught by the rigorous and unquestionable system. Especially when your job security comes from having those certifications, it becomes harder to say that the certifications teach wrong things.
It seems quite likely from the outside that this is what happened to US traffic engineering. Specifically that they focus on making it safe to drive fast and with the extra point that safe only means safe for drivers.
This isn't just based on judging their design outcomes to be bad. It's also in the data comparing the US to other countries. This is visible in vehicle deaths per Capita, but mostly in pedestrian deaths per Capita. Correcting for miles driven makes the vehicle deaths in the US merely high. But correcting for miles walked (not available data) likely pushes pedestrian deaths much higher. Which illustrates that a big part of the safety problem is prioritizing driving instead of encouraging and proyecting other modes of transportat. (And then still doing below average on driving safety)
You would be mistaken. Traffic engineers are responsible for far, far more deaths than software engineers.
That we allow terrible drivers to drive is another matter...
Vehicles are generally temporary. It is actually possible to ensure decent visibility at almost all junctions, as I found when I moved to my current country - it just takes a certain level of effort.
That said, obviously care should be taken to limit occurrences of view limiting obstacles whenever possible, especially in areas frequented by unskilled traffic participants—so pedestrians, really. A straightforward example would be disallowing street parking within a few tens of metres of pedestrian crossings. Street parking in general is horrible, especially on quiet residential streets—kids may dart around them onto the street at full speed.
The problem is not limited to large vehicles either.
——————
Anyways, here are some examples of what I'm talking about:
- Self-inflicted LOS issues by passing/filtering (motor)cyclists: https://youtu.be/qi6ithdYA_8?t=861, https://youtu.be/TRPYfHzQSFw?t=644, https://youtu.be/WgaWwWUYX64?t=200, https://youtu.be/WgaWwWUYX64?t=209, https://youtu.be/vYrxbdhLEN0?t=1083
- Cars obstructing view of an intersection: https://youtu.be/swmt44N9DJc?t=307, https://youtu.be/ejqpeFyqNz0?t=258, https://youtu.be/veLDLUXLrdQ?t=8, https://youtu.be/q46XoynHTpM?t=109, https://youtu.be/q46XoynHTpM?t=1016, https://youtu.be/m8jk2H7a-BI?t=70, https://youtu.be/9tgMe3CurNE?t=558, https://youtu.be/QCALZbDC_i0?t=172
- Cars obstructing view of a pedestrian/cyclist crossing: https://youtu.be/axCAi7Cjh2g?t=12, https://youtu.be/MReD5mieJ1c?t=1071, https://youtu.be/14c-iwZUh9M?t=5, https://youtu.be/Mzs0izUSoFo?t=14, https://youtu.be/vT7uI6EBQRM?t=238, https://youtu.be/O7UIACa35KY?t=366, https://www.youtube.com/shorts/IQHWUEPEwcg, https://youtu.be/vYrxbdhLEN0?t=551 (watch the whole video, it's very instructive)
- Pedestrian behavior around buses: https://youtu.be/oxN0tqO9cSk?t=8, https://youtu.be/03qTXV4aQKE?t=709
——————
And counterexamples, showing proper driving:
- Obstructed pedestrian crossing: https://www.youtube.com/watch?v=OThBjk-oFmk (I said proper driving, not proper cycling)
- Around a blind turn: https://youtu.be/86-qjb_m43A?t=294
- And to top it off, obstructed pedestrian crossing plus a bus: https://youtu.be/RpB4bx63qmg?t=439
——————
As you can see, LOS issues can pop up anywhere and there is no way to "fix" it. You have to adjust your behavior accordingly. You can't drive "optimistically", assuming nothing's there just because you can't see it. That's like closing your eyes and flooring it. Can't see nothing, therefore nothing is there!
When I see complaints about retrospectives from software devs they're usually about agile or scrum retrospective meetings, which have evolved to be performative routines. They're done every sprint (or week, if you're unlucky) and even if nothing happens the whole team might have to sit for an hour and come up with things to say to fill the air.
In software, the analysis following a mishap is usually called a post-mortem. I haven't seen many complaints about those have no value. Those are usually highly appreciated. Thought some times the "blameless post-mortem" people take the term a little too literally and try to avoid exploring useful failures if they might cause uncomfortable conversations about individuals making mistakes or even dropping the ball.
Regarding blamelessness, I think it was W. Edwards Deming who emphasized the importance of blaming process over people, which is always preferable, but its critical for individuals to at least be aware of their role in the problem.
It is nice though (as long as there isn't anyone in there that the team is afraid to be honest in front of), when people can vent about something that has been pissing them off, so that I as their manager know how they feel. But that happens only about 15-20% of the time. The rest is meaningless tripe like "Glad Project X is done" and "$TECHNOLOGY sucks" and "Good job to Bob and Susan for resolving the issue with the Acme account"
You mean to tell me that this comment section where we spew buzzwords and reference the same tropes we do for every "disaster" isn't performative.
I always thought that before the "Swiss cheese model" introduced in the 1990s that the term Swiss cheese was used to mean something that had oodles of security holes(flaws).
Perhaps I find the metaphor weird because pre-sliced cheese was introduced later in my life (processed slices were in my childhood, but not packets of pre-sliced cheese which is much more recent).
As Ops person, I've said that before when talking about software and it's mainly because most companies will refuse to listen to the lessons inside of them so why am I wasting time doing this?
To put it aviation terms, I'll write up something being like (Numbers made up) "Hey, V1 for Hornet loaded at 49000 pounds needs to be 160 knots so it needs 10000 feet for takeoff" Well, Sales team comes back and says NAS Norfolk is only 8700ft and customer demands 49000+ loads, we are not losing revenue so quiet Ops nerd!
Then 49000+ Hornet loses an engine, overruns the runway, the fireball I'd said would happen, happens and everyone is SHOCKED, SHOCKED I TELL YOU this is happening.
Except it's software and not aircraft and loss was just some money, maybe, so no one really cares.
I absolutely heard that in Hoover's voice.
Is there an equivalent to YouTube's Pilot Debrief or other similar channels but for ships?
The metaphor relies on you mixing and matching some different batches of presliced Swiss cheese. In a single block, the holes in the cheese are guaranteed to line up, because they are two-dimensional cross sections of three-dimensional gas bubbles. The odds of a hole in one slice of Swiss cheese lining up with another hole in the following slice are very similar to the odds of one step in a staircase being followed by another step.
You cannot create a swiss cheese safety model with correlated errors, same as how the metaphor fails if the slices all come from the same block of swiss cheese!
You have to ensure your holes come from different processes and systems! You have to ensure your swiss cheese holes come from different blocks of cheese!
Am I missing something? I feel like one of us is crazy when people are talking about improving process instead of assigning blame without addressing the base case.
I mean ultimately establishing a good process requires make good choices and not making bad ones, sure. But the kind of bad decisions that you have to avoid are not really "mistakes" the same way that, like, switching on the wrong generator is a mistake.
Edit wars aside, it's a nice philosophical question.
https://en.wikipedia.org/wiki/Francis_Scott_Key_Bridge_(Balt...
A lot of people wildly under-crimp things, but marine vessels not only have nuanced wire requirements, but more stringent crimping requirements that the field at large frustratingly refuses to adhere to despite ABYC and other codes insisting on it
The good tools will crimp to the proper pressure and make it obvious when it has happened.
Unfortunately the good tools aren't cheap. Even when they are used, some techs will substitute their own ideas of how a crimp should be made when nobody is watching them.
So outside of waiting time, I can go from eplan to "send me precrimped and labeled wires that were cut, crimped, and labeled by machine and automatically tested to spec" because this now exists as a service accessible even to random folks.
It is not even expensive.
Abdicating responsibility to those "good tools" are why shit never gets crimped right. People just crimp away without a care in the world. Don't get me wrong, they're great for speed and when all you're doing it working on brand new stuff that fits perfect. But when you're working on something sketchy you really want the feedback of the older styles of tool that have more direct feedback. They have a place, but you have to know what that place is.
See also: "the low level alarm would go off if it was empty"
The bad contact with the wire was just the trigger, that should have been recoverable had the regular fuel pumps been running.
Insurance providers insuring ships in US waters should also be required to permanently deny insurance coverage to vessels found to be out of compliance, though I doubt the insurance companies would want to play ball.
Was a FMECA (Failure Mode, Effects, and Criticality Analysis) performed on the design prior to implementation in order to find the single points of failure, and identify and mitigate their system level effects?
Evidence at hand suggests "No."
That's true in this case, as well. There was a long cascade of failures including an automatic switchover that had been disabled and set to manual mode.
The headlines about a loose wire are the media's way of reducing it to an understandable headline.
Instant classic destined for the engineering-disasters-drilled-into-1st-year-engineers canon (or are the other swiss cheese holes too confounding)
Where do you think it would fit on the list?
Fucking hell.
[1] As it happens I open with an anecdote about steering redundancy on ships in this post: https://www.gkogan.co/simple-systems/
Seems to me the only effective and enforceable redundancy that can be easily be imposed by regulation would be mandatory tug boats.
Way it worked in Sydney harbour 20+ years ago when I briefly worked on the wharves/tugs, was that the big ships had to have both local tugs, and a local pilot who would come aboard and run the ship. Which seemed to me to be quite an expensive operation but I honestly cant recall any big nautical disasters in the habour so I guess it works.
Which there are in some places. Where I grew up I'd watch the ships sail into and out of the oil and gas terminals, always accompanied by tugs. More than one in case there's a tug failure.
The crew weren't using the redundant fuel pumps. They were using the non-redundant fuel line flushing pump as a generator fuel pump, a task it was never designed for and which was not compliant.
That it doesn't restart on restoration of power is by design; you don't want to start flushing your fuel lines when the power returns because this could kill your generators and cause another blackout.
> Main engine shutting of (sic) when water pressure drops
Yeah, this is quite bad. There ought to be an override one can activate in an emergency in order to run the engines to the point of overheating, under the assumption that even destroying the engine will cause less catastrophic consequences than not having propulsion at the time.
> backup generator not even starting in time
There were 5 generators on board. Generators 1 through 4 are the main generators on the HV bus side, and the emergency backup generator is on the LV bus side.
When the incident occurred, the ship was being powered by generators 3 and 4, which were receiving their fuel via the non-redundant fuel line flushing pump. These generators powered the HV bus, which powered the LV bus via a transformer. The emergency backup generator was not running, so the LV bus was only receiving power from the HV bus via 1 transformer.
The incident tripped the circuit breaker for this transformer, disconnecting the HV bus from the LV bus, resulting in the first LV bus blackout. This resulted in main engine shutdown (coolant pump failure) and an automatic emergency backup generator startup.
There is an alternate (backup) set of circuit breakers and transformer that could have energised the LV bus, but the transformer switches were left in the manual position, so this failover did not happen automatically and immediately. There were no company procedures or regulations which required them to be left in the automatic position.
The LV bus also powered the fuel line flushing pump, so this pump failed. As a result, generators 3 and 4 started to fail (being supplied with fuel by a pump which was no longer operating). The electrical management system automatically commanded the start of generator 2 in response to the failing performance of generators 3 and 4.
Generator 1 and generator 2 were fed by the standard fuel pumps, which were available. One main generator is capable of powering the entire ship, so there was no need to start generator 1 as well; this would have just put more load on the HV bus (by having to run the fuel pump for generator 1 as well).
Instead of the automatic transformer failover (which was unavailable), the crew manually closed the same circuit breaker that had already tripped, 1 minute after the first LV bus blackout.
This restored power to the LV bus via the same transformer that was originally powering it, but did not restart the fuel line flushing pump supplying generators 3 and 4 (which were still running, but spinning down because they were being fed fuel via gravity only). This also restored full steering control, but this in itself was inadequate to control the vessel's course without the engine-driven propeller.
The main engine was still offline and takes upwards of half a minute to restart, assuming everyone is in place and ready to do so immediately, which was unlikely.
The emergency backup generator finally started 10 seconds later (25 seconds too late by requirements, 70 seconds after the first LV bus blackout).
Generator 2 had not yet gotten up to speed and connected to the HV bus before generators 3 and 4 disconnected (having exhausted the gravity-fed fuel in the line ahead of the inoperative fuel line flushing pump), resulting in an HV bus blackout and the second LV bus blackout. With only the emergency backup generator running on the LV side, only one-third of steering control was available, but again, this was inadequate without the engine.
3 seconds later, generator 2 connected to the HV bus. 26 seconds later, a crew member manually activated the alternate transformer, restoring power to the LV bus for the second time.
The collision was preventable:
- It is no longer a requirement that the engine automatically shuts down due to a loss of coolant pressure. It was at the time the vessel was constructed, but this was never re-evaluated. If it were, the system may have been tweaked to avoid losing the engine.
- If the transformer switches were left in the automatic position, the LV bus would have switched over to being powered by the second transformer automatically, and the engine coolant pumps and fuel line flushing pump would not have been lost.
- Leaving the emergency backup generator running (instead of in standby configuration) would have kept the LV bus energised after the first transformer tripped, and the engine coolant pumps and fuel line flushing pump would not have been lost.
- If the crew had opted to manually activate the second transformer within about half a minute (twice as fast as they reactivated the first one), and restarted the fuel line flushing pump, a second blackout would have been avoided, and the engine could have been restarted in time to steer away.
This shows the importance of leaving recovery systems armed and regularly training on power transfer procedures. It also illustrates why you shouldn't be running your main generators from a fuel pump which isn't designed for that task. This same pump setup was found on another ship they operated.
From the report:
> The low-voltage bus powered the low-voltage switchboard, which supplied power to vessel lighting and other equipment, including steering gear pumps, the fuel oil flushing pump and the main engine cooling water pumps. We found that the loss of power to the low-voltage bus led to a loss of lighting and machinery (the initial underway blackout), including the main engine cooling water pump and the steering gear pumps, resulting in a loss of propulsion and steering.
...
> The second safety concern was the operation of the flushing pump as a service pump for supplying fuel to online diesel generators. The online diesel generators running before the initial underway blackout (diesel generators 3 and 4) depended on the vessel’s flushing pump for pressurized fuel to keep running. The flushing pump, which relied on the low-voltage switchboard for power, was a pump designed for flushing fuel out of fuel piping for maintenance purposes; however, the pump was being utilized as the pump to supply pressurized fuel to diesel generators 3 and 4\. Unlike the supply and booster pumps, which were designed for the purpose of supplying fuel to diesel generators, the flushing pump lacked redundancy. Essentially, there was no secondary pump to take over if the flushing pump turned off or failed. Furthermore, unlike the supply and booster pumps, the flushing pump was not designed to restart automatically after a loss of power. As a result, the flushing pump did not restart after the initial underway blackout and stopped supplying pressurized fuel to the diesel generators 3 and 4, thus causing the second underway blackout (lowvoltage and high-voltage).
If you read the report they were misusing this pump to do fuel supply when it wasn't for that. And it was non redundant when fuel supply pumps are.
Its like someone repurposing a husky air compressor to power a pneumatic fire suppression system and then saying the issue is someone tripping over the cord and knocking it out.
That's everybody - captain, bridge crew, deck crew, cook, etc.
So - how many of those 22 will be your engineering crew? How many of those engineers would be on duty, when this incident happened? And once things start going wrong, and you're sending engineers off to "check why Pump #83, down on Deck H, shows as off-line" or whatever - how many people do you have left in the big, complex engineering control room - trying to figure out what's wrong and fix it, as multiple systems fail, in the maybe 3 1/2 minutes between the first failure and when collision becomes inevitable?
The spring terminals should also be designed to have a secondary latch on this type on (what should be) rugged installation.
Finally, critical circuits should be designed to detect open connections, and act accordingly. A single hardware<->software design for this could be a module to apply across all such wiring inputs/outputs. This is simple and cheap enough to do these days.
A manual tug-test on the physical would be advisable when installing, to check the spring terminal has gripped the conductor when latched.
When I used to be involved in control panels I would always yank on all the wires too :-)
Pre-contact everything is about the ship and why it hit anything, post-contact everything is about the bridge and why it collapsed. The ship part of the investigation wouldn't look significantly different if the bridge had remained (mostly) intact, or if the ship had run aground inside the harbor instead.
No, we are not talking a little paint-swapping.
Sucks to be any of the YouTubers influencers today telling everyone they should use WAGO connectors in all their walls.
Seriously though, impressive to trace the issue down this closely. I am at best an amateur DIY electrician, but I am always super careful about the quality of each connection.