You can see much more data in the report:
https://www.atsb.gov.au/sites/default/files/media/3532398/ao...
Wasn't the philosophy back then to run multiple independent (and often even designed and manufactured by different teams) computers and run a quorum algorithm at a very high level?
Maybe ECC was seen as redundant in that model?
It was, and they did (well, same design, but they were independent). I quote from the report:
"To provide redundancy, the ADIRS included three air data inertial reference units (ADIRU 1, ADIRU 2, and ADIRU 3). Each was of the same design, provided the same information, and operated independently of the other two"
> Maybe ECC was seen as redundant in that model?
I personally would not eschew any level of redundancy when it can improve safety, even in remote cases. It seems at the moment of the module's creation, EDAC was not required, and it probably was quite more expensive. The new variant apparently has EDAC. They retrofitted all units with the newer variants whenever one broke down. Overall, ECC is an extra layer of protection. The _presumably_ bit flip would be plausible to blame for data spikes. But even so, the data spikes should not have caused the controls issue. The controls issue is a separate problem, and it's highly likely THAT is what they are going to address, in another compute unit.
"There was a limitation in the algorithm used by the A330/A340 flight control primary computers for processing angle of attack (AOA) data. This limitation meant that, in a very specific situation, multiple AOA spikes from only one of the three air data inertial reference units could result in a nose-down elevator command. [Significant safety issue]"
This is most likely what they will address. The other reports confirm that the fix will be in the ELAC produced by Thales and the issue with the spikes detailed in the report was in an ADIRU module produced by Northrop Gruman.
Jeez, it would drive me _up the wall_. Let's say I could somewhat justify the security concerns, but this seems like it severely hampers the ability to design the system. And it seems like a safety concern.
Providing errors are independent, it's better to have three subsystems with 99% reliability in a voting arrangement than one system with 99.9% reliability.
Difference between it and ECC?
It’s confusing because EDAC and ECC seem to mean the same thing, but ECC is a term primarily used in memory integrity, where EDAC is a system level concept.
All of the value of your comment comes from the first sentence and the last two.
https://www.sciencedirect.com/science/article/abs/pii/S01419...
Because getting a new one certified is extremely expensive. And designing an aircraft with a new type certificate is unpopular with the airlines. Since pilots are locked into a single type at a time, a mixed fleet is less efficient.
Having a pilot switch type is very expensive, in the 50-100k per pilot range. And it comes with operational restrictions, you can't pair a newly trained (on type) captain with a newly trained first officer, so you need to manage all of this.
Significant internal hardware changes might indeed require re-certification, but it generally wouldn't mean that pilots need to re-qualify or get a new type rating.
But to do that you'll still have to prove that the changes don't change any of the aircraft characteristics. And that's not just the normal handling but also any failure modes. Which is an expensive thing to do, so Airbus would normally not do this unless there is a strong reason to do it.
The crew is also trained on a lot of knowledge about the systems behind the interface, so they can figure out what might be wrong in case of problems. That doesn't include the software architecture itself but it does include a lot of information on how redundancy between the systems work and what happens in case one system output is invalid. For example how the fail over logic works in case of a flight control computer failure, or how it responds to loosing certain inputs. And how that affects automation capabilities, like: no autoland when X fails, no autopilot and degradation to alternate contol law when Y fails, further degradation if X and Z fail at the same time. Sometimes also per "side", not all computers are connected to all sensors.
The computer change can't change any of that without requiring retraining.
Why would you assume they're not? I don't know about aircraft specifically, but there's plenty of hardware that uses components older than that. Microchip still makes 8051 clones 45 years after the 8051 was released.
Guessing that using previously certified stuff is an advantage
https://forums.raspberrypi.com/viewtopic.php?t=99167
https://forums.raspberrypi.com/viewtopic.php?f=28&t=99042
https://www.raspberrypi.com/news/xenon-death-flash-a-free-ph...
https://en.wikipedia.org/wiki/Single-event_upset
For manned spaceflight, NASA ups N from 3 to 5.
Other mitigations include completely disabling all CPU caches (with a big performance hit), and continuously refreshing the ECC RAM in background.
There are also a bunch of hardware mitigations to prevent "latch up" of the digital circuits.
Eg. I could understand if each subsystem had its own actuators and they were designed so any 3 could aerodynamically override the other 2, but I don't think that's how it works in practice.
Hardware fix is the ultimate solution but it might be possible to paper over with software.
The moment to avoid the accident was probably the very first moment when Bonin entered a steep climb when the plane was already at 35,000 feet, only 2000 feet below the maximum altitude for its configuration. This was already a sufficiently insane thing to do that the other less senior pilot should have taken control, had CRM been functioning effectively. What actually happened is that both of the pilots in the cockpit at the start of the incident failed to identify that the plane was stalled despite the fact that (i) several stall warnings had sounded and (ii) the plane had climbed above its maximum altitude (where it would inevitably either stall or overspeed) and was now descending. It’s never very satisfying to blame pilots, but this was a monumental fuck up.
If the pilots genuinely disagree about control inputs there is not much that hardware or software can do to help. Even on aircraft with traditional mechanically linked control columns like the 737, the linkage will break if enough pressure is applied in opposite directions by each pilot (a protection against jamming).
I still design this into many of the things I work on, especially if I’m working close to the metal on controller systems. At some point it becomes ridiculous / impossible but I’m often thinking about how a system would handle memory corruption, bit flips, invalid sensor data, etc. These days, somebody should design a triple redundant microcontroller that runs quorum on the gpio at the hardware level. It could be a 0.30 part instead of 0.10 one, but I would specify it just about everywhere. Add $3 to BOM cost to categorically eliminate an entire class of failure would be ramrodded by legal into just about every medical device, PLC, critical automotive system, etc one would think. Seems like a good gambit for a riscV startup, but what do I know.
Mind you whatever came out of that project is rolling on the street today.
"This identified vulnerability could lead in the worst case scenario to an uncommanded elevator movement that may result in exceeding the aircraft structural capability."
I jest, but, once upon a time I worked with an infallible developer. When my projects crashed and burned, I would assume that it was my lack of competence and take that as my starting point. However, my colleague would assume that it was a stray neutrino that had flipped a bit to trigger the failure, even if it was a reproducible error.
He would then work backwards from 93 million miles away to blame the client, blame the linux kernel, blame the device drivers and finally, once all of that and the 'three letter agencies' were eliminated, perhaps consider the problem was between his keyboard and his chair.
In all fairness, he was a genius, and, regarding the A320 situation, he would have been spot on!
If a radiation event caused some bit-flip, how would you realize that's what triggered an error? Or maybe the FDR does record when certain things go wrong? I'm thinking like, voting errors of the main flight computers?
Anyway, would be very interested to know!
Airbus/Thales's fix in this case appears to add more error checking, and to restart the misbehaving component. https://bea.aero/fileadmin/user_upload/BEA2024-0404-BEA2025-...
("une supervision interne du composant à l’origine de la défaillance ; - un mécanisme de redémarrage automatique de ce composant dès lors que la défaillance est détectée)
I turn the page on the excuse sheet. "SOLAR FLARES" stares out at me. I'd better read up on that..."
So the immediate cost to Airbus of grounding the fleet is quite low, whilst the downside of not grounding the fleet (risk of incident, lawsuits, reputation, etc.) could be substantial.
It sounds like the fix is fairly quick so probably not as expensive as the max multi month groundings
I doubt anyone is going to sue. Repairs etc are a part of life when owning aircraft. So as long as Airbus makes this happen fast and smooth they’re probably ok
"We take proactive measures, whereas our competitor only takes action after multiple fatal crashes!"
As far as I'm concerned it has not helped with their marketing.
The cause could have also been an extra check introduced in one of the routines - which backfired in this particular failure scenario.
How is it possible that this wouldn't impact upon flight schedules?
I presume they mean a Coronal Mass Ejection.
https://www.swpc.noaa.gov/noaa-scales-explanation
https://kauai.ccmc.gsfc.nasa.gov/CMEscoreboard/prediction/de...
The European Agency Aviation Safety Agency [2] instruction describes the characteristics of the incident but not the date.
[1] https://www.theguardian.com/business/2025/nov/28/airbus-issu...
> At least 15 passengers were injured and taken to the hospital after a sudden drop in altitude on the flight from Mexico was forced to make an emergency landing in Florida, US aviation officials said at the time.
> The Thursday flight from Cancun was headed to Newark, New Jersey, when the altitude dropped, leading to the diversion to Tampa International Airport, the US Federal Aviation Administration said in a statement.
> Pilots reported “a flight control issue” and described injuries including a possible “laceration in the head,” according to air traffic audio recorded by LiveATC.net.
> Medical personnel met the passengers and crew on the ground at the airport. Between 15 and 20 people were taken to hospitals with non-life-threatening injuries, said Vivian Shedd, a spokesperson for Tampa Fire Rescue.
> Pablo Rojas, a Miami-based attorney who specialises in aviation law, said a “flight control issue” indicated that the aircraft wasn't responding to the pilots.
https://www.stuff.co.nz/travel/360903363/what-happened-fligh...
I’m surprised passengers are allowed to unbuckle for so much of each flight. You can get injured while buckled it, but that seems less common.
Only aviation professionals or recovering flight phobics like me who have watched every episode of Air Crash Investigation will take proactive safety measure of their own accord. To normies it's all just a pointless hassle.
Curious what a sw change might have done in terms of resiliency. Maybe an incorrect memory setting or some code path that is not calculating things redundantly maybe?
So here’s everything you need to know about ELAC.
The ELAC System in the Airbus A320: The Brains Behind Pitch and Roll Control https://x.com/Turbinetraveler/status/1994498724513345637