Comments by "Tony Wilson" (@tonywilson4713) on "Boeing's Fatal Flaw (full documentary) | FRONTLINE" video.
-
92
-
35
-
24
-
12
-
10
-
5
-
4
-
4
-
3
-
2
-
2
-
@TheEvertw Sorry, but you clearly don't understand the FMEA processes.
I wanted to avoid any lengthy discussions but your reply needs it.
FYI - On top of the degree in aerospace qualifications I have formal qualifications in functional safety, 30+ years of experience and a pilots license with an aerobatic endorsement. So on top of my engineering I have formal training in handling and recovering aircraft from unusual attitudes. On top of that one of my frat brothers is a senior instructor on 737s with a major airline who was tasked as part of a team to go to Boeing and help get the plane recertified.
Your assumption is wrong from the start. Even before they made the system more aggressive it had issues that were found in the flight simulator in 2012. See that part of the video around 18:20.
I'll grant you have a point, EVERY form of analysis (not just FMEA, but HAZOP, CFD, FEA,....) have their limits. Even when you go into a lab with the most accurate equipment conceivable there's no such thing as perfect. Second to that is the reviews that need to be done regarding action items. HAZOPS/FMEAs always bring up action items. If they don't that's as much of a concern as if they do. What so many people misread in HAZOPS and FMEAs is that the analysis is only the first step in an iterative process. You analyze, you act, you review and if necessary go back and start again. If the action items that are identified are not handled and reviewed then things get missed.
You are almost on point regarding the software change that contributed. What you miss is that ANY CHANGE to a safety related system including software parameters requires reviewing and redoing/rechecking the specifications, FMEAs and anything else deemed relevant. This is a process required for ANY complex system. I have seen what can happen when a control room operator makes a change just to see what it does.
Here's a basic run down of part of safety engineering.
FIRST and FOREMEOST all functional safety is based around 3 fundamental concepts.
1: Any component WILL eventually fail if left in the field or in service long enough.
2: Any failure that can be reasonably assumed to potentially occur will eventually happen without maintenance and or replacement. In that definition maintenance also includes calibration (and re-calibration).
3: Fail safe which basically means that a fault of the safety function puts the system into a safe state. The absolute opposite of fail safe is fail dangerous, but you can get circumstances that are not exactly either. The problem with aircraft is they have only 1 fundamental safe state, which is secured on the ground with no fuel, no power and no people on board. Anytime an aircraft is in the sky its not in a safe state, but there are states more dangerous than others and states safer than others. Its also highly dependent on the aircraft type. I did my basics in Cessna C152s where stalls are fairly benign. I did my retracts and constant speed prop. in Mooney M20s where a stall is very serious issue. I did my basic aerobatics in R2160s and its stall is in between.
The other fundamental to understand regarding safety functions is the MooN concept which stands for "M out of N." Its a reference to how many sensors out of those available are required to trigger the safety function. The common variations we see in industry are 1oo1, 1oo2 and 2oo3. A 2 out of 3 system requires 2 out of the 3 sensors to report a value that triggers the safety function. MCAS for this part of its functionality was a 1oo1 system. It had one sensor and if that 1 sensor said the AOA was too high MCAS forced the nose down. 1oo1 systems are pretty common but not generally for CRITICAL safety functions. They are almost exclusively used in fail safe functions which makes then totally unsuited to aircraft critical safety functions.
Most engineers don't like dealing with safety engineers because we start with that first fundamental which is their design will eventually fail. Its a psychological barrier not an engineering one. Nobody likes being told that their design will fail. I've been in meetings where engineers have literally melted down at that, but if you don't go through the process you risk not uncovering flaws.
For a system like MCAS (and most safety related systems) you start with the sensors because that's where your software/computer gets its real world information. There are fundamentally 2 types of sensor the type that is either on or off (which we call digital) and the type that sends (or transmits) a signal representing a value (what we call analog). Yes there are all sorts of ways they can connect to the computers including redundant wiring and include diagnostic functions including fault detection. I build and design systems with these functions.
At the fundamental level ALL analog sensors have 4 basic failure modes that apply to any industry and any application.
1: Fail Low - when the fault sends the signal to its lowest value.
2: Fail High - when the fault sends the signal to its highest value.
3: Fail Steady - when the fault freezes the signal at a value.
4: Fail Erratic - when the fault makes the signal randomly behave.
FORGET all the arguments over which component does what. This is the basic starting point. For a system like MCAS you want to concentrate on what makes it trigger and what doesn't. So for the anti-stall ONLY I would modify the above with basic replies:
1: Fail Low to minimum AOA sensor value -> No action required MCAS anti-stall will not trigger.
2: Fail High or Fail steady above the MCAS trigger point -> MCAS will continue to push the nose down until the flight terminates.
3: Fail Steady below the trigger point -> No action required as MCAS anti-stall will not trigger.
4: Fail Erratic -> MCAS might Trigger, then de-activate then re-trigger as the signal oscillates.
That's about as basic as anyone could ever put this and it immediately provides 2 Action Items, the first of which guarantees a fatal crash because its a 1oo1 system that does not have the diagnostics to detect the fault and automatically ignore the sensor and as we now know is exactly what happened.
This is so basic to safety engineering its the sort of stuff you do on day 1.
The fact every other anti-stall system including the one Boeing uses on other aircraft uses at least 2 sensors and in some cases is linked to the rest of the aircraft sensor suite to check the validity of the information this system SHOULD NEVER HAVE BEEN CERTIFIED OR ALLOWED ON ANY AIRCRAFT.
Way back in college long before I did any industrial control systems we had an alum do a guest lecture. He was one of the lead aerodynamicists for the X-29 forward swept wing demonstrator. All that plane really wanted to do was rip its wings off and to prevent that it used computers. It was one of the first planes ever to use this kind of pilot augmented software to this almost extreme level. EVERYTHING was at least triple redundant. It actually had a massive effect on industrial safety with the advent of what are called TMR (triple mode redundant) systems like the Triconex safety platform.
As someone who came from an aerospace background who does work that has its foundation in the aerospace industry I'm stunned it was an aerospace company like Boeing that did this. I expect mining companies and manufacturers to behave like this. Even the oil & gas people who have a terrible track record now generally do better than this.
Boeing lost their way and I don't know if keeping people who contributed to that will help them find their way again.
2
-
1
-
1
-
1
-
1
-
1
-
1
-
1