"The Whole System" Author: Jerry Feinstein Introduction It is generally agreed that taking a "systems approach" to allocating and assessing the safety and RMA characteristics of a system is the correct approach. That is all elements of a system must be included in the allocation or requirements build-up. There is no general approach to defining "the system", however, for these purposes. At a high level, major components of most systems are the human operator, hardware, software, and interfaces (between major subsystems). These components may be either airborne, surface, undersea, or some of both. An example of an interface safety issue is a navagation system that provided postion data from a ground station to an aircraft. It is possible that when both the ground and airborne systems are working correctly, the aircraft may not receive the ground station data or receive it incorrectly as a result of either incompatible protocols or ionospheric interference. Safety Hazards or system unavailability are introduced at/by each of these components. The probability of an undesired event being introduced at the "System Level" the Boolean sum of the probability of each of these component contributions. Current Approaches to Specifying and Evaluating System Safety RMA Requirements Most programs make an attempt to identify and mitigate the most significant hazards or failure paths associated with these components. In contrast, there is no standardized approach in determining which components should be considered in specifying or evaluating the associated thresholds. Most often, it is customary to consider only the probability of hardware contributing to hazards or system failure, assuming that the interfaces between major hardware components are included through the analysis process. In other cases, more often in reliability than safety activities, the probability of software contributing to hazards or faults is included. Rarely is the probability of human error leading to a hazard or system failure included. An alternate approach to dealing with the hazards introduced by software is to apply the quality assurance provisions of a document like RTCA DO-178B. That is adjust the thoroughness of software documentation and testing to match the "safety impact" level associated with the functions controlled by the software. Some users of this approach make the incorrect presumption that the "DO-178B" process eliminates all software errors and their contribution to the probability of safety hazards. An effort to "quantify" Human error is even more rarely considered. Hazard mitigation is approached through design standards, critical task analysis, and simulation testing.
Issue Discussion 1) Excluding the probability of software and human error in evaluating safety risk or RMA characteristics leads to an: a) Understatement of hazard risk or proability of failue, and an b) Allocation of too large a risk to hardware components based on the assumption that neither the software or human operator will impact the safety or reliability.. 2) The existence of software "failure rates" is not universally accepted. Some have a problem with the concept that "software can break." This is a misunderstanding of software failure. Software failures result from incorrect specification, logic design, and coding. The methodology to establish software failure rates is less established than that for hardware and is subject to more variables than hardware failure rates. Important figures in software system safety such as Nancy Leveson are opposed to the use of probabilistic risk assessment for software safety as a result of the credibility issue. In contrast, there is a substantial industry devoted to software safety failure prediction and measurement. There are a number of models used for software predictions and some data available applicable to probabilistic risk assessment. 3) The existence of human failure rates is pretty much accepted but establishing credible ones is very difficult. Some sources do exist for software human reliability analysis such as NUREG/CR-1278 and for performance of human task safety analyses such as NUREG/CR-4772. Additionally, the ratio of the probability of human error to the other components listed in the first paragraph of this paper is often so high that leads to the argument that risk mitigation for the hardware or software is a second order effect. Proposals 1) Establishment of a standardized methodology that requires acknowledgement of the safety impact of the four system components listed above during development of the mission need statement and subsequent requirement development. This could be accomplished in a preliminary hazard list. 2) Establishment of the identification of which components are included in numerical safety or RMA related requirement specification requirements. 3) Establish a shopping list of acceptable techniques to verify compliance with software and human reliability and resulting safety requirements. Qualitative analysis can be considered. 4) Provide guidance in choosing analysis approaches and evaluating analyses performed to compatible with system interface inclusion.
Send comments or suggestions to the author directly at the E-mail address below. ![]() Copyright Jerold H. Feinstein, PE 1997 All rights reserved; contact for permission to use This page was last updated on 02/18/98 and is located at http://www.oocities.org/CapeCanaveral/Hangar/6056 This page hosted by
|