RELIABILITY
AND SYSTEM ENGINEERING TEAM WORK IN SAFETY ANALYSIS FOR AVIONICS SYSTEMS.
Vigdor Brecher and Dan Rabinovitz
Elbit Systems Ltd, POB 539, Haifa 31053 Tel. 04-8316462 e-mail:
brecher@elbit.co.il
ABSTRACT
For Avionics
Systems the safety aspects need detailed and accurate attention from the safety
point of view. A team work methodology is presented for a systematic approach
in proper use of information and tools. The presentation includes practical equations,
checklists and examples.
INTRODUCTION
In Elbit
Systems Ltd (ESL) a teamwork concept in performing the safety related
activities was developed and applied on several projects during the last three
years. The Reliability, Availability, Maintainability and Safety (RAMS) group collaborates
with the System Engineering Group and Chief Safety Engineer effort in properly
training the system engineer on a project work basis in using the proper tools
and adopting the proper methodology. The methodology in part or integral was
applied on projects of simple display
systems through complex avionics display systems, weapon delivery systems and
full aircraft upgrading projects as well military and civil. The methodology is
used for Aircraft and Helicopter Upgrades, Unmanned
Airborne Vehicles (UAVs), Combat Vehicle Upgrades (see
Figure 1).
Figure 1
We tried to
assure that the same safety process is used across the whole
company and the safety process covers all lifecycle phases from pre-development
to in-service.
The
safety process was conducted as well as at aircraft level, therefore had the
capability to identify and manage safety issues that cross system boundaries.
We considered it very important that the safety process to be concurrent
with, and to interact with, system design and development. Therefore the System
Engineering Group involvement was so important to us.
SAFETY ASSESSMENT - TEAMWORK OF SYSTEM
ENGINEERING AND RAMS
Identification of Safety Standards/Procedures Currently Applied
In ESL applying safety tasks for military systems as per MIL-STD-882C/D System Safety Program
Requirements were well knows to Engineering Teams and System Engineers.
Identification of safety standards/procedures currently applied in ESL
and proper tailoring of the appropriate methodology was more difficult for
Civil Aviation applications or for UAVs.
Safety
Assessment Process (see Figure 2) includes Functional Hazard Assessment, Preliminary
System Safety Assessment, System Safety Assessment, Common Cause Analysis and Safety-Related
Flight Operations or Maintenance Tasks.
Figure 2
FAA documents along
Military Standards were properly adopted as basis for the assessment process:
Tools and Methodologies
System Engineering developed the Functional Flows and prepared simplified
schematics that were the basis for preparing overall Fault Trees to be discussed
with RAMS team.
We introduced FTA software tools to replace work performed
with Visio and trained system engineers in using it.
1.
The Hazard
Analysis was performed on hazards identified
2.
Equipment failure
rates of failure modes resulting in hazards defined in the PHL were allocated
from reliability data provided by the equipment manufacturer’s designers and
Elbit proprietary data.
3.
On the basis of
FMECA of equipments comprising System, failure rates causing total failure and
failure rates causing misleading or improper data generation were calculated.
4.
Assuming random
failure occurrence and independence between failures, probability of each basic
failure was calculated for normalized mission duration of one hour.
5.
Functional and
Electrical block diagrams were constructed for hazard assessment.
6.
Fault trees were
defined on the basis of the Functional and Electrical block diagrams, and
hazard probability calculated using FTA method.
7. The calculated probabilities of hazards occurrence were allocated to mission phases in which they are applicable.
8.
Hazard Analysis
results table was prepared.
The system
safety design order of precedence is consistent with MIL-STD-882D.The order of
precedence for satisfying system safety requirements and resolving identified
hazards is as follows:
(1)Design for
Minimum Risk. From the onset of design development the design is set to
eliminate hazards. If an identified hazard cannot be eliminated, reduce the
associated risk to an acceptable level, as defined by the managing authority,
through design selection.
(2)Incorporate
Safety Devices. If identified hazards
cannot be eliminated or their associated risk adequately reduced through design
selection, that risk shall be reduced to an acceptable level through the use of
fixed, automatic, or other protective safety design features or devices. Provisions shall be made for periodic
functional checks of safety devices when applicable.
(3) Provide Warning Devices. When neither design nor safety devices can
effectively eliminate identified hazards or adequately reduce the associated
risk, devices shall be used to detect the condition and to produce an adequate
warning signal to alert personnel of the hazard. Warning signals and their application shall
be designed to minimize the probability of incorrect personnel reaction to the
signals and shall be standardized within like types of systems.
(4)Develop
Procedures and Training. Where it is
impractical to eliminate hazards through design selection or adequately reduce
the associated risk with safety and warning devices, procedures and training
shall be used. However, without a
specific waiver, no warning, caution, or other form of written advisory shall
be used as the only risk reduction method for Category I or II hazards (as
defined in Table 4).
Used Tools
Fault Tree Analysis (FTA) is used in all the safety process during
safety assessment activity.
The adopted FTA methodology, implemented by using FTA software has the advantages that
the
analysis is more and more deep (fault tree more detailed) as the system
development progresses and its pictorial modeling capability.. The main modeling
capability of Fault Tree supports the following quantitative failure models:
· fixed unavailability and
failure frequency model
· constant failure and
repair rate model
· mean time to failure and
repair model
· dormant failure with
periodic inspection model
· sequential failure model
· standby model
· uncertainty values
The analyses supported by this tool are:
· cut sets (qualitative
analysis)
· calculation of system
unavailability and related parameters
· calculation of gates
probabilities
· common cause failures.
The Steps in Building a Fault Tree
are:
Values for
the Basic Events in FTA are calculated by RAMS for:
Electrical
components
Predicted failure rate data based on MIL-HDBK-217
Information from manufacturers (life test data)
Need to be adjusted for the proper environment and stresses
Software
databases
Field use
(last resort)
Mechanical
components
Determine stresses - loads (mechanical, environmental)
Construct stress/strength equation for multiple loads if required
Calculate
design (safety) margin and reliability (probability of failure) for the
required life
Manufacturing
defects per factory data or
field failure data.
Prepare Table of Criticality Levels
and Probability Classifications of evaluated failure conditions to include: Failure Condition -Criticality Level - Probability
Classification
In the Safety Assessment process, the Common Cause Analysis (CCA) at equipment level contributes to the verification of those independence requirements between equipment internal failures are met related to a catastrophic or hazardous Undesired Event (UE).
The CCA is performed during the equipment design and development phases, and is complementary to the Equipment Safety Analysis (ESA) but use a different methodology. The CMA is generally a qualitative analysis.
The CCA process is made up of five
steps:
· Collect CCA inputs,
· Identify the CCA independence criteria’s,
· Establish specific list of potential Common Cause Failures (Common Cause Types, Sources) adapted to the equipment under study,
· Analyse the design, the installation/maintenance/operation rules regarding the independence requirements by the identification of the precautions implemented to prevent CCF,
· Document the results of the above steps. (See also Figure 3)
Why Teamwork Is So Important
Safety
analyses involve some degree of intrinsic uncertainty. While the intended behavior
of the system is usually well understood, failures and their effects are harder
to understand.
As a consequence, there is a degree of subjectivity, in that identification.
Dealing with safety issues relies on the experience of the safety engineer and
the brainstorming with system engineering, pilots, is also of real help.
There is no way of testing for completeness of the hazard identification
during the early phases of the design process, and often no way of checking in
advance whether the safety problems that have been suggested are actually
present in the system (although this can usually be decided as experience is
gained in the longer term).
.
Different
groups need to work with different views of the system (e.g. systems engineers
/ functional view; safety engineers / hazard-directed view). This is generally
a benefit, in that taking different views can achieve new insights, and help to
identify flaws. However, this diversity can become a weakness if the views are
not consistent, or if there are problems of clarity and completeness of
comprehension when working with unfamiliar models. Therefore the teamwork is so
important.
Existing / traditional safety analysis techniques are difficult to use
on modern, complex systems for the following reasons:
- it is hard to produce analysis in a timely
manner (it is a slow, costly, labor-intensive, mainly manual process), which
limits the contribution safety analysis can make to evaluation of design
alternatives
- Minor design changes can have an extensive impact on the related
safety analysis
-Modern systems have a multitude of functions, and contain intricate
failure detection and management methods. As a consequence, there are a huge
number of failure conditions, and it is difficult to assess and model the way
in which failures propagate through the system
- It is hard to represent the dynamic behavior of complex systems using
techniques such as fault trees.
The following issue needs further attention and methodology solutions:
· Interaction between
structural and systems safety issues
· How to assess human
error
· How to assess software
error
· After the initial hazard
identification (based on information from past programs / lessons learned) and
the aircraft level FHA analysis, there is a lack of a systematic process for identifying
new hazards, particularly those introduced as the design changes and develops.
· The specification of
safety targets in terms of aircraft loss rates puts too much emphasis on quantitative
(rather than qualitative) safety analysis.
· Textual description of
failure modes is often too ambiguous.
· The classification of
failure conditions - this covers a number of related issues- lack of
consistency and repeatability (making it hard to achieve reliable comparisons /
trade-offs between the risks of different hazards or failure modes)
- over-pessimistic assessment of failure mode effects we reliability engineers are always pessimists!)
· Determination of Design
Assurance Levels / Safety Integrity Levels, and especially the criteria for
independence of failures
SUMMARY
A disciplined,
effective and integrative process dealing with the safety aspects of the system
is implemented in ESL projects. Serious steps were taken for an effective team
work between System Engineering and RAMS in performing safety tasks for complex
systems. This methodology was reviewed and applied to several projects Integration
of software with hardware or a systematic process for identifying new hazards
by combining the methodologies provided by military and civil standards shall
receive supplementary attention. This presentation stressed the importance of the
teamwork to achieve the goal.