데이터셋 상세
미국
Towards Software Health Management with Bayesian Networks
As software and software intensive systems are becoming increasingly ubiquitous, the impact of failures can be tremendous. In some industries such as aerospace, medical devices, or automotive, such failures can cost lives or endan- ger mission success. Software faults can arise due to the inter- action between the software, the hardware, and the operating environment. Unanticipated environmental changes lead to software anomalies that may have significant impact on the overall success of the mission. Latent coding errors can at any time during system operation trigger faults despite the fact that usually a significant effort has been expended in verification and validation (V&V) of the software system. Nevertheless, it is becoming increasingly more apparent that pre-deployment V&V is not enough to guarantee that a com- plex software system meets all safety, security, and reliabil- ity requirements. Software Health Management (SWHM) is a new field that is concerned with the development of tools and technologies to enable automated detection, diagnosis, prediction, and mitigation of adverse events due to software anomalies, while the system is in operation. The prognos- tic capability of the SWHM to detect and diagnose failures before they happen will yield safer and more dependable systems for the future. This paper addresses the motivation, needs, and requirements of software health management as a new discipline and motivates the need for SWHM in safety critical applications.
데이터 정보
연관 데이터
Adaptive Load-Allocation for Prognosis-Based Risk Management
공공데이터포털
It is an inescapable truth that no matter how well a system is designed it will degrade, and if degrading parts are not repaired or replaced the system will fail. Avoiding the expense and safety risks associated with system failures is certainly a top priority in many systems; however, there is also a strong motivation not to be overly cautious in the design and maintenance of systems, due to the expense of maintenance and the undesirable sacrifices in performance and cost effectiveness incurred when systems are over designed for safety. This paper describes an analytical process that starts with the derivation of an expression to evaluate the desirability of future control outcomes, and eventually produces control routines that use uncertain prognostic information to optimize derived risk metrics. A case study on the design of fault-adaptive control for a skid-steered robot will illustrate some of the fundamental challenges of prognostics-based control design.
Health Monitoring and Prognostics for Computer Servers
공공데이터포털
**Abstract** Prognostics solutions for mission critical systems require a comprehensive methodology for proactively detecting and isolating failures, recommending and guiding condition-based maintenance actions, and estimating in real time the remaining useful life of critical components and associated subsystems. A major challenge has been to extend the benefits of prognostics to include computer servers and other electronic components. The key enabler for prognostics capabilities is monitoring time series signals relating to the health of executing components and subsystems. Time series signals are processed in real time using pattern recognition for proactive anomaly detection and for remaining useful life estimation. Examples will be presented of the use of pattern recognition techniques for early detection of a number of mechanisms that are known to cause failures in electronic systems, including: environmental issues; software aging; degraded or failed sensors; degradation of hardware components; degradation of mechanical, electronic, and optical interconnects. Prognostics pattern classification is helping to substantially increase component reliability margins and system availability goals while reducing costly sources of "no trouble found" events that have become a significant warranty-cost issue. **Bios** Aleksey Urmanov is a research scientist at Sun Microsystems. He earned his doctoral degree in Nuclear Engineering at the University of Tennessee in 2002. Dr. Urmanov's research activities are centered around his interest in pattern recognition, statistical learning theory and ill-posed problems in engineering. His most recent activities at Sun focus on developing health monitoring and prognostics methods for EP-enabled computer servers. He is a founder and an Editor of the Journal of Pattern Recognition Research. Anton Bougaev holds a M.S. and a Ph.D. degrees in Nuclear Engineering from Purdue University. Before joining Sun Microsystems Inc. in 2007, he was a lecturer in Nuclear Engineering Department and a member of Applied Intelligent Systems Laboratory (AISL), of Purdue University, West Lafayette, USA. Dr. Bougaev is a founder and the Editor-in-Chief of the Journal of Pattern Recognition Research. His current focus is in reliability physics with emphasis on complex system analysis and the physics of failures which are based on the data driven pattern recognition techniques.
The Case for Software Health Management
공공데이터포털
Software Health Management (SWHM) is a new field that is concerned with the development of tools and technologies to enable automated detection, diagnosis, prediction, and mitigation of adverse events due to software anomalies. Significant effort has been expended in the last several decades in the development of verification and validation methods for software intensive systems, but it is becoming increasingly more apparent that this is not enough to guarantee that a complex software system meets all safety and reliability requirements. Modern software systems can exhibit a variety of failure modes which can go undetected in a verification and validation process. While standard techniques for error handling, fault detection and isolation can have significant benefits for many systems, it is becoming increasingly evident that new technologies and methods are necessary for the development of techniques to detect, diagnose, predict, and then mitigate the adverse events due to software that has already undergone significant verification and validation procedures. These software faults often arise due to the interaction between the software and the operating environment. Unanticipated environmental changes lead to software anomalies that may have significant impact on the overall success of the mission. Because software is ubiquitous, it is not sufficient that errors are detected only after they occur. Rather, software must be instrumented and monitored for failures before they happen. This prognostic capability will yield safer and more dependable systems for the future. This paper addresses the motivation, needs, and requirements of software health management as a new discipline. Published in the Proceedings of the IEEE Conference on Space Mission Challenges for Information Technology, Palo Alto, CA, August 2011.
A Combined Model-Based and Data-Driven Prognostic Approach for Aircraft System Life Management
공공데이터포털
Failure prognosis - as a natural extension to the fault detection and isolation (FDI) problem - has become a key issue in a world where the economic impact of system reliability and cost-effective operation of critical assets is steadily increasing. Failure prognostic algorithms aim to characterize the evolution of incipient fault conditions in complex dynamic processes, thus allowing to estimate of the remaining useful life (RUL) of subsystems and components. Several examples can be used here to illustrate the range of possible applications for these algorithms: electro-mechanical systems, continuous-time manufacturing processes, structural damage analysis, and even fault tolerant software architectures. Most of them have in common the fact that they are highly complex, nonlinear, and affected by large-grain uncertainty. We introduce in this chapter an integrated failure prognosis architecture that is applicable to a variety of aircraft systems and industrial processes. We are targeting a specific rotorcraft system as a prototypical testbed for proof-of-concept. The overall architecture consists of an on-board and an off-board module for eventual on-platformimplementation purposes.
A Bayesian Framework for Remaining Useful Life Estimation
공공데이터포털
The estimation of remaining useful life (RUL) of a faulty component is at the center of system prognostics and health management. It gives operators a potent tool in decision making by quantifying how much time is left until functionality is lost. This is especially true for aerospace systems, where unanticipated subsystem downtime may lead to catastrophic failures. RUL prediction needs to contend with multiple sources of error like modeling inconsistencies, system noise and degraded sensor fidelity. Bayesian theory of uncertainty management provides a way to contain these problems by integrating out the nuisance variables. We use the Relevance Vector Machine (RVM), for model development. RVM is a Bayesian treatment of the well known Support Vector Machine (SVM), a kernel-based regression/classification technique. This model is next used in a Particle Filter (PF) framework. Statistical estimates of the noise in the system and anticipated operational conditions are processed to provide estimates of RUL in the form of a probability density function (PDF). Validation of this approach on experimental data collected from Li-ion batteries is presented.
The ProADAPT System in the 2009 Diagnostic Challenge Competition
공공데이터포털
Reliable systems health management is an important research area of NASA. A health management system that can accurately and quickly diagnose faults in various on-board systems of a vehicle will play a key role in the success of current and future NASA missions. We introduce in this paper the ProDiagnose algorithm, a diagnostic algorithm that uses a probabilistic approach, accomplished with Bayesian Network models compiled to Arithmetic Circuits, to diagnose these systems. We describe the ProDiagnose algorithm, how it works, and the probabilistic models involved. We show by experimentation on two Electrical Power Systems based on the ADAPT testbed, used in the Diagnostic Challenge Competition (DX-09), that ProDiagnose can produce results with over 96% accuracy and < 1 second mean diagnostic time. **Reference:** B. W. Ricks, and O. J. Mengshoel. "The Diagnostic Challenge Competition: Probabilistic Techniques for Fault Diagnosis in Electrical Power Systems." Proc. of the 20th International workshop on Principles of Diagnosis (DX-09) Stockholm, Sweden, 2009 **BibTex Reference:** @inproceedings{ricks09diagnostic, author = {Ricks, B. W. and Mengshoel, O. J.}, title = {The Diagnostic Challenge Competition: Probabilistic Techniques for Fault Diagnosis in Electrical Power Systems}, booktitle = {Proc. of the 20th International Workshop on Principles of Diagnosis (DX-09)}, address = {Stockholm, Sweden}, year = {2009} }
Data Mining in Systems Health Management
공공데이터포털
This chapter presents theoretical and practical aspects associated to the implementation of a combined model-based/data-driven approach for failure prognostics based on particle filtering algorithms, in which the current esti- mate of the state PDF is used to determine the operating condition of the system and predict the progression of a fault indicator, given a dynamic state model and a set of process measurements. In this approach, the task of es- timating the current value of the fault indicator, as well as other important changing parameters in the environment, involves two basic steps: the predic- tion step, based on the process model, and an update step, which incorporates the new measurement into the a priori state estimate. This framework allows to estimate of the probability of failure at future time instants (RUL PDF) in real-time, providing information about time-to- failure (TTF) expectations, statistical confidence intervals, long-term predic- tions; using for this purpose empirical knowledge about critical conditions for the system (also referred to as the hazard zones). This information is of paramount significance for the improvement of the system reliability and cost-effective operation of critical assets, as it has been shown in a case study where feedback correction strategies (based on uncertainty measures) have been implemented to lengthen the RUL of a rotorcraft transmission system with propagating fatigue cracks on a critical component. Although the feed- back loop is implemented using simple linear relationships, it is helpful to provide a quick insight into the manner that the system reacts to changes on its input signals, in terms of its predicted RUL. The method is able to manage non-Gaussian pdf’s since it includes concepts such as nonlinear state estimation and confidence intervals in its formulation. Real data from a fault seeded test showed that the proposed framework was able to anticipate modifications on the system input to lengthen its RUL. Results of this test indicate that the method was able to successfully suggest the correction that the system required. In this sense, future work will be focused on the development and testing of similar strategies using different input-output uncertainty metrics.
Evaluating Algorithm Performance Metrics Tailored for Prognostics
공공데이터포털
Prognostics has taken center stage in Condition Based Maintenance (CBM) where it is desired to estimate Remaining Useful Life (RUL) of a system so that remedial measures may be taken in advance to avoid catastrophic events or unwanted downtimes. Validation of such predictions is an important but difficult proposition and a lack of appropriate evaluation methods renders prognostics meaningless. Evaluation methods currently used in the research community are not standardized and in many cases do not sufficiently assess key performance aspects expected out of a prognostics algorithm. In this paper we introduce several new evaluation metrics tailored for prognostics and show that they can effectively evaluate various algorithms as compared to other conventional metrics. Four prognostic algorithms, Relevance Vector Machine (RVM), Gaussian Process Regression (GPR), Artificial Neural Network (ANN), and Polynomial Regression (PR), are compared. These algorithms vary in complexity and their ability to manage uncertainty around predicted estimates. Results show that the new metrics rank these algorithms in a different manner; depending on the requirements and constraints suitable metrics may be chosen. Beyond these results, this paper offers ideas about how metrics suitable to prognostics may be designed so that the evaluation procedure can be standardized.
Real System Failures
공공데이터포털
This resource area contains descriptions of actual electronic systems failure scenarios with an emphasis on the diversity of failure modes and effects that can befall dependable systems. Introductory pages begin [here](/dashlink/static/media/other/Introduction1.html). The descriptions begin [here](/dashlink/static/media/other/ObservedFailures1.html). These pages are separated into sections. Each section starts with a List of failure scenarios. In between the List slides are slides that give more information on those scenarios which warrant more than a bullet or two of explanation. Some references are listed [here](/dashlink/static/media/other/References.html). A list of acronyms and initialisms is [here](/dashlink/static/media/other/Acronyms_Initialisms.html). If you would like to add a story to this list or add additional significant details to an existing story, please contact Kevin Driscoll at ![](/dashlink/static/media/other/KevinDriscoll-email.PNG) For a not-quite-working wiki subset of this Resource area, click on the Wiki link just to the left of this Summary or go to the URL [https://c3.nasa.gov/dashlink/projects/79/wiki/test_stories_split](/dashlink/projects/79/wiki/test_stories_split). Also, those who log in can add comments to the Discussions at the bottom of this page.
A Survey of Health Management User Objectives in Aerospace Systems Related to Diagnostic and Prognostic Metrics
공공데이터포털
One of the most prominent technical challenges to effective deployment of health management systems is the vast difference in user objectives with respect to engineering development. In this paper, a detailed survey on the objectives of different users of health management systems is presented. These user objectives are then mapped to the metrics typically encountered in the development and testing of two main systems health management functions: diagnosis and prognosis. Using this mapping, the gaps between user goals and the metrics associated with diagnostics and prognostics are identified and presented with a collection of lessons learned from previous studies that include both industrial and military aerospace applications.*