Transient Error Reliability Models Based on Data Analysis.
CARNEGIE-MELLON UNIV PITTSBURGH PA DEPT OF COMPUTER SCIENCE
Pagination or Media Count:
Experimental data on transient errors from several digital computer systems is presented and analyzed. This is the first large scale public study on the statistical distribution of transient errors. The systems for which data has been collected are the DEC PDP-10 series computers, the CM trademark multiprocessor, and the C.vmp fault tolerant microprocessor. Statistical tests indicate that transient errors follow a decreasing hazard rate distribution. This is at variance with the standard assumption of constant hazard rates exponential distribution used in reliability modeling, and requires models of greater complexity for accurate results. Models of common fault tolerant redundant structures are developed using the Weibull distribution, which has a time-varying hazard rate. Both analytical and simulation models are used to analyze the differences between the reliabilities predicted by Weibull based transient error models and those predicted by exponential based models. The analysis indicates a significant difference between the models based on the exponential distribution and those based on the decreasing hazard rate Weibull distribution. Reliability differences ranging from -0.10 to 0.20 and factors greater than 2.0 in Mission Time Improvement for Weibull parameters equivalent to measured system behavior are seen in the model results. System designers should be aware of these differences. Author
- Statistics and Probability
- Computer Hardware