Accession Number:

AD1024222

Title:

FAIL-SAFE: Fault Aware IntelLigent Software for Exascale

Descriptive Note:

Technical Report,05 Aug 2013,04 Apr 2016

Corporate Author:

University of Southern California Los Angeles United States

Personal Author(s):

Report Date:

2016-06-13

Pagination or Media Count:

50.0

Abstract:

The University of Southern California USC, the Lawrence Livermore National Laboratory LLNL, and the Jet Propulsion Laboratory JPL believe that a new generation of dependable applications must be developed to successfully exploit this next generation of technology. Such applications and the systems they run on must be introspective and adaptive, actively searching for errors in their program state with hardware mechanisms and new software techniques. Towards this end, we have developed and demonstrating the technology to enable adaptive, application-oriented control of fault tolerance, for a set of scientific applications on a workstation-class system by injecting memory faults and observing the survivability of the applications. We have defined an assertion language that provides programmer with a convenient interface to specific the resilient characteristics of applications and have implemented a limited set of these assertions as source-to-source transformations in the ROSE-compiler infrastructure. The outcomes of this research provide a model for the vendors of Defense systems, and a prototype capability should the vendors chose notto bring such technology to market. The increased application resilience resulting from this research will lead to faster completion of Defense applications, and thus substantial energy savings as well as increased mission assurance.

Subject Categories:

  • Computer Programming and Software

Distribution Statement:

APPROVED FOR PUBLIC RELEASE