The software supportability and reliability site

Software Reliability

 

What is Software Reliability?


The topic of software reliability is concerned with all life cycle activities that prevent, detect, remove, and/or mitigate software faults, and that verify/validate the degree to which the operating software will not cause system failures. Software reliability is (quantitatively) defined as the probability of failure-free operation of a software program for a specified time under specified operating conditions. However, having a "mean time to failure (MTBF)" with "margins" and "confidence limits", even with the appropriate accompanying data evidence, is not generally sufficient to convince customers, regulatory authorities, or even the system/software suppliers that the software satisfies its requirements. Thus, software reliability is also (qualitatively) demonstrated by the process followed to develop the software. The dimension of a good development process are: use of best practice engineering methods, implementation of fault tolerance design, application of specialized methods and procedural methods to ensure mistake-proof loading and/or operation. These tools and techniques provide evidence improving the confidence that the software will not cause a system failure. These reliability assurance techniques can be extended, eg. Fault Tree Analysis or Fault Seeding, to ensure safety- and/or security-critical requirements are met.

What does Software Reliability Encompass?

Software, today, is typically the major concern for reliability assurance in most important system applications. Because the software component typically manages the overall system functions, faults in the software may cause critical system outages. Such system failures due to software faults are classified as "software failures". Faults can cause the system to hang, crash, or not perform a task that the customer wishes it to perform. Thus, it is important to use methods and techniques that provide evidence that the software component has been designed, implemented, tested, installed, and, as necessary, updated without faults that might result in undesirable system failures.

There are similarities between hardware and software failures and also differences. Software failures are primarily the result of design defects (during development or maintenance). Other failure sources include use-induced degradation as well as inadequate operational procedures and logistics operations documentation that is considered part of the "software data package". Hardware failures are primarily the result of physical wear out. Other failure sources include design defects, manufacturing quality deficiency, or maintenance or operating errors. Some system failures are the result of a combination of hardware and software faults. It is generally easier to implement changes to software than to hardware, although any component change must be part of a system support concept that includes continued reliability analysis. Hardware is generally repaired to an original state, unless there is a reason to modify it. Software can frequently be returned to its original state by re-initializing. This cleans up the software envrionment (eg. emptying queues and correcting memory leaks). The software structure can be changed also to correct, enhance, and adapt the code, so as to become a new version, that is, a new product.

Both hardware and software must be managed as an integrated system. The reliability of the system will depend on the reliability of the hardware and software as an integrated whole. Some techniques to manage the system reliability will be similarly applied to hardware and software components whereas other techniques will be unique to hardware or to software. In addition, the application of a given technique may be different for software than for hardware.

There are no existing methods that guarantee delivered software has no faults. That is, there is always some likelihood that under certain environmental conditions and system operational use, faults in software will be encountered that result in failures of the system. In short, software reliability is not "1.0". There are existing methods and techniques that correlate with delivery of software with reduced faults/failures. It is desirable to provide sufficient quantitative and qualitative evidence that appropriate development and support activities have been properly conducted to prevent, detect, remove, and/or mitigate possible software faults, particularly those faults that might result in critical system failures.

How might faults be prevented, detected, removed, and/or mitigated in the software development and/or support activities? What techniques might be used to provide quantitative or qualitative evidence that faults capable of causing a system failure do not exist in the software component? Given limited resources and time, which combination of techniques provides the "optimum" cost/benefit results - and what measures are appropriate and "accurate enough"? How are decisions made to select such techniques and how is the evidence from the use of such techniques collected and presented? It is these concerns for which research and practical experience provide some guidelines - both for management of a software reliability program and conduct of life cycle activities using appropriate software engineering and reliability-specific techniques. It is from these types of guidelines, typically constrained to specific application domains and operational scenarios, that software reliability standards can be established.

When high levels of reliability need to be assured, it will be necessary to use several sources of evidence to support reliability claims. Combining such disparate evidence to aid decision making is itself a difficult task and a topic of current research. Four areas of evidence are important in terms of benefits and limitations:

  1. evidence from software components and structure;
  2. evidence from static analysis of the software product;
  3. evidence from testing of software under operational conditions; and
  4. evidence of process quality.

Among the challenges to provide software reliability assurance, there are cultural issues in addition to hard technical research questions to be investigated.


Reliability Confidence Limits


A comprehensive list of software reliability references can be found here. A list of software reliability standards can be found here.

Last updated:
17th January 2004
Contact us: Mail our Webmaster
Please read also our Web Privacy and Security Notice.

Hosted by:
Exobits