high availability

created by Platypus
(idea) by Platypus (6.6 y) (print)   (I like it!) Sat Nov 13 1999 at 9:15:38
Highly available systems use component redundancy to simulate a very small Mean Time To Repair (MTRR). This often means giving the appearance of a system that does crash, but reboots _really quickly_. This is distinct from a fault tolerant system, which instead tries to simulate a very large Mean Time Between Failures (MTBF) and mask faults entirely.
(idea) by EE (3.8 y) (print)   (I like it!) Sun Apr 23 2000 at 21:29:25
High Availability is most often defined as a system that is free of Single Points Of Failure (often shortened to SPOFs). Thus, any component failing will result in system slowdown, but not in full loss of service. In practice, if an entire node in a High Availability cluster fails, there will be a brief loss of service (e.g, active TCP connections will be aborted), and a slowdown while the caches are being warmed.

In older texts, especially marketing material, you will see Fault Tolerant and Highly Available used interchangably. Nowadays, Fault Tolerant is most commonly used to describe a system where the hardware let any component fail without having any impact on the software.

Note that a highly available system may be more attractive than a fault tolerant system, as the former is usually also resistant to many forms of software fault.

Y'know, if you log in, you can write something here, or contact authors directly on the site. Create a New User if you don't already have an account.