So how can consumer and business users alike predict when their hard drives are about to fail? Well, their first port of call might be to check the manufacturers’ estimates of their storage device lifespans, usually provided in the form of a mean time between failures (MTBF) rating.
However, common as this benchmark is, it’s important to bear in mind that the given reading might not be as transparent and reassuring as it first appears.
What is mean time between failures (MTBF)?
In theory, an MTBF rating is pretty much what it sounds like – the average period of time between one inherent failure and the next in a single component’s lifespan. So, if a machine or part malfunctions and is afterwards repaired, its MTBF figure is the number of hours it can be expected to run as normal before it breaks down again.
With consumer hard drives, it’s not uncommon to see MTBFs of around 300,000 hours. That’s 12,500 days, or a little over 34 years. Meanwhile, enterprise-grade HDDs advertise MTBFs of up to 1.5 million hours, which is the best part of 175 years. Impressive stuff!
It should be plain to see that these figures are misleading, and that they’re a far cry from our real-world expectations of hard drive longevity and reliability. That’s not because there’s a problem with the MTBF metric per se – far from being a marketing buzzword, it has a long and distinguished lineage in military and aerospace engineering. But realistically, no hard drive manufacturer has been testing its enterprise HDDs since the mid-18th century – instead, the figures are derived from error rates in statistically significant numbers of drives running for weeks or months at a time, not the devices’ average lifespan in the field.
Correspondingly, studies have demonstrated that MTBFs typically promise much lower failure rates than actually occur in real-world performance. In 2007, researchers at Carnegie Mellon University investigated a sample of 100,000 HDDs with manufacturer-provided MTBF ranges of one million to 1.5 million hours. This translates to an annual failure rate (AFR) of 0.88 per cent, but their study found that AFRs in the field “typically exceed one per cent, with two to four per cent common and up to 13 per cent observed in some systems”.
Contemporaneous research from Google resulted in similar findings: from a sample of 100,000 hard drives with a 300,000-hour MTBF (an AFR of 2.92 per cent), the real-world AFR topped 8.6 per cent by the devices’ third year of use.
Note that manufacturers aren’t, reassuringly, turning a blind eye to this discrepancy. Both Western Digital have phased out using the metric for their HDDs in recent years, for example.
So with MTBFs proven to be an unreliable indicator of hard drive health, how else can we predict the end of a storage device’s lifespan? What’s your experience. Let us know in the comments below.