blog dds: 2007-03-09 — Software Rejuvenation is Counterproductive

In the February issue of the Computer magazine Grottke and Trivedi propose four strategies for fighting bugs that are difficult to detect and reproduce. Retrying an operation and replicating software are indeed time-honored and practical solutions. When coupled with appropriate logging, they may allow an application to continue functioning, while also alerting its maintainers that something is amiss. On the other hand, the proposal to restart applications at regular intervals (rejuvenation as the authors call it), doesn't allow us to find latent bugs, sweeping them instead under the carpet. This lowers the bar on the quality we expect from software, and will doubtless result in a higher density of bugs and increasingly complicated failure modes.

As an example, consider how complaints and jokes about the Windows blue screen of death errors pressured Microsoft to hunt down bugs and improve the quality of Windows to the extent that nowadays they seldom crash. If early versions of Windows had a nightly reboot as a "feature", many of these bugs would survive undetected for years. With a growing number of bugs, users, and software and hardware complexity these bugs would eventually surface, only now in situations that would be a lot more difficult to analyze.

Comments Post Toot! Tweet Share