With parity RAID levels, there are the following factors to consider:
1) the number of drives in the array,
2) the size of each drive, which in turn determines
3) the size of the rebuild window, therefore the size of the period of vulnerability to complete array failure
In the case of RAID 5 for example, many people say they won't go beyond six or seven drives before they start thinking about RAID 6, or at least RAID 5 + hot spare. But, what size drives are they talking about? 100GB? 250GB? 500GB?
What happens when you get to using 1TB drives, or 2TB drives? The rebuild times are far longer, so while the chance of a drive failing is probably the same, regardless of the capacity of the drive, the opportunity for another drive to fail during the (now lengthened) rebuild period is far greater.
Additionally, while increased areal density will result in faster writes (and therefore shorten the rebuild time), that same increased areal density results in the chance of a URE being noticeably higher.
OK, fine so we use RAID 6. Again the focus here is being able to use arrays with larger numbers of drives, because the chance of three drives failing (electrically or mechanically) during operation is far smaller. But what about UREs? We're now seeing 3TB drives on the market and 4TB will be here by the end of 2011. We may see 6TB by the end of 2014.
I think things like triple parity (RAID-Z3 for example) will help but more importantly, the kind of data checksumming which ZFS uses will prove to be far more valuable, and other filesystems will have to employ such mechanisms if they are to remain of use to those of us with huge storage requirements.
Of course, in the mean time, RAID scrubbing will help to mitigate some of the corruption issues.