When RAID was developed about 20 years ago at the University of California in Berkeley, storing large amounts of data was still a very tricky task. While the 14-inch SLEDs (Single Large Expensive Disks), quite common in computing centers, offer large capacities (two to three GByte) and good data security, they were also extremely expensive. As an alternative for storing massive data amounts, the relatively new - and much cheaper - drives with a 5.25-inch form factor emerged. But despite such advantages, their use created some other problems.
First, saving data on many small drives instead of one large harddrive threatened to generate administrative problems. Organizing drives with the JBOD approach ("just a bunch of disks"), meaning that users bundle independent drives, made the process of finding free storage capacity and storing files fairly complicated. Secondly, the reliability of the "Mini-HDDs" per se was already lower than that of the SLEDs. Moreover, the statistical probability of data loss while saving data on several drives - instead of one single drive - increased.
But the Berkeley Ph.D. students David Patterson, Garth Gibson and Randy Katz developed a solution for all these problems: They proposed a combination of several smaller drives to form one compound that was failsafe and was equipped with error detection and an error correction mechanism. In a study they released in June 1988, the three Ph.D. students assigned their new technology a fitting name. Marking the birth of RAID, their paper was titled "A Case for Redundant Arrays of Inexpensive Disks (RAID)".
The Advantages of RAID
In their original paper, Patterson, Gibson and Katz proposed a total of five different methods that would allow individual drives to be grouped together to one array. They numbered them from RAID levels 1 to 5 - a terminology that still stands to this day. But this somewhat unfortunate terminology frequently leads to confusion: Even though the term "level" is used in connection with this technology, RAID is not a procedure that is set up in steps. Instead, it merges technologies that are completely independent from one another.
In addition to the primary goal - that is, to have a cost-efficient technology that allows for high-capacity storage and possesses sufficient protection against outages - the proposed methods show other advantages. First, the RAID compound presents itself to the user as one single logic volume. Therefore, administration is just as simple as it is in one single drive. In addition, many RAID levels offer speed advantages in comparison to individual drives. The reason is that they simultaneously access harddrives. But this advantage can only materialize completely when a sufficient number of channels are available for addressing the drives without blocking.
The combination of these various advantages not only led to the rapid emergence of RAID-based storage devices, especially in servers: Additionally, RAID versions - optimized to suit particular applications and enhance specific functions - were developed. Today, the bandwidth of available levels spans from RAID 0 to RAID 7, in addition to combined technologies such as RAID 0+1 or 50.
Software RAID vs. Hardware RAID
The somewhat confusing terms "hardware RAID" and "software RAID" - after all both versions require software in order to operate - refer to the type of implementation.
In software RAID, software - running on the host's CPU - handles the control of the harddrive compound. Frequently, the operating systems already contain such components. For example, Windows NT can handle RAID 0 as well as RAID 1 and 5, though it can only work with the latter in the server version. Generally, Linux administers arrays of the levels 0,1,4 and 5, meaning that software RAID is typically the most cost-efficient and the simplest solution. In addition, it can be adjusted relatively quickly to increased requirements, for example, through a processor upgrade at the host. On the other hand, it puts substantial load on the CPU and typically operates depending on the platform and the operating system. In addition, there are typically just one or two connects for accessing drives, a fact that limits the possible parallelization of harddrive accesses and, consequently, also the performance.
In hardware RAID in contrast, an individual controller is responsible for accessing the arrays. That relieves the host CPU and also leads to greater performance. In addition, RAID controllers connect the drives via several channels, which allows for simultaneous access to drives and, therefore, higher transfer rates. On the downside, that also means higher costs. Hardware RAID operates independently from platforms. However, they also require software for administration; this software has to be tailored to a certain type of operating system.
Until several years ago, RAID controllers granted little to no flexibility regarding harddrive requirements. SCSI as an interface was a must, and all used harddrives had to have identical capacities. In many cases, only harddrives from the same build series could be used in the array.
In the meantime, however, the requirements on the harddrives used are much less rigid. Simply for performance reasons, SCSI or its designated successor, Fibre Channel, are employed in the server segment. For desktop PCs, however, the industry offers controllers with the Ultra-ATA/66 and Ultra-ATA/100 interface.
In addition, modern controllers and software RAID solutions offer mixed compounds, consisting of harddrives with different capacities. The flipside here: It is not possible to use the entire net capacity for the array. PAID procedures presume that harddrives are equal in size. In a mixed configuration scenario, this leads to the following outcome: Each drive is only used to the maximum capacity limit of the smallest installed harddrive. Here's a specific example: In a scenario combining a 20-GByte drive with two 30-GByte disks, a capacity of just three times 20 GByte is available for the array.