Steadfast Blog

(Almost) Everything You Need to Know About RAID

Posted in Servers / Server Hardware on March 25th, 2010

Now that all of our dedicated servers come with two drives, with a recommendation of hardware RAID 1 (mirroring).  We have been asked by many customers about RAID, what it is and how it affects them so I figured I'd put this blog post together and share what information I can. If you still have further questions, don't hesitate to contact us.

What is RAID?

RAID stands for Redundant Array of Inexpensive Disks. That means that RAID is a way of logically putting multiple disks together into a single array. The idea then is that these disks working together will have the speed and/or reliability of a more expensive disk. Now, the exact speed and reliability you'll achieve from RAID depends on the type of RAID you're using.

Hard drives have a limited speed, due to physical limitations and also due to the mechanical nature have a relatively high failure rate. RAID is meant to help alleviate both of these issues, depending on the RAID type you use. Typically, a hard drive has a 5% chance of failure in the first year of operation. This has been proven by multiple reports and type/brand does not really have much effect (though newer SSDs are significantly more reliable) on this number.

What are the types of RAID?

RAID 0 (Striping) - RAID 0 is taking any number of disks and striping data across all of them. This will greatly increase speeds, as you're reading and writing from multiple disks at a time. An individual file can then use the speed and capacity of all the drives of the array. The downside to RAID 0 though is that it is NOT redundant, the loss of any individual disk will cause complete data loss. I would not recommend ever using RAID 0 in a server environment. You can use it for cache or other purposes where speed is important and reliability/data loss does not matter at all, but it should not be used for anything other than that. As an example, with the 5% annual failure rate of drives, if you have a 6 disk RAID 0 array you've increased your risk of data loss to nearly 27%.

RAID 1 (Mirroring) - RAID 1 is generally used with a pair of disks, though could be done with more, and would identically mirror/copy the data equally across all the drives in the array. The point of RAID 1 is primarily for redundancy, as you can completely lose a drive, but still stay up and running off the additional drive(s). You can then rebuild the array to a new drive off of the other drive with little to no downtime. RAID 1 also gives you the additional benefit of increased read performance as data can be read off any of the drives in the array. The downsides are that you will have slightly higher write latency, since the data needs to be written to all the drives in the array, and you'll only have the available capacity of a single drive.

RAID 5/6 (Striping + Distributed Parity) - RAID 5 requires the use of at least 3 drives (RAID 6 requires at least 4 drives) and will take the idea of RAID 0, striping the data across multiple drives to increase performance, but also adds the aspect of redundancy by distributing parity information across the disks. I will not go into a complex discussion of how this works, but with RAID 5 you can lose one disk and with RAID 6 you can lose two disks and still maintain your operations and data. RAID 5 and 6 will get you significantly improved read performance, but write performance is largely dependent on the RAID controller used, due to the need to calculate the parity data and write it across all the disks. RAID 5 and RAID 6 are often a very good option for a standard web server, where most of the transactions are reads, and get you a good value for your money, as you only need to purchase one additional drive for RAID 5 (or two additional drives for RAID 6). I would not recommend using RAID 5 or RAID 6 for a heavy write environment, such as a database server, as you'll likely hurt your overall performance. In addition, losing a drive in RAID 5 will cause significantly worse performance, as data will need to be calculated out of the parity information and rebuild times are also significantly longer.

RAID 10 (Mirroring + Striping) - RAID 10 requires at least 4 drives and is a combination of RAID 1 (mirroring) and RAID 0 (striping), getting you both increased speed and redundancy. This is often the recommended RAID level if you're looking for speed and still require redundancy. In a four drive configuration, two mirrored drives hold half of the striped data and another two mirror the other half of the data. This means you can lose any single drive, and then possibly even a 2nd drive without losing any data. Just like RAID 1, you'll only have the capacity of half the drives, but you will see improved read and write performance and also have the fast rebuild time of RAID 1.

When should I use RAID?

RAID is extremely useful if reliability and data redundancy are important to you. Even if you take backups, you will need to take the time to restore those backups and those backups could be hours or days old, resulting in data loss. RAID allows you to survive a drive loss without data loss and in many cases without any downtime.

RAID is also useful if you are having disk IO issues, where applications are waiting on the disk to perform tasks. Going with RAID will provide you additional throughput by allowing you to read and write data from multiple drives instead of a single drive. Additionally, if you go with hardware RAID, the hardware RAID card will include additional memory to be used as cache, reducing the strain put on the physical hardware and increase overall performance.

What type of RAID should I use?

No RAID - When you can survive several hours of downtime and/or data loss due to needing to restore your site from backups.

RAID 0 - Never, unless the data has no value to you.

RAID 1 - If you are looking to inexpensively gain additional data redundancy and/or read speeds. A good base RAID level for those looking to achieve high uptime.

RAID 5/6 - Web servers and high read environments. Generally will perform worse than RAID 1 on writes, so if your environment is write heavy or you don't need more space than is allowed on a disk with RAID 1, RAID 1 is likely your most effective option.

RAID 10 - A good all around solution. It will also cost more than all the other options. It offers you additional read and write speed as well as a good level of overall redundancy.

Should I chose hardware RAID or Linux software RAID?

Linux software RAID is the cheaper option, as it does not require a separate hardware RAID card, but it does have some drawbacks. With software RAID, you are not going to get the additional benefit of the cache or dedicated processor on a hardware RAID card, these processes will instead take resources on the processor for the system. With simpler forms of RAID, such as RAID 0 and RAID 1 this is often not an issue, as those calculations are simple, but with RAID 5 and RAID 6 you can see severe performance degradation from software RAID. If you are simply looking to gain additional redundancy with RAID 1, software RAID is a good, cheap option, but if you're looking to also gain significant performance I would highly recommend spending the money on hardware RAID.

We generally do not recommend Windows software RAID as it can often cause various issues, etc. We highly recommend going with hardware RAID or the moterboards host RAID on a Windows based system.

What does RAID cost me?

Software RAID does not add any cost for a RAID controller and is quite easy to calculate the cost of as you are just purchasing additional drives. All of our standard dedicated servers come with at least two drives, meaning there is NO cost for software RAID1, and is highly recommended.  It is highly recommended that drives in a RAID array be of the same type and size. With RAID 0 or RAID 1 you'd need at least two drives, so you would need to purchase one additional drive in most cases. With RAID 5 you'll need at least three drives, so two additional drives, and with RAID 6 or 10 you'd need at least four total drives. To gain additional performance, redundancy, or disk space, you can add more disks to the arrays as well.

With hardware RAID, you have the additional cost of a RAID controller. For doing a two disk array, such as RAID 0 or RAID 1, you would be fine using a $20 per month two port SATA RAID card. This card is a good inexpensive way to get the advantages of a RAID array, but will not allow you to add additional arrays or hot spare drives. We also offer a $50 per month 4/8 port hardware SATA/SAS RAID card. This card includes more memory and a faster processor, to support the more complicated RAID 5/6/10, in addition to more ports. You would then be able to support one array of 4-8 disks or multiple smaller arrays on the single RAID card. We also offer a storage array server that supports up to 16 disks in a RAID array, giving you a myriad of options/configurations. The same disk requirements and costs as are outlined in the software RAID cost section would apply to hardware RAID as well.

What does RAID not do?

RAID does not equate to 100% uptime. There is still a risk of a RAID card failure, though that is significantly lower than a drive failure and there are still software and other hardware causes for system downtime.

RAID does not replace backups. RAID can protect you against a drive failure, but it will not protect you from data corruption, human error, or security issues. There are plenty of reasons other than a drive failure that you should keep backups, so do not take RAID as a replacement for backups.  With server backup options starting at just $9.95/mo , you have no excuse to not backup your valuable data.

RAID does not allow you to dynamically increase the size of the array. If you need more disk space you cannot simply add another drive to the array, you would need to start from scratch, rebuilding/reformatting the array.

comments powered by Disqus