Sunday, September 26, 2010

Example with NetApp for realistic expectations of raw and usable capacity

Warning: I’m not a SAN expert but as I’ve gotten more opportunities to work in datacenter projects, I’m beginning to see more real world SAN implementations and while this doesn’t provide a complete breakdown of what to consider while calculating raw and usable storage, I hope this will at the very least provide some useful information to professionals out there looking for some real world numbers when provisioning SAN storage.

Configuration

Brand: NetApp

Model: FAS2020

Version: 7.2.6.1

RAID: RAID_DP (Double Parity)

RAID Size: 16

Number of Disks: 6

Disk Size: 300GB SAS

Actual usual disk size: 266GB

Total Aggregate Capacity: 908GB

As shown with the information listed above, configuring a FAS2020 with 6 x 300GB SAS drives realistically yields only 908GB for the aggregate. Working out the numbers we can see that:

Specifications on paper: 300GB x 6 disks = 1.8TB

Actual drive capacity: 266 x 6 disks = 1.598TB

Actual useable *aggregate* capacity after RAID_DP: 0.908GB

If we divide the numbers to get the amount of storage space you lose from the overhead such as RAID, we’re actually losing approximately: 51% of drive space. This 51% also does not include the spare disk you’ll need per controller (you need a disk for each controller so if you have an active/active setup, you’ll need 2 disks for each controller). Also don’t forget that the software for the controller also sits in aggregate 0 on the NetApp which means that will take up additional space. As of the year 2009, the NetApp technician told me that a minimum of 10GB is required for the root volume and 20GB is recommended for the FAS2020.

Lastly, as new volumes are created for LUNs, your volume needs more space than the actual LUN and the reason for this is because you will need extra space if you decide to use snapshots. Best practice as told by the NetApp engineer is that you should have 2x + delta (x being the size of the LUN) extra space because it covers the situation if a snapshot is taken of the LUN (let's say LUN was completely full), deleted all information on the LUN, filled it back up with different information but because the 2x + delta was followed, this means that your snapshot can hold all the information prior to deleting the original information. With that being said, as most companies don’t like to lose so much storage, another good practice is to use 1x + delta (x being the size of the LUN).

There are times when thinking about all the reasons that contribute to lost storage in exchange for redundancy often scares me so I find that it’s ever so much more important to communicate to customers all the variables and set their expectations appropriately.

-------------------------------------------------------------------------------------------------------------------------------------------------------

The following is another example similar to the configuration above but with 1TB drives:

Configuration

Brand: NetApp

Model: FAS2020

Version: 7.2.6.1

RAID: RAID_DP (Double Parity)

RAID Size: 6

Number of Disks: 6

Disk Size: 1TB SAS

Actual usual disk size: 828GB

Total Aggregate Capacity: 2.76TB

As shown with the information listed above, configuring a FAS2020 with 6 x 1TB SAS drives realistically yields only 2.76GB for the aggregate. Working out the numbers we can see that:

Specifications on paper: 1TB x 6 disks = 6TB

Actual drive capacity: 828 x 6 disks = 4.968TB

Actual useable *aggregate* capacity after RAID_DP: 2.76TB

If we divide the numbers to get the amount of storage space you lose from the overhead such as RAID, we’re actually losing approximately: 54% of drive space. As indicated in the example above, this 54% also does not include other variables that will contribute to more loss in storage space.

-------------------------------------------------------------------------------------------------------------------------------------------------------

Here’s another example similar to the first one but with 4 disks instead:

Configuration

Brand: NetApp

Model: FAS2020

Version: 7.2.6.1

RAID: RAID_DP (Double Parity)

RAID Size: 16

Number of Disks: 4

Disk Size: 300GB SAS

Actual usual disk size: 266GB

Total Aggregate Capacity: 454GB

As shown with the information listed above, configuring a FAS2020 with 4 x 300GB SAS drives realistically yields only 454GB for the aggregate. Working out the numbers we can see that:

Specifications on paper: 300GB x 4 disks = 1.2TB

Actual drive capacity: 266 x 4 disks = 1.02TB

Actual useable *aggregate* capacity after RAID_DP: 454GB

If we divide the numbers to get the amount of storage space you lose from the overhead such as RAID, we’re actually losing approximately: 63% of drive space. Again, this does not include other contributing factors that will decrease the amount of usable storage even more.

-------------------------------------------------------------------------------------------------------------------------------------------------------

Extras

Random notes I took during troubleshooting with NetApp engineer: There are ways to reclaim space such as snapshots for your aggregates, volumes, reducing factional reserve, reducing snapshot schedules’ frequency but they all contribute to reduced redundancy. Also, make sure a 1 LUN per volume mapping is followed in case a volume ever goes down, not all of your LUNs do. Lastly, make sure auto snap auto delete is turned on because if no space is left for snapshots, the NetApp will delete the old one and take the snapshot. If this was not turned on, the LUN will go offline if it fills up without space reservation.

My thoughts: Being as a consultant means we’re obligated to pass the truth about our knowledge to customers and while this may hard to digest for many clients, it’s important not to overlook why companies purchase SANs in the first place: because they want robust storage that provides redundancy, exceptional recovery time and performance and storage companies design their storage solutions with this as their number 1 priority. I’ve been fortunate enough to be at a training session delivered by Peter Henneberry from NetApp and it was quite the eye opener when he gave us real world statistics on how they can complete backups within the seconds or minutes rather than hours so while I can’t state all the benefits of a SAN, there are plenty of reasons.

I’m not much of storage consultant even though I’d like to get into it a bit more so please forgive any mistakes I have made in this post whether it’s calculations or information I have missed.

2 comments:

ENZO said...

Hi bro. You clarification about FAS2020 is very clear.Tks for your info. But how I can config FAS2020 to RAID-Double Parity to show HDD 4.9TB as you talk?Pls reply to me in lethanhluyen@gmail.com
Regards,
Luyen-Visco

Terence Luk said...

Hi Luyen,

The configuration example I have in this blog post is actually a single aggregate with all of the disks. This is not really the best practice because NetApp recommends to have a single aggregate for the root volume where the OS resides and a seperate aggregate for your data. The environment we had for this deployment was too small to burn 3 drives (minimum required for RAID-DP) and then another 2 drives for a data volume. Hope this helps.