Overclock.net › Forums › Specialty Builds › Servers › ZFS storage advice
New Posts  All Forums:Forum Nav:

ZFS storage advice

post #1 of 13
Thread Starter 
Hello,

I am in the process of a storage build-out for my home cluster and picked up 92 (ninety-two) 36GB 10k RPM 2.5" SAS hard drives on ebay (hell of a deal, $308 shipped for all 92, they come with intel / mixed brand hot swap trays). I realize these are small drives but I am adding *fast* storage to my cluster, I already have a pool of ten 2TB drives which gives me plenty of 'bulk' storage. This build-out is to serve up iSCSI LUNs to other systems running XenServer hypervisor. My question is how to configure these new 2.5" drives in a new zfs pool. Here's what I have and what I'm planning on buying within the next few months:

Have (old gear):

Supermicro SC846TQ chassis, 4U, 24x3.5" hot-swap, redundant 900W PSU.
Supermicro X7DB8+ motherboard
2x quad-core Intel Xeon L5420 low-voltage (50W TDP) CPUs
24GB (12x 2GB) DDR2 FB-DIMM PC2-5300F fully buffered ECC RAM
3x Supermicro SAT2-MV8-AOC 8-port SATA II PCI-X cards (connected to the 24-drive backplane of the chassis)

Have (new gear):
LSI SAS9201-16e 16-port 6.0Gbps SAS external PCI-E 2.0 x8 HBA card
1x HP MSA70 25-bay 2.5" SAS enclosure w/ redundant 575W PSU
4x 2-meter SFF-8088 to SFF-8088 external SAS cables

Awaiting delivery from UPS:
92 x 36GB 10k RPM 2.5" SAS HDD, mix of mostly Fujitsu and some Seagate drives.

Planning on purchasing:
3x HP MSA70
also more hard drives, larger capacity, to replace failed drives.


What I'm considering as far as RAID schemes go is something like this:

Favorite - 4 vdevs, 6 drives each (24 total), RAIDZ2, plus hot spare, per MSA70 unit.
Alternative #1 - 7 Vdevs, 10 drives each, RAIDZ2, plus 5 hot spares over three MSA70 units
Alternative #2 - 12 mirrors plus hot spare per MSA70 unit.
Alternative #3 - SUGGEST SOMETHING!

My goals in order of priority are:
1. Safety / data integrity
2. IOPS
3. Random read/write speed
4. Space efficiency.

I'll probably back up the 'fast' vm storage pool to my bulk storage pool on a regular basis like daily snapshots or something.
Workstation
(14 items)
 
  
CPUMotherboardGraphicsRAM
Core i7-920 D0 EVGA X58 SLI Micro BFG GeForce GTS 250 OC 1GB 24GB (6x4GB) G.Skill Sniper SR2 
Hard DriveOptical DriveOSMonitor
2x WD Raptor 74GB 10K 8MB (RAID0) LG BD/HD-DVD/DVD-RW Combo Windows 7 Ultimate x64 (RTM) Gateway FHD2400 1920x1200 2ms 
MonitorKeyboardPowerCase
Dell UltraSharp 3008WFP 2560x1600 8ms Deck Legend Frost XFX 850W Modular 80PLUS Silver Cooler Master HAF 932 
MouseMouse Pad
Logitech G9 12"x12"x3/8" Polyethelyne 
  hide details  
Reply
Workstation
(14 items)
 
  
CPUMotherboardGraphicsRAM
Core i7-920 D0 EVGA X58 SLI Micro BFG GeForce GTS 250 OC 1GB 24GB (6x4GB) G.Skill Sniper SR2 
Hard DriveOptical DriveOSMonitor
2x WD Raptor 74GB 10K 8MB (RAID0) LG BD/HD-DVD/DVD-RW Combo Windows 7 Ultimate x64 (RTM) Gateway FHD2400 1920x1200 2ms 
MonitorKeyboardPowerCase
Dell UltraSharp 3008WFP 2560x1600 8ms Deck Legend Frost XFX 850W Modular 80PLUS Silver Cooler Master HAF 932 
MouseMouse Pad
Logitech G9 12"x12"x3/8" Polyethelyne 
  hide details  
Reply
post #2 of 13
Quote:
Originally Posted by cathode View Post

Hello,

I am in the process of a storage build-out for my home cluster and picked up 92 (ninety-two) 36GB 10k RPM 2.5" SAS hard drives on ebay (hell of a deal, $308 shipped for all 92, they come with intel / mixed brand hot swap trays). I realize these are small drives but I am adding *fast* storage to my cluster, I already have a pool of ten 2TB drives which gives me plenty of 'bulk' storage. This build-out is to serve up iSCSI LUNs to other systems running XenServer hypervisor. My question is how to configure these new 2.5" drives in a new zfs pool. Here's what I have and what I'm planning on buying within the next few months:

Have (old gear):

Supermicro SC846TQ chassis, 4U, 24x3.5" hot-swap, redundant 900W PSU.
Supermicro X7DB8+ motherboard
2x quad-core Intel Xeon L5420 low-voltage (50W TDP) CPUs
24GB (12x 2GB) DDR2 FB-DIMM PC2-5300F fully buffered ECC RAM
3x Supermicro SAT2-MV8-AOC 8-port SATA II PCI-X cards (connected to the 24-drive backplane of the chassis)

Have (new gear):
LSI SAS9201-16e 16-port 6.0Gbps SAS external PCI-E 2.0 x8 HBA card
1x HP MSA70 25-bay 2.5" SAS enclosure w/ redundant 575W PSU
4x 2-meter SFF-8088 to SFF-8088 external SAS cables

Awaiting delivery from UPS:
92 x 36GB 10k RPM 2.5" SAS HDD, mix of mostly Fujitsu and some Seagate drives.

Planning on purchasing:
3x HP MSA70
also more hard drives, larger capacity, to replace failed drives.


What I'm considering as far as RAID schemes go is something like this:

Favorite - 4 vdevs, 6 drives each (24 total), RAIDZ2, plus hot spare, per MSA70 unit.
Alternative #1 - 7 Vdevs, 10 drives each, RAIDZ2, plus 5 hot spares over three MSA70 units
Alternative #2 - 12 mirrors plus hot spare per MSA70 unit.
Alternative #3 - SUGGEST SOMETHING!

My goals in order of priority are:
1. Safety / data integrity
2. IOPS
3. Random read/write speed
4. Space efficiency.

I'll probably back up the 'fast' vm storage pool to my bulk storage pool on a regular basis like daily snapshots or something.

Well, I'm not a ZFS guy so my opinions are based on hardware RAID. For maximum performance for iSCSI storage for VMs, you want to use a RAID-10 or similar RAID setup. RAID 6 (or RAID Z2) has a nasty write penalty which should be avoided for VM storage.
The second point I have, based on hardware RAID, is that for each sub-array in a nested RAID, you should have 1 hot space. For example, if you have a RAID 50 with 9 drives (3 RAID 5 each with 3 drives, stripped across each other), you should have a total of 3 hot spaces. I don't apply this recommendation to RAID 10 though, since that wouldn't make sense. In a large array, you want more than 1 hot spare.

RAID 10 will give you the best protection, as well as the best performance. It's just at a higher cost (half your raw storage is usable storage). IOPS will be the same regardless of your chosen pooling method, since IOPS are all about the number of drives in an array. 10K SAS drives should net about 125 IOPS per drive (shooting low, but can be as high as 150 each). Also, IOPS and Random R/W speeds are pretty much the exact same thing.
post #3 of 13
Quote:
Originally Posted by tycoonbob View Post

Well, I'm not a ZFS guy so my opinions are based on hardware RAID. For maximum performance for iSCSI storage for VMs, you want to use a RAID-10 or similar RAID setup. RAID 6 (or RAID Z2) has a nasty write penalty which should be avoided for VM storage.
The second point I have, based on hardware RAID, is that for each sub-array in a nested RAID, you should have 1 hot space. For example, if you have a RAID 50 with 9 drives (3 RAID 5 each with 3 drives, stripped across each other), you should have a total of 3 hot spaces. I don't apply this recommendation to RAID 10 though, since that wouldn't make sense. In a large array, you want more than 1 hot spare.

RAID 10 will give you the best protection, as well as the best performance. It's just at a higher cost (half your raw storage is usable storage). IOPS will be the same regardless of your chosen pooling method, since IOPS are all about the number of drives in an array. 10K SAS drives should net about 125 IOPS per drive (shooting low, but can be as high as 150 each). Also, IOPS and Random R/W speeds are pretty much the exact same thing.

RAID-Z2 does not suffer the same write-hole penalty that standard RAID-6 does.

To the OP:

I would suggest using a single zpool, using multiple RAID-Z2 vdevs. No more than 10 drives per vdev. RAID-Z2 would give you arguably better protection than RAID 10, given that RAID-Z2 can handle any two drives failing, whereas RAID 10 can survive 2 drives failing as long as they are separate mirrors.

A note about using multiple vdevs however: More vdevs (generally) means higher read performance. Writes are a bit tricky though. Your write bandwidth increases with more vdevs, but since a zpool stripes data along vdevs, the latency to write data across vdevs increases as more vdevs are added.
    
CPUMotherboardGraphicsRAM
Core i7 970 @ 4.0 GHz 1.22 Vcore Asus Rampage II Gene GTX 260 216SP G.SKILL PI 3x2gb DDR3 1600 @ 7-8-7-24 
Hard DriveOSMonitorPower
2x 500gb Seagates RAID 0, 1x 500gb non-RAID Windows 7 Professional x64 ASUS 24'' VH242H / Spectre 24'' WS Corsair 750TX 
Case
Corsair 300R 
  hide details  
Reply
    
CPUMotherboardGraphicsRAM
Core i7 970 @ 4.0 GHz 1.22 Vcore Asus Rampage II Gene GTX 260 216SP G.SKILL PI 3x2gb DDR3 1600 @ 7-8-7-24 
Hard DriveOSMonitorPower
2x 500gb Seagates RAID 0, 1x 500gb non-RAID Windows 7 Professional x64 ASUS 24'' VH242H / Spectre 24'' WS Corsair 750TX 
Case
Corsair 300R 
  hide details  
Reply
post #4 of 13
Quote:
Originally Posted by TurboTurtle View Post

A note about using multiple vdevs however: More vdevs (generally) means higher read performance. Writes are a bit tricky though. Your write bandwidth increases with more vdevs, but since a zpool stripes data along vdevs, the latency to write data across vdevs increases as more vdevs are added.

Are you sure about this? I would have thought there would be higher latency with raidz2 since the data has to be split across all the disks in the raidz, and parity has to be calculated. Would be interested to read the source of your info.
post #5 of 13
@tycoonbob

On a hardware RAID controller, isn't the write penalty for RAID 5 & 6 somewhat mitigated by the onboard cache? Doesn't the cacheing algorithm organise the writes so that they can be executed sequentially?
Ryzen
(12 items)
 
  
CPUMotherboardGraphicsRAM
Ryzen 7 1700 Gigabyte GA-AB350M Gaming 3 Palit GT-430 Corsair Vengeance LPX CMK16GX4M2B3000C15 
Hard DriveCoolingOSMonitor
Samsung 850 EVO AMD Wraith Spire Linux Mint 18.x Dell UltraSharp U2414H 
KeyboardPowerCaseMouse
Dell SK-8185 Thermaltake ToughPower 850W Lian-Li PC-A04B Logitech Trackman Wheel 
  hide details  
Reply
Ryzen
(12 items)
 
  
CPUMotherboardGraphicsRAM
Ryzen 7 1700 Gigabyte GA-AB350M Gaming 3 Palit GT-430 Corsair Vengeance LPX CMK16GX4M2B3000C15 
Hard DriveCoolingOSMonitor
Samsung 850 EVO AMD Wraith Spire Linux Mint 18.x Dell UltraSharp U2414H 
KeyboardPowerCaseMouse
Dell SK-8185 Thermaltake ToughPower 850W Lian-Li PC-A04B Logitech Trackman Wheel 
  hide details  
Reply
post #6 of 13
Quote:
Originally Posted by CaptainBlame View Post

Are you sure about this? I would have thought there would be higher latency with raidz2 since the data has to be split across all the disks in the raidz, and parity has to be calculated. Would be interested to read the source of your info.

Was a while back I read a fairly detailed report on it, I'll see if I can turn it up.

In the meantime, I'm not sure I understand your query. Your statement about higher latency is what I said as well (for writes). Reads would perform the same as any read performance increase for other versions of RAID. You don't have to calculate parity when reading data, so there is no increased overhead. As the same with RAID 0 or any other striping technology, the more stripes the faster your reads are.

Again, write bandwidth increases with more vdevs since you have more resources available to do the actual write operations, but as you point out, the fact that there are no more parity calculations going on your write performance does not increase in the same fashion.
    
CPUMotherboardGraphicsRAM
Core i7 970 @ 4.0 GHz 1.22 Vcore Asus Rampage II Gene GTX 260 216SP G.SKILL PI 3x2gb DDR3 1600 @ 7-8-7-24 
Hard DriveOSMonitorPower
2x 500gb Seagates RAID 0, 1x 500gb non-RAID Windows 7 Professional x64 ASUS 24'' VH242H / Spectre 24'' WS Corsair 750TX 
Case
Corsair 300R 
  hide details  
Reply
    
CPUMotherboardGraphicsRAM
Core i7 970 @ 4.0 GHz 1.22 Vcore Asus Rampage II Gene GTX 260 216SP G.SKILL PI 3x2gb DDR3 1600 @ 7-8-7-24 
Hard DriveOSMonitorPower
2x 500gb Seagates RAID 0, 1x 500gb non-RAID Windows 7 Professional x64 ASUS 24'' VH242H / Spectre 24'' WS Corsair 750TX 
Case
Corsair 300R 
  hide details  
Reply
post #7 of 13
I am talking about writes.

My understanding is the exact opposite of what you have written, a single 10 disk raidz2 vdev is going to have higher bandwidth than a single 2 disk mirrored vdev. However iops of a raidz2 vdev (regardless of number of disks) is that of a single disk as is the case with a mirrored vdev, and raidz has higher latency than mirrored vdev due to splitting the data across the array and calculating parity.

I think adding multiple vdevs (of any kind) to a pool is going to increase the bandwidth and iops due to the stripe. The latency is still going to be affected by the time it takes to write to a single vdev, in which case mirror should be faster. If you want lower latency use SSD cache.
Edited by CaptainBlame - 5/27/13 at 5:07pm
post #8 of 13
Quote:
Originally Posted by parityboy View Post

@tycoonbob

On a hardware RAID controller, isn't the write penalty for RAID 5 & 6 somewhat mitigated by the onboard cache? Doesn't the cacheing algorithm organise the writes so that they can be executed sequentially?

This of course depends on the controller, but most modern controllers will do this. Typically, RAID 5 should be avoided and RAID 6 should only be used for data that is read a lot, but not wrote/modified too often. RAID 10 is the defacto for performance increasing RAID arrays used in enterprise (SANs and servers) for databases, VMs, etc.
post #9 of 13
@tycoonbob

Cheers for that. smile.gif Here's another question: with more and more storage arrays using SSDs, apart from RAID controllers having to be fast enough to keep up with the IOPS capacity of the SSDs themselves, do they use different caching algorithms for handling SSDs? For example, using the cache to organise random writes into sequential writes works well for hard drives, but for SSDs this is redundant. Do they switch algorithms to deal with SSDs?
Ryzen
(12 items)
 
  
CPUMotherboardGraphicsRAM
Ryzen 7 1700 Gigabyte GA-AB350M Gaming 3 Palit GT-430 Corsair Vengeance LPX CMK16GX4M2B3000C15 
Hard DriveCoolingOSMonitor
Samsung 850 EVO AMD Wraith Spire Linux Mint 18.x Dell UltraSharp U2414H 
KeyboardPowerCaseMouse
Dell SK-8185 Thermaltake ToughPower 850W Lian-Li PC-A04B Logitech Trackman Wheel 
  hide details  
Reply
Ryzen
(12 items)
 
  
CPUMotherboardGraphicsRAM
Ryzen 7 1700 Gigabyte GA-AB350M Gaming 3 Palit GT-430 Corsair Vengeance LPX CMK16GX4M2B3000C15 
Hard DriveCoolingOSMonitor
Samsung 850 EVO AMD Wraith Spire Linux Mint 18.x Dell UltraSharp U2414H 
KeyboardPowerCaseMouse
Dell SK-8185 Thermaltake ToughPower 850W Lian-Li PC-A04B Logitech Trackman Wheel 
  hide details  
Reply
post #10 of 13
Quote:
Originally Posted by parityboy View Post

@tycoonbob

Cheers for that. smile.gif Here's another question: with more and more storage arrays using SSDs, apart from RAID controllers having to be fast enough to keep up with the IOPS capacity of the SSDs themselves, do they use different caching algorithms for handling SSDs? For example, using the cache to organise random writes into sequential writes works well for hard drives, but for SSDs this is redundant. Do they switch algorithms to deal with SSDs?

While I do not know for sure, I wouldn't think that they needed to use a different algorithm. Writing to RAM is faster than writing to SSDs, without a doubt. I recall seeing a tests where 32GB of DDR3-1600 RAM in a RAMdisk was pushing IOPs of over 5 million, yes...5,000,000 IOPS!

I have only worked with one SAN before that had SSDs (Dell Equallogic PS6000 with 16 50GB SSDs in a single RAID 50 volume with 1 LUN on top of it for a 2-node Server 2012 Hyper-V Cluster), but I do know that the controllers were smart enough to see that they are SSDs. That makes me think that on the back end of the software/firmware, it could have done something different to manage the data on that array/group, if it needed to.

A lot of LSI based controllers have something called CacheCade which uses an SSD as a Cache in front of the array to increase IOPS. This SSD is past the RAM cache on the controller though, and is used to store a copy of the most accessed blocks of data. So it depends what/how you are using that SSD in an array.

Also keep in mind that most controllers have no problems keeping up with the IOPS of a SSD. A typical consumer SSD (on the low end) pushes 60,000 IOPS. 60,000 IOPS is an approximate total of 234MB of data and a SATA II interface can push 300MBps. So once you get to higher performance SSDs, you want to make sure you have a SATA III interface which is double that of SATA III, at 600MBps allowing for a single drive to move as much as ~150,000 IOPS from a single drive. So if you had 3 drives in a RAID 5 pushing 150,000 IOPS...each drive can max out their IOPS, but the array will have over 400,000 IOPS. So until we see SSDs breaking that 150,000 IOPS, then SATA III will do just fine. Now when you get into mSAS controllers, they are quad channel SATA ports. That means that the SAS 2.0 interface can manage 4 concurrent 600MBps connections allowing for 4 150,000 IOPS capable drives on that interface, while that interface can manage to passthrough 600,000 IOPS. That kind of performance is not heard of too often, at least not yet. The newest revision of SAS allows for 1,200Mbps and there are SSDs with that interface that can read up to 145,000 IOPS.

***My math is approximate values.
Edited by tycoonbob - 5/30/13 at 12:47pm
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Servers
Overclock.net › Forums › Specialty Builds › Servers › ZFS storage advice