Overclock.net › Forums › Software, Programming and Coding › Operating Systems › Linux, Unix › How to build a auto failover "cluster" ?
New Posts  All Forums:Forum Nav:

How to build a auto failover "cluster" ? - Page 2

post #11 of 15
@OP

Not sure if you've been here, but Linux HA might be a useful resource. Combining this with a virtualisation technology such as KVM or VMware might be a way out. smile.gif Which Linux distro do your machines run?

EDIT:

If you look at this link for Veritas VCS, you will see other names mentioned. Might be worth checking them out. smile.gif
Edited by parityboy - 12/15/11 at 5:15pm
Ryzen
(12 items)
 
  
CPUMotherboardGraphicsRAM
Ryzen 7 1700 Gigabyte GA-AB350M Gaming 3 Palit GT-430 Corsair Vengeance LPX CMK16GX4M2B3000C15 
Hard DriveCoolingOSMonitor
Samsung 850 EVO AMD Wraith Spire Linux Mint 18.x Dell UltraSharp U2414H 
KeyboardPowerCaseMouse
Apple Basic Keyboard Thermaltake ToughPower 850W Lian-Li PC-A04B Logitech Trackman Wheel 
  hide details  
Reply
Ryzen
(12 items)
 
  
CPUMotherboardGraphicsRAM
Ryzen 7 1700 Gigabyte GA-AB350M Gaming 3 Palit GT-430 Corsair Vengeance LPX CMK16GX4M2B3000C15 
Hard DriveCoolingOSMonitor
Samsung 850 EVO AMD Wraith Spire Linux Mint 18.x Dell UltraSharp U2414H 
KeyboardPowerCaseMouse
Apple Basic Keyboard Thermaltake ToughPower 850W Lian-Li PC-A04B Logitech Trackman Wheel 
  hide details  
Reply
post #12 of 15
Quote:
Originally Posted by lloyd mcclendon View Post

thanks for the links.. now i've got something to start with. they are linux machines of course. i figured the linux forum was enough, but i should have specified. smile.gif
Quote:
just buy Microsoft Clustering Server.

fixed. and ... eh nevermind. but if you know the overall story of how that product works, i'd love to hear it. smile.gif will also look into it. thanks

My bad, I just saw it on the front page and didn't even notice it was in the Linux forum. Stupid me.

I've use it for Exchange in the past. Basically, you spin up multiple identical servers with unique FQDNs and IP addresses and create a quasi virtual server through MSCS that has it's own unique FQDN and IP. MSCS controls which node is the active cluster. If the active node fails, MSCS on one of the other servers will see it and elevate another node to the active roll.
Skylake
(12 items)
 
New Faithful
(12 items)
 
Old Faithful
(13 items)
 
CPUMotherboardGraphicsRAM
Intel Core i7-6700k 4.7@1.328v Asus Z170-Deluxe MSI GTX 1070 Armor G.SKILL Ripjaws V 16GB (2 x 8GB) DDR4 3200 
Hard DriveOptical DriveCoolingOS
Samsung 950 Pro M.2 LG BH12 Blu-Ray Rewriter  Noctua NH-D15 Windows 10 Home 
MonitorKeyboardPowerCase
ASUS VG278H  Corsair Strafe RGB MX Silent Corsair HX750 Cooler Master 932 HAF 
CPUMotherboardGraphicsRAM
Athlon 64 x2 4200+ Gigabyte GA-K8N-SLi BFG Tech GeForce 7800 GT 2GB Crucial Value RAM 
Hard DriveOptical DriveOSMonitor
WD VelociRaptor 300GB LG BH12 Blu-Ray Rewriter Windows 7 Home Premium x64 Samsung 226BW 22" 
PowerCase
OCZ GameXStream 700W Thermaltake Xaser III VM3000A 
  hide details  
Reply
Skylake
(12 items)
 
New Faithful
(12 items)
 
Old Faithful
(13 items)
 
CPUMotherboardGraphicsRAM
Intel Core i7-6700k 4.7@1.328v Asus Z170-Deluxe MSI GTX 1070 Armor G.SKILL Ripjaws V 16GB (2 x 8GB) DDR4 3200 
Hard DriveOptical DriveCoolingOS
Samsung 950 Pro M.2 LG BH12 Blu-Ray Rewriter  Noctua NH-D15 Windows 10 Home 
MonitorKeyboardPowerCase
ASUS VG278H  Corsair Strafe RGB MX Silent Corsair HX750 Cooler Master 932 HAF 
CPUMotherboardGraphicsRAM
Athlon 64 x2 4200+ Gigabyte GA-K8N-SLi BFG Tech GeForce 7800 GT 2GB Crucial Value RAM 
Hard DriveOptical DriveOSMonitor
WD VelociRaptor 300GB LG BH12 Blu-Ray Rewriter Windows 7 Home Premium x64 Samsung 226BW 22" 
PowerCase
OCZ GameXStream 700W Thermaltake Xaser III VM3000A 
  hide details  
Reply
post #13 of 15
Thread Starter 
These machines are purposed as lightweight appliances ... In the case where we have to ship a physical box, we install a fairly empty linux host, that has an isolated VM guest inside of it that actually serves as the appliance machine (qemu-kvm through libvirt). Typically we put two guests in each host, one for testing that is normally shut off, and one for production. All in all this has proven to be a solid setup and performance is fine. thumb.gif In the case where we have to ship a VM, we can just ship the images, after format conversion and any required changes to the guest. We haven't yet had to do that, but I'm going to be testing it soon.

I guess really for this discussion, we can flat out disregard the physical host, be it ours or theirs it doesn't really matter - it's all virtual machines.

Currently these are all ubuntu, but as I'm now taking over the entire project I may be looking to change that, due to some instability issues. Ubuntu was chosen for a few reasons I do not disagree with, but in practice there's just too many bugs to be comfortable with, and the release cycle only seems to perpetuate the instability. I've been on and off with ubuntu for a very long time. Early on it was great, the best, but at some point they put too many chefs in the kitchen and it's all been downhill.

I'm just not sure yet what I may go with. As difficult as it is for me to say this, gentoo would be a terrible fit for this purpose. redface.gif There's always an upside to it, but we can't deal with something that high maintenance in this large of a number. I'm really looking for something that is rock stable but still relatively current. Possibly debian but I can't imagine it is tremendously more stable than it's ubuntu stepchild. centos or fedora maybe but I don't know those that well. Good suggestions are welcome.

So that aside, I just have to figure out how I can build functionality in our monitoring server trap a problem on the production guest
(or it's host), and fire something that can reroute everything to a redundant (possibly parallel) VM that has its own IP and hopefully is on a different physical host. Actually I think I can pull that off pretty easily.. it's just the fact that machine2 has to pick up whatever machine1 was doing at the exact instant in time cleanly. There's so many aspects of that problem when looking at the details of the applications, I don't think it's feasible to go about patching everything to be "cluster aware". I may have to do that though...rolleyes.gif But I feel like there should be a clean way to do it at a higher level so the application has no idea anything is happening.

But ultimately, I am struggling with how this picture really does not solve anything. Practically maybe, but theoretically same the problem is still right there.
Code:
         traffic
             |
       10.10.20.1
          |     |
10.10.20.2     10.10.20.3

what happens if .1 fails? That's really every bit as likely to happen as it would be for .2 or .3.... And putting another layer in there for .1 obviously doesn't help. Now pretend I took the time to draw that picture, and you can see how this setup is just making the problem even WORSE. Instead of just one machine that will fail and when it does I have to deal with it, I now have three. The only gain is _if .2 or .3 fails by itself, that's fine, but if .1 dies we're down... so why did this even get brought up as a solution? smil3dbd4e4c2e742.gif is .1 a part of .2 and then I would kick .1 over to .3, so only two machines? am i misunderstanding this?

dilbert-virtualization-cartoon2.gif
Edited by lloyd mcclendon - 12/15/11 at 9:24pm
stable again
(25 items)
 
  
CPUCPUMotherboardGraphics
E5-2687W E5-2687W ASUS Z9PED8-WS EVGA GTX 570 (Linux host) 
GraphicsRAMHard DriveHard Drive
EVGA GTX 970 FTW (win7 guest) 64GB G.SKILL 2133 2x Crucial M4 256GB raid1 4x 3TB raid 10 
CoolingCoolingCoolingCooling
2x Apogee HD  2x RX 480 2x MCP 655 RP-452x2 rev2 (new) 
CoolingCoolingOSOS
16x Cougar Turbine CFT12SB4 (new) EK FC 580 Gentoo (host) Gentoo (x23 guests) 
OSMonitorMonitorPower
windows 7 (guest w/ vfio-pci) Viewsonic 23" 1080P Viewsonic 19" Antec HCP Platinum 1000 (new) 
CaseOtherOther
Case Labs TH10 (still the best ever) 2x Lamptron FC-5 IOGEAR 2 way DVI KVM Switch 
  hide details  
Reply
stable again
(25 items)
 
  
CPUCPUMotherboardGraphics
E5-2687W E5-2687W ASUS Z9PED8-WS EVGA GTX 570 (Linux host) 
GraphicsRAMHard DriveHard Drive
EVGA GTX 970 FTW (win7 guest) 64GB G.SKILL 2133 2x Crucial M4 256GB raid1 4x 3TB raid 10 
CoolingCoolingCoolingCooling
2x Apogee HD  2x RX 480 2x MCP 655 RP-452x2 rev2 (new) 
CoolingCoolingOSOS
16x Cougar Turbine CFT12SB4 (new) EK FC 580 Gentoo (host) Gentoo (x23 guests) 
OSMonitorMonitorPower
windows 7 (guest w/ vfio-pci) Viewsonic 23" 1080P Viewsonic 19" Antec HCP Platinum 1000 (new) 
CaseOtherOther
Case Labs TH10 (still the best ever) 2x Lamptron FC-5 IOGEAR 2 way DVI KVM Switch 
  hide details  
Reply
post #14 of 15
Quote:
Originally Posted by lloyd mcclendon View Post

These machines are purposed as lightweight appliances ... In the case where we have to ship a physical box, we install a fairly empty linux host, that has an isolated VM guest inside of it that actually serves as the appliance machine (qemu-kvm through libvirt). Typically we put two guests in each host, one for testing that is normally shut off, and one for production. All in all this has proven to be a solid setup and performance is fine. thumb.gif In the case where we have to ship a VM, we can just ship the images, after format conversion and any required changes to the guest. We haven't yet had to do that, but I'm going to be testing it soon.
I guess really for this discussion, we can flat out disregard the physical host, be it ours or theirs it doesn't really matter - it's all virtual machines.
Currently these are all ubuntu, but as I'm now taking over the entire project I may be looking to change that, due to some instability issues. Ubuntu was chosen for a few reasons I do not disagree with, but in practice there's just too many bugs to be comfortable with, and the release cycle only seems to perpetuate the instability. I've been on and off with ubuntu for a very long time. Early on it was great, the best, but at some point they put too many chefs in the kitchen and it's all been downhill.
I'm just not sure yet what I may go with. As difficult as it is for me to say this, gentoo would be a terrible fit for this purpose. redface.gif There's always an upside to it, but we can't deal with something that high maintenance in this large of a number. I'm really looking for something that is rock stable but still relatively current. Possibly debian but I can't imagine it is tremendously more stable than it's ubuntu stepchild. centos or fedora maybe but I don't know those that well. Good suggestions are welcome.
So that aside, I just have to figure out how I can build functionality in our monitoring server trap a problem on the production guest
(or it's host), and fire something that can reroute everything to a redundant (possibly parallel) VM that has its own IP and hopefully is on a different physical host. Actually I think I can pull that off pretty easily.. it's just the fact that machine2 has to pick up whatever machine1 was doing at the exact instant in time cleanly. There's so many aspects of that problem when looking at the details of the applications, I don't think it's feasible to go about patching everything to be "cluster aware". I may have to do that though...rolleyes.gif But I feel like there should be a clean way to do it at a higher level so the application has no idea anything is happening.
But ultimately, I am struggling with how this picture really does not solve anything. Practically maybe, but theoretically same the problem is still right there.
Code:
         traffic
             |
       10.10.20.1
          |     |
10.10.20.2     10.10.20.3
what happens if .1 fails? That's really every bit as likely to happen as it would be for .2 or .3.... And putting another layer in there for .1 obviously doesn't help. Now pretend I took the time to draw that picture, and you can see how this setup is just making the problem even WORSE. Instead of just one machine that will fail and when it does I have to deal with it, I now have three. The only gain is _if .2 or .3 fails by itself, that's fine, but if .1 dies we're down... so why did this even get brought up as a solution? smil3dbd4e4c2e742.gif is .1 a part of .2 and then I would kick .1 over to .3, so only two machines? am i misunderstanding this?
dilbert-virtualization-cartoon2.gif

Like I said before, VM teleportation will do most of what you're asking.
Also, Debian is a lot more stable than Ubuntu
post #15 of 15
@OP

From what you've posted so far it looks like you have two clustering problems to solve:

i) Bare metal failure. What happens if a network card dies, or a CPU fails? No amount of VMs will save you in this instance.

ii) VM failure. Same as above, plus the VM itself crashing or otherwise becoming unavailable.

So from my perspective, here's a rough outline.

1. Start with two boxes. These are your clustered SAN nodes, and will serve as datastores for the VM images. They are made accessible to the other cluster nodes via iSCSI or NFS, and are clustered using the Linux HA package and OpenFiler. Look here for a basic guide.

2. Now add at least two VM host nodes These are clustered via Linux HA, and connect to the primary SAN node via iSCSI or NFS. If the primary SAN node dies, the secondary takes over.

3. Use a round-robin DNS scheme so that each node will get a share of the workload. This should allow you to avoid the single point of failure of a front-end node.

This setup will allow you to use VM teleportation to keep your guest VMs redundant. However, if an app server (like JBoss) crashes inside a running VM, or an app crashes inside the app server, it won't help. Having the SAN will give you a common datastore to store session information and such, but the app server and/or app will have to be configured to make use of this.

OS-wise, I would say either Ubuntu Server, (which you've had issues with), RedHat Enterprise Linux or CentOS (de-badged RHEL clone).
Ryzen
(12 items)
 
  
CPUMotherboardGraphicsRAM
Ryzen 7 1700 Gigabyte GA-AB350M Gaming 3 Palit GT-430 Corsair Vengeance LPX CMK16GX4M2B3000C15 
Hard DriveCoolingOSMonitor
Samsung 850 EVO AMD Wraith Spire Linux Mint 18.x Dell UltraSharp U2414H 
KeyboardPowerCaseMouse
Apple Basic Keyboard Thermaltake ToughPower 850W Lian-Li PC-A04B Logitech Trackman Wheel 
  hide details  
Reply
Ryzen
(12 items)
 
  
CPUMotherboardGraphicsRAM
Ryzen 7 1700 Gigabyte GA-AB350M Gaming 3 Palit GT-430 Corsair Vengeance LPX CMK16GX4M2B3000C15 
Hard DriveCoolingOSMonitor
Samsung 850 EVO AMD Wraith Spire Linux Mint 18.x Dell UltraSharp U2414H 
KeyboardPowerCaseMouse
Apple Basic Keyboard Thermaltake ToughPower 850W Lian-Li PC-A04B Logitech Trackman Wheel 
  hide details  
Reply
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Linux, Unix
Overclock.net › Forums › Software, Programming and Coding › Operating Systems › Linux, Unix › How to build a auto failover "cluster" ?