N+2 Failover for a Blade Design with Upstream Switches

A resilient and highly available Blade design with VC and Network uplink is always desirable. Yesterday we were discussing about two design scenario. First one use the very simple network topology and rescued us from SPOF (Single Point of Failure) scenario and at the same time giving us performance boost using LACP.


Now we would like to stretch it a bit and like to create a more resilient and highly available scenario. Our second design scenario enact this behavior. Here we would be getting N+2 failover and at the same time will carry the same performance boost (giving us a 40GB Pipe). Let us visualize this.

Now you see here if my first VC goes down then my second VC would take over. Okey, this is as expected. Now let us make it more complex. Assume another device, in this case a upstream switch fails; what would you expect? Yeah we still have connectivity, because still we have the  other switch which will forward the traffic down to the other VC module. So did we achieve N+2 failover? I think yes we got it.


One thing to consider here is Cross Switch LACP is achievable using either Cisco vPC/VSS or HP IRF technology. So if your switch does not support any of these you have to fall back to the first design, we would talk about it now. Also in this design scenario if your switch does not support vPC/VSS or IRF then you will lose 20GB bandwidth from the 40GB pipe. Reason being, you won’t be using Active/Active in this scenario and at any point in time you would get two pipe as Standby. Make this point as design constraint and present it to customer.


Now let us see the first design and let us visualize what would have happened with the same failure scenario.

You see where am I going? Yes this is going to give you N+1 failover scenario. If my first or second VC is down we will still have the link. But if we lose another upstream switch then we would lose all the connectivity. If you have to fallback to this design due to constraint of not using vPC/VSS or IRF then make it risk point in your design and present to customer. In this design scenario you will get the same 40GB pipe.

This issue here is a Threat Risk Analysis and a judgement call on what is an acceptable level of risk.

If you are designing to support a solution that has at its highest concern availability then splitting the each LACP bundle will give a more available topology, but at a cost.

If you are designing for maximum throughput then accepting the risk that a VC or a Uplink switch will fail and that it is an acceptable risk to the solution.

About Prasenjit Sarkar

Prasenjit Sarkar is a Product Manager at Oracle for their Public Cloud with primary focus on Cloud Strategy, Oracle Openstack, PaaS, Cloud Native Applications and API Platform. His primary focus is driving Oracle’s Cloud Computing business with commercial and public sector customers; helping to shape and deliver on a strategy to build broad use of Oracle’s Infrastructure as a Service (IaaS) and Platform as a Service (PaaS) offerings such as Compute, Storage, Java as a Service, and Database as a Service. He is also responsible for developing public/private cloud integration strategies, customer’s Cloud Computing architecture vision, future state architectures, and implementable architecture roadmaps in the context of the public, private, and hybrid cloud computing solutions Oracle can offer.