Design Considerations for Stretched Clusters – Your Thought ?

Perhaps this is one of those topics which has been touched, revised many a times by some of our industry experts like Duncan Epping, Scott Lowe etc. First let me revisit those articles which has already been written by them.

  1. vSphere 5.0 HA and Metro / Stretched Cluster Solutions
  2. Some questions about stretched clusters with regards to power outage
  3. Distance vMotion = Stretched Cluster ?

Now once you read this articles (if not already), you may ask me, what is the new in my article. Well, a valid question and I did not want to reinvent the same wheel. So, I took a conscious step in touching this topic.

In this article I am only going to touch about the considerations as solutions are already available and well explained by our industry experts.

 

Stretched Cluster Considerations:

  • For compute you need to have Hypervisor at both the locations and need to have sufficient capacity there to hosts the migrated or restarted (vSphere HA) virtual machines.
  • You should have a mirrored storage. But hey do you know what other storage considerations are there for stretched clusters?
  • You should have Layer 2 adjacency for Network and need to consider bandwidth and latency. Do you know how much latency is supported for distance vMotion?

 

Now let me show you a simple diagram of stretched cluster.

stretch-cluster

 

For stretched clusters, you should have mirrored storage at both sites, and that storage must be completely synchronized before a VM can be migrated from one location to another. Whether the synchronization is going to be synchronous or asynchronous, that is a separate business decision and SLA is the business driver for that.

 

If you are mirroring data synchronously, there should be no issue with the data being synchronized, assuming there is sufficient bandwidth and minimal latency. If you are mirroring asynchronously, any outstanding writes must be completed before the VM can be moved.

 

But hey stop, this is not as easy as it may sound, it may be problematic for high I/O applications if the bandwidth cannot accommodate the amount of data that is being written (churn aka change rate). Let us assume you have a Oracle VM and you try to move that VM, while still actively processing transactions, to a remote node. Consider what will happen in this condition.

 

Where the data is being mirrored is also a consideration. Do you know when data is being written to two locations from the hypervisor, then that can significantly complicate your storage and network configurations. If data is being mirrored at the storage system, you need to carefully consider

  • How does that impact array performance? Or does it?
  • If there is an appliance that is mirroring the data, what protocols does it support?
  • What about the scalability of this solution?

 

In any of these situations, you also have to consider these below conditions:

  • How does the mirroring solution handle a communication failure? Or is it going to handle it at all?
  • If it is synchronous, does the write fail or does the local system cache the information to write it at a later point of time?
  • If the above condition is true, then how much can be cached? Is there a potential budget constraint for storing the cached data?
  • Does asynchronous mode do anything to accelerate synchronization after a failure?
  • Does the cluster enter a split brain scenario if the two sites are disconnected for a period of time? (Read Duncan’s article)
  • Consider what happens if a hypervisor fails and VMs need to be restarted? (Read Duncan’s article)
  • Do you restart them locally or on a remote node? If you are using vSphere 5.x HA then let Hypervisor decide this?
  • Can you set an affinity for one side or another?
  • Can you keep related services together, so that the app is not at one site and the database at the other? Let DRS Affinity/Anti-Affinity rule play this role. (Read vSphere Clustering Deepdive)
  • What if there is not sufficient capacity to keep both at one site?

 

 

VMworld Call for Paper – Public Voting is Open

Yesterday VMware has open up the public voting catalog for VMworld 2013 papers and I am really excited to see some of the super rich technical paper.

Well, I have submitted my bit there as well and looking forward to see how are you going to like this paper.

  • Did you know that BC/DR is one of the crucial aspect of success story of Public Cloud?
  • Do you want to know how to Design a effective BC/DR solution for your Public/Hybrid Cloud?
  • Do you know how effective a BaaS (Backup as a Service) can be in your public cloud?

If any of your answer is “NO” then you should vote for this session to see how we at VMware R&D is working for this. Listen to us who has designed, architected, implemented and tested the BaaS solution.

Title: Architecting Backup as a Service for vCloud Hybrid Service

Session Number: 4985

Abstract: The goal of this design paper is to provide all the necessary information to assist the service providers with architecting a Backup as a Service solution in a VMware based Hybrid Cloud.

In a Hybrid Cloud solution, backup deals with more scalability rather than the vApps in a vCloud. vCloud workload backup solution has to manage vApps in the cloud plus the deleted vApps, orphaned vApps in vCloud etc. So to allow customers to move to the Cloud scale, we need to provide automated application of selectable backup policies which in turn should be standardized as well.
Service Provider should assign customers vApps depend on the SLA level too, for an example, attached the vApps to Gold, Silver and Bronze backup workflow pipeline. Also considering this backup solution for cloud workload, it should support Public, Hybrid and Private Clouds because customers buy cloud for flexibility and agility. Also backup must be fully cloud aware and flexible.

A Service Provider should consider resource metering too while architecting the backup solution, such as, storage, IOPs, network, bandwidth etc. This allows a provider of the service to enforce backup resource allocation choices too. A customer should be able to choose from PAYG, Reservation or allocation model.

So in a nutshell what this vCloud backup does for you is, though it costs you something now, however if there is a catastrophic failure happens then it pays you back more than the premium you paid to the service provider.

Speaker: 

Russel Callen (Senior Cloud Architect)

Prasenjit Sarkar (Senior Member of Technical Staff)

 

I have submitted another panel discussion session which includes a demo as well of our tool (Bezoar) which does self discovery and self healing of our VMware vCloud Hybrid Service. You will see it in action if it gets selected.

Title: Bezoar: Self-healing vCloud Hybrid Services

Session Number: 5116

Abstract: In the era of public, private and hybrid clouds, it is a challenge to minimize the downtime. Our datacenters are no longer a collection of just physical resources connected together but a complex and intricate combination of virtual and physical infrastructure working in perfect harmony with multiple levels of abstraction. As beautiful as this may sound, this whole ecosystem hangs by a very thin thread where a small error threatens to bring down the complete cloud. Finding and fixing problems fast is the key.

Bezoar can automatically understand your cloud architecture, monitor its components, identify root cause, perform impact analysis and heal your cloud. It does not stop there; it is a self-learning system with intelligence to prevent the same problem from recurring in future.

To achieve this, our auto discovery algorithm will probe and learn relationships between various physical and virtual components of the cloud and map the dependencies between them. This is how Bezoar learns about your cloud. We store these complex relationships in our CMDB, to be used later for impact analysis and help cloud admins to visualize their complete infrastructure.

Our system will continuously monitor your cloud and its services to report health per tenant. As soon as one of the components or its service goes down, Bezoar alerts you and swings into action to rectify the problem. If it doesn’t know how to fix the problem it will ask for a solution that will be recorded into its RCA DB for self healing in future.

Speaker: 

Vineet Sinha (Staff Engineer)

Rishi Sharda (Senior Member of Technical Staff)

Prasenjit Sarkar (Senior Member of Technical Staff)

 

 

OS Cluster, Application Cluster or vSphere HA – Design Considerations

I know lot of articles already has been written on this topic yet, however did not see a apple to apple comparison of the factors which can lead a design considerations to be made by any VMware Architect.

In our old days we have seen those classis model of data centers where we had to deploy a clustering solution to provide high availability for critical SLA driven services, such as, email, databases, ERP etc.

However today we are rapidly moving to the Cloud era where we have our Hypervisor to provide high availability features to some extent if not as the traditional model does. Generally speaking Hypervisor HA provides a very similar amount of redundancy that an OS cluster did in those classis data center. Now let me show you where are heading towards before I take this discussion further. Look at the below snapshot of a classic OS clustering in the old data centers.

Traditional

 

Now comes the Hypervisor based HA feature. Below is a typical representation of it.

HypervisorHA

 

Now comes the Application cluster over Hypervisor layer.

AppCluster

In the event of a service or server failure, the OS cluster would restart the service on another node in the cluster. The hypervisor model performs a similar function by restarting a failed VM on another hypervisor in the cluster. However, one component that may not be identical is the ability to detect a service failure in hypervisor HA. The biggest question is

 

Can your hypervisor detect that a service, such as Microsoft Exchange, has failed on a VM and trigger a restart of the VM?

 

The time to restart a VM on another node will vary based on a number of factors, such as the number of services, utilization of the hypervisor, etc. This amount of time may or may not be more than restarting just the services on another cluster node. If your hypervisor has a hot standby feature, where a synchronized copy of the VM is kept on another hypervisor, the failover time is almost instantaneous. However, there may be other restrictions that should be considered in this type of environment. Yes you are right am talking about vSphere FT.

But hey do you know that deploying an OS cluster within a hypervisor environment is generally a bit more complex, as there are certain disk configurations that are required for clusters. Since the hypervisor typically masks much of that configuration from the VM, special considerations must be taken into account.

Also, deploying services into a cluster (Application Cluster) usually requires additional IP addresses, DNS names, and other network components to allow that service to float between the cluster nodes (firewall rules). Also the time to restart the services on another node should be considered against the amount of time to restart the entire VM. However, the cluster will natively be able to detect a service failure and trigger a restart. You need to examine the maximum number of cluster nodes that the OS supports and that too on the Hypervisor too. You may need to have multiple clusters configured for your services, which can complicate the environment even further. So operational complexity is an issue here.

Finally, be aware of how failback is configured, especially if you are using both hypervisor HA and OS Clusters. If the cluster is configured to fail the resource back immediately, then when the VM is restarted, the service will experience another outage to return to normal operating status.

For application clusters, or applications that are configured with redundant components, be sure that the redundant copies are stored on different hypervisors so that a server failure doesn’t impact the entire application. Depending on how the application is configured, you may or may not want to use hypervisor HA as well.

If the application has an automated failback mechanism, you may not want to use hypervisor HA, since that could also cause service disruption during the failback process. If the failback can be controlled, then you may want to use Hypervisor HA or some other process to restart the VM so that if the second node fails, you do not lose the service entirely.

 

Load Based Teaming & IP Hash – Both Together Huh !!

Let me ask you this question.

 

How many of you have seen LBT and IP Hash working together in a single VMware vDS?

 

Interestingly enough one of my friend was working on a project that has decided to use HP BladeSystem with the B22HP FEX adapters connected to N5K and then to N7K’s back in Core.  vPC Plus is a possibility to the host which offers a the ability to leverage LACP with Route base on IP Hash on the two 10GE LOM host config.  This particular configuration would be highly beneficial for the NFS storage (NetApp) that they use.

Now in this design one interesting question arises and that is; can we mix LACP + IP Hash with LBT, or virtual-port-id on different port groups on the vDS with the same two LACP enabled uplinks?

Now even if you do not understand the network-level LACP behavior, the rational is, use IP Hash for the NFS port group, virtual port id for multi-nic vmotion and LBT for VM’s. Look at this below pictorial representation.

LBT-IPHash

 

Well to me LACP+IP Hash across the board benefits the NFS storage OR LBT/virtual-port-id for VM’s and multi-nic vMotion.

Now the question is whether it will work or not.

Well, I have tested it in my Lab environment. I have LACP running in my Arista Switches and MLAG configured in my SuperMicro Blades. All of my dVPG were running on Route based on IP Hash.

Now I have created a new PG using LBT and put couple of VMs on that and it is working perfectly without any issues. I put a VMK as well and there also I could not find any issues at all.

Now you should note that the following teaming algorithms have no dependency on the physical switch configuration:

  1. Port ID hash
  2. MAC hash 
  3. Load based teaming
  4. Explicit failover

This is because we will always send the traffic from a particular Source MAC address through the same physical uplink. This is pretty important requirement from the physical network switches perspective.

Now, in the case of LBT before moving any traffic we indicate to the physical switch through reverse arp request that traffic from a particular MAC will be moved to another physical uplink.

I ran this idea with our Network experts as well (Vyenkatesh Deshpande). As per him:

 

What you are seeing is not surprising. The question is do we want customers to do such type of mixed configuration. We don’t like it because it creates more confusion. Already, network people are struggling to understand that teaming configuration is not tied to physical NICs in virtual switch (VSS/VDS).  It is good to be consistent across the physical switch and virtual switch configuration.

 

Also you need to note that with LACP on the upstream switch, the only Supported solution is Route Based on IP Hash.

 

If you have missed these two posts which talks a lot about these two (IP Hash and LBT), then read these two below article:

IP-Hash versus LBT

NFS and IP-HASH Load balancing

 

Note: One of my friend tested this today again and uncovered some interesting results.

So YES you can configure a vDS as proposed but what we found with LACP enabled and NOT using IP Hash for the VM port groups is that the VMs could not communicate until IP Hash was toggled on then off for that VM PG. After a VM is powered off then on again it no longer has network connectivity until you toggle IP Hash again.

So it seems that the policy must apply to all new dvports that are connected and once connected you can change the load balancing policy and the VMs continue to work.

Well now it seems unless we know of any way to shape this behavior the idea is debunked.

 

Self Healing in Service Provider Cloud – Your Thought ?

Its been really a long time (close to a month) when I last blogged. Hectic and Frantic schedule took a toll on me and even though I wanted to talk about many things, I could not do so.

However now as I am back, I am back with couple of interesting ideas and will see how are you lining up your thought around this.

The first one here I wanted to talk about is Self Healing of your Cloud, well needless to say I am focusing on Service Provider Cloud.

Now let me define what do I mean by Self Healing of your SP Cloud.

By definition this will be a system which can determine that something is not operating correctly or not configured properly and make the appropriate corrections to restore it to its best condition.

In a nutshell this module can be integrated into any Automation Tool and compare the what should be condition to what is condition and apply the automation to put it back.

Now you may ask me how to implement it or how to hook this module with any automation tool. Well, to do this, there could be many a way. I am going to talk about three ways here.

To make it happen, what we can do is using a SNMP and API based solution. To take an example using either some API based module which will proactively check the health/parameters/SLA condition for this Cloud or using any standard practiced Monitoring Solution. Now look at the first solution which will be using API based proactive check module along with this Self Healing module.

Phase 1

Using API calls, SNMP, or other configuration gathering method, the status of a subsystem can be compared to what is considered normal as defined by the SLA for that subsystem. If it is not in compliance, the automation and orchestration components can be invoked to put things back they way they should be. This might involve redeploying a configuration file, reconfiguring a network interface, vacating a hypervisor host of all the VMs it contains and placing them on another host, or just about any other task that can be automated or orchestrated.

For Monitoring, you can use Hyperic or EMC Smarts or CA Spectrum console and Puppet open source software packages can be used which will be used to facilitate self-healing tasks.

  • Hyperic is a monitoring tool for hosts and services, similarly CA Spectrum too.
  • Puppet is an automatic configuration management tool.
Used together, they can monitor and correct any configuration issue on any platform that is supported by both tools.

Now look at the Monitoring based solution, where this module can integrate with any standard practiced monitoring service and reactively solve many a issues (where ever possible).

Phase 2

Also another solution could be using a CMDB and API. Here we need to look at the DB for errors and change the configuration as part of automation and orchestration. Well, to me this is a lengthy process to start with. However I am open for comments.

Think about a situation something went wrong and at the same time your operations team is having a glass of beer and in the background your self healing module is doing the work. How cool can that be :-)

I am all ears and looking for more ideas around it and your esteemed feedback.

 

SSL VPN-Plus using vShield Edge – How To

I know since a long time, I kept on saying this place is not meant for any how to. No hard feeling for any one who write How To’s, but my focus was not to write something like, click here and click there and you should be fine.

However, today I am taking it back :) .

Well, it all started with a Use Case scenario and a requirement from a different team altogether.

Now let me show you what is my setup and what was the requirement which lead me to write this post.

NestedLab
Now if you look at the above design, we have many nested cloud setups which is been used for some testing. These are all nested VMs, which means these are all registered on top of Virtual ESXi. These virtual ESXi are registered on top of Physical Hardware which is really robust in our case, so that it can take the entire load.

Now, you may think for this entire setup how many IPs you need. I can tell you its lot. To scale it as well, we can’t afford to have it run on top of Public IPs (Corp IPs). So, to minimize the use of Corp IPs and the scalability purpose we have deployed a vSM and an Edge device.

Today every one knows that we have 10 interfaces (Internal and External) in a Edge device. For external of course we need one and for our purpose we just took one interface and made it as Internal.

This interface is running a 192.168.0.0/16 Static IP Pool. This way we can have as many as 656534 usable IP address. This is of course enough for our requirement.

Our external network is running on 10.100.100.0/24 network. This can have as many as 254 usable IP Address and thus we had to segregate it using Edge. Let me also show you how we carved it out inside the virtual (nested) cloud.

Tenants

 

Our nested cloud is actually running on top of another private network which is segregated with Org Edge GW.

Now comes the real requirement. We actually needed to have consumers access the vCD cell to satisfy their need.

Oops!! The immediate question comes into mind is, how many cloud do we need, because according to that we need so many IPs right??

You may say, well this is not a problem, we can easily create a DNAT rule in Edge and have the Public IP mapped to the Private vCD Cell IP right?? Again the question, we have 254 IPs and we need to have our Mgmt Stack up with these IPs well. Can we actually look at this solution if we need to scale it out and the answer is no.

Now, what left in our hand is, creating a tunnel to the Edge and have the users get access to everything they need and of course we have firewall to block unwanted traffic. Here I thought of bringing an SSL VPN-Plus.

If you are logged in to the SSL VPN using your Edge device then you have access to all of your internal resources. Of course you have option to give access to particular port right, so why to worry? Let me show you now how to begin with and a step by step approach.

1. Login to your vSM portal and expand the Datacenters section and choose the Datacenter where you want to deploy the VPN through Edge.

2. Select Network Virtualization from the right hand side -> Select the Edge and Click on Actions -> Select Manage

Edge-DC

 

3. Once it opens up, go to the VPN Tab and it should show you the Dashboard.

Edge-VPNTab

 

4. First thing which you need to do it, Select Server Settings and click on Change button at the right hand tree.

5. At this stage select the external interface where you want to enable the SSL VPN. This will become your VPN termination IP. Rest of the input can be taken as default.

Edge-VPNServer

 

6. Now you should move to the IP Pool Item. Click on the Green “+” sign to add a IP Pool. This is basically going to be allocated when a client from outside connect to the VPN. The moment client connect to the VPN he will have an IP Address from this Pool and that will be assigned to his VPN Network Adapter.

Note: Here, you should choose the other subnet rather than the internal subnet what you have. In our requirement, our internal subnet is 192.168.0.0/16. So, we have used 172.168.0.0/24. That means 254 client can get one one IP address.

Edge-VPNIPPool

 

7. After this move to the Private Networks section. Click on the green “+” icon to add a Private Network space. You need to add which network this VPN should give access to. That means once a client login to the VPN which networks they will have access to. In this example we have given the 192.168.0.0/16 network access. That means when a client logs in to the VPN he/she will have access to the entire private clouds.

You also need to select how the traffic will be forwarded, by default entire traffic is forwarded SSL VPN Over Tunnel. If you select Traffic through Tunnel then you should enable TCP Optimization to optimize the internet speed. Read the below paragraph from vShield Admin guide.

Conventional full-access SSL VPNs tunnel sends TCP/IP data in a second TCP/IP stack for encryption over the internet. This results in application layer data being encapsulated twice in two separate TCP streams. When packet loss occurs (which happens even under optimal internet conditions), a performance degradation effect called TCP-over-TCP meltdown occurs. In essence, two TCP instruments are correcting a single packet of IP data, undermining network throughput and causing connection timeouts. TCP Optimization eliminates this TCP-over-TCP problem, ensuring optimal performance.

Type the port numbers that you want to open for the remote user to access the corporate internal servers/machines like 3389 for RDP, 20/21 for FTP, and 80 for http. If you want to give unrestricted access to the user, you can leave the Ports field blank.

Edge-VPNPrivateNW

 

8. Authentication is the next section. Here you have many options like, AD, LDAP, RADIUS, RSA-ACE and LOCAL. For our purpose we have chosen LOCAL. This is the basic authentication method and requires very less administrative overhead. We can have the local users added to the Users section and can have them authenticated locally.

You can select Password Policy where you have the option of choosing Password Length, Expiry, account lockout policy and so on.

 

Edge-VPNAuth

9. Now comes the Installation Package. Here we need to create an installation package of the SSL VPN-Plus client for the remote user. This is how the client will get the VPN Client software on their client machine. We support Windows, MAC and Linux.

Also you can select Installation Parameters here as well, like, create desktop icon, start client at logon, allow remember password and so on.

Click on the green “+” icon to add the client installation package. Within this window select the Green “+” icon to add the gateway. This is the same IP which you have chosen at the time of enabling outside VPN access. This is nothing but external IP address of your Edge device.

ClientPkg1

10. At the step 8 if you select LOCAL authentication, then you need to add some local users here. Select the Users section add some local users.

11. After this point there is no necessary things you should do, may be you can change some General Settings, like, preventing multiple logon using same user, compression, logging and so on.

For this select General Settings and click on Change at the right side and change the settings as per your requirement.

So, basically you are done with the configuration now and go back to the Dashboard, in the Service section click on Enable.

12. Now at the client side type the URL of the external IP of the Edge where you have enabled the SSL VPN (https://10.100.100.1/sslvpn-plus/)

13. Login to the portal using the local user which you have created at Step 8.

SSLClient

 

14. Here click on the SSL Client, this is basically a link which will download the client. Install the Client and then open it up. Click on Login and it will ask for the Username and Password. Supply the credentials as per Step 8 and now you should be logged in to the SSL VPN.

SSLLogin

SSLSuccess

Now once you are logged into it, you have all of the internal resource access. You either want to do RDP or SSH or other method to access all of your resources.

I hope this is useful to many of you as I did not find a single article on the net which can talk about the step by step process, so thought to enlighten the enthusiast who would like to implement similar solution.

 

Traffic Flow of Destination NAT through Edge Gateway

Did you ever wonder how the traffic flows when you create a Destination NAT? If the answer is Yes then you should follow the rest and if no then you already know how it works :D

Destination NAT (DNAT) maps an unregistered IP address to a registered IP address from a group of registered IP addresses. Destination NAT also establishes a 1:1 mapping between unregistered and registered IP address, but the mapping could vary depending on the registered address available in the pool, at the time of communication.

The typical usage of this is to redirect incoming packets with a destination of a public address/port to a private IP address/port inside your network.

The internal network is usually a LAN (Local Area Network), commonly referred to as the stub domain. A stub domain is a LAN that uses IP addresses internally. Most of the network traffic in a stub domain is local, it doesn’t travel off the internal network. A stub domain can include both registered and unregistered IP addresses. Of course, any computers that use unregistered IP addresses must use Network Address Translation to communicate with the rest of the world.

So now let me show you a example design where we are mapping (DNAT) an External IP Address to an Internal IP Address.

DNAT-Map

 

In this example we are mapping 10.144.36.101 to an internal VM IP which is 192.168.100.101. This is pretty straight forward and does not need any explanation. I will now show you the flow diagram and will explain how the packet flows.

DNAT-Flow

Now look at the above diagram. Lets say we have a client in the external network who is trying to connect to the internal VM which is inside the Org vDC network and has an internal IP Address (192.168.100.101).

Now the client will send an ARP for the external address which is 10.144.36.101. Your Edge device’s external interface has an external IP address. That will listen and will reply saying that ARP is in his external MAC. Once this is done, then your Edge will query the database (Routing Table?) and will see that there is a 1:1 mapping of its internal IP, it will send the packet through the internal interface to the appropriate VM.

So destination NAT changes the destination address in IP header of this packet. It may also change the destination port in the TCP/UDP headers.

 

Improving Network Performance for Multicast Traffic – SplitRx mode is your choice

Last year I wrote about the scalable approach of Multicast Traffic and now with the introduction of VXLAN and other models, we saw a lot of multicast traffic.

Multicast is an efficient way of disseminating information and communicating over the network. A single sender can connect to multiple receivers and exchange information while conserving network bandwidth. Financial stock exchanges, multimedia content delivery networks, and commercial enterprises often use multicast as a communication mechanism. Multiple receivers can be enabled on a single ESXi host. Because the receivers are on the same host, the physical network does not have to transfer multiple copies of the same packet. Packet replication is carried out in the hypervisor instead.

But hey how about processing this broadcast of multicast traffic within an environment? I meant how about the CPU cycles? Are we going to saturate our CPU in this scenario?

Well, we have an answer to this and that is SplitRx mode.

SplitRx mode is an ESXi feature that uses multiple physical CPU’s to process network packets received in a single network queue. This feature provides a scalable and efficient platform for multicast receivers. SplitRx mode typically improves throughput and CPU efficiency for multicast traffic workloads.

SplitRx mode is supported only on vmxnet3 network adapters. This feature is disabled by default. We recommend enabling SplitRx Mode in situations where multiple virtual machines share a single physical NIC and receive a lot of multicast or broadcast packets.

SplitRx mode is individually configured for each virtual NIC.

SplitRx mode uses multiple physical CPU’s to process network packets received in a single network queue. This feature can significantly improve network performance for certain workloads. These workloads include:

  • Multiple virtual machines on one ESXi host all receiving multicast traffic from the same source.
  • Traffic via the vNetwork Appliance (DVFilter) API between two virtual machines on the same ESXi host. SplitRx mode will typically improve throughput and maximum packet rates for these workloads.

vSphere 5.1 automatically enables this feature for a VMXNET3 virtual network adapter (the only adapter type on which it is supported) when it detects that a single network queue on a physical NIC is both (a) heavily utilized and (b) servicing more than eight clients (that is, virtual machines or the vmknic) that have evenly distributed loads.

Now the question is how do you enable that or disable that (if you need to do that). Here is the way how you enable / disable it.

  1. Open up vSphere Client
  2. Login to the vCenter Server
  3. In the home screen select Hosts and Clusters
  4. Select the ESXi host you wish to change
  5. Under the Configuration tab, in the Software pane, click Advanced Settings
  6. Click on the Net section of the left hand side tree
  7. Find NetSplitRxMode
  8. Click on the value to be changed and configure it as you wish

NetSplitRxMode = “0″
This value disables splitRx mode for the ESXi host.

NetSplitRxMode = “1″
This value (the default) enables splitRx mode for the ESXi host.

The change will take effect immediately and does not require the ESXi host to be restarted.

SplitRxGUI

The SplitRx mode feature can also be configured individually for each virtual NIC using the ethernetX.emuRxMode variable in each virtual machine’s .vmx file (where X is replaced with the network adapter’s ID).
The possible values for this variable are:

ethernetX.emuRxMode = “0″
This value disables splitRx mode for ethernetX.

ethernetX.emuRxMode = “1″
This value enables splitRx mode for ethernetX.

 

So, if you want to change the value of this on individual VMs through vSphere Client, you should follow the below steps:

  1. Select the virtual machine you wish to change, and then click Edit virtual machine settings.
  2. Under the Options tab, select General, and then click Configuration Parameters.
  3. Look for ethernetX.emuRxMode (where X is the number of the desired NIC). If the variable isn’t present, click Add Row and enter it as a new variable.
  4. Click on the value to be changed and configure it as you wish.

Note: The change will not take effect until the virtual machine has been restarted.

 

Sub Allocate IP Pools – Further segregating External Network IP Pool in vCloud Director

vCloud 5.1 brought us many new features and some are very useful. If you look at the Use Cases for these features, you would absolutely love it.

One of such a feature is Sub Allocate IP Pool. While you create External Network in vCloud, you tend to specify a IP Pool which can be used by Organization Network to get the external world connectivity.

Now let us imagine a situation where you have a /24 network entirely allocated to a external network in pool. Your requirement is to use max of 10 of those IP addresses on your Org Network, and you have a 20 of such Org to set up.

In this situation, you did not have a control over the IP allocation to each Organization. That means out of your 254 IP Address, you did not have a control of allocating those IP addresses to your Org Network earlier right? It would have been first come first serve scenario, which means the first Org will take first 10 IP address and then the second one, so on and so forth.

Would you really like it? I mean think about Operational simplification side, a Ops guy may not know which IP address is allocated to which Org unless it is allocated and does not have any control to segregate it.

Now with the release of new vCloud, we have the segregation power of IP into further segment, which is called Sub Allocate IP Pool. Now when you create a new Gateway in your Cloud for your Org Network, you have the option to choose the Sub Allocation and then specify your own range.

Let me help you to visualize this. Look at the picture below:

IP Pool

 

Now in this example, I have a flat /24 network (10.0.0.0/24) where I have 254 External IP Addresses. Here, I want to segregate the IP Allocation to each Organization and I have chosen Sub Allocation of IP Pool. In this example I have allocated first 100 IP Addresses to Organization X and then another 100 IP Addresses to Organization Y. This way even when you use 10 IP addresses from Org X it will not overlap to the Org Y IP Pool.

An Org Admin then can use this Sub Pool to easily manage their DNAT and SNAT mapping. An example output is like below. Though this example output does not match the above IP Addressing scheme, but that’s OK I guess now :)

IP Pool Example

 

Deploying Exchange 2010 – Do you need FAST VP and FAST Cache ?

In my last post I talked about the situations where it is not advisable to use EMC FAST VP and FAST Cache. In this post I will talk about an example where you should not employ both of them.

Microsoft Exchange 2010 was designed to use large, slow drives, and minimizes access to the physical disks. As a result, FAST Cache is only useful if very high levels of performance are required. Jetstress, used in testing during Exchange deployment, has poor data locality, so FAST Cache is not likely to provide any deterministic performance improvement with Exchange 2010.

Background Database Maintenance, which is a regular part of Exchange implementations, causes pollution of FAST VP statistics collection and ranking. Homogeneous Pools, or Traditional LUNs, will not exhibit this effect, and are recommended. The use of “Highest Tier Available” data placement for Exchange data may reduce the effect of Background Database Maintenance on FAST VP LUNs.

The use of Thin Pool LUNs should be avoided with Exchange 2010. If Thick Pool LUNs are used, users should be aware that Jetstress causes data to be allocated to LUNs unevenly, causing initial LUN performance to be poorer than that for other LUNs. This could cause Jetstress to report a failure.

To work around this, EMC engineering has developed a utility called “SOAPTool” which forces even distribution of the data. If using Thick Pool LUNs, use the SOAPTool utility to ensure optimal performance. Alternatively, Traditional (RAID Group) LUNs may be used instead.