Academic Research – Controlling Host Q-Depth based on Storage Fan-Out Ratio & Hosts Throughput Aggregation

Fan-out ratio is something that “generally” a vSphere Admin does not care about at all. So, lets take a look at what is Fan-Out ratio.

The fan-out ratio is the number of hosts connected to a port on a SAN array. When you try to determine how many hosts you should connect to a particular storage port, there are three things to consider: port queue depth, IOps and throughput.

However, a vSphere admin when try to connect to the back end storage and attach a LUN and create a datastore, does not really look at the factor of Fan-Out ratio & it’s over subscription at all. They just add the LUN and create a datastore out of it.

Oversubscribed fan-out ratio could be really dangerous in the situations where in there are many hosts, VMs, many LUNs per host etc. Chance of performance degradation is heavy there.

So, determining the current fan-out ratio and show the facts to the vSphere admin so that they can optimize their Virtual infra is the need here. That means, if they are aware of what is the current fan-out ratio and how that is going to bite them tomorrow, then they can take precaution of not adding multiple hosts to the same port of the storage, not spinning more and more VMs, not attaching many LUNs to a few servers etc.

 

Invention and how it can solve the problem

Let us have a recap on the Fan-Out ratio. A fan-out ratio is the relationship in quantity between a single port on a storage device and the number of servers that are attached to it.

It is important to know the fan-out ratio in a virtualized environment when doing SAN design for performance optimization, so that each server gets optimal access to storage resources. When the fan-out ratio is high and the storage array becomes overloaded, application performance will be affected negatively.

Key factors in deciding the optimum fan-out ratio are server host bus adapter (HBA) queue depth, storage device input/output (IOps) and port throughput.

Solution has been envisioned as a distributed system and for scalability it has been further modularized that sits in the VMware vCenter Server.

Discovery Module

This module will have Plug-in interface to integrate with vCenter Server, where there will be an option to register the Storage Management station using it’s IP Address and credentials.

When the Plug-in is registered with the Storage back end, it will have enough opportunity to pick up these numbers:

  1. Number of Storage Ports
  2. Speed of per Storage Ports
  3. Q-Depth of the Storage Ports

Additionally this module will also fetch the Q-Depth configured for each Host side adapter and the speed of it what it negotiates at.

 

Fan-Out Analysis & Alerting Module

This module will interface with the discovery module and fetch the information for analysis of the current situation. Once this module gets the Q-Depth per port of the existing Hosts, it will sum up for all hosts.

Also it will see which SP port each hosts is connected and maintain that in the table to do this calculation. Calculation is (queue-depth of array port) / (queue-depth of hba port of the attached servers).

That is, sum of queue depth of all initiators zoned with a single port is the Fan-Out ratio and that should not exceed the maximum Fan-Out for any given storage vendor.

**Many array vendors has their defined limit and best practice, so using this value a vSphere admin can easily check the compliance side of it and if needed redistribute the hosts per Array port.

For an example, if each port on the array has a queue depth of 1024 and each host has a queue depth of 128 (example #s) then you could have 8 hosts per port.

**Redistribution of the Host port mapping to Array port mapping is beyond this paper as that has to be done at the Fabric side. This module will just generate alert based on the current situation and the ratio.

 

Controlling Q-Depth Module

Once the Fan-Out ratio limit is over then it will be oversubscribed. However, over subscription is not that bad as it sounds. It needs to be analyzed with the current host load (Throughput) and then take certain action on it. Over subscription is not bad unless the host is saturated in terms of IOps and throughput.

Let us look at what is throughput. Most storage ports today have a maximum bandwidth of 4/8/16Gbps. The same rules for non-saturation of a port apply here as to queue depth and IOps: Add up the figures for each host on the port and the total should not exceed 4/8/16Gbps.

This module will take these parameters into account to set the host port Q-depth at the host side the moment it realizes that aggregation of the all hosts throughput exceeds and also there is a over subscription of the Fan-Out ratio. Few of the examples are as below:

  • For Emulex HBA = tgt_queue_depth
  • For Qlogic HBA = ql2xmaxqdepth
  • For Brocade HBA = bfa_lun_queue_depth

Total workload at the current Fan-Out ratio will be calculated and given back to the Alerting module as well to show it to the Admin to not to overrun either the queue depth or throughput of the Storage array port.

This module will analyze the current situation, and depend on the current situation, it will set the default setting for target port queue depth lower than what is configured on the most exhausted ESXi host to ensure that the total workload of all servers will not overrun the total queue depth and throughput of the target storage array port. In a nutshell it will show you the current situation and alerts you, but does not stop there. It will also go ahead and redistribute the load of the ESXi by slowing down the throttle using Q-Depth per HBA per ESXi host.

Basically this module will help vSphere Admin to limit the queue depth on a per-target basis. This recommendation comes from limiting the number of outstanding commands on a target (Storage Array Port), per ESXi host.

 

About Prasenjit Sarkar

Prasenjit Sarkar is a Product Manager at Oracle for their Public Cloud with primary focus on Cloud Strategy, Oracle Openstack, PaaS, Cloud Native Applications and API Platform. His primary focus is driving Oracle’s Cloud Computing business with commercial and public sector customers; helping to shape and deliver on a strategy to build broad use of Oracle’s Infrastructure as a Service (IaaS) and Platform as a Service (PaaS) offerings such as Compute, Storage, Java as a Service, and Database as a Service. He is also responsible for developing public/private cloud integration strategies, customer’s Cloud Computing architecture vision, future state architectures, and implementable architecture roadmaps in the context of the public, private, and hybrid cloud computing solutions Oracle can offer.

One thought on “Academic Research – Controlling Host Q-Depth based on Storage Fan-Out Ratio & Hosts Throughput Aggregation

  1. Pingback: Academic Research – Controlling Host Q-Depth based on Storage Fan-Out Ratio & Hosts Throughput Aggregation | Stretch Cloud – Technology Undressed |