Storage vMotion should validate VM’s IOps demand supply capability – Feature Request

I am not sure how many of you must have thought about this but when we have SDRS in place or even without that, when we manually migrate a VM from one Datastore to another Datastore, what we lack is VM’s IOps demand supply capability validation.

 

supply-and-demand-image

 

Profile Driven Storage, VASA and Storage vMotion today has a limitation and that is not able to check the current demand supply relationship.

To categorize the storage or is able to analyze the IOps availability and demand and other behavior of the storage array, using VASA integration, and change the previous categorization over the time is what we are lacking in today’s implementation.

Right now VASA is simply reporting. We need to build our storage profiles on what is being reported by the storage. So when we do storage vMotion within a storage cluster or outside a cluster, we don’t bother to check whether moving this VM to another Datastore will decrease it’s performance, in terms of having enough IOps supply from the next Datastore.

So today there is no guarantee that when we move the VM to another Datastore, it will keep on functioning the same way it is doing right now.

Now the obvious question is, how could you do that? Well there is a way to do that. Though I am prescribing a high-level implementation.

 

We can solve this problem in three modules one being enhancement to the existing API (VASA).

 

VASA enhancement
*************************

Change the VASA logic to get the storage IOps supply from storage back (Datastore) and report it on the capability.

 

IOps Demand-Supply Module
************************************

Once VASA shows the IOps availability, this module can send API calls to fetch that and do continuous monitoring of the backend to see what is being consumed and what is left on a particular datastore.

This module knows how much is the capacity in the Datastore and that shows using VASA. So this module will monitor the Datastores to see how many VMs are getting powered on and how much IOps are being consumed by each one of them and add that to see how much is left to be consumed.

IOps Compliance Module
*******************************

Now as the Demand-Supply module learns that what is being used and what is left, compliance module will come into play and during any Storage vMotion operation this module will check whether moving this VM will fetch the same amount of performing IOps or not in the next Datastore.

This module’s job is to see what was the last hour trend for this VM’s IOps requirement and based on that see whether that much of IOps is available or not in the target Datastore or in Datastore Cluster. If there is no compliant Datastore then it will generate warning and will show the user saying that “moving this VM to the target datastore or datastore cluster may degrade your VM’s performance”

Also during the SDRS automated operation, it will send out a trap back to the SDRS movement engine showing that moving to other datastore due to space issue may degarde VM’s performance. SDRS will take decision based on cost of the movement then over space and IOps requirement and move the VM to the most IOps available datastore.

Once the movement is done, it will send the trap back to the demand supply module to refresh it’s DB and get the latest demand supply situation.

This is just not the end of the job for this compliance module. It will also ask demand-supply module to continuously monitor if there is any datastore which has more demand and lesser supply. The moment another Datastores get added or some Datastores frees up or shows the IOps usage trend down, it will automatically move the first VM who started starving for IOps in to the next datastore where availability is there.

One thought on “Storage vMotion should validate VM’s IOps demand supply capability – Feature Request

  1. I generally really like your feature request. But it is good to mention that this feature is not usable on modern storage systems leveraging disk pooling and sometimes even automated storage tiering (aka AST) because svMotion moves based on performance recommendations should be avoided as it introduce another IO workload on storage backend without any real benefit because same spindles are used. BTW I really like disk pooling and I’m not big fan of AST but that’s another topic.

    However your feature should be very helpful during svMotions between different storage systems and/or different disk pools inside modern storage systems.

    The most important and interesting is implementation of such feature. Your implementation proposal is IMHO over architected, very complex, storage vendor dependent and non-deterministic. Number of IOPS is very difficult to predict because it is a function of several variables like block size, read/write ratio, randomness, queue management, etc … number of IOPS can be used as estimated storage quantity parameter but we would need storage quality parameter which is IO response time. So if we will choose response time as a metric indicating if target datastore is able to handle our workload or not why not leverage already existing SIOC (Storage IO Control) normalized latency? Advantage of this approach is storage independency and reusability of already existing VMware vSphere unique feature (SIOC). And I believe the implementation would be significantly simpler.

    What do you thing about it. Do you see any issues with such implementation?

    David.

Leave a Reply