US Patent – Assignment of Applications in a Virtual Machine Environment based on Data Access Pattern

Assignment of applications in a VM and then map it to a particular datastore is always troublesome because we need to consider so many things. That is just not from the physical storage side but also from VM or virtualization as a whole as well. This is another innovation (US Patent) that I did 2 years back and now its published. Take a look at this and see how did we place the VMs based on Application’s Data Access Pattern.

This is the second US Patent (Application Pending) which is now published and I am sharing this with you all.

Problem Statement

Today’s biggest challenge what we face is architecting applications which has different data access pattern. There are two, namely Random I/O and Sequential I/O.

If you put them together in a single Datastore knowingly or unknowingly then your performance will get a significant hit. But the problem is, how do you define or know an application whether it sends Random I/O or Sequential I/O.

Few things why putting Random I/O and Sequential I/O can be nightmare for vSphere Admins.

Every time you need to access a block on a disk drive, the disk actuator arm has to move the head to the correct track (the seek time), then the disk platter has to rotate to locate the correct sector (the rotational latency). This mechanical action takes time.

Obviously the amount of time depends on where the head was previously located and how fortunate you are with the location of the sector on the platter: if it’s directly under the head you do not need to wait, but if it just passed the head you have to wait for a complete revolution.

What about the next block? Well, if that next block is somewhere else on the disk, you will need to incur the same penalties of seek time and rotational latency. We call this type of operation a random I/O.

But if the next block happened to be located directly after the previous one on the same track, the disk head would encounter it immediately afterwards, incurring no wait time (i.e. no latency). This, of course, is a sequential I/O.

Now to make it more clear that how bad a Random I/O can be, let me show the IOps and Throughput relationship:

Throughput = IOPS x I/O size

It’s time to start thinking about that I/O size now. If we read or write a single random block in one second then the number of IOps is 1 and the I/O size is also 1 (I’m using a unit of “blocks” to keep things simple). The Throughput can therefore be calculated as (1 x 1) = 1 block / second.

Alternatively, if we wanted to read or write eight contiguous blocks from disk as a sequential operation then this again would only result in the number of IOps being 1, but this time the I/O size is 8. The throughput is therefore calculated as (1 x 8) = 8 blocks / second.

Hopefully you can see from this example the great benefit of sequential I/O on disk systems. It allows increased throughput. Every time you increase the I/O size you get a corresponding increase in throughput, while the IOps figure remains resolutely fixed.

So, in a nutshell, putting Random I/O Applications (VMs with Random I/O) with Sequential I/O Applications on the same datastore can be disastrous.

Solution Abstract

Techniques for assigning applications to datastores in a virtual machine environment are disclosed. In an embodiment, applications exhibiting different I/O data access patterns are assigned to datastores by collecting data related to the input-output operations performed by the applications, analysing the collected data to identify corresponding data access patterns, and assigning applications to datastores based on the identified data access patterns. In this way, applications can be segregated by data access pattern onto separate datastores. For example, random I/O apps and sequential I/O apps can be assigned to different datastores. Additionally, if random I/O apps are found to be co-mingled with sequential I/O apps on the same datastore, then data associated with the applications can be migrated as necessary to achieve segregation. In an embodiment, random I/O apps and sequential I/O apps are segregated onto datastores that rotate independent of each other.

About Prasenjit Sarkar

Prasenjit Sarkar is a Product Manager at Oracle for their Public Cloud with primary focus on Cloud Strategy, Oracle Openstack, PaaS, Cloud Native Applications and API Platform. His primary focus is driving Oracle’s Cloud Computing business with commercial and public sector customers; helping to shape and deliver on a strategy to build broad use of Oracle’s Infrastructure as a Service (IaaS) and Platform as a Service (PaaS) offerings such as Compute, Storage, Java as a Service, and Database as a Service. He is also responsible for developing public/private cloud integration strategies, customer’s Cloud Computing architecture vision, future state architectures, and implementable architecture roadmaps in the context of the public, private, and hybrid cloud computing solutions Oracle can offer.