What most Service Providers are missing when providing Backup As A Service

As we are inching towards Hybrid Cloud solution model or even a fully dependent Public Cloud model, we are moving close to the point where we need solid BCDR plan in place. However, most of the Backup solution made today does not take care of the main pain point of a Service Provider and that is SLA. Before I move forward let me take you to the basics of the SLA model.

 

Recovery point objective

An RPO is the targeted maximum amount of time that can be tolerated between mirroring your data. Some enterprises may require an RPO of zero, meaning they cannot lose any data. This requires continuous, synchronous replication. Other enterprises may be able to tolerate data gaps of seconds, minutes, hours or even days if they have to revert to a secondary site.

 

Recovery time objective

An RTO is the target interval between when an application outage occurs and when the application must be back up and running. This includes the time it takes to detect the failure, prepare the backup site, initialize the failed application and perform any network configuration to reroute requests to the backup site. The lower the RTO, the shorter the time between disaster and recovery.

 

If you look at most of the Backup Vendors you will see during backup, all VMs for them get equal proportion of physical resource (CPU, Mem & Network). So, there is no way a Service Provider can provide more resources for a customer’s critical VM and get the backup done quickly based on SLA. However, in many customer environments, specially in Service Provider environment, SLA is the foremost thing. So what you see today is as follows.

 

What you see today

 

WhatYouSee

 

But what would you like to see is follows:

 

What we like to see

Future-State

 

In a SP environment, customers want to preserve their data. But not all of their. VMs are so precious to have strict backup SLA. That means some VMs can wait to be backed up. However, they want their VIP VMs to be backed up faster than their non VIP VMs. Problem is there is no way a SP can segregate different class of VMs and prioritize those VMs over the normal VMs.

 

Now I am presenting a solution that will solve this issue. A solution that can throttle the backup threads based on SLA level.

 

A Solution from 20K Feet: Class based Backup Profiles for Thread Throttling

 

BackupClass

 

Using a Backup Service Class we can define the priority of some VMs. This priority will be driven by SLA. This solution will provide a VM Backup Class Profile to the customer through VMware vCenter Server.

 

These profiles will hold the class information. Customers just need to attach a VM to a Backup Class Profile. Backup Controller will take the class and throttle that particular thread. So that the VIP VM with Gold Class profile will get more CPU, Memory and Network resource to get backup quickly.

 

ThreadThrottle

 

Today there is no way some one can solve this problem as there is no SLA driven class based backup profile concept available. What max other can do is to provide more resource to the backup controller to backup all of their VMs. However, they can’t assign more resources to a particular VM, either manually or automatically.

 

A typical workflow of the solution is as follows.

Backup-Thread-Throttling

 

A future work

We would also like to throw some light on the future work on this classic model. Something that a Backup Vendor also should do keeping SPs in mind.

 

Master-Slave Tagging

VM’s along with a priority tag of GOLD/SILVER/BRONZE, will also have a secondary tag to specify Master-Slave relationship. This Master-Slave tag will help the backup scheduler to backup the VM’s marked as Master first and then the related Slave VM’s.

 

Scenario :  There are multiple VM’s that needs to be backed up, amongst which there are 3 vm’s, namely DB-Master, DB-SLAVE1 & DB –SLAVE2. All these 3 vm’s are of GOLD class, i.e. have highest priority, however when backing up the scheduler will backup the DB-Master first and then the Slave’s. This will help in providing time for the slave vm for any data replication that is still in process. So as to facilitate getting master and slave backup at same state.

 

Smaller Payload gets processed first

This talks about case where two VM’s have same backup priority, but different expected time to completion. In such a case the scheduler will give more resources to the VM’s whose expected time to completion is less, So as to reduce the backup queue.

 

Scenario : VM-A & VM-B have HIGH backup priority, for them the backup job started at 11:00 AM. Now at 11:30 a new VM-C (having HIGH priority) joins the queue. Now in such a case, the scheduler before adding the VM-C to run queue, will check if the VM-A and VM-B are nearing completion, So as to make sure that the SLA for VM-A & VM-B backup job is intact.

 

About Prasenjit Sarkar

Prasenjit Sarkar is a Product Manager at Oracle for their Public Cloud with primary focus on Cloud Strategy, Oracle Openstack, PaaS, Cloud Native Applications and API Platform. His primary focus is driving Oracle’s Cloud Computing business with commercial and public sector customers; helping to shape and deliver on a strategy to build broad use of Oracle’s Infrastructure as a Service (IaaS) and Platform as a Service (PaaS) offerings such as Compute, Storage, Java as a Service, and Database as a Service. He is also responsible for developing public/private cloud integration strategies, customer’s Cloud Computing architecture vision, future state architectures, and implementable architecture roadmaps in the context of the public, private, and hybrid cloud computing solutions Oracle can offer.

One thought on “What most Service Providers are missing when providing Backup As A Service

  1. Pingback: BAAS for MSP using class of service | Lateral-IT