vCOPs Backup & Disaster Recovery

Three days back I wrote an article on vCOPs Sizing and then my fellow friend Paul Meehan asked me this on Twitter?


Also as a follow up question I see this is being asked now.

What are the operational consideration for disaster recovery for vCOps. How would this work and what would you need to do to enable it ?

So this post is to answer these questions and helping every one who have used vCOPs for their vSphere Infrastructure monitoring and submitted that design for their VCDX defense.

Before we move forward with the backup and DR options for vCOPs, let me show you the vApp architecture of vCOPs. I am sure you have seen it millions of times by now, but good to show you here again to redraw the concept.

vCOPs Arch

Basically you see this is nothing but a combination of two VMs in a customized vApp that makes vCOPs vApp. That means we can use vSphere HA to protect these two VMs. vSphere HA is supported for vApps and virtual machines (UI VM and Analytics VM). If one host (with an affected virtual machine) goes down, it will be restarted on another host. If one of the virtual machines (for example, UI VM) is moved to another host, the inter-virtual machine communication might break. Yes you heard it right, it might break as it uses a OpenVPN connection in between them. However, its pretty easy to fix this, you need to power off and start the vApp to fix it.

Second point might come out when we use HA is FT. Can we use vSphere FT to protect this vApp? Simple answer, “No”. In vSphere, Virtual Symmetric Multiprocessing, multiprocessor or multicore, vSphere FT is not available. So when vCenter Operations Manager virtual machines use more than one vCPU, its not possible to use FT to protect it and make it highly available.

Now the question will turn it to VMware Site Recovery Manager (SRM). We know that the vCenter SRM is VM centric and not vApp aware. However, you can add vCenter Operations Manager to the list of protected virtual machines in a recovery plan. The inter-virtual machine communication (Open VPN) is likely to break when operating in your disaster recovery site. You can repair it using the auto-repair function of the administrator UI. Because vCOPs is not SRM aware, shadow virtual machines are going to be listed and you need to use collector user to restrict visibility. You will also experience an alert storm during failover.

The last method to check is the traditional VM backup mechanism. As of now VADP does not understand any vApp construct. You need to back up each individual virtual machine (UI virtual machine, Analytics virtual machine), then at the time of recovery, restore UI virtual machine and Analytics virtual machine and then repair vApp.

If you want to avoid a broken vApp at the time of restore from a backup, you need to deploy a fresh installation of the vCOPs Manager vApp that has the same major and minor release version that was backed up and must be restored. You need to perform a side-by-side upgrade from the backed-up vApp.

Note: VMware does not support or recommend application and file-based backup.


About Prasenjit Sarkar

Prasenjit Sarkar is a Product Manager at Oracle for their Public Cloud with primary focus on Cloud Strategy, Oracle Openstack, PaaS, Cloud Native Applications and API Platform. His primary focus is driving Oracle’s Cloud Computing business with commercial and public sector customers; helping to shape and deliver on a strategy to build broad use of Oracle’s Infrastructure as a Service (IaaS) and Platform as a Service (PaaS) offerings such as Compute, Storage, Java as a Service, and Database as a Service. He is also responsible for developing public/private cloud integration strategies, customer’s Cloud Computing architecture vision, future state architectures, and implementable architecture roadmaps in the context of the public, private, and hybrid cloud computing solutions Oracle can offer.