vCD Elastic Allocation Pool FUD is back

 

Problem Statement

With Allocation Pool VDC and with 100% memory reservation, we still cannot use the entire VM memory reservation allocated. This is due to the “by design” problem of having VM memory overhead in vSphere. That means if you reserve 100% of memory for your Allocation Pool, you still cannot power on the last VM (most times).

 

Architectural Decision

Due to the “by design” fact of VM memory overhead, we cannot use the entire allocated memory and this will be solved by enabling Elastic Allocation Pool in the vCloud System level and then set a lower vCPU Speed value (260 MHz). This will allow VMs to use the entire allocated memory (100% guarantees) in the Org VDC.

 

Justification

VM memory overhead is calculated with so many moving targets like the model of the CPU in the ESXi host the VM will be running on, whether 3D is enabled for MKS, etc. So you cannot use the entire allocated memory at any point in time.

By selecting the Elastic VDC, we are overwriting this behavior and still not allowing more VMs to power on from what they have entitled to. Also Elastic VDC gives us an opportunity to set a custom vCPU speed and lowering the vCPU speed will allow you to deploy more vCPUs without being penalized. Without setting this flag, you cannot overcommit the vCPU which is really bad.

260MHz is the least vCPU speed we can set and thus this has been taken to allow system administrators to overcommit the vCPUs in a VDC with Allocation Pool.

 

Implication

Caveat around this is not having any memory reservation for any VMs. Due to the nature of OrgVDCs, it does not allow an Org Admin to set the resource reservation for the VMs (unlike Reservation Pool) and thus any VMs with Elasticity on will not have any reservation which will be marked as overkill for the customer’s high I/O VMs (like DB or Mail Server).

You can easily overwrite the resource reservation using the vSphere but that is not the intent. Hence, I would flag it as RISK as it will hamper customer’s VM performance for sure.

If we say we are reserving 100% memory and thus spawning the VMs will get equal memory and can’t oversubscribe the memory as the limit is still what the customer has bought, then also if there is a contention of memory within those VMs, I don’t have an option to prefer those VMs which are resource hungry. In a nutshell all of the VMs will get equal share.

Elastic-Shares

Equal shares will distribute the resource in a RP equally and thus there will not be any guarantee that a hungry VM can get more resource on demand.

 

Alternative

This issue is by design and well documented in this vCD allocation whitepaper.

One of the option to work around this is likely to over allocate resources to the customer but only reserve the amount they purchased.

Historically VM overhead ranges in between <=5% to 20%. Most configurations have an overhead of less than 5%, if you assume such you could over allocate resources by 5% but only reserve ~95%. The effect would be that the customer could consume up to the amount of vRAM they purchased and if they created VMs with low overhead (high vRAM allocations, low vCPU) they could possibly actually consume more than they “purchased”. In the case of a 20GHz/20GB purchase we would have to set the Allocation to 21GHz but set the reservation to 95%.

 

Why Does it Happen

Virtualization of memory resources has some associated overhead. ESXi virtual machines can incur two kinds of memory overhead:

  1. The additional time to access memory within a virtual machine.
  2. The extra space needed by the ESXi host for its own code and data structures, beyond the memory allocated to each virtual machine.

 

ESXi memory virtualization adds little time overhead to memory accesses. Because the processor’s paging hardware uses page tables (shadow page tables for software-based approach or nested page tables for hardware-assisted approach) directly, most memory accesses in the virtual machine can execute without address translation overhead.

 

The memory space overhead has two components:

  1. A fixed, system-wide overhead for the VMkernel.
  2. Additional overhead for each virtual machine.

 

Overhead memory includes space reserved for the virtual machine frame buffer and various virtualization data structures, such as shadow page tables. Overhead memory depends on the number of virtual CPUs and the configured memory for the guest operating system.

 

Now let me show you how it works by example.

Before creating an OrgvDC, I have changed the Allocation Pool settings to Elastic.

Elastic

 

Then I have created a vDC with Allocation Pool, 50% CPU guarantee and 100% Memory guarantee. Here vCPU speed has been set to 260 MHz.

 

Elastic-Allocation-Pool

 

Now I have created 10 VMs with 1 GB memory each. As per the Allocation Model, it should set the Memory reservation automatically to the configured VM memory which is 1 GB but due to elasticity, it did not do that. Individual VM memory reservation became 0 MB.

Elastic-Reservation

 

Also note that it has created an Expandable reservation and an Unlimited Memory allocation.

Elastic-Reservation-ResPool

 

 

However, this expandable reservation does not allow us to power on any further VM with any amount of memory. Yes I have even tried with 1MB memory VM and that also fails saying “Not enough memory resource available”.

 

Things should be done:

  1. While creating VDCs, in the vCD Side, You should “Make Allocation Pool Org VDCs Elastic”.
  2. Also, while creating VDCs, vCPU speed has to set to .26 GHz (260 MHz).

 

These two things should resolve this issue.

 

PS: This is only applicable to VMware vCD 5.1.2 as there VMware has changed the Allocation Pool and introduced this vCPU speed setting with Elasticity on.

If you want to read more about this change, refer to Tomas Fojta’s blog here.

 

About Prasenjit Sarkar

Prasenjit Sarkar is a Product Manager at Oracle for their Public Cloud with primary focus on Cloud Strategy, Cloud Native Applications and API Platform. His primary focus is driving Oracle’s Cloud Computing business with commercial and public sector customers; helping to shape and deliver on a strategy to build broad use of Oracle’s Infrastructure as a Service (IaaS) offerings such as Compute, Storage, Network & Database as a Service. He is also responsible for developing public/private cloud integration strategies, customer’s Cloud Computing architecture vision, future state architectures, and implementable architecture roadmaps in the context of the public, private, and hybrid cloud computing solutions Oracle can offer.

Leave a Reply