Whether to use Ephemeral or not

I am starting this discussion point to see how we are seeing the below challenge. It smokes up a bit as of now and different set of mind have their different opinion. I am just trying to facilitate this discussion and putting my views and opinion. Below is a complete summary:

 

Problem Summary

There are lot of discussions went on previously on use of vSS or vDS so I will not go ahead with the discussion (for more information read Duncan Epping’s Blog). So as per my opinion I would like to go ahead with pure vDS and not on a Hybrid model just for taking advantage of control layer segregated from vCenter. Now few days back we got to know about a nice article which talks about The Secret of Ephemeral Port Groups. Now there are counter discussions also started that do we really change our infrastructure while the vCenter is down. Yeah I know our first priority becomes making the vCenter up and running as until then we lose lot of functionality (including vMotion, DRS etc.). But, in order to recover from a minor catastrophe, it may necessary to manually register that VM and get it online which we can’t if the VM is down.

 

OK so now the conclusion is just to avoid the emergency situation we can simply create a few Ephemeral Port Groups to have as backups for accessing the most critical VLANs.

 

But this comes at a cost and that is exposing your infrastructure to a potential Risk. The Risk is that if you configure ephemeral port binding your network will be less secure. Anybody who will gain host access can create rogue virtual machine and place it on the network and then you will end up having a DoS Attack. So now even if you want to avoid some minor catastrophe and have a few Ephemeral Ports created but you will end up landing onto a known potential Risk.

 

Mitigation of Risk

Now everything in this world obviously comes at a premium, even this email thread also comes at a premium J (my time, laptop power, network bandwidth and mail box space). So I thought to plan it in such a way which will mitigate this Risk but at the same time it will imply some constraint and that is manpower efficiency and cost. Here is what I am proposing.

First let’s assume that we have sufficient knowledge power and IT budget. So I am proposing my solution as to deploy Cisco Nexus 1000v vDS. This will bypass the total Control layer from vCenter and will place it onto the VSM (Virtual Supervisor Module) VM. So even if the vCenter is down but we can still control the functionality of the Cisco Nexus 1000v vDS through the VSM.

 

Having said that again there are some limitation of this approach. Still this approach need the Port Group to be pre provisioned. That means we need to create the Ephemeral Port Group when the vCenter is up and running. If the vCenter is down even if you create a Port-Profile (aka Port Group), it will not be able to push the configuration to the vCenter database and thus you will end up seeing this new Port Group in ESX Server. So that means again we are back to square one.

 

Now I started thinking about the Miscellaneous option in standard vDS, which will give us an option of Block All Ports to YES. So I started thinking about the same line and was wondering if we can do that in Cisco Nexus 1000v (manually from vCenter you can’t change the settings of either the port group or the vDS itself of a Cisco Nexus 1000v). So you need to do this from VSM command line. Now I did some test and here is the result.

 

I have created two access list on VSM. One will block all ICMP Echo request and one will allow all ICMP Echo request. Now I have pre provisioned one Ephemeral Port Group. Command are as below:

 

N1KV-Primary(config)# port-profile type vethernet Backup

N1KV-Primary(config-port-prof)# vmware port-group

N1KV-Primary(config-port-prof)# port-binding ephemeral

N1KV-Primary(config-port-prof)# switchport mode access

N1KV-Primary(config-port-prof)# switchport access vlan 1

N1KV-Primary(config-port-prof)# no shut

N1KV-Primary(config-port-prof)# state enabled

 

Above commands basically created a Port Group of type Ephemeral and pushed it to vCenter. Now I have blocked only ICMP Echo Responses on this Port Group and that too for the Outside interface that means VM will be reachable but from VM we won’t be able to ping any ip address. IP Access list created as below:

 

N1KV-Primary(config)# ip access-list no_icmp

N1KV-Primary(config)# deny icmp any any

N1KV-Primary(config)# permit ip any any

N1KV-Primary(config)# exit

N1KV-Primary(config)# ip access-list yes_icmp

N1KV-Primary(config)# permit icmp any any

N1KV-Primary(config)# permit ip any any

N1KV-Primary(config)# exit

 

Now below command will block all ICMP Traffic on the pre provisioned Ephemeral Port Group called Backup.

 

N1KV-Primary(config)# port-profile type vethernet Backup

N1KV-Primary(config-port-prof)# ip port access-group no_icmp out

 

So now we have Powered off the vCenter so that we lost control over the network. After this I have connected to a ESX directly using vSphere Client. Now I have created a fresh new VM and connected my NIC port to this new Ephemeral Port Group. It will look like as below:

 

Look at the highlighted section and it will look like this, once you power it on then only it will give it a Port Number from vCenter side. But anyway that does not matter now. So now we got the connectivity from this VM except that we can’t ping any ip address as we have placed an Access List onto this Port Group.

 

But our prove point will be to override this as per the port level. Cisco Nexus 1000v is an awesome network appliance and NX-OS is the heart of it. Now we can see which Virtual Interface this new VM has been connected to internally from VSM. Command and output is as below:

 

N1KV-Primary(config)# show interface virtual

 

——————————————————————————-

Port        Adapter        Owner                    Mod Host

——————————————————————————-

Veth1       vswif0         VMware Service Console   3   X.X.X.X

Veth2       vswif0         VMware Service Console   5   X.X.X.X

Veth3       vswif0         VMware Service Console   4   X.X.X.X

Veth4       vswif0         VMware Service Console   6   X.X.X.X

Veth5       Net Adapter 1  N1KV-Secondary           5   X.X.X.X

Veth6       Net Adapter 2  N1KV-Secondary           5   X.X.X.X

Veth7       Net Adapter 3  N1KV-Secondary           5   X.X.X.X

Veth8       Net Adapter 1  VC-Jit                   6   X.X.X.X

Veth9                      VMware-vCenter-Server-Ap 3   X.X.X.X

Veth10      Net Adapter 1  N1KV-Primary             4   X.X.X.X

Veth11      Net Adapter 2  N1KV-Primary             4   X.X.X.X

Veth12      Net Adapter 3  N1KV-Primary             4   X.X.X.X

Veth13      vmk0           VMware VMkernel          4   X.X.X.X

Veth14      vmk0           VMware VMkernel          6   X.X.X.X

Veth15      vmk0           VMware VMkernel          5   X.X.X.X

Veth16      vmk0           VMware VMkernel          3   X.X.X.X

Veth17      vmk1           VMware VMkernel          4   X.X.X.X

Veth18      vmk1           VMware VMkernel          5   X.X.X.X

Veth19      vmk1           VMware VMkernel          3   X.X.X.X

Veth20      vmk1           VMware VMkernel          6   X.X.X.X

Veth22      Net Adapter 1  XP-SP2-32Bit Rehearsal   5   X.X.X.X

 

Now look at the highlighted output text. It says that it has been connected to vEthernet port 22. So my intention is to place the anti-access rule on this port which will override the Port Group wise security and will allow the VM to ping the outside ip address. Below is the command which will do that part.

 

N1KV-Primary(config)# interface vethernet 22

N1KV-Primary(config-if)# ip port access-group yes_icmp out

 

Now my VM started pinging the outside ip address. So what is the moral of our testing. Moral of this testing shows that we can still Pre Provision some Ephemeral Port Group to bypass the risk of losing the ability to provision few VM at the time of minor catastrophe, at the same time placing an access list to  block all ports Port Group wise will keep us away from having it a medium of DoS or similar attack. But as I said it will come at a premium which are Cost, IT Staff Availability and Network Team’s Competency. But still remember I am not bypassing the fact that it should be considered at extreme situation while still our focus would be onto making the vCenter up and running.

 

I would like to hear about your thoughts on this and feedback on this testing or on my thoughts would be much helpful for our future direction.

About Prasenjit Sarkar

Prasenjit Sarkar is a Product Manager at Oracle for their Public Cloud with primary focus on Cloud Strategy, Cloud Native Applications and API Platform. His primary focus is driving Oracle’s Cloud Computing business with commercial and public sector customers; helping to shape and deliver on a strategy to build broad use of Oracle’s Infrastructure as a Service (IaaS) offerings such as Compute, Storage, Network & Database as a Service. He is also responsible for developing public/private cloud integration strategies, customer’s Cloud Computing architecture vision, future state architectures, and implementable architecture roadmaps in the context of the public, private, and hybrid cloud computing solutions Oracle can offer.

Leave a Reply