Designing a custom Storage Performance Dashboard using vCOPs

In the last couple of days, I am fully involved in creating some kind of custom dashboards for my environment, which will help me in deep understanding/troubleshooting of our EMC VNX storage. Well, we have a default custom dashboard and that is called Datastore Performance. However, it does not give you the flexibility to get what you are looking for in a nice way. I thought to take up this opportunity and design a custom dashboard that will fit my purpose.

What I wanted is a heat map dashboard with a health tree, mashup charts and metric graph (rolling view). This is not there in the default storage performance dashboard. Before I roll on to show you how do you create this dashboard, I need to tell you what are the kinds of widget that I am using and what are their purpose built in.

Heat Map

In most cases, you can select only from internally generated attributes that describe the general operation of the resources, such as health or the active anomaly count. When you select a single resource kind, you can select a metric for that resource kind.

The Health Map widget has a General mode and an Instance mode:

General mode:

The widget displays a colored rectangle for each selected resource. The size of the rectangle indicates the value of one selected attribute. The color of the rectangle indicates the value of another selected attribute.

Instance mode:

Each rectangle represents a single instance of the selected metric for a resource. A resource can have multiple instances of the same metric. The rectangles are all the same size. The color of the rectangles varies based on the instance value. You can use instance mode only if you select a single resource kind.

In either mode, you can group the rectangles according to tag type and select the color range to use. By default, green indicates a low value and red indicates the high end of the value range.

When you point to the rectangle for a resource, the widget displays the resource’s name, group-by values, and the current values of the two tracked attributes. You can click Show Sparkline in the pop-up window to review a small sparkline of the tracked metric by the heat map color.

HeatMap

Metric Graph (Rolling View)

This widget is similar to the Metric Graph widget. Additional metrics are in a slider at the bottom of that widget and can directly be selected. The administrator can configure an interval, and all the configured graphs are rotating.

The Metric Graph (Rolling View) widget displays a full chart for one selected metric at a time. Miniature graphs for the other selected metrics appear at the bottom of the widget. You can click a miniature graph to examine the full graph for that metric, or set the widget to rotate through all selected metrics at an interval that you define. The key in the graph indicates the maximum and minimum points on the line chart.

Metric Graph

 

Metric Sparklines

The Metric Sparklines widget displays simple graphs that contain the values of selected metrics over time and provides a quick view of the trends in KPIs. You can select every metric from an object and combine it in a single view. You can select super metrics.

The Metric Sparklines widget displays simple graphs that contain the values of selected metrics over time and provides a quick view of the trends in KPIs.

If all of the metrics in the widget are for a resource that another widget provides, the resource name appears at the top right of the widget.

SparkLines

Mashup Chart 

This chart displays different aspects of the behavior of a selected resource. Mashup Charts widget contains the following charts:
  • Health chart for the resource. A Health chart for the resource, which can include each alert for the specified time period. Click an alert to review more information, or double-click an alert to open the Alert Summary page
  • Anomaly count graph for the resource. An Anomaly Count Graph for the resource, which is similar to the anomaly graph that the cross-silo analysis feature generates. The graph displays the number of anomalies for the resource and its children at the indicated time. For an application, it also displays the count for each tier in a stacked chart. A red line marks the noise threshold for the resource. An anomaly count higher than this threshold indicates a 90 percent probability of a problem and triggers an early warning alert
  • KPI metric graph listed as a root cause resource. Metric graphs for any or all of the KPIs for resources listed as a root cause resource. For an application, this chart displays the application and tiers that contain root causes. You can select the KPI to include by selecting Chart Controls > KPIs on the widget tool bar. The shared areas on a graph indicate that the KPI violated its threshold during that time period. Click the top left of the shaded area to examine details about the anomaly.

MashupCharts

 

Health Tree

This widget displays the section of your resource hierarchy around the selected resource. The Health Tree widget displays all of the parent containers that hold the resource. When you select a container resource, the widget displays all of the child resources that the container holds.

Unless you are in Pan or Zoom the view mode, you can point to a resource to display its name and current health. You can double-click a resource to shift the display to display its parents and children.

If you configure a filter tag for the Health Tree widget, only the parent and child resources that match the tag appear in the widget. You can double-click a resource to turn off the filter and display all of its parents and children.

Health-Tree

 

So, now as you are aware of all the types of widgets I have used to create this dashboard, let me show you what is the use case of it and how do you create this.

Most of the time our vSphere Admin lack of some crucial storage performance metrics and those are IOps and Latency. There are many of our vSphere features works on these lines, for example, Storage IO Control, Storage DRS etc. If we can group the datastores and show the heat map around these three metrics that would be really nice.

To get all datastores grouped into a single view, I have used Heat Map widget. To get the relationship established in between and a VM and a Datastore interactively, I have used Health Tree widget. Also we were interested in doing a deep dive of the metrics, so I have used Metric Graph with Mashup Chart. In this dashboard we will scope in three things, Read IOps, Write IOps and Latency on each datastore.

Lets get back to the dashboard creation now.

  • Login to the vCOPs custom dashboard
  • On the Home Screen, click the small plus tab to the right of the last tab
  • Click the Create Dashboard Using Widgets button in the upper-left corner of the left pane.
  • Select the Heat Map widget from the left pane and drag it to the right pane
  • Select the Health Tree widget from the left pane and drag it to the right pane    
  • Select the Metric Graph widget from the left pane and drag it to the right pane
  • Select the Mashup Charts widget from the left pane and drag it to the right pane
  • Type a name in the Tab Name text box.
  • Click on OK. Final screen should look like this.

Dashboard

  • Click the Edit Widget toolbar button on the Heat Map.
  • Provide a name of the widget title.
  • Select Datastore from the Resource Kinds drop-down menu.
  • Select Datacenter from the Group By drop-down menu.
  • Select Cluster Compute Resource from the Then By drop-down menu.
  • Select Datastore > Disk Command Latency in the Color By pane.
  • In the color picker, type 0 for the minimum value and 20 for the maximum value. I think for my environment I am expecting this range to be around 20, so I choose 20 as the max.

HeatMapWidget

  • Click the Capture new Configuration button next to the Configuration drop-down menu.
  • Type Storage Total Latency in the configuration box and click OK.
  • Click the Update selected configuration button .
  • Select Datastore > Max observed Reads per second.
  • Delete the values configured in the color picker.
  • Click the Capture new Configuration button next to the Configuration drop-down menu.
  • Type Max Read IOps in the configuration box and click the Update selected configuration button.
  • Select Datastore > Max observed Writes per second.
  • Click the Capture new Configuration button next to the Configuration drop-down menu.
  • Type Max Write IOps in the configuration box and click OK.
  • Click the Update selected configuration button.
  • Click OK.
  • Click Interactions from the dashboard toolbar.
  • Select Storage Heat Map as the providing widget for the Health Tree widget.
  • Select Storage Heat Map as the providing widget for the Metric Graph widget from the lower drop-down menu.
  • Select Health Tree as the providing widget for the Mashup Charts widget.

WidgetInteractions

  • Click OK to close the interactions window.

Final screen after all of the steps are done should look like this.

StorageDashboard

 

I am sharing the Dashboard with you that I have made. Just download this file and import it on your Custom Dashboard.

In the next post I will talk about some Super Metrics and creating a Application centric dashboard. Till then happy Dash Boarding :-)

 

4 thoughts on “Designing a custom Storage Performance Dashboard using vCOPs

  1. Hi Prasenjit. Great post as always! Thanks for sharing vCOPS dashboard with community. I’ll definitely test your dashboard but as far as I know vCOPS doesn’t know SIOC Normalized latency metrics. I’ve read somewhere that SIOC metricses are available with vCOPS + hyperic but never test it. I’m sure I don’t need to explain to you the difference between datastore latencies and SIOC normalized latencies. I would like to share with you that SIOC Normalized latency is the only metric which helps me with storage performance analysis because it analysis only IOs with “normal IO size”. I believe the “Normal IO size” is between 2k-16k but I cannot find more details. Anyway the important thing is that SIOC normalized latency is very close to storage array front-end port latency and that is what you need for performance troubleshooting or performance planning.

    Do you know something more about SIOC metrics and vCOPS?

  2. Pingback: Designing a custom Storage Performance Dashboard using vCOPs (Stretch Cloud) | NMS Test

  3. Pingback: The Scoop – April Edition | vmnick

Leave a Reply