Automating Azure Instrumentation and Monitoring

One of the core data types that Azure Monitor works with is metrics – numerical pieces of data that represent the state of an Azure resource or of an application component at a specific point in time. Azure publishes built-in metrics for almost all Azure services, and these metrics are available for querying interactively as well as for use within alerts and other systems. In addition to the Azure-published metrics, we can also publish our own custom metrics. In this post we’ll discuss how to do this using both Azure Monitor’s recently announced support for custom metrics, and with Application Insights’ custom metrics features. We’ll start by looking at what metrics are and how they work.

This post is part of a series:

Part 1 provides an introduction to the series by describing why we should instrument our systems, outlines some of the major tools that Azure provides such as Azure Monitor, and argues why we should be adopting an ‘infrastructure as code’ mindset for our instrumentation and monitoring components.
Part 2 describes Azure Application Insights, including its proactive detection and alert features. It also outlines a pattern for deploying instrumentation components based on the requirements we might typically have for different environments, from short-lived development and test environments through to production.
Part 3 (this post) discusses how to publish custom metrics, both through Application Insights and to Azure Monitor. Custom metrics let us enrich the data that is available to our instrumentation components.
Part 4 covers the basics of alerts and metric alerts. Azure Monitor’s powerful alerting system is a big topic, and in this part we’ll discuss how it works overall, as well as how to get alerts for built-in and custom metrics.
Part 5 (coming soon) covers log alerts and resource health alerts, two other major types of alerts that Azure Monitor provides. Log alerts let us alert on information coming into Application Insights logs and Log Analytics workspaces, while resource health alerts us when Azure itself is having an issue that may result in downtime or degraded performance.
Part 6 (coming soon) describes dashboards. The Azure Portal has a great dashboard UI, and our instrumentation data can be made available as charts. Dashboards are also possible to automate, and I’ll show a few tips and tricks I’ve learned when doing this.
Part 7 (coming soon) covers availability tests, which let us proactively monitor our web applications for potential outages. We’ll discuss deploying and automating both single-step (ping) and multi-step availability tests.
Part 8 (coming soon) describes autoscale. While this isn’t exactly instrumentation in and of itself, autoscale is built on much of the same data used to drive alerts and dashboards, and autoscale rules can be automated as well.

Finally, part 9 (coming soon) covers exporting data to other systems. Azure Monitor metrics and log data can be automatically exported, as can Application Insights data, and the export rules can be exported and used from automation scripts.

What Are Metrics?

Metrics are pieces of numerical data. Each metric has both a value and a unit. Here are some example metrics:

Metrics can be captured either in their raw or aggregated form. An aggregated metric is a way of simplifying the metric across a given period of time. For example, consider a system that processes messages from a queue. We could count the number of messages processed by the system in two ways: we could adjust our count every time a message is processed, or we could check the number of messages on the queue every minute, and batch these into five-minute blocks. The latter is one example of an aggregated metric.

Because metrics are numerical in nature, they can be visualised in different ways. For example, a line chart might show the value of a metric over time.

Azure Monitor also supports adding dimensions to metrics. Dimensions are extra pieces of data that help to add context to a metric. For example, the Azure Monitor metric for the number of messages in a Service Bus namespace has the entity (queue or topic) name as a dimension. Queries and visualisations against this metric can then filter down to specific topics, can visualise each topic separately, or can roll up all topics/queues and show the total number of messages across the whole Service Bus namespace.

Azure Monitor Metrics

Azure Monitor currently has two metric systems.

Classic metrics were traditionally published by most Azure services. When we use the Azure Portal, the Metrics (Classic) page displays these metrics.
Near real-time metrics are the newer type of metrics, and Azure is moving to use this across all services. As their name suggests, these metrics get updated more frequently than classic metrics – where classic metrics might not appear for several minutes, near real-time metrics typically are available for querying within 1-2 minutes of being published, and sometimes much quicker than that. Additionally, near real-time metrics are the only metric type that supports dimensions; classic metrics do not. Custom metrics need to be published as near real-time metrics.

Over time, all Azure services will move to the near real-time metrics system. In the meantime, you can check whether a given Azure service is publishing to the classic or newer metric system by checking this page. In this post we’ll only be dealing with the newer (near real-time) metric system.

Custom Metrics

Almost all Azure services publish their own metrics in some form, although the usefulness and quality varies depending on the specific service. Core Azure services tend to have excellent support for metrics. Publishing of built-in metrics happens automatically and without any interaction on our part. The metrics are available for interactive querying through the portal and API, and for alerts and all of the other purposes we discussed in part 1 of this series.

There are some situations where the built-in metrics aren’t enough for our purposes. This commonly happens within our own applications. For example, if our application has components that process messages from a queue then it can be helpful to know how many messages are being processed per minute, how long each message takes to process, and how many messages are currently on the queues. These metrics can help us to understand the health of our system, to provision new workers to help to process messages more quickly, or to understand bottlenecks that our developers might need to investigate.

There are two ways that we can publish custom metrics into Azure Monitor.

Azure Monitor custom metrics, currently a preview service, provides an API for us to send metrics into Azure Monitor. We submit our metrics to an Azure resource, and the metrics are saved alongside the built-in metrics for that resource.
Application Insights also provides custom metrics. Our applications can create and publish metrics into Application Insights, and they are accessible by using Application Insights’ UI, and through the other places that we work with metrics. Although the core offering of publishing custom metrics into Application Insights is generally available, some specific features are in preview.

How do we choose which approach to use? Broadly speaking I’d generally suggest using Azure Monitor’s custom metrics API for publishing resource or infrastructure-level metrics – i.e. enriching the data that Azure itself publishes about a resource – and I’d suggest using Application Insights for publishing application-level metrics – i.e. metrics about our own application code.

Here’s a concrete example, again related to queue processing. If we have an application that processes queue messages, we’ll typically want instrumentation to understand how these queues and processors are behaving. If we’re using Service Bus queues or topics then we get a lot of instrumentation about our queues, including the number of messages that are currently on the queue. But if we’re using Azure Storage queues, we’re out of luck. Azure Storage queues don’t have the same metrics, and we don’t get the queue lengths from within Azure Monitor. This is an ideal use case for Azure Monitor’s custom metrics.

We may also want to understand how long it’s taking us to process each message – from the time it was submitted to the queue to the time it completed processing. This is often an important metric to ensure that our data is up-to-date and that users are having the best experience possible. Ultimately this comes down to how long our application is taking to perform its logic, and so this is an application-level concern and not an infrastructure-level concern. We’ll use Application Insights for this custom metric.

Let’s look at how we can write code to publish each of these metrics.

Publishing Custom Resource Metrics

In order to publish a custom resource metric we need to do the following:

Decide whether we will add dimensions.
Decide whether we will aggregate our metric’s value.
Authenticate and obtain an access token.
Send the metric data to the Azure Monitor metrics API.

Let’s look at each of these in turn, in the context of an example Azure Function app that we’ll use to send our custom metrics.

Adding Dimensions

As described above, dimensions let us add extra data to our metrics so that we can group and compare them. We can submit metrics to Azure Monitor with or without dimensions. If we want to include dimensions, we need to include two extra properties – dimNames specifies the names of the dimensions we want to add to the metric, and dimValues specifies the values of those dimensions. The order of the dimension names and values must match so that Azure Monitor can relate the value to its dimension name.

Aggregating Metrics

Metrics are typically queried in an aggregated form – for example, counting or averaging the values of metrics to get a picture of how things are going overall. When submitting custom metrics we can also choose to send our metric values in an aggregated form if we want. The main reasons we’d do this are:

To save cost. Azure Monitor custom metrics aren’t cheap when you use them at scale, and so pre-aggregating within our application means we don’t need to incur quite as high a cost since we aren’t sending as much raw data to Azure Monitor to ingest.
To reduce a very high volume of metrics. If we have a large number of metrics to report on, it will likely be much faster for us to send the aggregated metric to Azure Monitor rather than sending each individual metric.

However, it’s up to us – we can choose to send individual values if we want.

If we send aggregated metrics then we need to construct a JSON object to represent the metric as follows:

For example, let’s imagine we have recorded the following queue lengths (all times in UTC):

We might send the following pre-aggregated metrics in a single payload:

Azure Monitor would then be able to display the aggregated metrics for us when we query.

If we chose not to send the metrics in an aggregated form, we’d send the metrics across individual messages; here’s an example of the fourth message:

Security for Communicating with Azure Monitor

We need to obtain an access token when we want to communicate with Azure Monitor. When we use Azure Functions, we can make use of managed identities to simplify this process a lot. I won’t cover all the details of managed identities here, but the example ARM template for this post includes the creation and use of a managed identity. Once the function is created, it can use the following code to obtain a token that is valid for communication with Azure Monitor:

The second part of this process is authorising the function’s identity to write metrics to resources. This is done by using the standard Azure role-based access control system. The function’s identity needs to be granted the Monitoring Metrics Publisher role, which has been defined with the well-known role definition ID 3913510d-42f4-4e42-8a64-420c390055eb.

Sending Custom Metrics to Azure Monitor

Now we have our metric object and our access token ready, we can submit the metric object to Azure Monitor. The actual submission is fairly easy – we just perform a POST to a URL. However, the URL we submit to will be different depending on the resource’s location and resource ID, so we dynamically construct the URL as follows:

We might deploy into the West US 2 region, so an example URL might look like this: https://westus2.monitoring.azure.com/subscriptions/377f4fe1-340d-4c5d-86ae-ad795b5ac17d/resourceGroups/MyCustomMetrics/providers/Microsoft.Storage/storageAccounts/mystorageaccount/metrics

Currently Azure Monitor only supports a subset of Azure regions for custom metrics, but this list is likely to grow as the feature moves out of preview.

Here is the full C# Azure Function we use to send our custom metrics:

Testing our Function

I’ve provided an ARM template that you can deploy to test this:

Make sure to deploy this into a region that supports custom metrics, like West US 2.

Once you’ve deployed it, you can create some queues in the storage account (use the storage account that begins with q and not the one that begins with fn). Add some messages to the queues, and then run the function or wait for it to run automatically every five minutes.

Then you can check the metrics for the storage queue, making sure to change the metric namespace to queue processing:

You should see something like the following:

As of the time of writing (January 2019), there is a bug where Azure Storage custom metrics don’t display dimensions. This will hopefully be fixed soon.

Publishing Custom Application Metrics

Application Insights also allows for the publishing of custom metrics using its own SDK and APIs. These metrics can be queried through Azure Monitor in the same way as resource-level metrics. The process by which metrics are published into Application Insights is quite different to how Azure Monitor custom metrics are published, though.

The Microsoft documentation on Application Insights custom metrics is quite comprehensive, so rather than restate it here I will simply link to the important parts. I will focus on the C# SDK in this post.

To publish a custom metric to Application Insights you need an instance of the TelemetryClient class. In an Azure Functions app you can set the APPINSIGHTS_INSTRUMENTATIONKEY application setting – for example, within an ARM template – and then create an instance of TelemetryClient. The TelemetryClient will find the setting and will automatically configure itself to send telemetry to the correct place.

Once you have an instance of TelemetryClient, you can use the GetMetric().TrackValue() method to log a new metric value, which is then pre-aggregated and sent to Application Insights after a short delay. Dimensions can also be set using the same method. There are a number of overloads of this method that can be used to submit custom dimensions, too.

Note that as some features are in preview, they don’t work consistently yet – for example, at time of writing custom namespaces aren’t honoured correctly, but this should hopefully be resolved soon.

If you want to send raw metrics rather than pre-aggregated metrics, the legacy TrackMetric() method can be used, but Microsoft discourages its use and are deprecating it.

Here is some example Azure Function code that writes a random value to the My Test Metric metric:

And a full ARM template that deploys this is:

Summary

Custom metrics allow us to enrich our telemetry data with numerical values that can be aggregated and analysed, both manually through portal dashboards and APIs, and automatically using a variety of Azure Monitor features. We can publish custom metrics against any Azure resource by using the new custom metrics APIs, and we can also write application-level metrics to Application Insights.

In the next part of this series we will start to look at alerts, and will specifically look at metric alerts – one way to have Azure Monitor process the data for both built-in and custom metrics and alert us when things go awry.