The Best Kubernetes Monitoring Tools

In this article, you’ll learn about the best Kubernetes performance monitoring tools that are currently on the market. Although there are a number of application performance monitoring solutions out there, this article covers the best options in terms of their key features, functionalities, ease of setup, and the support garnered from each of their respective communities. You’ve been trusted with the responsibility of managing your organization’s production Kubernetes clusters. Not only is it important that your Kubernetes clusters remain available, but also, all of the applications hosted on those clusters should remain intact, functional, and readily available. If the platform falls over, the containerized applications will likely fail as well. This is why it’s so important to understand how your Kubernetes clusters are going to be used and how your applications are deployed on the platform. The question now is, how do you know (with certainty) that the cluster is operational? Is the cluster healthy and ready to host applications? How do you ensure that the cluster has enough capacity to support all of its applications? How will you know if one of your hosted containerized applications fails? As you can see, it’s incredibly important to proactively manage your Kubernetes clusters, which is the point of Kubernetes monitoring. However, monitoring Kubernetes can be difficult. It’s designed as a complex, distributed solution; Kubernetes clusters can (and most likely will) run across multiple nodes, sometimes in hybrid cloud scenarios. When something goes wrong, it may involve you digging through the logs of several systems and/or services in order to fully investigate your issue(s). It’s imperative that you integrate a good monitoring stack with your Kubernetes clusters. It allows you to gain the visibility that you need to handle any issues that arise and, in some cases, prevent them from ever happening at all.

Prometheus and Grafana

One of the industry’s favorite Kubernetes open-source monitoring applications is Prometheus. It’s an open source monitoring tool originally created by the developers at SoundCloud and is mainly written in Go. Prometheus has a number of features that make it a standout:

It uses a powerful multidimensional data model to store all of its metrics data.
It has its own functional query language called PromQL that lets users query time series data in real time.
It uses a pull model over HTTP to collect time series metrics data.
It uses a feature called a Pushgateway that lets you push time series metrics data to it in cases where data cannot be scraped or pulled.

It’s a great option if collecting time series data is a requirement for your organization, and it’s well suited for monitoring nodes alongside service-oriented architectures. Prometheus doesn’t require any agents or applications to be installed on your server fleet in order to collect data, and it leverages a component called AlertManager, which manages alerts and sends notifications via email, on-call systems, and/or group collaboration tools like Slack. In terms of being extensible, Prometheus has client libraries that will allow you to create custom applications with some of your favorite programming languages. It’s often integrated with Grafana to provide a data visualization layer for observability. It also supports third-party exporters that lets users export data from third-party databases (for example) and convert them into Prometheus metrics. Prometheus can be installed using the precompiled macOS, Linux, and Windows binaries. You can also build and install the components of Prometheus from source, though one of the most commonly used installation options is to run Prometheus on Docker. For more information about Prometheus, check out their documentation page.

ContainIQ

Another great option for monitoring your Kubernetes clusters with ease is ContainIQ. ContainIQ offers you a fully featured monitoring solution that is focused on instantly monitoring Kubernetes. Their goal is to ensure that the installation of the product is quick and that your monitoring solution is ready to go right out of the box, complete with prebuilt dashboards that quickly integrate with your Kubernetes clusters. ContainIQ has three features that make it a great monitoring option:

a prebuilt service latency dashboard for Kubernetes
a prebuilt events dashboard for Kubernetes
a prebuilt pod and node metrics dashboard for Kubernetes
integrates with fluentd for log collection and visualization

ContainIQ helps keep administrators and site reliability engineers focused on their core competency instead of investing valuable time and effort into your monitoring solution. You can utilize Helm charts to install the product or a .yaml file that’s provided. ContainIQ uses multiple agents to collect data: one is installed as a single replica deployment that is responsible for collecting metrics directly from the Kubernetes API, and DaemonSets used to collect logs as well as latency data from all pods on a given node through eBPF. For more information on ContainIQ, visit their documentation page.

WeaveScope

Weave Scope, developed by Weaveworks, is considered both a monitoring and visualization tool for Docker containers and Kubernetes clusters and has a comprehensive list of features, including the following:

the ability to build topologies of your applications and infrastructure
using views that let you filter your visualizations to reflect the various Kubernetes components
leveraging a graphical overview mode and a more detailed table mode to display metrics on Kubernetes components
a powerful search tool to quickly pinpoint Kubernetes resources
supporting managing containers from the Scope browser itself
a plug-in API used for creating custom metrics

Weave Scope can be installed as a stand-alone solution, or you can use Weave Cloud, their hosted solution. It can be installed with Docker in single-node or cluster scenarios or using Docker Compose. It’s also supported across a number of container orchestration platforms like OpenShift, Amazon ECS, minimesos, and Mesosphere DC/OS. For more information on Weave Scope, make sure you check out their documentation page.

New Relic and Pixie

New Relic offers a full-stack analysis platform called New Relic One. With New Relic One, you have an entire list of monitoring and analysis capabilities, including the following:

application performance monitoring
code debugging
Kubernetes monitoring with Pixie
machine learning monitoring
log management
error tracking
infrastructure monitoring
network monitoring
browser monitoring

It supports a host of technologies as part of its monitoring platform, and its most recent acquisition of Pixie is particularly important. Pixie is a Kubernetes-based observability tool that collects data without having to install any integrations or language agents. It’s supported on several cloud-hosted Kubernetes services, like Amazon EKS, Google Kubernetes Engine (GKS), Microsoft Azure Kubernetes Service, Red Hat OpenShift, and Pivotal Container Service. Pixie has the ability of automatically collecting metrics, events, logs, and traces from your Kubernetes clusters using Extended Berkeley Packet Filters (eBPF). eBPF is a Linux kernel technology that’s used to trace user-space processes. It’s used by Pixie to enable observability at the kernel layer, which offers users a deeper visibility into their Kubernetes clusters. New Relic’s Pixie also provides you with a number of monitoring instrumentations that make up its full suite of features. Users have the option of installing any or all of the instrumentations as they see fit. These include the following:

Kubernetes infrastructure for system-level metrics
Kubernetes events for cluster-related events
Prometheus metrics to support data coming from Prometheus endpoints
Kubernetes logs for control-plane data
application performance for code-level metrics
network performance monitoring for DNS, network mapping, flow graphs, etc.

For each of the instrumentations provided, there are two installation options: a guided install and the manual setup. The guided install can be used for a faster implementation, and the manual install option that utilizes Helm offers a more customized installation process. To find out more about Pixie and the rest of the monitoring features that the New Relic platform has to offer, you can find out more in their Kubernetes monitoring guide.

DataDog

DataDog collects data using what’s called the Datadog Agent. Installed on each node in your cluster, the DataDog Agent collects metrics, traces, and logs from each of the nodes in your cluster, giving you a way to monitor the overall health of your nodes. In addition, DataDog has an additional tool called the DataDog Cluster Agent that provides additional benefits, like monitoring Kubernetes at a slightly higher abstraction layer: the cluster level. Deploying the cluster agent is fairly straightforward; the install uses a manifest, and a Kubernetes Deployment and Service is deployed for the Cluster Agent. Once your cluster agent and node-based agents are deployed, metrics from your cluster will begin to stream to DataDog. It also includes a built-in handy dashboard for visualizing your Kubernetes metrics. DataDog has a very healthy developer community, working on a number of product enhancements, things like community contributed APIs and DogStatsD client libraries. In fact, DataDog has a very active GitHub account and they also hold regularly scheduled office hours for their open-source contributors. For a list of their open-source projects, check out DataDog’s GitHub profile. For more information about monitoring Kubernetes using DataDog, check out this Monitoring in the Kubernetes Era article on their website.

Dynatrace

Dynatrace offers world-class observability tools and it states that their solution is the only one that offers full stack observability without any significant changes to your cluster. It offers what they characterize as advanced observability through its monitoring solution, with features like:

Cluster resource utilization
Pods and workload overviews
Monitoring from the application layer down to the host infrastructure

To setup Kubernetes monitoring with Dynatrace, you’ll need to install and configure the Dynatrace Operator tool. It can be installed two ways: automated or manually. Once you’ve set it up and created the custom Dynatrace resource, the operator automatically deploys a number of additional resources, based on the configuration options that you choose. These options include: classic FullStack, hostMonitoring, and applicationMonitoring. Dynatrace has an active support community, with over 45,000 members. The Dynatrace community can help you find answers to questions, but they also use their community to educate, provide updates, and to post news and updates about their products. It also has a well established GitHub page, featuring a number of their open-source projects. For more information about monitoring Kubernetes using Dynatrace, have a look at their Kubernetes monitoring documentation.

Final Thoughts

In this article, you’ve learned about the industry’s best options for monitoring your Kubernetes clusters. Each of these products has distinct features that make them unique, but all of them perform data collection, support data visualization, and provide a means to be alerted when issues arise. There are other monitoring tools out there, but what’s most important is choosing the right monitoring solution that will aid you in managing and maintaining your containerized infrastructure. More and more, applications are containerized, and applications are designed using microservice architectures. This ultimately means that the use of your container orchestration platform(s) will only grow. Monitoring and observability are two of the sciences behind understanding the behaviors of your containerized applications, the orchestration platforms themselves, and even the physical nodes that host these solutions. You’ve got a choice to make; make it count!