Your guide to Kubernetes: Important considerations for monitoring system health

Tim Dorr
CTO and Founder

As we continue to explore what it takes to get started with Kubernetes, next up on the list is monitoring.

Good monitoring is essential for any application. It’s your window into what’s going on in the system at any point in time and the only way to ensure everything is running properly for your end users.

While you’re likely well-versed in what monitoring typically entails, it’s important to understand how monitoring plays out in Kubernetes, as there are certain considerations you need to take into account.

4 important areas for monitoring in Kubernetes

Once you have your application setup in Kubernetes and you’re ready to go live, it’s important to make sure you have the proper monitoring in place to keep tabs on system health and performance. As you do so, be sure to consider the following areas of interest that are specific to Kubernetes:

1) Monitoring the health of the servers on which you run Kubernetes

First, you need to monitor the health of the actual servers on which you run Kubernetes, including CPU time, amount of memory used, network traffic and disc usage. This monitoring sits entirely in the purview of DevOps as it typically tracks lower level metrics that are available exclusively to operations teams. Importantly, this server monitoring should be your first line of defense that indicates something is wrong.

2) Monitoring Kubernetes itself

Next, it’s important to keep close tabs on Kubernetes itself. Specifically, you should monitor the health of your clusters, Nodes, Deployments and Pods as well as the amount of resources they use. This monitoring still sits in the realm of your operations team, but your developers may start to have some involvement when it comes to things like how much memory a piece of the application in any given Pod might use.

One common practice to enable Kubernetes monitoring is to set up a Horizontal Pod Auto-scaler, which is a built-in mechanism to scale your workloads inside of Kubernetes up or down based on a designated metric. This might be as simple as something that says “if this one system is scaled up, this other system needs to scale up to match it.” The Horizontal Pod Auto-scaler can also pull in external metrics and report on higher level concepts like where network traffic is coming from to provide more granularity on data than you can typically get at the server level.

3) Monitoring your application

Once you have tabs on your servers and Kubernetes, it’s time to look closely at your application to make sure the code your developers write runs properly. There are a number of ways to accomplish this monitoring, many of which are relatively low touch. For instance, you can install an additional library that adds monitoring to the application or go through your full application code to add in specific collection points for important metrics. And this monitoring can go through external services or happen inside a Kubernetes cluster.

Generally, this level of monitoring requires involvement from both your development team and your operations team. On the developer side, it’s helpful to track metrics like requests per second to particular APIs, number of database calls to the server and time to complete a request. Developers can then use these metrics to do things like identifying problematic code or pinpointing areas of the application with high traffic that might require more of a focus on performance and security. Meanwhile, operations teams can use this data to track the overall health of the system and keep tabs on specific details related to application workloads. Specifically, these higher level application metrics can help determine which databases might need to run on a separate service or get scaled up based on system traffic.

4) Monitoring through distributed tracing

Finally, Kubernetes also requires monitoring through distributed tracing to understand how all of the services in your application work together. A typical Kubernetes setup includes multiple services that interact with each other in various ways, and it’s important to keep tabs on the path that a user’s request takes through these systems. Distributed tracing offers that single view across multiple services by collecting metrics from each point of activity and combining those data points into a single file so you can watch the entire experience as it goes from service to service. Importantly, distributed tracing can keep track of this experience even as it goes across different development languages and frameworks, so it doesn’t matter if the services involved switch from Javascript to Ruby to Go to Python as long as they all speak the same distributed tracing language. Ultimately, this allows you to watch the exact path of how those services work together to understand where something might go wrong.

This type of cross-service monitoring is important because service failures usually don’t come in isolation, rather they occur as part of the larger system. It’s easy to test different pieces of your application in isolation, but in the real world those services talk to one another and you need to know how those communications work, especially when it comes to resolving more subtle performance issues. For example, if there’s a slow down in the billing experience, it might not be obvious just by looking at the billing service what’s causing that slowdown. But looking at everything together might reveal that billing is slow because of the connection to a third party payment processor. Distributed tracing directly connects those disparate pieces, making it an extremely helpful form of monitoring to get to the bottom of subtle performance or error related issues.

Top tools for monitoring in Kubernetes

As you set up these types of monitoring in Kubernetes, you’ll need a few external solutions along the way to help with everything from metrics collection to data visualization. There are plenty of options available, but here are the top tools to help kick off your evaluation:

Metrics collection

Kubernetes has a small, built-in metrics collector, but it’s not scalable or meant to work as a database. As a result, most teams add a separate solution to power deeper metrics collection. Both of the most popular solutions offer a time series database, meaning they allow you to track data and see how it transitions over time.

Visualization

Once you have metrics in place, you need a visualization layer that will allow you to build queries against that data and view the information over time in graphical form. When it comes to this layer, one tool typically proves most popular.

Distributed tracing

Distributed tracing tracks application performance across different services, including databases and third-party services. There are two tools in particular to which you should pay attention for distributed tracing.

Finally, there are a couple paid services that offer a broader set of capabilities all in one package. If you want to go this route, there are two solutions worth looking into closely.

Getting started with monitoring in Kubernetes

Monitoring has always been essential to maintaining a successful application, and this continues to be the case as you work with Kubernetes. While the importance of monitoring hasn’t changed, there are certain unique approaches and sets of tools you need to consider as you think through monitoring in a Kubernetes environment. The areas outlined here should help ensure your team has everything you need to know to get started with monitoring in Kubernetes.

Interested in learning more? Contact Spaceship today to discover how we can help give your operations and delivery workflows a boost.

Want to be first in line to get early access to Spaceship?

Be the first to know when we launch! Sign up for our waiting list below: