Your guide to Kubernetes: Important considerations for monitoring system health
As we continue to explore what it takes to get started with Kubernetes, next up on the list is monitoring.
Good monitoring is essential for any application. It’s your window into what’s going on in the system at any point in time and the only way to ensure everything is running properly for your end users.
While you’re likely well-versed in what monitoring typically entails, it’s important to understand how monitoring plays out in Kubernetes, as there are certain considerations you need to take into account.
4 important areas for monitoring in Kubernetes
Once you have your application setup in Kubernetes and you’re ready to go live, it’s important to make sure you have the proper monitoring in place to keep tabs on system health and performance. As you do so, be sure to consider the following areas of interest that are specific to Kubernetes:
1) Monitoring the health of the servers on which you run Kubernetes
First, you need to monitor the health of the actual servers on which you run Kubernetes, including CPU time, amount of memory used, network traffic and disc usage. This monitoring sits entirely in the purview of DevOps as it typically tracks lower level metrics that are available exclusively to operations teams. Importantly, this server monitoring should be your first line of defense that indicates something is wrong.
2) Monitoring Kubernetes itself
Next, it’s important to keep close tabs on Kubernetes itself. Specifically, you should monitor the health of your clusters, Nodes, Deployments and Pods as well as the amount of resources they use. This monitoring still sits in the realm of your operations team, but your developers may start to have some involvement when it comes to things like how much memory a piece of the application in any given Pod might use.
One common practice to enable Kubernetes monitoring is to set up a Horizontal Pod Auto-scaler, which is a built-in mechanism to scale your workloads inside of Kubernetes up or down based on a designated metric. This might be as simple as something that says “if this one system is scaled up, this other system needs to scale up to match it.” The Horizontal Pod Auto-scaler can also pull in external metrics and report on higher level concepts like where network traffic is coming from to provide more granularity on data than you can typically get at the server level.
3) Monitoring your application
Once you have tabs on your servers and Kubernetes, it’s time to look closely at your application to make sure the code your developers write runs properly. There are a number of ways to accomplish this monitoring, many of which are relatively low touch. For instance, you can install an additional library that adds monitoring to the application or go through your full application code to add in specific collection points for important metrics. And this monitoring can go through external services or happen inside a Kubernetes cluster.
Generally, this level of monitoring requires involvement from both your development team and your operations team. On the developer side, it’s helpful to track metrics like requests per second to particular APIs, number of database calls to the server and time to complete a request. Developers can then use these metrics to do things like identifying problematic code or pinpointing areas of the application with high traffic that might require more of a focus on performance and security. Meanwhile, operations teams can use this data to track the overall health of the system and keep tabs on specific details related to application workloads. Specifically, these higher level application metrics can help determine which databases might need to run on a separate service or get scaled up based on system traffic.
4) Monitoring through distributed tracing
This type of cross-service monitoring is important because service failures usually don’t come in isolation, rather they occur as part of the larger system. It’s easy to test different pieces of your application in isolation, but in the real world those services talk to one another and you need to know how those communications work, especially when it comes to resolving more subtle performance issues. For example, if there’s a slow down in the billing experience, it might not be obvious just by looking at the billing service what’s causing that slowdown. But looking at everything together might reveal that billing is slow because of the connection to a third party payment processor. Distributed tracing directly connects those disparate pieces, making it an extremely helpful form of monitoring to get to the bottom of subtle performance or error related issues.
Top tools for monitoring in Kubernetes
As you set up these types of monitoring in Kubernetes, you’ll need a few external solutions along the way to help with everything from metrics collection to data visualization. There are plenty of options available, but here are the top tools to help kick off your evaluation:
Kubernetes has a small, built-in metrics collector, but it’s not scalable or meant to work as a database. As a result, most teams add a separate solution to power deeper metrics collection. Both of the most popular solutions offer a time series database, meaning they allow you to track data and see how it transitions over time.
- Prometheus: Prometheus is one of the most popular metrics collectors. It is a cloud native solution that can scale up for extremely large workloads but remains very easy to use. In addition to collecting metrics, Prometheus also does alerting and can be tied back into Kubernetes to power things like Deployment and Node auto-scaling. Many Kubernetes applications have built-in support to report their metrics directly to Prometheus.
- InfluxDB: Influx DB is also a fairly popular tool. It has its own ecosystem, working natively with tools for real-time data processing and visualization while InfluxDB handles the data collection.
Once you have metrics in place, you need a visualization layer that will allow you to build queries against that data and view the information over time in graphical form. When it comes to this layer, one tool typically proves most popular.
- Grafana: Grafana is an open source project from a commercial company that has built-in support for Prometheus (however it can talk to other systems like InfluxDB). It offers an easy to use interface out of the box and scales well for large workloads.
Distributed tracing tracks application performance across different services, including databases and third-party services. There are two tools in particular to which you should pay attention for distributed tracing.
- Jaeger: Jaeger is a popular open source project from the CNCF. Although it only supports a handful of languages/frameworks, they are all popular among developers.
- Zipkin: Zipkin is an older distributed tracing system that supports a wide array of languages/frameworks.
Finally, there are a couple paid services that offer a broader set of capabilities all in one package. If you want to go this route, there are two solutions worth looking into closely.
- Datadog: Datadog is a paid service for all kinds of monitoring that does everything from low-level server monitoring to cross-application distributed tracing. It is part of a full DevOps suite and while it can be expensive, that cost is offset by not having to set up and maintain your own monitoring infrastructure.
- New Relic: New Relic is more of a legacy solution that is particularly well known because it has deep integrations into popular development languages and frameworks. It does offer server monitoring, but the biggest value of New Relic comes from its high level application performance monitoring. New Relic is relatively easy to set up and start getting information about how your code is behaving and what’s going on within your Kubernetes clusters, which is not always the case with other systems.
Getting started with monitoring in Kubernetes
Monitoring has always been essential to maintaining a successful application, and this continues to be the case as you work with Kubernetes. While the importance of monitoring hasn’t changed, there are certain unique approaches and sets of tools you need to consider as you think through monitoring in a Kubernetes environment. The areas outlined here should help ensure your team has everything you need to know to get started with monitoring in Kubernetes.
Interested in learning more? Contact Spaceship today to discover how we can help give your operations and delivery workflows a boost.