In the first post of our Amazon EKS series, we went deep into what EKS is and how organizations that run Kubernetes can benefit from it. In this second installment, we’ll learn why it’s essential for organizations to monitor EKS logs, along with how to do it.
To recap, Amazon’s Elastic Container Service for Kubernetes (Amazon EKS) is a managed service that enables organizations to simply run Kubernetes on AWS without having to maintain their own K8s control plane, similar to Google’s GKE and Microsoft’s Azure Kubernetes Service (AKS).
Amazon EKS provisions clusters of worker nodes using Amazon Machine Image (AMI) and the AWS CloudFormation script. Because Amazon provisions, scales, and manages the Kubernetes control plane for your organization in several availability zones (AZs), IT teams are relieved of the operational burden of managing Kubernetes.
Read more: What is Amazon EKS?
Logging and Monitoring Amazon EKS using CloudWatch
CloudWatch is the built-in monitoring platform for AWS. Organizations use CloudWatch to collect and track metrics on AWS cloud resources and the apps that run on AWS. Many organizations rely on CloudWatch to collect and monitor log data, set alerts and alarms, and configure workflows to automatically audit and react to changes in AWS environments. In EKS, CloudWatch is one of the simplest ways to aggregate and expose metrics for EKS clusters.
Just in 2019, Amazon announced that Amazon EKS now has the capability to send log data from the K8s control plane to Amazon CloudWatch Logs. Through this, teams will find it easier to monitor changes to their systems, as well as how their EKS clusters are performing. In EKS, there are several log types which correspond to specific components of the provisioned Kubernetes control plane. You can read more about Kubernetes Components here.
What are EKS Logs?
Logs are invaluable to IT teams as they are crucial in monitoring the health of your data and systems. Logging is extremely helpful in validating the health of your Kubernetes clusters. It also makes it easier for teams to generate alerts to ensure operational events are reviewed regularly or when there are peculiarities.
In the control plane logging feature, teams can select the log types they need and log data will be sent as log streams to the corresponding EKS cluster group in CloudWatch.
The following log types are supported:
- API server logs (api) - Your EKS cluster’s API service is the Kubernetes component that exposes the K8S API. The API server logs contain information on API requests made to your EKS cluster.
- Audit logs (audit) - Audit logs contain the record on the information on individual users, admins, and system components that have interacted with your EKS cluster via API.
- Authenticator (authenticator) - Authenticator logs are unique to EKS. They contain information on authentication requests performed on your cluster via the Kubernetes Role Based Access Control (RBAC) using IAM credentials.
- Controller manager (controllerManager) - The controller manager log type contains data on the core control loops that are shipped with Kubernetes.
- Scheduler (scheduler) - The scheduler log type records when and where Kubernetes pods are run in your cluster.
Collecting Logs from EKS Clusters
As mentioned, Amazon announced last year that log data can now be sent from the K8s control plane straight to Amazon CloudWatch logs. Before this update, it wasn’t possible to access control plane log data to monitor activity and audit changes in EKS clusters.
Using the Amazon EKS control plane logging feature by enabling the log types needed for each of your EKS clusters--this feature is disabled by default. Log types are enabled on a per-cluster basis, and you may do so in your AWS Management Console, through the AWS CLI, or through the EKS API. Enabling a log type means that log data will be sent via log streams to your CloudWatch logs in the same AWS account, and stored accordingly. Note that control plane logging sends logs with a log verbosity level of 2.
How to Enable Control Plane Logging in the AWS Console
- Go to the Amazon EKS console on AWS.
- Choose the cluster you want to enable logging on. Click on the name to display cluster information.
- In the Logging section, choose Update.
- Enable or disable the log types accordingly. Note that logging for all log types is disabled by default.
- Click on Update.
How to Enable Control Plane Logging via the AWS CLI
Control plan logging is available for AWS CLI version 1.16.139 or higher.
Step 1: Check the AWS CLI version.
In AWS CLI, check the version using the command aws --version.
If the AWS CLI version is lower, please use the Installing the AWS Command Line Interface in the AWS Command Line Interface User Guide to update to a higher CLI version.
Step 2: Update control plane log export configuration.
Once you’ve verified that the AWS CLI version you have supports the control plane logging feature, use this command to update the cluster’s control plane log export configuration, substituting your cluster name and desired endpoint access values.
aws eks --region us-west-2 update-cluster-config --name prod \--logging '{"clusterLogging":[{"types":["api","audit","authenticator","controllerManager","scheduler"],"enabled":true}]}'
The output will look like this:
{ "update": { "id": "883405c8-65c6-4758-8cee-2a7c1340a6d9", "status": "InProgress", "type": "LoggingUpdate", "params": [ { "type": "ClusterLogging", "value": "{\"clusterLogging\":[{\"types\":[\"api\",\"audit\",\"authenticator\",\"controllerManager\",\"scheduler\"],\"enabled\":true}]}" } ], "createdAt": 1553271814.684, "errors": [] }}
Note the update ID as you will need it to check the status of the configuration update.
Step 3: Check the status of the log export configuration update.
Using the cluster name and the update ID returned, using the following command to check the status of the update.
aws eks --region us-west-2 describe-update --name prod --update-id 883405c8-65c6-4758-8cee-2a7c1340a6d9
The output will look like this:
{ "update": { "id": "883405c8-65c6-4758-8cee-2a7c1340a6d9", "status": "Successful", "type": "LoggingUpdate", "params": [ { "type": "ClusterLogging", "value": "{\"clusterLogging\":[{\"types\":[\"api\",\"audit\",\"authenticator\",\"controllerManager\",\"scheduler\"],\"enabled\":true}]}" } ], "createdAt": 1553271814.684, "errors": [] }}
The update has been completed when the status says Successful.
Note that updating log export configuration will send all available log types to CloudWatch Logs. These logs will be charged according to standard AWS CloudWatch Logs pricing.
How to Monitor Cluster Control Plane Logs on CloudWatch
Now that you’ve enabled control plane logging, it’s time to learn how to view the logs on the CloudWatch console. Here are the steps to do just that:
- Open the CloudWatch console for EKS clusters in AWS via this link: https://console.aws.amazon.com/cloudwatch/home#logs:prefix=/aws/eks
- Click on the cluster name of the EKS cluster you want to view control plane logs for.
- Choose the specific log stream to view. The names will vary depending on log type.
- Kubernetes API server component logs (api) - kube-apiserver-clustername
- Audit (audit) - kube-apiserver-audit-clustername
- Authenticator (authenticator) - authenticator-clustername
- Controller manager (controllerManager) - kube-controller-manager-clustername
- Scheduler (scheduler) - kube-scheduler-clustername
Monitoring Logs via the AWS API
Aside from the AWS console and CLI, there are SDKs that AWS provides for third-party tools to interact with AWS APIs. These APIs makes it possible for organization and third-party providers to develop a custom solution. You can read more here.
Logs and events provide the granular details that organizations can breakdown and investigate to see the record of what’s happening in the clusters, but it’s essential to have a way to monitor metrics which provide teams a higher-level view for regular monitoring.
Monitoring EKS Metrics
In non-managed Kubernetes, it’s critical to monitor Master Nodes as they determine the life of a K8s cluster. Since AWS manages the master nodes and the control plane in EKS, admins have no open access to getting granular metrics on the master node. Kubernetes has a built-in set up to monitor outages via Kubelet which collects data on the state of pods, nodes, and containers and metrics can be accessed via the Metrics Server. Since the metric-server only captures the state of K8s resources in the short term, there’s still a need to capture metrics as a time-series for easier analysis and visualization.
Because EKS is part of the larger AWS platform, monitoring of EC2 instances are already set-up and stored in CloudWatch. There is also Container Insights, an AWS service that provides metrics about pods, nodes, and containers, which are also stored in CloudWatch. However, the cost of configuring custom metrics on CloudWatch can be a challenge for high volume systems at $0.30 per metric.
A way around this is the more widely-taken road of using Prometheus to monitor EKS clusters, which can be pushed to CloudWatch or sent to third-party monitoring tools like Sumo Logic.
kube-state-metrics
Used in managed Kubernetes environments like EKS, kube-state-metrics generates metrics on the status of objects, including containers, nodes, and pods by listening to the Kubernetes API server. Kube-state-metrics can be deployed using Helm. There are other ways to access cluster information, but kube-state-metrics makes it easier for teams to view metrics using monitoring services like Sumo Logic.
We will go over how to monitor EKS metrics using Sumo Logic in the final part of this series. For now, let’s go over our suggested metrics to collect via kube-state-metrics. For a complete list of all metrics and their descriptions, read the documentation here.
Name in kube-state-metrics | Description | |
Daemonsets | kube_daemonset_status_current_number_scheduled | Number of nodes correctly running at least one daemon pod |
kube_daemonset_status_desired_number_scheduled | Number of nodes specified for running the daemon pod | |
kube_daemonset_status_number_misscheduled | Number of daemonsets running where they are not supposed to | |
kube_daemonset_status_number_unavailable | Number of pods currently not available | |
Kube_daemonset_metadata_generation | Sequence number of a specific generation of the desired state | |
Deployments | kube_deployment_metadata_generation | Sequence number of a specific generation of the desired state |
kube_deployment_spec_paused | Info on whether the deployment is paused and not to be processed by the deployment controller | |
kube_deployment_spec_replicas | Number of desired pods for deployment | |
kube_deployment_spec_strategy_rollingupdate_max_unavailable | Max number of unavailable replicas in a rolling deployment update | |
kube_deployment_status_observed_generation | Status of the generation as observed by the deployment controller | |
kube_deployment_status_replicas_available | Number of available replicas per deployment | |
kube_deployment_status_replicas_unavailable | Number of unavailable replicas per deployment | |
Nodes | kube_node_info | Information about a cluster node |
kube_node_spec_unschedulable | Info on whether new pods can be scheduled by a node | |
kube_node_status_allocatable | Info on allocatable resources of a node | |
kube_node_status_capacity | Info on a node's resource capacity | |
Kube_node_status_condition | Info on the condition of a cluster node | |
Pods | kube_pod_container_info | Info about a container in a pod |
kube_pod_container_resource_requests | Number of resource requests by a container | |
kube_pod_container_resource_limits | Number describing the limit of resource requests by a container | |
kube_pod_container_status_ready | Info on whether container readiness check succeeded | |
Kube_pod_container_status_terminated_reason | Explains why the container is currently in a terminated state | |
kube_pod_container_status_waiting_reason | Explains why the container is currently in a waiting state | |
kube_pod_status_phase | The current phase of a pod |
How EKS is Monitored with Sumo Logic
Kube-state-metrics makes it possible for teams to aggregate information and use Sumo Logic to view metrics of different components to make better decisions based on correlations across logs, metrics and events information for all of your EKS control plane components. In the final post of this series, we will go deeper into monitoring, and we’ll explain how to get set up with Sumo Logic to monitor EKS. We’ll expand on the different features and benefits of using Sumo Logic to get the most insights out of your Kubernetes clusters.
Complete visibility for DevSecOps
Reduce downtime and move from reactive to proactive monitoring.