IT systems are constantly growing in complexity. This makes it increasingly difficult to understand how a certain infrastructure-related issue would impact your application, end users and your business overall. The opposite problem is also challenging to address: “I know that my users are not able to use my application, but which part of my infrastructure is causing this misbehavior?”
Not being able to efficiently map your infrastructure behavior to end user experience may result in longer problem resolution time, chaotic troubleshooting, extended downtime and consequently loss of revenue. Learn how to avoid these problems and with best practices that help you efficiently resolve user-impacting issues.
Understand the dependencies between your applications and infrastructure
It might be possible for small systems to maintain a centralized database by documenting all of the relationships between various components. However, it becomes more challenging as your system expands, introducing new microservices and using more ephemeral components. In such cases, maintaining a detailed, low-level dependency database becomes impossible without automation.
To address this challenge in Sumo Logic, we’ve introduced the concept of an entity that allows the Sumo Logic platform to understand your data (logs, metrics and traces) and map it to a given part of your system. An entity can be an infrastructure component (such as a physical server, Kubernetes node, cloud instance, etc.) and a logical concept (such as a service or an application).
The Sumo Logic platform currently recognizes entities for:
Applications and services
Application components
To get the most out of automation, you’re advised to set up the collection of logs, metrics and traces. However, even with one type of data, you can better understand your system.
Establish a procedure to pivot from symptoms to causes quickly
Understanding the dependencies and relationships between your components is important - but it may not be sufficient if you don’t know how to switch context between different parts of your infrastructure. To be fully prepared for investigating and resolving problems with your application, you should be able to answer the following questions:
Which system components should you investigate if a given application is impacted?
What are the metrics or data points that are important?
How to identify potential problems?
The Sumo Logic platform answers these questions and lets you quickly switch context - not only between application and infrastructure but also between different types of data. For example, if you have a monitor configured to alert you about a high error rate for an application - you can easily pivot to investigating log lines for a server running this application.
The example below, shows an alert configured to alert about high latency on “login-service”. This can be clearly seen on the chart but it may be challenging to determine the reason behind this problem. With Sumo Logic, you can use preconfigured dashboards to get an overview of various parts of your infrastructure and pivot to the right context with just a few clicks:
Prepare data in advance that might be useful for resolving the problem.
The final step to successfully preparing to respond to a production issue is ensuring that you have the right telemetry data available - and a good way to review it quickly. Know where to find your data, how to identify outliers and how to drill further into the issue. Figuring it out in advance can save you precious time when troubleshooting an outage.
Once you ingest your application and infrastructure data to Sumo Logic, you will have the information immediately available to you in the right context. Your logs, metrics and traces will be categorized, processed and displayed in a way that speeds up the troubleshooting process, using out-of-the-box dashboards.
In the Kubernetes dashboard below, we can see that the CPU usage spiked on the corresponding node, likely causing issues on the application side. This demonstrates how complex problems involving several parts of the system can be easily solved with the right tools and preparation.
Learn more about how Sumo Logic can help you better understand your system and improve your troubleshooting workflows:
Complete visibility for DevSecOps
Reduce downtime and move from reactive to proactive monitoring.