Managing logs is a significant part of an SRE's daily grind. Scattered within heaps of log data are invaluable insights - those small bits of information that can unveil underlying issues and patterns critical for system monitoring and troubleshooting. However, in an era where the volume of logs is astronomical, how do you discern the relevant from the irrelevant?
Sumo Logic's array of log analytics features comes to the rescue, wielding the might of artificial intelligence. Read on to learn how Sumo Logic's AI capacities make log management more intelligible, efficient, and actionable.
Machine intelligence in system troubleshooting
In artificial intelligence, one of the foundational tasks is discerning patterns and filtering out irrelevant information via "clustering". Coincidentally, one of the quintessential challenges for DevOps practitioners is filtering out the noise for system troubleshooting. Among heaps of logs showcasing expected behavior and performance, you need to isolate logs relevant to troubleshooting and security insights. These logs often harbor information pertinent to errors, load issues, and latency anomalies.
AI-powered log analytics from Sumo Logic lets you compare sets of structured logs based on events of significance. Be it JSON, CSV, key-value, or other structured formats, Sumo Logic provides a smart tool to streamline anomaly detection.
Conditional analysis for events
Artificial intelligence models help you identify an event of interest through conditional statements. In the case of log analytics, the event condition is essentially akin to stating “I want to see logs where X happened”. To define these conditions, DevOps practitioners learn a structured query language tailored to the tool. The criteria then act as a filter set by the user, focusing only on logs that meet that specific condition.
Moreover, structured logs often incorporate elements like timestamps. This type of structured log data forms the basis for metrics. Utilizing tools like Sumo Logic’s time compare operator, you can search for scenarios where a particular error took place within a certain time period, multiple time periods, or an aggregate over multiple time periods in the past.
You can use these structured logs to evaluate the performance metrics of a website, such as the latency or the number of exceptions, before and after a deployment. You can also track malicious activity by comparing past security events against prior averages. These “Event Conditions” essentially set a guideline that works against a vast array of unstructured logs.
Within the Sumo Logic platform, you can also leverage LogReduce, which uses clustering to help security engineers delve into particular messages or data to see where particular attacks may have occurred. LogCompare can then help you find out when a specific attack has happened. Combining these tools, you can unlock advanced log analytics to pinpoint entities or properties associated with the events they are curious about.
Pattern recognition and differential analysis
Machine intelligence systems often require contrasting datasets for better pattern recognition. You can also specify a condition to contrast against the event-of-interest condition. If one isn't provided, LogExplain is smart enough to generate a comparison dataset based on your event condition.
LogExplain will then process the data against the specified conditions, creating two data sets:
- Control data set - representing “normal” operations.
- Event-of-interest data set - portraying anomalies or significant events.
Once a condition is set, LogExplain parses through the logs, matching each entry against defined conditions. This AIOps capability allows for automated processing, ensuring logs are sifted through in real time. It does this by capturing frequent joint-column entries (like key-value pairs) that occur notably more in the event-of-interest data set as compared to the control set.
Practical use cases for AIOps:
Let's put theory into action with a few examples:
- AWS CloudTrail: Practitioners can use LogExplain to assess which users, IP addresses, AWS regions, and S3 event names most correlate with the S3 Access Denied error. This is discerned by comparing the prevalence of these entities in AWS CloudTrail logs that contain the S3 Access Denied errors against those that don't. These insights make troubleshooting a breeze, providing a clear direction for further investigation.
- Kubernetes: Imagine you come across a cluster of logs with the reason "FailedScheduling," indicating Kubernetes can't find suitable nodes for requested pods. With LogExplain, you can delve into which specific pods are affected and why they can't find a node.
Closing thoughts
The future of system reliability and troubleshooting hinges on AI-powered tools like Sumo Logic. In an ever-complex IT landscape, SREs need tools that don’t just simplify your tasks but also empower you to be proactive and efficient. The intelligent capabilities for log analytics are steering SREs in the right direction, transforming the tedious task of log management into a more streamlined and precise science.
Learn more about the Sumo Logic Log Analytics solution.
Complete visibility for DevSecOps
Reduce downtime and move from reactive to proactive monitoring.