Together forever and never to part
Together forever we two
And don't you know
I would move heaven and earth
To be together forever with you
Rick Astley
There are lots of things in this world that when put together, prove to be more than the sum of their parts. For example, take a favorite dessert of nearby Napa Valley - vanilla ice cream, olive oil and sea salt. The first time you hear it, you might think -- “You are going to pour oil on my ice cream, and then top it off with some salt? Yuck! Thanks for ruining dinner.” Take the simple, but delicious peanut butter and jelly. Many europeans think it's disgusting (this is the same culture that finds snails or marmite delectable, so consider the source.). Yet it is now so American that I have heard on good authority it’s being considered as a requirement for U.S. citizenship (along with eating apple pie and listening to Taylor Swift, of course).
So, before you start thinking this is a food blog, let’s get to the point. In the world of IT we have something similar - logs and metrics. Simply put, logs are usually records of something that happened at some point in time (e.g. application errors, warnings, application events, etc.) and metrics are typically measurements of something over time (e.g. the response time of the home page, server CPU, server memory, etc.). In the past, those types of data were often treated very differently. We at Sumo Logic believe that’s a problem, and have worked to solve it. In this blog, I want to dig a little deeper into why logs and metrics go together like peanut butter and jelly. Building the Tesla for Logs & Metrics Once a software company builds a product and starts selling it, it’s very hard to change the skeleton of the thing - the architecture. Switching analogies for a second - consider for a moment that software is similar to cars. When automotive designers are faced with a new problem or scenario, the easiest approach is to tinker with the existing design. This works if you’re adding new tires to an SUV for off-roading. It does not work if you are designing a fully electric car. Tesla completely reworked everything - the transmission, the engine - even the sales model. In the case of systems dealing with machine data, the tinkering approach has definitely not been successful. For example, some log analytics companies, having built engines to make sense of messy, unstructured logs, have also tweaked their mechanics to address highly structured time series data (e.g. metrics). The end result is a lot of hoop jumping to get their engines to perform the way users expect. On the other hand, companies that have chosen to build metrics analysis engines have tried their hand at logs - or at least they say they do. However, in reality, the “logs” passing through the metrics engines are so massaged and the messiness wrung mercilessly out of them, that you end up with something more akin to metrics data. So, this approach may improve engine performance, but breaks down when it comes to accuracy -- all for the sake of expediency. We dealt with this same conundrum at Sumo Logic. We decided to take a big risk and actually evolve our engine design, resulting in a next generation architecture. We built a purpose-built, high-performance engine for time series data, alongside, and integrated with, our high performance log analytics engine. We didn’t forget all of the lessons of the past in building a highly scalable, multi-tenant application -- we just didn’t take shortcuts. The end result is a platform that treats both types of data natively, or, what I consider to be the respect they deserve. It’s all About the Analytics The secret of a great peanut butter and jelly sandwich is not just bringing unique ingredients together, but creating it with the entire sandwich experience in mind. That applied to us as well when we unified logs and metrics -- what’s the point of it all? For example, we aren’t a data warehouse in the sky. People come to Sumo Logic to make sense of their data -- to do analytics. And, the core of analytics is the type of questions asked of the data. Because of data’s nature, some types of data answer certain types of questions better than others. In the case of logs, users often find themselves looking for the proverbial needle in the haystack, or, to be more accurate, hundreds of needles sprinkled across a field of haystacks. So, log analytics has to excel at “x-raying” those haystacks to get to the offending needles very quickly, and even better, detecting patterns in the needles - basically troubleshooting and root cause analysis. Counting the needles, haystacks, or even the hay straw itself is of secondary concern. With metrics, the goal is often to to look for behavior or trends over time. Users rely on time-series metrics to understand how the things they measure change over time, and whether it indicates something deeper beneath the surface. Fitness devices (like the Apple Watch) are a perfect example here. I can track my heart-rate, running speed, distance, etc., over time. This helps me determine if getting out of bed to run outside was remotely worth it. If my stats improve over time, then yes. If no, then I’m sleeping in! At Sumo Logic, we knew we couldn’t afford to ignore the different ways that people use their data. And, simply put, we couldn’t treat metrics like a half-brother of logs. So, we focused on tools that help uncover system and application behavior, not just draw nice looking graphs. Users can overlay related data to find anomalies and patterns in the data. And we strove to make the system as real-time as possible, shaving off microseconds wherever we could so the data is as relevant and timely as possible. Listen to the Users A platform that optimizes for the data and the analytics, but ignores the user, will end up being stiff and hard to use. In building this new functionality at Sumo Logic, we’ve strived to understand our users. That meant investing a lot of time and energy in talking to our users, and listening to their feedback -- and then being willing to change. At the end of the day, that is why we embarked on this journey in the first place. Our customers intuitively understand that having multiple tools for logs and metrics was slowing them down. They have transitioned, or are transitioning, to a world of small, collaborative teams, with cross-functional skill sets, who own segments of their applications. The implication is that the majority of our users aren’t log experts, server experts, or network specialists, but software developers, and/or DevOps teams that support modern applications. They know what they need to measure and to analyze because they write the code and support it in the wild. What they are not interested in is learning the deep intricacies of machine data or advanced analytics. So, after listening to our customers, we embarked on a journey to empower them to explore their data without learning complex search languages or getting a Ph.D. in machine learning. We want our users to be able to lose themselves in the analysis of their data without context-switching from one platform to another, and without diverting time away from their tasks. When it is 2 a.m., and the application is down, learning the theory of statistical analysis is definitely not fun. So, wrapping up, I’ll conclude with this: we are committed to providing the best possible experience for our users -- and that has meant questioning many of our own assumptions about how machine data analytics works. While it might have been easy to recognize that peanut butter and jelly belong together, it takes dedicated hard work and perseverance to get it exactly right. We didn’t do it to prove a point. We did it because our customers, in one way or another, asked us to. They have taken immense risks on their own to ride the cutting edge of innovation in their fields, and they deserve a machine data analytics tool that is right there in the thick of it with them.But we are strong, each in our purpose, and we are all more strong together.
― Van Helsing in Bram Stoker’s Dracula
Complete visibility for DevSecOps
Reduce downtime and move from reactive to proactive monitoring.