Honeycomb.io: Modern Observability for Developers and Operations

Last month at DevWeek 2022 I got to talk to a lot of cool vendors and learn what they do and more specifically, what problems they solve, and for whom. One such company was Honeycomb.io which delivers modern observability. I began chatting with Phillip Carter and he was gracious enough to give me a visual tour of what Honeycomb.io provides both developers and operations and what it enables them to do.

Transcript

Honeycomb.io - Philip Carter

[00:00:00] Barton: All right, coming to live from devweek/cloud here in Austin, Texas. I'm here with Philip Carter, Philip, how are you?

[00:00:07] Phillip Carter: Hi I'm doing just fine. My name's Philip. I work for honeycomb. We are a modern observability too, and so observability, what does that mean? It can kind of mean different things to different people. So, if you're coming from like an operations background, you probably care about things like service health, making sure that things are actually up like service A, that is supposed to be talking to service B is actually talking to service B, talking to the right version of service B.

[00:00:37] You wanna make sure. Oh, like, is something gonna go down? How do I know? Like if we gotta go fix something, how do I know what to go and fix and why? And so like observability is kind of a fancy word, but it's like a modern evolution of monitoring and APM and some of that stuff that you might be more familiar with.

[00:00:52] Barton: And so you work on both the app developers and operations. how does that work?

[00:00:58] Phillip Carter: Yeah, we do. So the way that it generally works. You generate data in production about your service and how it's running, and then it gets sent over to honeycomb and then you can analyze it and do interesting things with it.

[00:01:10] So the idea of like you generating data that can kind of depend on who you are. So like if you're an application developer, you might load open telemetry, SDK, where Open telemetry generates what's called telemetry data for production services to CNCF project. That's the second largest CNCF project out there right now, actually.

[00:01:29] And so you can generate that. Using like an SK and some APIs. And you're good to go as a developer, but if you're in operations and you're like, okay, I just wanna see that everything is doing what I think it's doing. I don't wanna have to dive into the source code yet. I don't want to do a deep investigation, work, planning, all that kind of stuff. There are options. So if you're a Java developer, or if you using Python or…

[00:01:52] Barton: you mean like one of these languages here

[00:01:53] Phillip Carter: like one of those languages right there. Yeah. You can then install an agent that then runs alongside your application and it'll do things like track incoming HTP requests, outgoing requests. It'll detect if you're using certain libraries and add what's called instrumentation for that. And so it's a very easy way to get started kind depending on where you're coming from. And when you have that. And you want to send it to a tool like honeycomb now it's like, alright, what do you do with it?

[00:02:19] So let me just quickly show a few things here. [Starts speaking to whats on the screen] So this page, there's a few things that we got here. You can see there's this spike in latency that you can see here. There's things called total spans. You can go to status code, you can get a little bit of a dashboard view about what's going on here. I won't go into the core analysis. Instead, I wanna show something that's gonna be likely relevant, especially to people who are owning these services and wanna make sure that they're healthy and say,

[00:02:43] Barton: when you say owning you mean on the back end?

[00:02:46] Phillip Carter: Yeah. On the back end. So for example, there's this tool called SLOs or service level objectives. And so this is how you define what the health of your service is. And you can then track, is it actually healthy? And then you can establish, what's called a budget of like, if certain things are going wrong. So like for example, we have something on latency now. It's okay if there's a little bit of latency here and there, but if there's a ton of latency all the time, or if we've done a lot of latency over time and we're gonna reach a threshold where it's like, okay, we should probably go in and fix something.

[00:03:20] How do you know? Right. How can you see over time if you're gonna. So we've established a budget of what our acceptable latency is. And if we're going to go to that budget, we have like 27 point something percent of our budget left over. Once this reaches a certain threshold, we're gonna start getting notifications and says, Hey, You have X amount of time left over before you exhaust your budget, you should plan this work instead of dealing with like a fire drill immediately about what's going on.

[00:03:50] And so then there's analysis tools where then you can, well, look at what's called. The different dimensions of the data. And so this is tracking latency over time and you can actually see that if I hover over here, there's this thing called slash cart slash checkout, or the name here, Cart checkouts.

[00:04:08] There's probably this, this service, the checkout service. Seems to be the one causing a lot of latency problems. So now like just did a quick glance. I'm able to get an understanding of what I should probably go and investigate. And if I want to go there I can do this whole group by field thing.

[00:04:27] And so what this will do is this will actually show me a whole bunch of other stuff. I can start looking at a heat map of some of the different services. I can go into this query builder right here, and I can group by a lot of different fields that are in the data, or if I'm just exploring things I'm not looking at an alert or something like that, I just wanna look at the same thing and I want to go here.

[00:04:52] I can actually get a similar kind of view with a thing called bubble up that will allow me to sort of generate the same thing. And, you know, I can see [00:05:00] similarly cart checkout is showing up here. This is showing a latency spike. And so that's the thing that is relevant here. I can then click in and this is where application developers come in.

[00:05:12] Clicking in here will take me to a trace. And this is generated from the actual code of the service itself. And then I can see, okay, there's a lot of time being spent inside of this block called D get discounts. And I see it's calling the database a whole lot. It's probably a bug in my code here, but like I was able to very quickly identify. Something is going on. Okay. We're hitting the database a whole lot. Let me go and investigate. I can go and talk to one of my developers and be like, Hey, when was the last time you changed talking to the database? Maybe? Or maybe I could just go and fix it myself, but whereas I started from not knowing anything. And then within a minute I was able to identify, okay, there's something I can go and take a look at now. And that's kinda what honeycomb is all about.

[00:05:53] Barton: Very cool. And so just to end with, what's coming next for honeycomb, what's the area you wanna get into?

[00:05:58] Phillip Carter: Ooh, there's a couple of cool features coming next that I can't talk about yet, but in October we're gonna have an announcement for, I would say at a high level, everything that you saw here is.

[00:06:09] We, we like to say that we're the fastest observability tool. We want it to be the fastest, just straight up. So everything performance related, everything just making what exists,

Barton: the need for speed

Phillip Carter: theneed for speed, more improvements in open telemetry. So we're major contributors to the Open Telemetry Project.

[00:06:24] So ways that you can generate that data. We wanna make it as easy as possible, more agents that you can install on your backend to get basic visibility, better APIs for developers to work with. So that it's easy as possible for them to add those SDKs and, you know, Add the tracing and then get on with their lives and continue to add value to our business and we got some product announcements coming at KubeCon and so I would keep your eyes peeled for that.

[00:06:50] Barton: All right. We'll see you in Detroit for the announcements. Philip Carter. Thank you.

Developer Blog

Honeycomb.io: Modern Observability for Developers and Operations