The Road to Observability from System Down to Code

The Road to Observability form System Down to Code
When Datadog shows you something has happened with your software at a system level, Ozcode takes you on an observability journey down to code level.

Datadog is an industry-leading observability platform and brings a wide variety of observability data into one integrated view. From details captured in your processed logs, Datadog lets you switch to traces to see how the corresponding user request was executed.  In case of an error in your software, Datadog displays the full stack trace and then lets you use faceted search to drill down into the corresponding traces and logs to determine the cause of the issue. Datadog continuously monitors your production environment and provides system-level alerts such as traffic spikes, elevated latency, or looming bottlenecks to help you troubleshoot issues and keep things running smoothly.

The system-level data Datadog provides goes a long way to determining the root cause of errors, but in many cases, observability at the level of logs, metrics, and traces does not provide enough information to understand what really went wrong with your application. Think of it like a car. If you see the temperature gauge rising, you might guess that you need to top up the radiator fluid. But are you sure that’s really why your engine is overheating? Is there a leak in your cooling system, or is the engine overheating because you’re losing oil? To find out, you have to pop the hood with Ozcode.

Popping the hood on your production environment

When Datadog shows you something has happened with your software, Ozcode pops the hood and takes you on an observability journey from system level down to code level. Datadog can provide a great starting point, showing you anomalies in metrics and even the stack trace of exceptions. From there, you go to Ozcode.

To investigate anomalies surfaced by DataDog, Ozcode lets you add dynamic logging using tracepoints. You can add these log entries on the fly to your live running code without having to deploy a new build through your CI/CD pipeline. Using dynamic logs to reveal the value of locals, variables, method parameters, and return values anywhere in your code goes a long way to exposing the root cause of an incident.

Ozcode also pops the hood on exceptions. Ozcode autonomously captures any exception that your application throws along with full, time-travel debug information so you can step through the error execution flow with full visibility into your production data at every step of the way. This is what we call code-level observability.

When the impossible happens

Let’s see how this integration might work with an eCommerce nightmare.

Black Friday or some other purchasing frenzy is just around the corner. All systems are GO. Everything has been tested, retested, and reinforced.

And then, the impossible happens. Customers can’t complete checkout.

Everybody’s face-palming, and phones and pagers are going off everywhere in IT/Ops.

The first place your DevOps engineers go to is your observability platform. Datadog to the rescue.

A quick look at the service map shows which service is throwing errors.

Datadog Service Map
Image source: Datadog

Let’s drill down into the App Analytics screen for that service and investigate the errors.

Image source: Datadog
While the HTTP request to “checkout” returns a 200 OK, you see many errors and can even see the exception that is thrown. But what now? Now it’s time for developers to dig down into the code, and the collaborative features of both Datadog and Ozcode help break the silos between IT/Ops and developers to get them working together.

Ozcode Production Debugger - LEARN MORE

Time to pop the hood

Ozcode steps in when you need to start working with code. Setting up Ozcode to work with Datadog is easy – just install the Ozcode extension from the Datadog marketplace and get the Ozcode agent installed on your servers. Once you’re set up, Ozcode will show you all the exceptions you saw in Datadog, and now you can time-travel debug them with full visibility into the error execution flow on your live production environment.

But that’s not always enough. We also saw that even in cases where the request returned a 200 OK,  customers can’t seem to check out. Let’s dig a little deeper.                      

Observability hits code level

Going back to your Datadog dashboard, you discover that some critical requests are showing unusually high latency.

Datadog showing latency
Image source: Datadog

Let’s set a tracepoint (a.k.a dynamic log) in the method that tries to fill orders.

Set Tracepoint
Now, as customers continue trying to checkout, you’ll start collecting tracepoint hits; only now, you’ll have source code and will be able to view all locals and variables in scope for each tracepoint.   With the new integration, the Ozcode app is embedded right inside the DataDog platform, so you never have to leave.
Ozcode tracepoints in Datadog
Need even more data? No problem. You can keep adding tracepoints without worrying about performance until you have all the data you need. No need to rebuild and redeploy.

Ozcode Production Debugger - LEARN MORE

Let’s examine one of those dynamic log entries inside Datadog’s Log View.

The Log View correlates the Ozcode dynamic log entry to the trace of the request that generated it. Analyzing this visual representation of the internal workings of our application shows us exactly where the application is spending time and why checkout is taking too long.

Having discovered the problematic variable, you may now want to monitor it for a while to make sure a fix you implement is working correctly.

Let’s go back up the observability path to Datadog.

Since Ozcode pipes dynamic log output back to Datadog, you can use Live tail and watch how your variables change in real-time. In fact, you can use all of the platform’s analytics capabilities for your new live log entries.

Datadog LIvetail
Image source: Datadog

Using dynamic logging to pipe variables back into Datadog opens up a world of opportunity. You can watch how anything changes in real-time on a new chart you define for your dashboard. Taking the car analogy, you’ve added gauges to measure your radiator fluid and oil level in real-time with no effort.

From system to code and back

Observability is critical to keep systems running smoothly and fix them when they don’t. Our journey into observability started at the system level when Datadog’s Service Map showed that one of our services was throwing errors. A look at the Analytics Panel revealed what the error was and even gave us the stack trace. To understand the root cause of the error, we first used Ozcode to time-travel debug an error and then drilled down by adding tracepoints on the fly. These tracepoints generated dynamic logs, which we fed back into Datadog, and even created ad-hoc metrics and visuals to monitor suspicious variables. As soon as a variable went off the scale somewhere, we could examine the live application state that caused it in great detail to take us directly to the root cause of the error.

When you’re thinking about observability, you need to think about the full round trip; from system, down to the code, and back.

Ozcode Lightweight Time-Travel Debugger

Up to 3 users, 10 monthly agents, 100K monthly events – ALWAYS FREE

Ozcode Production Debugger

Idan Shatz

Comments

Keep your code in good shape!

Subscribe to our blog and follow the latest news in the .NET debugging industry

Ready to Dive into Your Prod Code?

Easy debugging with full time-travel data

The Exception

A Quarterly Roundup

Subscribe to our quarterly newsletter and hear all about what’s happening around debugging with Ozcode

Share on facebook
Share on twitter
Share on linkedin
Share on email

Recent Posts

Follow Us

Join Ozcode YouTube Channel

Let’s start debugging, it’s free!

Thanks for downloading the OzCode trial!

You’re well on your way to making C# even sharper.

If your download doesn’t start automatically , please use this direct link.

If you’d like to install OzCode but don’t have
administrative privileges on your machine, please contact us.

Get Started for FREE!

Ozcode Logo

This website uses cookies to ensure you get the best experience on our website.