I’ve been debugging with logs for as long as I can remember. We all have. Logging is an easy way to extract data from Production and inspect it. However, logs can be very ineffective for many reasons. For example, you never seem to have the log you want. Dynamic logs present a new paradigm for debugging Production systems and are a way to make logs an effective debugging tool.
This post is based on a webinar I recently gave. To view the full webinar, scroll down to the end of the post.
Production debugging with logs – for and against
There are several advantages to Production debugging with logs:
- Logs can show a sequence of events.
- You can easily filter to errors (provided those errors were indeed logged).
- Logs are very reliable. They’re unaffected by crashes or even computer restarts.
- Logs can report usage telemetry.
- They can be aggregated from many different sources (machines, files, databases).
- Logs can help you detect issues that can only be understood when you have a lot of data.
That said, debugging with logs also presents several challenges:
- It can be very hard to find the relevant logs. You have to sift through mountains of data, and finding “just the right log entry” can be the proverbial needle in a haystack.
- In practice, you never have enough logs.
- Microservices make debugging with logs even more difficult. Extracting all the relevant logs requires extra work like adding correlation IDs.
Consequently, debugging with logs usually entails several rounds of analyze, add more logs, rebuild, redeploy, reproduce the error (which itself can be next to impossible) and start again. It looks something like this:

Debugging with logs gets easier if they’re structured
When structured logs came along, they made logging both much simpler and more useful. If once you had to write something like this:
var requestInfo = new {Url = https://myurl.com/data, Payload = 12};
_log.information(“Request Info: url=” + requestInfo.Url + “, Payload=” + requestInfo.Payload );
Not you can just write:
var requestInfo = new {Url = https://myurl.com/data, Payload = 12};
_log.information(“Request Info is {@RequestInfo},requestInfo);
This is a simple example with only 2 data items to log, but imagine requestInfo is a big, complex object with many data items.
Once logs are more structured, you can do a lot more with them
- Search and filter with rich rules
- Collect and visualize numeric data
And the possibilities continue to grow if you are piping your logs into a hosted ELK stack such as logz.io or another enterprise log analysis tool.
Finding needles by debugging with logs the Ozcode way
One of the big challenges with logs is that there are so many of them, and they can be all over the place – in different files, different databases, and even different machines. In order to make sense out of logs, you need to collect the relevant logs into one coherent sequence, so you understand what went wrong (remember that needle-in-a-haystack I mentioned before?). Well, that’s exactly what Ozcode does. Ozcode gathers all the log entries along the execution flow of an error. For example, the screenshot below shows all the logs relevant to the current HTTP request.
We see just the relevant logs with no effort. No need to extract them from your ELK stack or filter them according to some correlation ID, or manage span IDs or trace IDs, … you get the picture. With Ozcode, it’s all just there.
Dynamic logging with tracepoints: the right logs, in the right place, at the right time
Logical errors are among the most difficult to diagnose and fix. They don’t throw exceptions, so there’s nothing to point you at where you should start looking. You can try to reproduce them locally; however, that’s a game of hit-and-miss. You could attach a debugger to your Production environment, but that’s usually a complicated process, and you can’t really stop your Production server with breakpoints.
So, we’re back to logs.
But the problem with log-based debugging today is that Production errors are completely unpredictable. You never know what is going to go wrong. If you did, you would write your code differently in the first place. Therefore, you also can’t know what you need to log. You can’t log everything all the time. Log files quickly accumulate to petabytes and imagine the sea of information you’d have to swim through to find the right logs anyway. So, it becomes the tedious, iterative process described above.
What you need is the right logs, in the right place, at the right time.
This is exactly what Ozcode gives you today with tracepoints. Here’s how it works.
Create a new Tracepoint session.
Select the class you want to inspect.
Ozcode will display the decompiled source code for that class taken directly from your live Production environment.
Select the locations where you want to place tracepoints.
At each location, you can specify a structured log that includes any number of parameters.
And hit Start Collecting at the top of the screen.
Now you can invoke the application flow you want to inspect, and as the code runs through each tracepoint, it will output the log you specified to the console.
A tracepoint is not just a dynamic log entry. It’s a full snapshot of the code execution. You can click on any tracepoint hit and inspect the local variables at that point in time, just like time-travel debugging.
Ozcode tracepoints enhance your capabilities for debugging with logs in two ways. You can add logs at runtime, without having to push code to a repo, redeploy the app, and reproduce the problem. You can also capture snapshots at any place in code. Kind of like what you get with regular breakpoints, except that Ozcode doesn’t stop the production server while debugging. This completely transforms your production debugging experience and empowers you to solve in minutes what would have taken hours or days without dynamic logging and tracepoints from Ozcode.