Why I Hate Bugs, and What I’ve Decided to Do About Them

I started OzCode nine years ago for one simple reason: I hate debugging.
In the past, I worked for a company that makes medical devices, which meant that every bug in our software would have a direct effect on people’s health. This greatly amplified the stress and frustration of debugging.

This and other experiences allowed me to master exception handling, breakpoint setting, and deciphering callstacks, yet it still always amazed me how hard it is to track failures down and fix them, especially when the code is already deployed to a production (or even staging) environment. So I started OzCode to try and turn debugging into a straightforward process instead of a guessing game.

You Don’t Watch Movies Backwards

There are a lot of error monitoring tools on the market like Raygun, Stackify, and Microsoft Application Insights. Sure, they give you a nice dashboard so that you can see all the different exceptions and give you an idea of where the error occurred. The problem is that looking at error reports is like seeing the last frame of a murder mystery movie, where everyone is already dead. These exception handling tools tend to hint at how you got to the point of failure without giving you the whole story.

One frame just isn’t enough (Dial M for Murder, 1954)

Moreover – as much as I love the technological advancements that have led to cool things like distributed apps, microservices, serverless technologies, and more to enable a great customer experience – debugging in today’s cloud native application development cycle is more difficult than ever. Because today, the shift to the cloud means that new lines of code are often executed minutes after they were written, most times using a Docker container somewhere in the cloud, with little or no visibility into how they were executed, and so errors are even tougher to track down.

Similarly, we face the challenge of distributed code, because most tech stacks are comprised of numerous microservices or serverless functions, making tracing hard to impossible. Think about it: when a user clicks a button in their web browser, it sends out a request to some REST API which promptly calls another service, which then calls another service – and good luck to you – tracing errors in that chain of completely isolated pieces of software.

The Bane of Log Files

So you know you have a bug when your error monitor tooling or APM gives you an alert (or when angry customers yell at you over the phone), but then you have to reproduce the issue and isolate what’s unique about that particular faulty scenario.

The common method of production debugging is to look up the log files of each service running in the cloud (the only way to gain visibility into a cloud service). But – and pardon my language – log files suck! Log files are usually spread across different file system locations, machines, and dashboards. To continue the murder mystery metaphor: log files give you various frames of the crime movie, so you get a flash of a knife or a gun, but you still need to piece together the correct murder weapon and killer, the motive, and the conclusion.

Reading through log files is time consuming and inefficient. You don’t know exactly what the problem is so you can’t immediately search for a particular symptom. Of course, inserting a breakpoint and attaching a debugger is out of the question because that would break the production environment.

You also need to figure out which version of the software is live. Then you have to go through massive amounts of logging data to find the log lines that correlate to the problem. Think you’ve got it? The next step is to form a hypothesis and then try to validate that hypothesis.

Usually, you don’t have all the info to understand the root cause of the bug, so your first guess will not produce results. Back to adding more log lines, and try again. This means changing your code, redeploying it to the production environment, and waiting for the error to reproduce again. You’ll likely end up with multiple iterations until you completely understand and resolve the issue.

Spend Your Time Producing, not Reproducing

So now I’ve named the main reasons why I hate debugging, here’s what I decided to do about it: build OzCode, and put an end to the pain of reproduction. With OzCode, it’s like watching the director’s cut. Our lightweight agent technology gives you the full version of the “crime movie”, empowering developers to use time travel debugging techniques to debug the elusive chain of causality that led up to the moment of failure.

OzCode allows you to see from the first frame all the way to the last, thereby eliminating guesswork and minimizing the time it takes to remove errors from your production environment, though the agent can also be applied during the staging / QA process – not just production – to silently monitor applications for errors.

When an error occurs, OzCode captures the time travel and execution of the particular flow of events that led to the exception – using contextual logging to provide you with the full picture (horror mystery film, in the case of our metaphor). The results of this trace are delivered through a browser-based, IDE-like debugging experience. Instead of slaving through never-ending log files, developers can finally debug the actual failure, without having to reproduce it!

Not to mention, considering today’s geographically-distributed teams, OzCode’s debugging agent can function across locations, allowing remote teams to work together more efficiently. We create a web-based collaborative debugging experience, initiated by sending a link to begin debugging sessions, so that teams can solve issues faster than ever before.

From the point of view of the CTO or R&D manager, OzCode saves money. Debugging in the production environment can be much more expensive than debugging during development or testing. Bugs that appear in production can cause downtime and churn, so when issues appear in a live product, speedy triage is essential.

Who Can Use OzCode? (Not just horror mystery buffs!)

The OzCode Production debugger has something for everyone in the company who is involved with solving bugs in production – from SREs to QA teams, and of course ideal for developers using .NET Framework or .NET Core (on Windows or Linux) as well as those building ASP.NET web apps, Windows Services, and traditional desktop applications.

OzCode can help remove the pains from your debugging process, by providing you with context: the full story of your bug.

So if you’re tired of using old methods to solve increasingly difficult problems, then you’ll be perfect as an OzCode beta user. Sign up to start a beta trial that will rid you of debugging headaches and let you spend more time producing code that works.