It’s never a question of if your software will contain bugs, rather when and how they will be detected. That’s why we run a barrage of tests on our software before deploying it to Production. These tests indeed detect bugs, and we reject builds to fix those bugs, but with premium websites going down, company stocks collapsing, and the occasional rocket falling out of the sky, it’s clear that bugs do get through to Production. Now, we know that the later you detect a bug in the software development lifecycle (SDLC), the more expensive it is to fix. That’s why when a bug is detected in Production, fixing it becomes paramount to virtually everybody in the company. Unfortunately, bugs in Production are also the most difficult ones to solve. Compared to earlier phases in the SDLC, which are very controlled environments, you generally have much less information about things like the exact scenario, the data involved, the sequence of events, and more. Borrowing a concept from one of our earlier blog posts, fixing a Production bug is kind of like solving a murder mystery. To discover “whodunnit” (i.e., determine root cause), you need to follow a trail of clues. What then, are those clues that make up the perfect bug report so developers can fix production bugs quickly and finally? The real answer to this question is in three words at the end of this post, but if you want the details, read on.
Reproducing the error
A set of instructions on how to reproduce the bug is probably the most critical element of a perfect bug report. The developer needs this to examine the internal state of the software in order to understand what caused the bug. I mean, how can you expect a developer to fix a bug if he can’t see what’s happening under the hood. As such, this is also a source of friction between developers and QA engineers. QA provides the developer with a set of instructions, the developer follows the instructions, and … nada. And so, we come to, “It works on my machine.”
Ozcode takes on this challenge head-on, and instead of solving the problem, pre-empts it, removing the problem altogether. With Ozcode Production Debugger, QA just shares a link with the developer. Upon clicking the link, the developer is taken to the corresponding Debugging Screen, where he has access to all the information about the error recorded exactly as it occurred in its live runtime environment. No need to reproduce the error, and “It works on my machine” is gone forever.
Now that we have reproducing the error out of the way, we need to understand what happened. This is where log files come in. Log files contain a wealth of information. In fact, developers log so much information that it can occupy terabytes on your server. When it comes down to it, extracting just the right information needed to understand the problem at hand becomes like the proverbial needle-in-a-haystack. Not only that, log files can be distributed between different modules and microservices, so all these need to be collected and analyzed together. How does one piece together all these pieces of the puzzle? Well, the perfect bug report would do that for you.
In fact, that’s what Ozcode Production Debugger does. When an exception is thrown, all the relevant log entries are gathered from the different modules and microservices that participated in the execution flow of the error and assembled in one convenient view for the developer to examine.
Code execution flow
Log files do provide a lot of information, but each log entry is a very specific piece of data at a very specific moment in time. To continue our murder mystery analogy, you could consider log entries as particular frames of a movie. To get the full picture, the developer has to follow the trail (the call stack) from the dead body lying on the floor (the exception that was thrown) back to the perpetrator’s first steps (the point of entry) on the crime scene (the module or service) and examine the clues (local variables, environment variables, method return values) at each step of the way. Fortunately, Ozcode Production Debugger is the sleuth that uncovers all this information.
The developer can click through each step of the call stack and travel through time to see every line of code that was executed and how the values of all the relevant variables changed with time.
But in the age of distributed computing, our crime may pass through several different locations (microservices), each with its own execution flow, call stack, variables and all. Not to worry, our Sherlock Holmes manages that by providing a timeline of events across microservices that participated in the crime showing all network requests and database queries.
Urgency and priority
So many bugs, so little time. Which ones should the developer address first? While setting priority for a fix might not really be QA’s job, you do want to give the engineer leading the sprint tools to decide how she should prioritize the various bugs that are detected. One criterion that could be useful is how frequently an exception is thrown. Something that happens a lot could mean a bad user experience, but that’s not really enough. Even if an exception is thrown a lot, if you’re only getting it in spikes at infrequent intervals, it could be an edge case that your users only rarely encounter. So, when the last time an exception was thrown is also an important criterion to consider.
Elementary, my dear Watson…
The three words for a perfect bug report
If you’ve ever done QA and found yourself sweating to assemble the right log files, describe the steps to reproduce an error, specify the environment where the error occurred, the build number and on and on, you might appreciate how much easier it is if all you needed to do was provide a perfect bug report was send a (here are those three words I promised at the beginning of the post) single shareable link. That’s all it takes with Ozcode Production Debugger. The bug just appears in your dashboard. You select the bug, copy the link from your browser address bar and send that to your developer. Ozcode “Sherlock Holmes” Production Debugger has done the rest.