As developers, when we’re debugging software, we first have to understand what that piece of code does. Only then can we try to figure out want went wrong (i.e., what’s the bug) and come up with a fix. Over the years, I’ve met many developers, and some are better at debugging than others. When interviewing developers, I routinely give them technical exercises to assess their .NET debugging skills. I’ve found that developers who are effective at debugging typically write better code. The features they develop are more robust and complete. So, it shouldn’t surprise anyone that improving your debugging skills will make you a better developer and even help you further your career. But can anyone improve their debugging skills?
Can you improve your .NET Debugging skills?
Some people may think that being good at debugging is a gift; you either have it, or you don’t. In a great book I recently read (see below), the author spelled it out nicely:
“Debugging is more science than art. It can and should be methodical, and as such, can be taught.”
I love that. Debugging can be taught, and it should be a part of every software development course. We can all improve our debugging skills. The question is, “How?”
The 5 habits of software developers for highly effective .NET debugging
To improve your .NET debugging skills, you need to think like a craftsman. You need to master the basics to perfection until they come naturally, almost effortlessly. However, forming good habits is hard. It requires time, practice, and diligence. As the saying goes, “There’s only one way to do a good job, and that is, to do a good job.” So, here are 5 habits you can develop that will make you a master of debugging:
- Reproduce consistently
- Understand what needs to be fixed
- Create tests that fail to verify the potential fix
- Continue debugging until all the tests pass
- Verify that the bug has been fixed in Production
If you follow this workflow, you will be able to capture and understand a bug and resolve it quickly, effectively, and completely. You may sometimes be tempted to skip a step. Don’t do it. Each step is designed to keep you focused and optimize your work so that the following steps will be effective. Skipping a step can lead you in the wrong direction, eventually forcing you to start over and repeat all the steps from the beginning.
Let’s dive in.
The first thing about a bug that should concern you is how to reproduce it. There are four steps to this habit. At the end of this step, you should have two sets of instructions: a sure-fire way to reproduce the bug and the shortest way to reproduce it.
Find any initial steps to reproduce
You want to find the steps needed to make the bug occur. This is critical. If you can’t reproduce the bug, how can you fix it? Even if the error seems obvious and simple, you’d be surprised how these things can come back and haunt you from Production. Adding or changing code that doesn’t fix the bug may even introduce new bugs. I know, I’ve been there.
Ensure your method is robust
Once you’ve found the steps to reproduce the bug, you want to ensure it’s consistent. You should approach this like a QA engineer and execute your steps under different scenarios. If the bug does not reproduce every time, you’ll have to re-examine your diagnosis of the problem and tweak the steps so that the bug reproduces every time you go through them. This is crucial to understanding what is broken so you can really resolve the issue.
Find the shortest method
Once you can reproduce the bug consistently, you want to find the shortest way to do so. See if you can reduce the number of steps. For example, if a bug occurs when 9 items are added to an order on an eCommerce site, are all 9 items really required to reproduce the bug? Perhaps changing the order in which the items are added can reproduce the bug after the 3rd item. Having a quick way to reproduce the bug will help both you as the developer and QA to reproduce the bug and validate your fix.
Write down your steps as a repeatable script
I can’t stress this enough. You’ve got to write things down. It’s your way of communicating this critical step of reproducing the bug with the rest of your team. It can be in a work item, or a bug report, whatever works in your team. But if it’s not written down, it doesn’t exist. You may even forget the steps if you haven’t written them down.
So, to complete this step, you should have two scripts written down. The first provides a robust, sure-fire way to reproduce the bug. You (or QA) will use this script later to ensure the bug has really been fixed. The second is the shortest way to reproduce the bug. You’ll use this as you investigate the bug and develop your fix. You’ll probably have to go through these steps several times before you come up with your solution, so having a short-cut to reproduce can be really useful (as long as you have the robust, sure-fire script as your final validation of the fix).
Understand what really needs to be fixed
Before you dive into fixing a bug, you need to pause and really think about what needs to be fixed and how your fix may impact the entire system. I’ll give you an example.
Say you’re debugging a backend transaction that occasionally fails because it can’t write a log entry to a disk after completing an operation. You could fix this by adding a try/catch block around the code that writes to the disk. The operation would be complete. Failing to write the log entry to the disk may throw an exception, which you would catch, and your program would continue executing successfully. Great. Mark that one as DONE! On to the next issue.
Not so fast!
What if logging transactions to a disk is a contractual obligation your company has undertaken. Your customer needs those logs for whatever reason, and you are legally bound to provide them. So, you can’t just decide to “move on” if a transaction fails to write to disk.
Sometimes, you may not even be sure how the system should behave. A menu item in a WPF application doesn’t work for a specific customer. Should you continue to display the manu item in a disabled state or remove it altogether.
Every change to your code can affect other parts of the system. This is the “blast radius” of your fix. If your fix touches several modules, what are the side-effects to those modules and your system as a whole? Your attention to these details can be key here. You might consider bringing other team members into the picture, such as QA engineers, product managers, business analysts, UX designers, or anyone else who might be affected by your changes. This kind of consideration and collaboration will only serve to gain you the respect of your team members and others across the organization.
Once you have a good understanding of the problem domain and the expected behavior of the system, you can move on to the next step – writing a test.
Create tests that fail without your fix
You may ask why this bug you’re working on wasn’t detected sooner. While there may be different reasons, one thing is for sure. Nobody ever tested the specific scenario that caused the bug to surface. So, before you change any code, use that short script you created to reproduce the bug as a starting point and write a test that fails because of the error you’re debugging. This may be anything from a (suite of) unit test(s) to a more encompassing end-to-end test if necessary. Whatever the case, you should strive to narrow your focus to a very specific code execution flow so you don’t have to reboot your entire system to run the test.
Continue debugging until all tests pass
Now that you have a test suite, you can start modifying the code until all the tests pass. You get a lot more out of these tests than you think.
- Your tests are a coded spec of your system
By going through your tests, one can understand how your system should work, and the tests are a way to verify if the spec was implemented correctly
- You know when you’re done
Once all your tests pass, you have fixed the bug. There is no guessing or estimation of completeness of the task. Note, however, that creating enough unit tests to cover all edge cases of the code your debugging is a bit of an art. You may find yourself adding more tests until you are confident that a feature is complete and thoroughly tested.
- You know that your fix doesn’t break anything else
If all bug fixes are verified with a suite of tests, then if your fix breaks something else, those tests will fail. This is where these habits start to compound as you accumulate more and more value from your tests over time. The tests you write today will protect your fix from a new bug that turns up tomorrow.
The 6th habit for effective .NET Debugging
Following those five habits of high-performance debugging form the basis of good software development. Once you have mastered them, you can take the next step to multiply your effectiveness. To achieve the next level of mastery, you need to find the tools that will enable you to work faster:
A unit test runner: most IDEs have this capability either built-in or through extensions. Personally, I love solutions that automatically run unit tests every time you save a file. If you just broke something, you’ll get instant feedback.
CI/CD: Automated build and deploy processes are pillars of a successful DevOps adoption. If you’re not doing it yet, this is a good time to start. And if you are, make sure your CI server runs your tests for every build. If any of your tests fail, you don’t want to push that build to Production.
APM/Error monitors: Reproducing Production errors can be very difficult. An APM or Error Monitor can point you in the right direction with a stack trace or contextual data that guides you when writing that robust script you need to reproduce an error. It certainly gives you more to go on than just a user’s description of what went wrong.
Logs and analytics: Log files contain tons of information. The right log analysis tools will let you slice and dice your logs to get actionable insights out of them. Once you can piece together what happened around your bug, you can start working on your robust script to reproduce it.
Highly effective .NET debugging in Production
To most developers, debugging is strongly identified with putting breakpoints in your code and then using F10 and F11 to through lines of code using an IDE such as Visual Studio while inspecting the call stack, local variables, threads, logs, events, etc. We all do that. However, you can’t do that in Production. You can’t put a breakpoint in live code and freeze an application that’s serving your customers. So, reproducing a bug in Production always involves guesswork.
Let’s remove the guesswork.
About 2000 words ago, I mentioned the first habit of highly effective debugging that you should master – consistently reproducing the bug. If you think about this methodically, there are a few things that can help you discover where your program logic breaks.
- Complete code execution flow of the bug
- Database queries and network request that were sent
- The state of the software, from static variables to local variables of the functions
- If the system threw exceptions, which exceptions, where and how we got there
- Relevant log entries
That’s exactly what Ozcode Production Debugger provides. Ozcode’s autonomous exception capture records all the relevant information leading up to an exception. You can then step through your code with full time-travel debug information,…
…and even add tracepoints to create dynamic logs providing code-level visibility to locals and variables anywhere in your application – without having to rebuild a new version and redeploy your application…
…basically taking all the guesswork out of .NET debugging on Production systems.