Modern IDEs provide very sophisticated tools for .NET debugging. We take it for granted that we can hit F5 to start a debug session, and F10/F11 to step over or into lines of code, fully expecting the IDE to display all variables in the vicinity. The truth is that amazing technology lies behind these capabilities, and in Development, we have everything at our fingertips. Not so for Production.
In Production, our ability to debug is hampered in many ways.
Reproducing the error: It’s usually anywhere between hard and impossible to reproduce a Production error.
Log files: We rely heavily on logs, but log files don’t cut it. You never have the right log entry where you want it, so you have to guess the solution, add more logs, build, deploy, rinse, repeat until you get what you want. And even then, logs are widely dispersed among different files, so it’s hard to find the relevant ones and put them together.
Breakpoints: Putting breakpoints in your production code is problematic. It causes your server to stop executing, which means anyone trying to access your application would not get a response.
Source code version: Matching your symbols and source code version to Production is not trivial. You’re usually several builds ahead on your development machine, not to mention that in Production, the code is optimized, so even if you could reproduce the right source code version, you might not be able to stop on breakpoints or see many of the local variables.
Microservices and serverless: With these technologies, not only do you have to grapple with the above, but there are added complications. With microservices, you have to debug across several processes, and in serverless architectures the offending code or scenario that you’re trying to debug is running one moment and gone the next.
Personally Identifiable Information: Production systems often contain sensitive information such as credit card numbers, social security numbers, and a host of other personal information that is heavily guarded by strict privacy laws. You’re supposed to debug Production systems without seeing any of this data.
Access to Production systems: There are different scenarios in which you don’t even have access to your Production systems. Consider, for example, that your customer reports a bug, but it happens on their desktop or on-premises infrastructure to which you have no access.
And yet, Production bugs happen in many shapes and forms, and being able to debug them is critical to your business. Take the classic example of the shopping cart functionality in an eCommerce website. Things can get much more subtle than that, and even small UI glitches can send your customers over to your competitors’ websites.
And yet, all is not lost. Debugging in Production is possible.
This blog is based on a webinar I recently gave where I showed some great .NET debugging tools and techniques you can use on your Production systems.
Scroll down to the end to view the full webinar recording.
The types of bugs you can encounter
There are several types of bugs that hit you in Production. Here are some of them:
- Failed Requests
Your client app sends a request to the server, but then something goes wrong, and you don’t get a response. Imagine a customer wanting to check if you have an item in stock and not getting a response.
Your whole IIS server or desktop application just disappears. We’ve all seen this – one moment it’s there, then, suddenly, it’s gone.
- Logical bugs
These may not throw an exception, but something won’t add up in the back-end logic. A value will be miscalculated, or the wrong action (or no action) will be performed. This can be manifested in any number of ways.
In the case of desktop applications, it’s very clear when something hangs. Your application won’t respond. Nothing moves. Your window is frozen. It’s frustrating but clear. But there are more subtle cases. For example, in an ASP.NET application server, if a request hangs instead of returning a response, the client will eventually time out and continue according to its handling of that situation. However, the server hang will cause more and more request threads to get stuck. You start seeing requests that, on the face of it, function perfectly, but the server is so flooded with hung threads that it can’t respond or responds slowly. Seemingly, a performance issue, but rooted somewhere completely different from where it is manifested. Eventually, you might restart the server, which will release all those hung threads, but the problem will only rear its ugly head again sometime soon – one of those annoying issues that are really hard to reproduce.
- Memory issues
There is a variety of memory issues you may encounter. If your application keeps consuing memory, it will eventually crash with an OutOfMemoryException, but it can get much more subtle than that. If your garbage collector doesn’t clean up running objects from memory, they may still be executing code causing wrong behavior, and the garbage collector will be working overtime and affecting performance.
- Performance issues
Hangs and memory issues often manifest as poor performance, but so will slow network or database requests or faulty caching functionality.
.NET Production Debugging tools of the trade
While .NET Debugging is not as straightforward in Production as it is in Development, there are many tools at your disposal.
In this post, I’ll go over the following:
- Windows Event Viewer
- Dump files
- Dedicated Debuggers (Visual Studio’s Debugger, dnSpy, WinDbg)
- Production Debugger (Ozcode)
Wait, there’s more.
There are, of course, more tools you can use, but to keep the length of this post sane, I won’t be covering:
- Log analyzers
- Performance Counters
- ETW Events
- Memory & Performance Profilers (dotTrace, PerfView, ANTS)
- Application Performance Monitoring (APM) tools (Application Insights)
- Error monitoring tools (Raygun, Application Insights)
In going through these tools, I’ll be using my reference ASP.NET Core 3.1 application, a simple price calculator that takes an input price, and calculates the output price which may be in a different currency and include a coupon code and/or an additional discount. Not much more than your standard “Hello World” with a simple UI.
.NET Debugging with Windows Event Viewer
Event Viewer shows a log of different system and application messages such as errors, warnings, and information. It comes built into the Windows OS and has been around since the release of Windows NT in 1993, so it’s readily available and easy to use. Let’s see how Windows Event Viewer can be used to get to the bottom of an application crash or a failed request.
When trying to convert $200 from “USD” to “EUR,” a request fails with a 500 Internal Server Error.
You don’t want that happening in your shopping cart now, do you?
Let’s Remote Desktop to the machine on which my server is running. ASP.NET logs all failed requests so I can pop up Windows Event Viewer to see what happened. Under Windows Logs | Applications, I can find the corresponding error log. The beauty of this is that it doesn’t require any setup. It’s just built into my Windows, and I can easily open it and access a lot of information.
We can see that the failed request path (basically, the request’s URL) was “/Price/Calculator/Calculate”, and below that, the stack trace shows that the value “eur” was not found. That already tells me a lot. There’s a mismatch between the currencies my UI allows and the ones my server expects. But where is that defined in my server?
So, here’s a neat trick.
I can just copy the call stack, and if I have JetBrains Rider or Visual Studio with ReSharper open on the corresponding project, it gets magically pasted into the Stacktrace window in my IDE. Then, with a few clicks up the call stack, it’s easy to find that the offending “EUR” value in my UI should have been “EURO”, and that’s what caused the exception.
Watch it in action.
The things you can do with dump files
procdump64.exe -ma -e [your-process-name] -ma: get the full memory dump needed to debug .NET Processes -e: to monitor unhandled exceptions (which is what has happened if your application crashed)if your applicationNow reproduce the crash, and ProcDump will create the dump file, which you can find in its installation folder. There are different ways to analyze dump files. You could use WinDbg again, but that’s kind of complicated for several reasons I won’t go into right now. I recommend copying the dump file to your development machine and analyzing it in Visual Studio, which is open on the same project that created the executable. When you open a dump file in Visual Studio, at first, you just get some general information such as the process name, its architecture, loaded modules, etc.
Under the Actions menu, you have different options for debugging.
Select Debug with Managed Only, and …
Not only do you get the exception details, but also the exact line of code that threw the exception. You can view local variables in scope, see the different threads, and even travel up the call stack and see the local variables for each call. This is a great debugging experience similar to what you get in Development.
But here’s the catch.
To get this great Dev-like debugging experience, you need to have the same source code that created your production executable loaded into Visual Studio. In practice, by the time you’re debugging an exception in Production, your codebase is going to be several builds ahead. To recreate the exact build, you need to go back to the exact Git version used, with all the same build parameters, etc. If a single character is off, this method won’t work. But there’s another problem. Production code is usually optimized, and this can really degrade your debugging experience. The JiT compiler inlines local variables, so you can’t map them to the source code in your Visual Studio project.
No locals. Bummer!
But the JiT compiler also changes code, so you may not even see the line of execution that threw the exception. (Sigh!)
Not your usual .NET Debugger
All other .Net debugging tools rolled into one
The last tool I’d like to show you in this post is Ozcode Production Debugger. It provides the capabilities of the other tools, but doesn’t require any correlation to your source code, and doesn’t interfere with the running of your application in production.
After connecting Ozcode’s agent to your running application, exceptions will just magically appear on the Ozcode dashboard. The screenshot below shows the two exceptions I described earlier in this post (an SSL exception, and the ArgumentException with the illegal “eur” value that caused a crash. All you need to do is click Debug to debug them.
Once you drill down into an exception, you get a lot of information:
- The line of code that threw the exceptions (i.e., where the application crashed)
- You can travel back through all the methods of the call stack.
- Examine locals variables at every stage…all the way up the call stack
So far, we have all the information you can get from dnSpy or from a dump file – but without really having to strain yourself to get it.
But it gets better.
You can also see the latest thousand log entries and even focus on the ones generated by the exception context.
And when the culprit is an HTTP request, you can see the incoming request – along with its headers, request body, as well as the outgoing request, query params, and all.
Let’s see it in action.
Pros and cons side by side
So, let’s put all of this together and compare these four fine tools. The real beauty is that you can pick and choose the best tool for the situation, and even use them together. None of them interfere with each other. The table below shows how each of these tools deals with failed requests and crashes:
Shows exception information
Stops server activity
Shows local variables
When code isn’t optimized
When code isn’t optimized
Shows source code
When matching symbols and source code
Shows execution line
When matching symbols and source code + code can’t be optimized
When code isn’t optimized
Stops latest logs
Stops HTTP requests and database queries