Famed computer scientist Donald Kunth once wrote that premature optimization is the root of all evil. I dare say that the problems the famed algorithmist faced are slightly more advanced than those most of us run into on a daily basis. Nonetheless the rule has been repeated and passed down from programmer to programmer over the years. The crux of the idea is that until you know there is a problem with your code the time spent optimizing it could be better spent building new features, fixing bugs (in the unlikely event you have some in your code) or refactoring.
All that being said, there are certainly times when improving performance pays for itself. Faster to load websites, quicker to run algorithms and lower resource utilization do afford advantages. User conversion rates have frequently been cited as being strongly tied to page load speeds and in a cloud based world the cost of running code can be measured in actual money due to consumption billing.
Fortunately, if you’re developing on the .NET platform there are some really fantastic tools to find the problems which deserve your attention. We’re going to look at a couple of them in this article.
First – we need a problem that needs some optimization.
Every year a fine group of people set up a Christmas advent calendar of challenging computer programming problems. Solving them has kept me up until 2am a number of times. As the problems grow in difficulty it becomes obvious that some data structures are better than others for solving them. Let’s look at one in particular: Day 9 from 2018
I’d encourage you to try solving the problem yourself but for the time being I can summarize: the problem involves jumping around an ever-increasing sized data structure according to a set of rules. When we get to the end of the data structure we loop back to the beginning.
Let us first look at a naïve implementation which uses an array to hold the numbers.
The run time for this solution is pretty long: a couple of seconds. In the second part of the puzzle we have to multiply the input by a factor of 100. This giant increase in problem size rapidly exhausted my patience and left me wanting to know what the deal was.
The first tool I chose to use was simply the memory and CPU profilers written into Visual Studio. These typically show up in the debugging view.
Right away we can see that the CPU isn’t doing a whole lot in this application because it is more of a memory bound process.
We can get more into the memory usage by opening up a session in the Performance Profiler. This tooling is available from the Analyze menu and will run the application with a bunch of instrumentation turned on. I ran mine using the performance wizard
Then I selected .NET memory allocation
With this enabled I was able to look at some charting and see what looks to be the problem with this application.
Inserting into the collection is a very busy operation. In fact, it consumes 99.8% of the memory allocations. That’s a really good road map to point us toward the actual problem. This memory problem is both more and less difficult than a typical one. We have just a single large object which is consuming all the memory allocations. It is simple because we know exactly where to focus our effort but we’re sort of at the end of the road on knowing exactly what the problem is other than it is with the list insertion.
Let’s try leaning on another solution: dotTrace. This is a tool from JetBrains who you might know from making the famous Resharper tool. It works in a similar way to the Visual Studio profiler.
Looking at the output from the profiling run we can see much of the same information that we had in the VS Profiler.
There we see a slightly different breakdown of performance. The insert operation is still very expensive and now we’re seeing a little bit of time being spent removing items from the list. One of the interesting things you can drill into with the Performance Profiler is the large object heap. On this we can see that we’re allocating a massive 8MB block of memory. I know this doesn’t seem like a lot but this snapshot was taken pretty early in the run. That so little memory was allocated is actually indicative in and off itself – it means that a lot of time was spent moving data around inside the memory.
Ideally, we want to avoid the large object heap because it has some implications on the performance of the garbage collector. Now we need to understand why that’s happening and how to avoid it. The key observation is that inserts are taking quite a long time.
The trouble with lists
Because we’re using a List under the covers every insert into a random location in the array actually shift the entire array above that index. Similarly, when you remove an item from a list then the entire array on which List is built is shifted around. This is highly inefficient because it causes a lot of memory rewrites. This isn’t very impactful when working at a small scale but when every iteration moves an average of 4MB of memory up or down one sizeof(int) in an array then performance plummets.
As it turns out inserting into an array and removing from an array is in O(n). Perhaps there is a data structure which is better for this sort of scenario. Our access patterns always involve moving locally in the data, that is we move up to the next item or back to the previous. If we encounter the end of the structure we loop back to the beginning. This sounds an awful lot like a problem which should be solved using a doubly linked circular list.
There is a very handy linked list implementation of linked list in the .NET framework. As it turns out this implementation is a doubly linked list so converting it to a circular list involves simply checking if next is null and if it is returning the head. Moving to the next node in the list is O(1) as is deleting and adding. If we update the code to use a linked list we find a very significant change to our memory allocation.
Vastly more memory was allocated in this version – up to 312MB – yet the process ran far more quickly. The 312Mb of allocation is enough to solve the entire problem were the 8MB was from when I gave up and stopped the profiling after about 60 seconds. In reality, we used about the same memory as we would have in letting the first version finish so this indicates that we’re making much more efficient use of the memory.
What we’ve seen so far is that both of these two profiling tools have given hints as to what the problem is but neither of them has been conclusive in pointing at the culprit. This is actually something that OzCode can do. In the array version I added a breakpoint about 10 000 cycles in to the main loop. Once I hit that breakpoint I opened up the enhanced collection debugging box that OzCode supplies.
Scrolling down to the bottom of the list I added a tracepoint on one of the final entries in the list
The tracepoint allows us to quickly see the value at that index as it changes. Running the application for a few more seconds it becomes apparent that this memory address is being constantly rewritten because the array is shifting.
The profiling tools gave us a clue as to the problem but it wasn’t until we used OzCode that we gained the insight to see exactly what the problem was and how to fix it. This is just one of the many features of OzCode which can help you be more efficient in your development.
We’ve looked at a couple of debugging tools which helped us zero in on the problem. Once we had the solution it was as case of digging into the implementation of List under the covers and fix the problem. In this scenario a simple optimization offered a huge advantage to the runtime and memory usage of the application. Tooling will get you most of the way there but, fortunately, we do need to continue to use our brains to come up with the actual fix.