Where is Live Debugging Going in 2021 and Beyond?

Industry pioneers answer the most burning questions on debugging in live production and pre-production environments

Ozcode Cloud Icon

Table of Contents

Introduction

Over the years, the software industry has undergone many paradigm shifts that sprouted new technologies. A move to distributed architectures like microservices and serverless brought us products like Docker, Kubernetes, AWS Lambda, and Azure Functions. Performance and error monitoring tools evolved to APMs, which have now become full-blown observability platforms. Today we are in the midst of a revolution in debugging live systems.

Debugging in development is pretty much the same as when I was a developer over 20 years ago. Reproduce the error, place breakpoints in your code, and hit F5, F10, and F11 (or some other key combination) to start the debugger and step through your code to understand what went wrong. Not so for production.

Debugging in production has always been vastly different. For one, developers don’t usually have access to their production environments, so they have to try and reproduce errors in their development IDEs, a near-impossible feat in the age of distributed cloud computing. Even if developers were given access to production, they wouldn’t be able to put breakpoints in the code to examine the application state.

All that is changing.

Over the last few years, a new industry segment has emerged that focuses on changing how we debug live applications in production environments to make the process as straightforward as in development. Several pioneering companies have entered this new field aiming to accelerate the resolution of errors in production (which still do and always will occur) without incurring any downgrade to service. As a new field, the industry hasn’t yet settled on a name, so you may have heard about:

  • Continuous debugging
  • Live debugging
  • Modern debugging
  • Remote debugging
  • Code-level observability
  • Autonomous debugging
  • Software understandability
Ozcode

Earlier this month, Adam LaGreca from 10K Media hosted a roundtable with some leading players in this emerging field:

Senior Solutions Architect

Co-founder and CTO

Co-founder and CEO

Business Unit Owner

The rest of this article summarizes the sentiments of these trailblazers in the art of debugging across the SDLC in their responses to Adam’s questions. For the complete recording, scroll down to the end of the article.

Why has live debugging in production become so difficult?

Software has changed in all respects. How it’s developed, how it’s deployed, and even in our expectations from software developers. Monoliths have been replaced by microservices and serverless. Waterfall has been replaced by agile, which then sprouted DevOps. The whole infrastructure is different. A LEMP stack won’t cut it anymore. You now have multiple instances of your software running in different locations, managed by a load balancer at an insane scale.

For the modern software developer, it’s no longer enough to just deal with the code… The core problem is containing the overall program in your head. You have to figure out all the moving parts at runtime in production to understand better what is going on.

Tom Granot, Lightrun

While we keep doing bigger and greater things with these new software architectures, one of the consequences of this increasing complexity is that it’s extremely difficult for a programmer to understand where in the chain of causality things break down and why. The typical debugging paradigm is several cycles of adding more logs, going through a lengthy CI/CD cycle, and shipping out a patch for debugging.

Currently, [the industry] uses very laborious slow steps, which ultimately inhibit developer productivity.

Omer Raviv, Ozcode

But developers don’t encounter these issues only in production. Today’s pre-production environments are also so complex that developers cannot recreate them on their local machines. Between large Kubernetes clusters, external dependencies, 3rd party libraries, and APIs, replicating an environment to reproduce a bug is very cumbersome, and fixing bugs is a long and frustrating process for engineers.

Engineers are the main cost center for the enterprise. They should be spending more time building rather than fixing.

Shahar Fogel, Rookout

And yet, today, we live in an environment of high expectations. Customers demand performance and new functionality delivered frequently; applications must run smoothly 24/7, and maintenance downtime is not acceptable. A slip on any of these parameters immediately affects the business. These high demands place a heavy burden on developers’ shoulders. But when it comes to debugging, the tooling has not kept pace and doesn’t address all these new challenges that developers face. There’s a faulty feedback loop that doesn’t provide developers with enough information.

Development is no longer enough. It’s a game of “you build it, you run it, and it had better be secure.”

Berkay Mollamustafaoğlu, Thundra

To compound the problems developers face, software is much more diverse than it was 10 – 15 years ago. Teams choose the best language and platform to develop any particular service, so developers need a much broader base of knowledge and capabilities than before.

As time goes by, applications are not properly maintained. Technical debt creeps in, and teams change until finally, nobody really understands how the application works.  Enterprises are hesitant, even scared, to touch legacy code, and it can be daunting to chase down bugs.

Developers are forced to fly blind. While DevOps and SREs have production data at their fingertips, development teams rarely get that data.

David Thacker, NerdVision

The challenges of debugging “the old way”

The first challenge a developer faces when fixing a bug is trying to reproduce it. Sometimes, even the most detailed QA reports don’t provide enough data, and QA engineers find themselves arm-wrestling with developers over the proverbial “It works on my machine.” This kind of development/QA friction, even for pre-production environments, wastes valuable development resources and ultimately results in lower quality code. It’s even worse for bugs in production.

The increased complexity of modern production environments makes it nearly impossible to reproduce production issues in a developer’s local environment. Cloud computing poses many obstacles, and between the growing trend to use feature flags and specialized configuration for different customer environments, it’s even not feasible to recreate the environment where a production error occurred.

The “Aha” moment for me was that the problem must be debugged exactly where it occurs. To try and move it somewhere else is very time-consuming.

Berkay Mollamustafaoğlu, Thundra

The most common production debugging tool in current use is static logging. There’s this built-in, traditional reliance on logs. While logs can provide first-responders with many insights, they’re not a productive tool for fixing production errors. As customer demands push companies to ship versions to production at an ever-increasing pace, there’s the corresponding demand for developer velocity to increase. But to fix bugs quickly, developers need feedback. In development, they get initial feedback from their compilers, then from static code analysis tools, then from their CI/CD pipeline. But when it comes to debugging in production with static logs, the feedback becomes prohibitively painful. Developers never have the logs they need in place and have to go through multiple cycles of log-only builds to get it.

If you know what data you’ll need to solve a bug ahead of time, you’ll fix the bug beforehand. That’s the paradox of static logging.

Omer Raviv, Ozcode

Modern debugging tames logs

Static logging has been commoditized by companies who are making it cheaper, faster, and more robust. But this mindset of “log everything and analyze later” incentivizes engineers to write reams of logs without investing enough thought into what they’re logging. The result is very noisy output, most of which is never viewed or analyzed anyway. When a problem arises, engineers immediately race to their logging tools but are swamped by the over-abundance of logs that don’t provide the right data.

They know approximately where the error is, but the quantity of logs is huge, and the quality is low.

Shahar Fogel, Rookout

Logging needs to be more ergonomic for developers. The tools should feel familiar like the “GitHubs” of the world, so developers feel comfortable with them. Developers need easy access to the information they need without having to endlessly scroll through a browser window.

Ergonomics is everything to developers.

Tom Granot, Lightrun

Modern debugging tools take a different approach and treat logs dynamically. They empower developers to add logs to live code, when and where they need them, simply and securely. This enables them to capture the data they need without the noise of reams of useless static logs. These tools close the gap between errors in production and the code that caused them providing developers with the data they need to fix those errors.

Developers are now looking at dynamic logs to capture the data they need without the noise.

David Thacker, NerdVision

The magic behind dynamic logging

The magic behind modern debugging tools can be broadly categorized as dynamic instrumentation. This is a technology borrowed from the cybersecurity space that changes applications at runtime. By manipulating byte code, modern debugging tools change live code to output logs anywhere in the application that the developer wants to inspect. Essentially, at the click of a button, developers can get the data they need without adding code, without stopping the application or affecting its performance, and without needing anyone else in the organization.

You can do things in production without customers noticing that anything’s happening in your application.

Shahar Fogel, Rookout

The front end of these tools is as important as the back end. They use UX patterns and paradigms that developers are familiar with, either integrating directly with popular IDEs or presenting themselves in an IDE-like user interface. The act of adding a dynamic log entry is virtually identical to adding a breakpoint, which developers are so familiar and comfortable with. Indeed, the different terms used to name this feature include non-breaking breakpoints, tracepoints, snapshots, data points, and more.

Developers don’t have to enter an entirely different world when they need to debug an application in production or pre-production environments.

Berkay Mollamustafaoğlu, Thundra

Modern debugging, observability, and understandability

The concept of observability has been widely embraced, with several companies becoming well-entrenched in the industry. The premise of modern debugging tools is also around observability, but at the code level. Traditional observability tools will show you the state of your machines and maybe even your applications. They may identify a spike or a crash and pinpoint it to a server, cluster, or application instance. But these parameters don’t provide enough data to debug production issues.

Observability tools look at the higher level. They don’t touch on the line-by-line code-level details that underly the running application.

Tom Granot, Lightrun

You could think of it as the contextual data an engineer wants to see in a JIRA ticket in order to make decisions and solve problems based on line-by-line data such as local variables, method parameters and return values, and stack traces. This data helps the engineer understand exactly what caused that spike or crash, get to the root cause, and fix it quickly.

60-70% of debugging is just understanding what went wrong. Fixing it is then the easy part.

Shahar Fogel, Rookout

The difference between traditional observability and code-level observability mirrors the difference between how IT/Ops handle incidents compared to developers. IT/Ops and SREs may be the first responders to alerts on traditional observability platforms, but when they can’t fix an issue and conclude the problem is deeper than a cluster or a machine, there’s the proverbial “throwing over the wall.” However, developers can’t act on those logs, metrics, and traces that the SREs are throwing at them. Developers need debuggers so they can delve deeper. So, there’s this chasm between IT/Ops and developers. As the integration between traditional observability and code-level observability improves, this chasm will be bridged, and collaboration between the teams will improve. But the collaboration extends beyond Dev and Ops. There’s also Dev/QA and even Dev/Dev collaboration. As the different teams intensify their collaborative efforts, development organizations will become more powerful because developers don’t work alone. There’s this fallacy that debugging is a solitary effort; a developer sitting in front of a bright screen in a dark room trying to figure out complex interactions between the moving parts of an application. In reality, putting IT/Ops, Dev, and QA in the same context will help the developers assemble the different pieces of data to solve the issues they’re debugging.

Our space will eventually create a world in which debugging an issue in production will be done with the same level of knowledge sharing that we have on a GitHub pull request.

Omer Raviv, Ozcode

Hurdles to overcome

In spite of the clear benefits this technology brings to software, companies in this space encounter a lot of resistance.

Old-school mindset

In general, people resist change, and that’s no different when it comes to introducing new technology to software organizations. Developers are so used to having to reproduce errors in their local environments and then adding logs to try and understand what caused them. It’s the first course of action when tackling production issues.  Even if agents do get installed in production environments, developers have this knee-jerk reaction of,” We don’t have access to production.”  However, as the complexities of the cloud make this approach less and less feasible, awareness of the alternatives will grow, and adoption will increase.

As developers discover this approach as an alternative to drowning in metrics and logs, market adoption will increase.

Berkay Mollamustafaoğlu, Thundra

Integration with observability platforms

Observability platforms were in a similar situation about ten years ago, but with widespread adoption, they have become the first line of defense for production issues. Modern debugging tools aren’t trying to replace observability platforms (and anyway, no enterprise is going to abandon observability). These are complementary technologies, and modern debuggers will have to play nicely with observability platforms. There are different ways to make these two classes of tools work together. Some approaches focus on keeping the products separate, communicating via APIs; others go for full-fledged integrations. Either way, working together with observability platforms with a smooth and intuitive workflow is a must.

Maturity and security

As an emerging category, companies are very concerned with both the maturity and security of modern debugging tools. They all require installing agents in the customer’s production environment, and fundamentally they are modifying the customer’s code.

We’re asking customers to put our agent in production and immediately hear from security officers and Ops. “Whoa, breakpoints? Production?” We all go through these processes but need to give them the confidence that we won’t break anything.

Shahar Fogel, Rookout

The pioneers in this market are well aware of and understand the security concerns, so they build security into their products as a primary feature from the ground up. Modern debuggers offer highly configurable redaction capabilities for personal data. They store and transfer data in compliance with the strictest security requirements, offer fine-grained access control, and audit access to sensitive data.

If people install our agents on their production servers and something goes wrong, we don’t have a business.

Omer Raviv, Ozcode

Resistance is futile

Trust in the maturity and stability of modern debuggers will come in time, and it will be sooner than most people realize. Things are moving fast, and the software industry is moving faster, especially since the COVID-19 pandemic revved up companies’ digital transformations. Cloud computing is quickly becoming the industry standard, and in 2020 it already accounted for 83% of workloads. Unless companies want to waste a lot of time debugging production issues, modern debugging tools will replace log-based debugging to become the industry standard in their own right. Look at what happened with front-end development. Web clients became the core of interaction between users and your application, so from JavaScript, we moved to frameworks like JQuery and then transitioned to Angular and React. The front-end tooling evolved to meet the need for multiple teams collaborating to add functionality at an exponentially increasing pace. The same is happening with live debugging. As the industry reconciles its need for better, stronger, and faster tools, live debugging in production will progress as front-end development did.

[Modern debuggers] will become the industry standard. As this technology becomes commoditized, the cost of not having it will be greater than the cost of an enterprise-wide license.

Omer Raviv, Ozcode

The need is there. The technology is evolving. Resistance is futile.

Watch the Recorded Roundtable

Live Debugger

Bridge the gap between finding errors in production and pinpointing their root cause in code

Thanks for downloading the OzCode trial!

You’re well on your way to making C# even sharper.

If your download doesn’t start automatically , please use this direct link.

If you’d like to install OzCode but don’t have
administrative privileges on your machine, please contact us.

Get Started for FREE!

Ozcode Logo

This website uses cookies to ensure you get the best experience on our website.