The embrace of DevOps has brought significant efficiency gains for organizations that have been willing to make the necessary investments in culture, process, and tooling. In teams that have successfully adopted DevOps practices, the different team members, from Development and QA, through to Operations work together in a smooth and predictable rhythm to roll out new capabilities that bring value to customers. This “rhythm of DevOps“ is one of the key factors responsible for the benefits DevOps brings to organizations, and production debugging fits right in with that rhythm.
DevOps has a much broader scope than just Dev and Ops, and for DevOps to work, different teams and processes need to be in rhythm:
- Design teams must work in rhythm with development teams working by the principles of Agile development
- Development teams must work in rhythm with QA testing their code
- QA must work in rhythm with Operations in charge of deploying releases
- Design, Development, QA, and Operations must all work in rhythm with customer demands and the needs of the business. These must all work together in harmony.
To keep customer value, and therefore, the business moving forward, all these different components of DevOps must be synchronized with the same rhythm. If one component fails, the potential bottleneck has a ripple effect on the whole process. For example, Development will get backed up if Operations is not keeping the pace for deployments. At Ozcode, we believe rapid and effective debugging is a critical extension needed in the DevOps value stream. If bugs are not quickly identified, triaged, and fixed, the rhythm of DevOps and the harmony between the teams will be broken. Let’s examine some of the pillars of DevOps to understand why effective production debugging is needed to maintain the rhythm and achieve DevOps excellence.
Automation and autonomous exception capture
“Automate everything” is one of the driving principles behind DevOps. It permeates through the pipeline from the developer’s workspace through to effectively monitoring applications and systems in Production. However, while automation has accelerated the DevOps pipeline, it has also added enormous pressure at every stage. More code faster also means more bugs in QA, staging, and Production; more risk of kinks in the rhythm of DevOps. Add microservices and serverless architectures, and the potential for debugging nightmares becomes scary. For example, how do you reproduce a bug that manifests across several microservices, or a bug in serverless code that is running one moment, and gone the next?
The answer is to automate catching those bugs in real-time as they happen. This is what Ozcode Production Debugger does with autonomous exception capture. Instead of having to accurately recreate a set of production microservices in a Dev environment (good luck with that), or recreate the exact environment in which a serverless function executed (even better luck with that), the Ozcode agent records the bug exactly as it happened in the runtime environment – with the decompiled code execution flow, variable and function return values, log files, call stack, event trace, network requests, database queries and more.
Now, not only does QA save time because the tester doesn’t have to work hard to gather all that information for a bug report, the developer gets everything needed to triage the error and really understand what happened – no guesswork or sifting through endless logs.
Collaboration in the context of an error
Collaboration is one of the key cultural aspects of DevOps, bridging the DevOps/Development chasm to bring together members of different teams across the DevOps pipeline. To keep the DevOps rhythm going, developers and QA need to understand Operations’ requirements and vice-versa. Real-time feedback enables effective communication and helps teams make changes to resolve errors quickly.
Ozcode Production Debugger promotes collaboration between teams across the DevOps pipeline. Teams are put into the same interactive debugging context by sharing a link among all relevant team members. Through this link, they can collaborate in real time in the collaboration panel – an extension to the debugging context through which team members can communicate in real-time – no matter where they are physically located.
The quality gates of your CI/CD pipeline
CI/CD has done wonders to shorten release cycles. Integration errors are now detected quickly, and thanks to short feedback loops back to the right developer, they are also fixed quickly. Now, shorter release cycles put pressure on QA to test more builds at each stage as they move up the CI/CD pipeline. Before a build gets promoted to the next stage, it must pass a set of quality gates, from unit tests, regression tests, performance tests, and more, each organization with its own policies. Ozcode Production Debugger is a quality gate that can dramatically improve the quality of builds as they move up the pipeline to production.
The Ozcode Production Debugger maintains a tally of each exception thrown by a build, and the number of times it was thrown. There are two ways these simple numbers are important quality gates. First, you can ensure there is no regression of an error that was supposedly fixed as new builds are released. Second, you can determine the severity of errors by the number of times they recur and set limits, so that builds with frequent errors do not get promoted. We will soon be releasing an API that will allow you to integrate Ozcode Production Debugger with your CI/CD tool to enable fully automated quality gates based on exceptions detected by the Ozcode agent.
Continuous monitoring for errors
DevOps does not end with the successful deployment of a build to production. Once deployed, an application needs to be closely monitored. The wide variety of Application Performance Monitoring (APM) products available on the market provide insights into a variety of performance KPIs for production systems. While these tools claim to help in diagnosing production errors, they fall far short of providing radical observability into the faulty code necessary for a root cause analysis of the error. Ozcode Production Debugger delivers continuous monitoring for errors where the APMs fall short.
With a lightweight agent that has no perceptible impact on the systems that it monitors, Ozcode Production Debugger complements performance monitoring with error monitoring that lets you set quality KPIs to complement those performance KPIs. With a high-level view of exceptions over time, Ozcode gives you a picture of system health with regards to errors and provides instant alerts when new errors occur in production, enabling a short MTTR.
Continuous debugging: the DevOps metronome
Bugs happen in production and pre-production environments. Wherever they occur in the DevOps pipeline, rapid resolution is critical in order to keep the DevOps rhythm going. An unresolved error at any step of the way can slow everything down, delay releases, and hamper productivity. Ozcode Production Debugger introduces the concept of continuous debugging to the DevOps pipeline in that it applies to QA, Staging, and Production environments, so it effectively enables debugging continuously throughout the DevOps pipeline. In each environment, it continuously detects errors and points to the exact location in the running code where they occur, dramatically reducing debugging time by up to 80% to enable rapid recovery and keep the DevOps pipeline moving forward. Though bugs will happen in QA, Staging, and Production, with Ozcode Production Debugger, they needn’t slow down the rhythm of DevOps.