Ozcode https://oz-code.com Debug like a Wizard Wed, 16 Jun 2021 14:11:22 +0000 en-US hourly 1 https://wordpress.org/?v=5.7.2 https://oz-code.com/wp-content/uploads/2020/03/ozcode_logo.svg Ozcode https://oz-code.com 32 32 What Every CTO Needs to Know About Live Debugging https://oz-code.com/blog/production-debugging/what-every-cto-needs-to-know-about-live-debugging Wed, 16 Jun 2021 13:46:48 +0000 https://oz-code.com/?p=18839 Live debugging supports one of a CTO's most important roles; using technology to generate value for the company and drive the business.

The post What Every CTO Needs to Know About Live Debugging appeared first on Ozcode.

]]>

Over the past decade, companies across industries and around the world have had to undergo several transformations.

In 2011, software started eating the world.

By 2013, every company became a technology company.

By 2019, every company became a software company.

And by 2020, every company became a DevOps company.

To stay competitive, CTOs had to navigate their companies through these transformations successfully, highlighting one of a CTO`s most important roles; to use technology to generate value for the company and achieve its business goals.

So, let me show you how live debugging serves that role. The TL;DR you need to know is:

You need it because bugs in production are consuming valuable developer resources and costing you money.

It’s efficient and effective in helping you make your production software more robust and enabling you to recover more quickly from production errors.

It works with modern software architectures.

It’s secure and compliant.

It integrates with observability platforms.

But before we go into all the reasons why, let’s look briefly at what a live debugger is.

What is a live debugger?

A live debugger is a tool or platform you use to resolve errors in your live production and pre-production environments like QA and staging. The production use case is naturally the most acute one, where resolving errors can become an emergency. However, applying live debugging in QA and staging will also accelerate your developer velocity and make your production software more robust while doing wonders for your DevOps KPIs.

The idea of debugging in production is not new. Over the years, tools like event viewers, log files, dump files, Application Performance Monitors (APMs), and others have helped developers resolve errors in production. However, none of these tools are ideally suited for the job. They are either intrusive to your production environment, requiring downtime and dramatically impacting performance, or they’re not very effective in determining the root cause. They don’t provide enough data and may require multiple CI/CD cycles to deploy new builds specifically designated for debugging.

Modern live debuggers modify the code in your live environments non-intrusively using byte-code instrumentation to generate the data developers need in two ways: recording the complete error execution flow of an exception along with all the debug data, and adding dynamic log entries on the fly, without having to rebuild the application. While APMs and Observability platforms have been using byte code instrumentation for the last ten years to generate various metrics displayed in beautiful graphs and charts, applying it to generate debug data is relatively new.

Why you need a live debugger

Amazon CTO Werner Vogel famously said, “Everything fails all the time.” But this is not new. In fact, it’s as old as the original Murphy’s law, “Anything that can go wrong, will go wrong.” No matter how many safeguards you have in place, you will have errors in production. It happens to the best of us. From UI glitches…

Live Debugger for CTOs - Ozcode
UI Glitches in TripAdvisor, United Airlines, and Amazon

Source: Applitools.com

…to crashing company stocks and knocking spaceships out of orbit. A quick look at downdetector.com will show you that at any time, household names like Comcast, YouTube, Instagram, AT&T, Verizon, Microsoft, and others experience outages that Gartner estimates can cost companies thousands of dollars per minute.

Ozcode Production Debugger - LEARN MORE

Live debugging is efficient and effective

Debugging in production is hard. Production environments are usually far too complex to recreate, and reproducing production errors for debugging can be impossible. You can’t put breakpoints in production. Even with modern, sophisticated log analyzers and distributed tracing, log files don’t usually contain enough data to determine the root cause of an error. In most cases, you don’t have access to the production environments running your applications where the errors occurred.

A live debugger overcomes all of these hurdles. Capturing exceptions along with the complete error execution flow removes the need to reproduce production errors. The full application state along the error execution path is available for a developer to step through very much as she would do in development. This is what we call time-travel debugging. Dynamic logging with tracepoints (aka, non-breaking breakpoints) provides a similar degree of observability into the code for logical errors that don’t generate exceptions. The developer can simply add log entries to investigate the application state anywhere in the code without having to rebuild and redeploy the application.

Between autonomous exception capture and dynamic logging, a live debugger offers the developer a fast path to the root cause of an error which slashes debugging time by up to 80%.

A live debugger overcomes the challenges of modern software architectures

Modern software architectures present special challenges when debugging live systems. It’s almost impossible to follow the complex code execution path of errors as they traverse a multitude of redundantly deployed, ephemeral microservices, with intermittent database requests, networking, and messaging, all while generating terabytes of log entries.

Fortunately, autonomous exception capture tracks execution flow across microservices, so you can follow the path of an error from one microservice to another and examine the application state at each step as the error unfolds. Similarly, for logical errors, you can place dynamic log entries anywhere in your code and trace the execution of an action across all the microservices in your application.

A live debugger is secure and compliant

A developer needs access to production data to understand the nature of an error. However, a variety of privacy regulations place restrictions on the production data you are allowed to expose. A live debugger finds the optimal balance between these two mutually exclusive forces with highly configurable PII redaction capabilities and enhanced data controls. Data can be redacted according to regular expressions, identifier names, or whole classes and namespaces, providing granular control over what is exposed to the developer. Data configured for redaction is masked before it ever leaves the production environments, and as a backup, data is redacted again at the front end to cover possible changes in redaction configuration. Moreover, the live debugger admin has complete control over data retention policies and can explore an exhaustive audit trail of data access.

Ozcode System Architecture with PII Redaction

Ozcode Production Debugger - LEARN MORE

A live debugger complements and integrates with observability platforms

You may wonder why you need a live debugger if you already have a modern observability platform (such as New Relic, DataDog, Logz.IO, Dynatrace, etc.) closely monitoring your systems. These sophisticated platforms provide many capabilities like log analysis, performance monitoring, error monitoring, and more, displaying a host of system metrics you can follow. While these capabilities are indispensable for the daily maintenance and monitoring of modern software systems, none of these platforms provide the code-level observability needed to do an effective root cause analysis of production errors. At best, they will register an exception and show you the relevant stack trace, but only a live debugger will let you drill down into the complete error execution flow to analyze the error, step-by-step, to determine its root cause. True, a live debugger cannot replace your observability platform; rather, it is complementary and provides data at the next level of detail needed to resolve production errors. Observability platforms will point you in the right direction, but you need a live debugger to drill down and determine the root cause of errors.

Ozcode Live Debugger Complements APMs

The business of technology

As a CTO, you make strategic decisions about your company’s technology stack. Out of the countless tools available, you have to choose those that will make the most impact in driving your business forward. I may be biased, but it seems clear that a live debugger is a no-brainer. It’s necessary infrastructure that’s as important as your observability platform, if not more. Not only will it make your products more robust by preventing faulty deployments from reaching production, but it will also drastically cut the time to resolve those errors that do slip through the barricades of your DevOps pipeline. Whether it’s reducing the time your developers spend on debugging in production (liberating them to add new features and value to your products) or reducing the number of customers impacted by an error, the ROI is huge and will immediately be reflected in your company’s bottom line.

Ozcode Production Debugger - LEARN MORE

The post What Every CTO Needs to Know About Live Debugging appeared first on Ozcode.

]]>
5 DevOps KPIs Live Debugging Can Improve https://oz-code.com/blog/devops/5-devops-kpis-live-debugging-can-improve Thu, 03 Jun 2021 08:56:56 +0000 https://oz-code.com/?p=18534 A live debugger can improve DevOps KPIs. Learn what code-level observability can do for MTBF, MTTD, and MTTR.

The post 5 DevOps KPIs Live Debugging Can Improve appeared first on Ozcode.

]]>

Twelve years after Patrick Debois coined the term “DevOps,” it’s clear that DevOps is here to stay. However, not all DevOps adoptions are equal. The 2019 Accelerate State of DevOps Report showed that companies on the most successful end of the spectrum could deploy changes on demand within an hour with a failure rate of less than 15%. However, when DevOps adoption falters, changes can take months to deploy, and about half of them fail. To put yourself on the right end of that spectrum, you must keep track of your performance against industry-standard DevOps KPIs. Thoughts of improving your DevOps adoption instinctively take you to testing, automation, collaboration, and other pillars of DevOps. I’m here to tell you that adopting a Live Debugger is increasingly becoming a trend among DevOps engineers, who find that this tool has a real positive impact on DevOps KPIs.

What is live debugging?

Live systems have errors that are often severe enough to cause an outage. In some cases, the errors are at a system level, and the different observability platforms available on the market provide enough information to determine their root cause and fix them. However, in many cases, the errors are at code level.

Bugs in production

Bugs in Production - Ozcode

To fix these bugs, you need the four pillars of live debugging so you can closely examine your production code and data along the error execution flow. And assuming a bug has not crashed your application (which doesn’t necessarily reduce its severity), you need to debug it without interrupting your customers’ experience in any way. Let’s see how live debugging in production and pre-production environments can do wonders for your DevOps KPIs.

Ozcode Production Debugger - LEARN MORE

The live debugging connection to DevOps KPI

There are many KPIs you could be monitoring to assess your DevOps adoption, and using a live debugger can dramatically improve several of them.

DevOps KPIs can be improved with a Live Debugger - Ozcode

Change Failure Rate

This KPI is one of four key metrics identified by Google’s DevOps Research and Assessment (DORA) team that indicates how a software team is performing. It measures the percentage of deployments that cause a failure in production and is an indication of the product’s stability.

\(Change Failure Rate = \frac{\displaystyle Deployments\,causing\,a\,failure\,in\,production}{\displaystyle Total\,deployments}\;x\;100\%\)

In the 2019 Accelerate State of DevOps Report, DORA found that Elite DevOps teams had a seven times lower change failure rate than low performing teams (i.e., deployments are only 1/7th as likely to fail).

Source: 2019 Accelerate State of DevOps Report

How does a live debugger help reduce change failure rate?

By shifting debugging left to pre-production. Here’s what you can do.

Install the Ozcode agent alongside your application on your staging environment. Your application will, most likely, throw exceptions. Any of those exceptions could cause a deployment in production to fail and increase your change failure rate. Ozcode will catch all of those exceptions autonomously and provide you with the full time-travel debugging information to fix those errors ON STAGING.

Fewer bugs on staging means fewer bugs in production and a lower change failure rate.

Defect Escape Rate

This term is also quite self-explanatory and is a measure of defects that “escape” your pre-production systems and get into production. The calculation is quite simple:

\(Defect\,Escape\,Rate\;=\;Bugs\,in(\frac{\displaystyle Production}{\displaystyle Production\,+\,Pre-production})\)

If your Defect Escape Rate is too high, you should re-evaluate your deployment frequency. You might find that you’re rushing through QA and staging to meet release deadlines. The consequence is more buggy code (ergo, a less stable application) in production which can cause anything from a loss of reputation to direct loss of revenue. As in the case of Change Failure Rate, using Ozcode on your staging environment and even on QA can reduce the number of bugs that escape and make it to production, hence lowering the Defect Escape Rate.

Mean Time to Detection

How long will a defect exist in production before you detect it? You want this number to be as low as possible because the earlier you detect a defect, the earlier you can fix it, so your customers will be less likely to experience it.

There are two primary factors affecting MTTD:

  1. When an incident occurs
  2. How long it takes you to detect it.

You have no control over when an incident occurs. An incident is unexpected; otherwise, you would have already implemented a fix to prevent it. Once you detect an incident, you can review your log files or monitoring systems to timestamp when it first occurred. With that data, it’s a simple calculation of detection time minus start time. MTTD is an average over any time interval you choose. Consider this example of an organization detecting three incidents:

Start timeDetection timeElapsed time (min)
4:26 pm5:02 pm36
3:05 pm8:51 pm346
10:15 am10:17 am2

For the time interval of this sample, MTTD = (36+346+2)/3 = 128 min.

Now, there are different ways you can assess your MTTD, for example, by removing outlier values or segmenting by incident severity, but that’s a topic for another post.

The time taken to detect an incident depends on whether it’s caused by a logical bug or a software error. You may only detect a logical bug once a user (or preferably, one of your own QA staff who is testing in production) reports an issue. Typically, MTTD for this kind of bug will be longer. On the other hand, a software error usually throws an exception, and in these cases, Ozcode Live Debugger can dramatically reduce MTTD.

As soon as your application throws an exception, Ozcode captures the error execution flow and displays the exception on the dashboard.

While you can get similar detection capabilities from modern APMs, with Ozcode’s Live Debugger, you can just click one of those exceptions to debug it directly. Essentially, MTTD for exceptions should evaluate to Zero, and from here, you’re in a race to reduce MTTR, which I discuss below.

Ozcode Production Debugger

Mean Time Between Failures (MTBF)

MTBF is another indicator of your software’s quality. It stands to reason that the more robust your software, the less likely it is to fail and the more available it will be to your customers. Here too, the calculation is quite simple:

\(MTBF = \frac{\displaystyle Total\,operating\,time}{\displaystyle Number\,of\,failures}\)

For example, if a system failed 4 times in 24 hours, and the total outage time was 2 hours, then for that 24-hour period:

MTBF = (24-2)/4 = 5.5 hours

MTBF goes hand-in-hand with Defect Escape Rate and Change Failure Rate in that improving those KPIs is likely to have a positive effect on MTBF. Using Ozcode to reduce the number of bugs in your pre-production environments will help deploy more robust releases and thus improve (i.e., increase) MTBF, but Ozcode can also have a direct effect on MTBF. By reducing your system downtime (i.e., the recovery time – see MTTR in the next section), Ozcode directly increases total operating time, and therefore, your MTBF.

TIP:
If your system uptime is approaching your MTBF, start taking extra care with new deployments and closely monitor your operations.

Mean Time to Recovery (MTTR)

MTTR measures how quickly you get your service running again after it crashes. This is probably one of the best-known DevOps KPIs because it relates to managing an emergency situation. Business stakeholders are watching the clock, pagers are beeping in Operations, and developers are getting phone calls in the middle of the night. Basically, anyone who cares is burning the midnight oil to get your systems up and running again. Here’s the calculation:

\(MTTR= \frac{\displaystyle Total\,downtime\,due\,to\,failures}{\displaystyle Number\,of\,failures}\)

Using our MTBF example again, MTTR for that 24-hour period is 2 hours/4 failures = 0.5 hours.

MTTR is an indication of how quickly you can respond to an outage and fix it. The quicker you can debug an issue that crashed your system, the lower your MTTR will be, and this is where Ozcode can help. Ozcode can reduce the time to debug an issue by up to 80% because:

  • There’s no need to reproduce the issue. Ozcode records the complete error execution flow directly on your production environment.
  • The time-travel debug information that Ozcode captures provides the developer who has to debug the issue with all the production data they need.

Providing the developer with this kind of code-level observability into the production system’s error state and allowing them to step through the error execution flow is another form of “shift-left debugging,” and it dramatically reduces the time from failure to recovery.

Bridging Dev and Ops with live debugging

DevOps engineers have recognized the value of having a live debugger in their enterprise tool stack. While they are the first responders to production incidents, they understand the need to bridge the gap to developers who must fix the bugs that cause those incidents. To understand exactly what went wrong, developers need access to production data so they can step through the execution flow of an error exactly how it happened with full visibility into production data at code level. With that data at their fingertips, developers can improve the quality of code in production without sacrificing release velocity, thereby improving DevOps KPIs.

Ozcode Live Debugger

3 users, 10 monthly agents, 100K events – ALWAYS FREE Ozcode Production Debugger - LEARN MORE

The post 5 DevOps KPIs Live Debugging Can Improve appeared first on Ozcode.

]]>
The Road to Observability from System Down to Code https://oz-code.com/blog/production-debugging/the-road-to-observability-from-system-down-to-code Mon, 24 May 2021 15:06:53 +0000 https://oz-code.com/?p=18489 When Datadog shows you something has happened with your software at a system level, Ozcode takes you on an observability journey down to code level.

The post The Road to Observability from System Down to Code appeared first on Ozcode.

]]>

Datadog is an industry-leading observability platform and brings a wide variety of observability data into one integrated view. From details captured in your processed logs, Datadog lets you switch to traces to see how the corresponding user request was executed.  In case of an error in your software, Datadog displays the full stack trace and then lets you use faceted search to drill down into the corresponding traces and logs to determine the cause of the issue. Datadog continuously monitors your production environment and provides system-level alerts such as traffic spikes, elevated latency, or looming bottlenecks to help you troubleshoot issues and keep things running smoothly.

The system-level data Datadog provides goes a long way to determining the root cause of errors, but in many cases, observability at the level of logs, metrics, and traces does not provide enough information to understand what really went wrong with your application. Think of it like a car. If you see the temperature gauge rising, you might guess that you need to top up the radiator fluid. But are you sure that’s really why your engine is overheating? Is there a leak in your cooling system, or is the engine overheating because you’re losing oil? To find out, you have to pop the hood with Ozcode.

Popping the hood on your production environment

When Datadog shows you something has happened with your software, Ozcode pops the hood and takes you on an observability journey from system level down to code level. Datadog can provide a great starting point, showing you anomalies in metrics and even the stack trace of exceptions. From there, you go to Ozcode.

To investigate anomalies surfaced by DataDog, Ozcode lets you add dynamic logging using tracepoints. You can add these log entries on the fly to your live running code without having to deploy a new build through your CI/CD pipeline. Using dynamic logs to reveal the value of locals, variables, method parameters, and return values anywhere in your code goes a long way to exposing the root cause of an incident.

Ozcode also pops the hood on exceptions. Ozcode autonomously captures any exception that your application throws along with full, time-travel debug information so you can step through the error execution flow with full visibility into your production data at every step of the way. This is what we call code-level observability.

When the impossible happens

Let’s see how this integration might work with an eCommerce nightmare.

Black Friday or some other purchasing frenzy is just around the corner. All systems are GO. Everything has been tested, retested, and reinforced.

And then, the impossible happens. Customers can’t complete checkout.

Everybody’s face-palming, and phones and pagers are going off everywhere in IT/Ops.

The first place your DevOps engineers go to is your observability platform. Datadog to the rescue.

A quick look at the service map shows which service is throwing errors.

Datadog Service Map
Image source: Datadog

Let’s drill down into the App Analytics screen for that service and investigate the errors.

Image source: Datadog
While the HTTP request to “checkout” returns a 200 OK, you see many errors and can even see the exception that is thrown. But what now? Now it’s time for developers to dig down into the code, and the collaborative features of both Datadog and Ozcode help break the silos between IT/Ops and developers to get them working together.

Ozcode Production Debugger - LEARN MORE

Time to pop the hood

Ozcode steps in when you need to start working with code. Setting up Ozcode to work with Datadog is easy – just install the Ozcode extension from the Datadog marketplace and get the Ozcode agent installed on your servers. Once you’re set up, Ozcode will show you all the exceptions you saw in Datadog, and now you can time-travel debug them with full visibility into the error execution flow on your live production environment.

But that’s not always enough. We also saw that even in cases where the request returned a 200 OK,  customers can’t seem to check out. Let’s dig a little deeper.                      

Observability hits code level

Going back to your Datadog dashboard, you discover that some critical requests are showing unusually high latency.

Datadog showing latency
Image source: Datadog

Let’s set a tracepoint (a.k.a dynamic log) in the method that tries to fill orders.

Set Tracepoint
Now, as customers continue trying to checkout, you’ll start collecting tracepoint hits; only now, you’ll have source code and will be able to view all locals and variables in scope for each tracepoint.   With the new integration, the Ozcode app is embedded right inside the DataDog platform, so you never have to leave.
Ozcode tracepoints in Datadog
Need even more data? No problem. You can keep adding tracepoints without worrying about performance until you have all the data you need. No need to rebuild and redeploy.

Ozcode Production Debugger - LEARN MORE

Let’s examine one of those dynamic log entries inside Datadog’s Log View.

The Log View correlates the Ozcode dynamic log entry to the trace of the request that generated it. Analyzing this visual representation of the internal workings of our application shows us exactly where the application is spending time and why checkout is taking too long.

Having discovered the problematic variable, you may now want to monitor it for a while to make sure a fix you implement is working correctly.

Let’s go back up the observability path to Datadog.

Since Ozcode pipes dynamic log output back to Datadog, you can use Live tail and watch how your variables change in real-time. In fact, you can use all of the platform’s analytics capabilities for your new live log entries.

Datadog LIvetail
Image source: Datadog

Using dynamic logging to pipe variables back into Datadog opens up a world of opportunity. You can watch how anything changes in real-time on a new chart you define for your dashboard. Taking the car analogy, you’ve added gauges to measure your radiator fluid and oil level in real-time with no effort.

From system to code and back

Observability is critical to keep systems running smoothly and fix them when they don’t. Our journey into observability started at the system level when Datadog’s Service Map showed that one of our services was throwing errors. A look at the Analytics Panel revealed what the error was and even gave us the stack trace. To understand the root cause of the error, we first used Ozcode to time-travel debug an error and then drilled down by adding tracepoints on the fly. These tracepoints generated dynamic logs, which we fed back into Datadog, and even created ad-hoc metrics and visuals to monitor suspicious variables. As soon as a variable went off the scale somewhere, we could examine the live application state that caused it in great detail to take us directly to the root cause of the error.

When you’re thinking about observability, you need to think about the full round trip; from system, down to the code, and back.

Ozcode Lightweight Time-Travel Debugger

Up to 3 users, 10 monthly agents, 100K monthly events – ALWAYS FREE

Ozcode Production Debugger

The post The Road to Observability from System Down to Code appeared first on Ozcode.

]]>
Frictionless On-Premises Incident Resolution. Don’t Rub Your Customers the Wrong Way https://oz-code.com/blog/production-debugging/frictionless-on-premises-incident-resolution-dont-rub-your-customers-the-wrong-way Wed, 28 Apr 2021 11:36:34 +0000 https://oz-code.com/?p=18186 On-premises deployments create hurdles when something goes wrong making incident resolution in production a painstaking game of trial and error. But there's hope in sight.

The post Frictionless On-Premises Incident Resolution. Don’t Rub Your Customers the Wrong Way appeared first on Ozcode.

]]>

Cloud computing is all the rage. Yes, the simplicity, agility, and scalability of the cloud are the driving forces of the digital transformation many companies are undergoing. Struggling with a remote workforce in the aftermath of COVID-19 only pushed this trend, with Gartner estimating that public cloud spending will reach over $360 Billion by 2022. But this does not mean that on-premises workloads are going away any time soon. Many applications are natively on-premises, and there is even a swing back with many services being repatriated from the cloud back to on-premises infrastructure. It seems like there’s a 12-lane highway between cloud and on-prem with workloads moving in both directions. So, on-prem is here to stay. The problem is that on-premises deployments create hurdles when something goes wrong. The difficulty of accessing your customer’s infrastructure makes incident resolution in production a painstaking game of trial and error. But there’s hope in sight.

The reasons for going on-prem

Here are some of the reasons companies remain on-prem:

Security: There’s an ongoing debate about the security of the public cloud compared to an on-premises private cloud. Many are opting for on-prem, especially in sensitive industries like finance, military, and health care.

Regulatory compliance: While the public cloud sports many certifications, not all clouds can satisfy all industries. The ultimate responsibility for data privacy and governance remains with you, and the cloud cannot always accommodate you with enough availability regions or account for human error.

Cost: The pay-per-use model with zero CapEx of the public cloud is appealing for many companies moving in that direction. They quickly realize that just lifting and shifting workloads to the cloud does not bring the cost benefits that the cloud promises and find themselves with a cloud hangover.

Edge computing: The number of connected devices we use is exploding, from smart cars to business analytics to automated factories. With more and more data being created by devices, there is a growing need to analyze that data at on-prem data centers near the compute edge.

Ozcode Live Debugger

Resolving incidents on-premises usually means a lot of customer friction

With so many companies keeping or moving their workloads on-prem, it’s likely that at least some of your customers will run your software at their on-prem data centers.

And then your support team gets that call.

Something’s not working right with your software, and your customer wants an urgent fix.

If your customer is willing to give you access to their servers on which your software is running, you can go about your investigation, but that’s not usually the case, especially in sensitive industries. So, you ask your customer to send you logs since you don’t have much else to go on.  You try and figure out what went wrong and add more logs to validate your theory, but now you have to reproduce the error with the new logs. You send your customer a hotfix and ask them to deploy it to their production environment. This process causes a lot of friction with your customer. It requires a great deal of their time and interaction, not to mention unplanned deployments to production, which may take days to happen. Worse, you rarely get it right the first time and will have to go through several iterations like this with your customer. More time, more aggravation. By the time you really figure out the problem, you’ve lost quite a bit of trust, and your customer’s upcoming license renewal may be very shaky.

Traditional log-based on-prem incident resolution
Traditional log-based on-prem incident resolution

Ozcode Production Debugger - LEARN MORE

The frictionless approach to on-premises live incident resolution

Ozcode supports on-premises installations. That means you can deploy the Ozcode agent alongside your customer’s software and install the Ozcode server at your customer’s site. If your customer’s site is truly air-gapped, you’ll have to log on to your favorite travel site, book your ticket and a hotel, and get on a plane. If you’re lucky, you might be able to drive there. Without the ability to create a connection to the world outside the customer’s network, there’s no other option. That’s why Ozcode is adding a “technician mode” to its Live Debugger in which anyone who has access to the Ozcode server on site will be able to export exception captures and tracepoint sessions with a single click of a button so your engineers can import them to your local Ozcode installation and time-travel debug in the comfort of their own desks.

Frictionless on-prem incident resolution
Frictionless on-prem incident resolution

This approach to resolving incidents in production related to your software on-premises will go much more smoothly with your customer. There are no repeated hotfixes to deploy just to get more logs and no downtime. Your engineers don’t have to reproduce the issue; they can just play back the autonomous exception capture to analyze, and time-travel debug it. And if they need more logs, no problem. They can use tracepoints to add dynamic logs wherever they need to in the code – no need to redeploy. The bottom line is that you’ll solve that gnarly bug much more quickly without rubbing your customer the wrong way.

Ozcode lightweight, time-travel, live debugger

Up to 3 users, 10 monthly agents, 100K monthly events – ALWAYS FREE

Ozcode Production Debugger - LEARN MORE

The post Frictionless On-Premises Incident Resolution. Don’t Rub Your Customers the Wrong Way appeared first on Ozcode.

]]>
Supercharging Web Apps by Testing and Debugging in Production https://oz-code.com/blog/production-debugging/supercharging-web-apps-testing-debugging-in-production Wed, 31 Mar 2021 14:08:05 +0000 https://oz-code.com/?p=17999 Two ways your web application can break are UI bugs and deep-down logical bugs in the server. You can detect and resolve both types with Selenium Test Automation and debugging in Production.

The post Supercharging Web Apps by Testing and Debugging in Production appeared first on Ozcode.

]]>

This post is co-authored by Himanshu Sheth, Senior Manager, Technical Content Marketing at LambdaTest.

“Move fast and break things,” goes the famous saying by Mark Zukerberg. But developers know that there’s a delicate balance between your release velocity and how robust your application is going to be. When bugs slip through Staging and get to Production, they start affecting your customers. When that happens (and it certainly does), it’s going to get everybody’s attention and become your top priority. Two ways your web application can break are UI bugs and deep-down logical bugs in the server.

In this post, I’ll show how you can detect and resolve both these types of Production bugs, hopefully before your customers notice them. First, I’ll show how to use Selenium test automation using LambdaTest Cloud Grid to run a web app simultaneously on multiple browsers to catch UI glitches.

With a cloud-based Selenium Grid, you can catch UI issues way ahead of time by testing the features across a range of browser and platform combinations. The fierce battle of quality vs. time can be won by testing on a cloud-based Selenium Grid!

Then I’ll show how Ozcode Live Debugger’s time-travel debugging digs deep into your live server code to help you debug exceptions and logical errors in Production. My reference application is this mock eCommerce site where you can purchase all sorts of goodies related to .NET.

Selenium test automation is a necessity, not a luxury

One of the biggest challenges faced by web developers is uniformity of the UI across different browsers, devices, and platform combinations. Cross-browser compatibility issues can create a huge bottleneck on the user experience, especially if you have not tested the UI on browsers & platforms that are widely used by your target audience. You do not want to irk your customers with misplaced buttons, overlapping texts, and other such usability issues that would drive them away from your website (or web application). However, it is impossible to cover the entire gamut of browsers and operating systems since the list can be an endless one. Did you know that despite the dominance of Chrome and Firefox, Internet Explorer is still relevant, even today? Free up your IT team from the unnecessary burden of constantly maintaining an in-house Selenium Grid that is high on maintenance and yields lower returns. Instead, prioritize the browser & OS combinations on which you intend to perform testing and kick-start with testing on a reliable & scalable cloud-based Selenium Grid by LambdaTest.

How to get started with Automated Browser Testing using a Selenium Grid On-Cloud

If you’re not already familiar with Selenium and how it works, I would recommend reading the “What is Selenium” guide by LambdaTest.

If you’ve worked with Selenium before and prefer it for automating browser interactions, you should give LambdaTest a spin. It helps to overcome existing infrastructure issues with your automation testing script.

To run your existing Selenium test script over LambdaTest Grid, you will need to change the Hub URL from your local machine to LambdaTest cloud. You can do that by declaring your username and access key in the remote Hub URL to successfully authenticate your access to the LambdaTest cloud servers. Your LambdaTest username & access key can be obtained from the Profile Page.

Here are the brief set of steps to perform Automation testing on LambdaTest:

You can monitor the status of the tests run on LambdaTest Grid by navigating to the Automation Dashboard.

Now that you have set up the account on LambdaTest, it’s time to port the working existing test implementation to LambdaTest. Suppose you have used the NUnit framework in Selenium C# for writing the automation tests. The change will be majorly involved in the method implemented under the [SetUp] annotation. This is where you have to instantiate the browser on which the test needs to be performed.

Here is the code snippet which showcases the instantiation of the Chrome browser on a local Selenium Grid:

				
					using NUnit.Framework;
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;
using System.Reflection;
using System.Threading;
using System.Collections.Generic;
using System.Web;

namespace NUnitTest
{
    public class NUnitTest
    {
        String test_url = "test_url";
        public IWebDriver driver;

        [SetUp]
        public void start_Browser()
        {
            /* Local Selenium WebDriver */
            driver = new ChromeDriver();
            driver.Url = test_url;
            driver.Manage().Window.Maximize();
        }
        /* Tests follow here */
    }
}

				
			

As seen above, the start_browser() method instantiates the Chrome browser, after which the URL under test is set. The test(s) would be implemented in method(s) that are under the [Test] annotation.

Before running the tests, generate the desired browser capabilities using the LambdaTest Capabilities Generator. As shown below, select the appropriate browser, browser version, and platform on which you intend to perform the test:

So, how do we port this implementation such that the existing tests run on cloud-based Selenium Grid from LambdaTest? Well, the changes are only involved in the method implemented under the [SetUp] annotation. Instead of a local Selenium WebDriver, we use the Remote WebDriver that passes the test request to the LambdaTest Hub [@hub.lambdatest.com/wd/hub].

				
					using NUnit.Framework;
using OpenQA.Selenium;
using OpenQA.Selenium.Remote;
using System.IO;
using System.Reflection;
using System.Threading;
using System.Collections.Generic;
using System.Web;

namespace NUnitTest
{
    public class NUnitTest
    {
        String test_url = "test_url";
        public IWebDriver driver;

        /* LambdaTest Credentials and Grid URL */
        String username = "user-name";
        String accesskey = "access-key";
        String gridURL = "@hub.lambdatest.com/wd/hub";

        [SetUp]
        public void start_Browser()
        {
            DesiredCapabilities capabilities = new DesiredCapabilities();

            capabilities.SetCapability("user", username);
            capabilities.SetCapability("accessKey", accesskey);
            capabilities.SetCapability("build", "[C#] Demo of LambdaTest Grid");
            capabilities.SetCapability("name", "[C#] Demo of LambdaTest Grid");
            capabilities.SetCapability("platform", "Windows 10");
            capabilities.SetCapability("browserName", "Chrome");
            capabilities.SetCapability("version", "latest");

            driver = new RemoteWebDriver(new Uri("https://" + username + ":" + accesskey + gridURL), capabilities, TimeSpan.FromSeconds(600));
            driver.Url = test_url;
            driver.Manage().Window.Maximize();
        }
        /* Tests follow here */
    }
}

				
			

With this, you are all set to run your tests on the LambdaTest Selenium Grid. On execution, you can visit the Automation Dashboard to keep a watch on the status of the tests.

Have a look at how your website (or web app) can render differently on different browsers (and browser versions):

Shown below is a cross-browser test performed on IE 8 (running on Windows 7).  Not only is the rendering messed up, but the “Next” button (which is in the SVG format) is also displayed incorrectly.

Compare this with a working test that is run on Chrome 89 + Windows 10 combination. There are no issues whatsoever in the rendering of the web page.

The key takeaway is that cross-browser testing at scale should feature in your automation testing checklist. With this, your customers would be greeted with an ever-lasting product experience that works like a charm on browsers and devices that they love to use!

Online Selenium Grid such as LambdaTest has made it super-easy for us to ensure a cross-browser compatible experience without having to worry much about the infrastructure limitations that curtail browser and test coverage. LambdaTest offers much-needed scalability and reliability so that cross-browser tests can be performed at scale!

Debugging logic in Production with Time-Travel Fidelity

Let’s now look at that other type of bug which I earlier mentioned – a logical bug. Our mock eCommerce site offers a Buy 2 Get 1 FREE deal with some bugs built-in. When I chose 2 of those nifty sweatshirts, the site automatically gave me a third one. Well, they’re cool, but not that cool, so I decided to bag it. But when updating the quantity to 0, the site throws an exception.

Watch this.

Ozcode automatically catches the exception and displays it in the dashboard. We can see it’s an ArgumentOutOfRangeException.

To debug the exception, I click the Debug button.

Ozcode shows where the exception was thrown in the code, and you immediately understand why. There’s an OutOfRange Guard clause, and the value of Input is -1.

Now let’s climb up the call stack a bit and look at method AdjustQuantity where we implemented the Buy2 Get 1 Free deal.

First off, from the red/green color coding, we see exactly which parts of this method were executed in this error flow. The first “if” statement handles the Buy 2 Get 1 Free.

				
					if (newQuantity == 2)
{
	newQuantity = 3
}

				
			

But that handles the case when a customer modifies the number of items from 1 to 2.

In this case, I’ve changed my mind and updated the quantity back to 0, so the second “if” statement is executed (as we can easily see because it’s green).

				
					if (currentQuantity > 2 && newQuantity < 2)
{
	newQuantity--
}

				
			

But someone has not considered an input value of 0 to newQuantity, so we get our exception.

Now, there are any number of APMs or error monitoring tools that will show you that ArgumentOutOfRangeException with the invalid input of -1. None of those will show you the code across the whole call stack and the values of all locals, variables, and method parameters and return values that show you exactly HOW you got to that invalid input. It’s only once you have that data that the fix for this bug becomes trivial.

Now, you may be thinking, “this was a simple example; real life is more complicated.” You may be right, but even for an example like this, you may have found yourself guessing at the solution, adding log entries to validate it, and rebuilding to test. This kind of observability into the code along the whole execution flow of the error is what makes it easy (or at least much easier) to fix any bug, whether it’s in a monolithic application, a redundant microservice, or a serverless function that runs for a microsecond and is then gone – let’s see you reproduce that. With Ozcode, there’s no need to reproduce it. It’s all recorded for you.

Testing and debugging in production, better together

Testing and debugging are two inseparable facets of delivering robust, working software. For Dev/QA collaboration to be frictionless, developers need as much information about errors as possible, and it’s up to QA to provide it. A while ago, I maintained that a perfect bug report could be provided as a single shareable link, and that’s true for server-side logic bugs. If we now consider UX, we need a bit more data, and that’s what LambdaTest provides to complete the picture. LambdaTest can simultaneously test your UI across a large matrix of OSs and browser versions. If one of those combinations generates an exception, data about the exact scenario, configurations, and versions can be provided to Ozcode Production Debugger, where you can take the analysis down to code level. Being able to debug errors connected to specific OS/browser combinations at a code level will drastically cut down the time it takes to understand where the problem is and fix your code. This is truly end-to-end error resolution.

The post Supercharging Web Apps by Testing and Debugging in Production appeared first on Ozcode.

]]>
Finding the Bug in the Haystack: Hunting down Exceptions in Production https://oz-code.com/blog/production-debugging/finding-the-bug-in-the-haystack-hunting-down-exceptions-in-production Wed, 24 Mar 2021 06:00:08 +0000 https://oz-code.com/?p=17756 As companies move fast and break things, they then have to fix all those things they have broken. With machine learning you can find the bugs that matter, and with time-travel debugging you can then fix them.

The post Finding the Bug in the Haystack: Hunting down Exceptions in Production appeared first on Ozcode.

]]>

This post is co-published by Logz.io and is co-authored by Omer Raviv, Co-founder & CTO @ Ozcode, and Dotan Horovits, Product Evangelist @ Logz.io.

Software companies are in constant pursuit to optimize their delivery flow and increase release velocity. But as they get better at CI/CD in the spirit of “move fast and break things,” they are also being forced to have a very sobering conversation about “how do we fix all those things we’ve been breaking so fast?”

As a result, today’s cloud-native world is fraught with production errors and in dire need of observability.

Climbing the ELK Stack Everest

The depth and breadth of production errors in today’s cloud-native world are apparent from the vast number of exceptions that these applications generate. And how do companies address the issue?

Logs, logs, and more logs.

Modern applications generate mountains of logs, and those logs are generously peppered with exceptions. The sheer magnitude of exceptions makes it extremely difficult to weed out just the right ones. Which exceptions are new? Which are just noise? Which contain important information, such as an error in a newly deployed feature or a customer that’s having a terrible experience and is about to churn?

Let machine learning find the needle in a haystack of errors in Kibana with Logz.io

Let’s take a look at a real-world scenario. If you’ve ever worked at an eCommerce company, this will sound familiar.

The end of November rolls around.

Your friends and family are giddy about all the neat things they’re going to buy.

You are somewhere between stressed and having a full-blown panic attack. It’s your company’s biggest day of the year for sales. Your infrastructure and code had better be up for the task.

Black Friday hits, your website traffic is peaking, and the nightmare begins.

Despite all of your best efforts and meticulous testing, your “buy 2 get 1 free” coupon code simply DOES NOT WORK.

What now?

Let’s look at some logs.

I already mentioned that your logs are likely to contain loads of exceptions. How are you going to pick out the ones related to your coupon code? The open-source ELK Stack is popular for ingesting those mountains of logs and slicing and dicing them in Kibana Discover to understand the scenario at hand. Each log entry can contain structured data, so you can filter on a specific field or piece of contextual data. Logs can also be enriched with additional contextual data you can filter on, such as a user’s email, the browser type, etc.

In our Black Friday nightmare scenario, you might filter on the particular services that are flaking out, the relevant time frame, and on your coupon code field:

A typical investigation in Kibana Discover involves an iterative process of filtering and querying to narrow down the search context, which can be tedious and time-consuming when having so many outstanding exceptions in the environment.

Logz.io offers a Log Management service based on the ELK Stack that saves you the hassle of managing the open source yourself at scale. But it does much more than that. Logz.io’s Exceptions tab within Kibana Discover does a fantastic job doing what no human can – looking through the hundreds of thousands of log lines that contains exceptions and using machine learning smarts (Logz.io’s Insights Engine) to group them together to a concise aggregated view, which can be filtered in all the same useful ways we apply filters in Kibana Discover.

In our Black Friday incident, even after filtering out, we’re faced with more than half a million log hits. However, the Logz.io’s Exceptions tab in Kibana flags only 17 clustered exceptions in this search context. Let’s take a closer look at these errors:

In the Exceptions tab, we immediately spot a new exception – ArgumentOutOfRangeException – that started firing intensively during the incident time window. In a real-world, cloud-native system, this would filter out the noise and let you home in on the right exceptions.

You now know where to start your final assault, where to start looking. But where do you go from here?
 

Ozcode – see the code behind the logs

The logs are the telemetry of our software’s black box. It records what the system tells us it is doing. Now that we used Logz.io’s Insights Engine to find out which exception we should focus on, we’d like to open up the black box and get code-level understanding of that exception. This is where Ozcode’s exception capture comes in. Ozcode Live Debugger’s exception capture includes all the data we need: you can time travel to see line-by-line code execution, up to the point where your application threw an exception,  viewing locals variables, method parameters and return values, network requests, database queries, and more.

The ArgumentOutOfRangeException exception and call stack we saw in Logz.io’s Kibana don’t provide enough data for us to understand what happened. However, by simply jumping over to the Ozcode dashboard and filtering for the specific exception type and time range, we can delve deeper…

The Ozcode recording shows us a visual look at the code execution that led to the bug – every expression that was false is highlighted in red, every expression that was true is highlighted in green, and every variable and method call show their exact value. We can see we had a simple calculation error in our “Buy 2 Get 1 free” sale, which made us think the customer wanted to buy a negative number of items.

Now that we understand what happened, That’s an easy fix! No need to try to reproduce the issue on the local dev machine to solve the mystery.

Zoom in fast and fix things

The ELK stack, and Kibana in particular, gives us tremendously powerful tools to investigate logs. Using Logz.io’s machine learning-based insights, we can surface the relevant exceptions and related logs inside Kibana out of the endless noise and millions of logs that modern cloud-based systems generate. The Ozcode Live Debugger enhances this experience even further by giving us code-level observability and time travel recording to quickly understand the root cause behind each exception. You can combine that with additional telemetry such as metrics and traces to increase your system’s observability and enhance your troubleshooting capabilities.

Ozcode Live Debugger

Ozcode Production Debugger

The post Finding the Bug in the Haystack: Hunting down Exceptions in Production appeared first on Ozcode.

]]>
An Observability Platform for Developers https://oz-code.com/blog/devops/an-observability-platform-for-developers Thu, 18 Mar 2021 14:12:23 +0000 https://oz-code.com/?p=17805 Observability platforms play a vital role in an enterprise’s tool stack, but they fall short of providing developers with the actionable data they need to resolve Production errors.

The post An Observability Platform for Developers appeared first on Ozcode.

]]>

Observability platforms play a vital role in an enterprise’s tool stack, providing DevOps/SREs and Production Support staff with a system-level view of their applications’ and infrastructure’s health and performance levels. By alerting the right DevOps and engineering staff to performance bottlenecks and live-site incidents, observability platforms help keep the company’s systems running smoothly to maintain the ever-important business continuity. However, the observability that these platforms provide is primarily at a system level. When a software error surfaces in Production, the people tasked with fixing it are developers. To resolve Production errors, developers need actionable data that enables them to reproduce and debug those errors, and that’s where the current state-of-the-art observability platforms fall short. They barely scrape the surface of the code-level observability that developers need. The performance metrics and stack traces that DevOps/SREs work with are ineffective and frustrating for developers who have to fix an urgent Production issue and only serve to create friction where collaboration is needed.

Ozcode Live Debugger introduces both a paradigm shift and a cultural shift in the realm of resolving Production incidents. By providing developers with the code-level observability they need, Ozcode turns Production Debugging into a monitoring discipline in which developers are empowered to actively participate as part of their day-to-day responsibilities. The rest of this post describes how Ozcode provides developers with the code-level observability they need to do an effective root-cause analysis of Production errors leading to a rapid resolution. It’s important to note that Ozcode Live Debugger and traditional observability platforms are not mutually exclusive but rather complement each other and are equally vital components of an enterprise tool stack.

Taking observability beyond the system and down to code-level

APMs such as New Relic, Dynatrace, AppDynamics, DataDog, and others have developed in recent years into full-fledged observability platforms. They include an enriched set of features to include capabilities like log analysis, real user monitoring, synthetic testing, error monitoring, and more. However, none of these platforms enable code-level observability, which is the crucial missing piece that allows developers and DevOps/SREs to collaborate and quickly resolve issues and prevent faulty deployments from reaching Production.

Ozcode Live Debugger presents a new paradigm for troubleshooting Production errors by turning Production Debugging into a monitoring discipline through the following key capabilities.

Autonomous exception capture replaces reproducing an error

Ozcode Live Debugger uses an agent that runs next to and monitors your application. When your application throws an exception, the Ozcode agent adds byte-code instrumentation to the code along the entire execution path from the initial interaction that triggered the exception to the line of code that threw it. Now, observability platforms also operate by adding instrumentation to your applications and infrastructure, but here’s the difference. Ozcode captures and records code-level Production data that is specific to the complete error execution flow that caused the exception. No observability platform provides that level of data. The Ozcode agent then transmits this debug data to the Ozcode server, where the relevant developer can analyze it independently of the live Production system.

Reproducing a Production error can be extremely challenging for several reasons:

  • Matching the scale and structure of Production in a parallel environment is usually not feasible (of possible at all)
  • Reproducing the exact user scenario that caused the error may be impossible
  • Matching the right source code to the current Production binary may be impossible
  • The ephemeral nature of microservices and serverless make it even more difficult for Production systems built on those technologies

By capturing the code execution flow of an error, Ozcode removes the need to reproduce an error. You debug the actual Production code where the error manifested in a completely non-intrusive way.

Ozcode Production Debugger - LEARN MORE

Time-travel debugging: the “Development experience” on Production code

During development, developers are used to debugging errors by stepping through their code with full visibility into their application’s runtime data. Doing the same in Production ranges from difficult to impossible, and observability platforms don’t even try to address this issue.

One way you might consider is to attach a remote debugger. There are tools on the market that let you connect to a live application for debugging. However, setting them up can be very complex, and company policies often forbid such connections for reasons of security.

And then, even if you were able to set up a remote connection to your Production systems, stepping through the code would require stopping the application flow with breakpoints. Again, company policies usually forbid this activity as it stops the application flow not only for the debugging developer but also for customers.

Ozcode Live Debugger delivers the “Development experience“on Production with Time-Travel debugging.

The detailed debug data stored in Ozcode exception captures provide the same code-level observability that developers are used to getting in their development environments. They can step back and forth through the complete error execution flow across the whole call stack with full visibility into:

  • Local variables
  • Method parameters and return values
  • Network requests
  • Database queries
  • Relevant log entries
  • Event trace across microservices

Moreover, Ozcode makes it easier for developers to understand what happened in the error execution flow through various visual aids. For example, red/green coloring for conditional statements, greyed-out text for code that is not executed, annotations providing the values of variables and method parameters within the body of the code, and more.

Ozcode Production Debugger - LEARN MORE

Using dynamic logging with tracepoints to make log-based debugging effective

Observability platforms may offer advanced log analysis as part of their enriched feature set. While the traditional way of troubleshooting with logs is better than nothing, it is NOT an effective way to debug Production issues for the following reasons:
  • Insufficiency – it’s impossible to predict exactly where an error will occur, so it’s equally impossible to ensure that you have diagnostic log entries in the right places of your code.
  • Distribution – logs from the multiple components of today’s complex software systems may be distributed between various sources, including files and databases. Piecing together the right logs to understand an error is extremely difficult.
  • Process – since there are never enough log entries present to debug an error, debugging-by-logs is a tedious, iterative process that goes something like this: analyze existing logs → formulate a theory for root-cause of the error → add logs to test the theory → rebuild → redeploy → reproduce the error. Usually, several iterations of this process are required before the root-cause is determined so a fix can be put in place.
Ozcode makes debugging-by-logs highly effective by addressing each of these issues:
  • Dynamic logging with tracepoints solves “insufficiency” – with Ozcode, there’s no need to predict where an error will occur. You can add tracepoints anywhere in the code and set up structured dynamic logs to output any data item for analysis or examine the application state every time the code passes through a tracepoint.
  • Log aggregation solves “distribution” – Ozcode assembles the log entries relevant to the error execution flow into one place, so there’s no need to dig through multiple sources to extract the relevant log entries.
  • Autonomous exception capture with time-travel debugging solves “process” – the process using Ozcode is completely different. Autonomous exception capture gives you the complete error execution flow, so there is no need to reproduce an error. Time-travel debugging lets you step through the error execution flow with code-level observability. Using tracepoints and dynamic logs, you can add and remove log output and examine application state anywhere in your code at will without having to rebuild and redeploy.

Observability platforms and Ozcode, better together

Ultimately, observability platforms may alert you to an error in your application and even point you to where in the code you might start looking; however, from there, it’s a guessing game. These tools do not provide the code-level observability into the error execution flow needed to do an effective root cause analysis. Metrics and dashboards cannot replace time-travel debug information required to fix Production errors.

Nevertheless, it’s important to emphasize again that Ozcode Live Debugger and observability platforms are complementary tools, and both are vital components of an enterprise tool stack. Observability platforms monitor your systems for performance metrics and resource usage to ensure a good level of business continuity during normal operation. However, Production errors are inevitable, and when they occur, they disrupt business continuity. That’s when Ozcode Live Debugger jumps in to restore it.

In other words…

Observability platforms maintain business continuity during normal operations. Ozcode Production Debugger restores business continuity when errors occur.

Ozcode Live Debugger

Autonomous Exception Capture
Never have to reproduce a bug again

Time-travel Debugging
Code-level data at your fingertips

Tracepoints
Add dynamic logs and view data anywhere in your code without having to redeploy

Ozcode Production Debugger - LEARN MORE

The post An Observability Platform for Developers appeared first on Ozcode.

]]>
Ozcode Live Collaborative Debugger for FREE so Everyone Can Use It: Dev, QA, DevOps, SRE https://oz-code.com/blog/general/ozcode-live-collaborative-debugger-for-free-so-everyone-can-use-it-dev-qa-devops-sre Sun, 07 Mar 2021 15:38:25 +0000 https://oz-code.com/?p=17717 To effectively resolve incidents in Production and pre-Production environments, Developers, QA, DevOps and SREs need to collaborate with code-level visibility into the data on those environments. That's why we launched Ozcode's collaborative Production Debugger.

The post Ozcode Live Collaborative Debugger for FREE so Everyone Can Use It: Dev, QA, DevOps, SRE appeared first on Ozcode.

]]>

Ozcode has been solving developer debugging pains for years. The tremendous adoption of our Visual Studio Extension showed just how badly developers need data to resolve bugs. But when we saw that debugging pains don’t go away when a build is deployed, we understood that solving debugging pains must be extended beyond the developer’s IDE. We launched Ozcode’s live collaborative Live Debugger a year ago with exactly that in mind. Here’s what happened.

A thousand people from small and large enterprises across industries and geographies have joined our early access program and started using the Live Debugger. Together, we are growing as a community of software professionals from various disciplines who are discovering that there’s more to debugging than initially meets the eye. We knew developers would come on board since they are the clearest winners of digging into Production environments and seeing why things are going wrong. Finally, extracting data from Production was easy – no more endless cycles with code changes only for deploying log files.

Debugging with log files - Ozcode

But we were also betting on others, and indeed they came. We found QA testers using the Live Debugger to provide (what we think of as) perfect bug reports. With Ozcode, gone are the days of, “It works on my machine.” And then, DevOps engineers started to get very interested. We believe this will be a DevOps trend in 2021 because nowadays when we meet with customers, DevOps engineers are taking a front seat in the discussions. DevOps engineers have seen how collaborating with developers on resolving Production incidents using the Production Debugger does wonders for DevOps KPIs like MTTR. Now, it’s not surprising that companies adopting the Production Debugger first try it out on test environments before installing it on Production. But we also found that many companies continue to use the Production Debugger on pre-Production environments in QA and Staging. This kind of “shift-left Production Debugging” also improves DevOps KPIs with fewer bugs making it through that final deployment to Production (think Defect Escape Rate).

Ozcode Production Debugger - LEARN MORE

With a thousand development, QA, and DevOps engineers running the Live Debugger on Production and pre-Production environments, the flow of feedback has been tremendous. Everything from UI to low-level algorithms has changed based on feedback we have received. We strongly believe that it’s this power of the community that helps us refine our platform to build a great product. As a token of thanks to our growing community, we are announcing the Always FREE Edition of Ozcode Production Debugger for teams of up to 5 users. We are convinced that debugging live QA, Staging, and Production systems should be available to everyone in the field and believe that this free offering will make debugging collaborative and much simpler for thousands more software professionals in the coming year. Our “Always Free” edition allows teams of 5 users to run 100 agents and debug a million events every month. This is an offering that can bring real value to an organization even before widespread adoption on its Production and pre-Production environments. Here are all the details of Ozcode Always Free.

If you’re ready to get on board, sign up here.

Ozcode Live Debugger - ALWAYS FREE

The post Ozcode Live Collaborative Debugger for FREE so Everyone Can Use It: Dev, QA, DevOps, SRE appeared first on Ozcode.

]]>
3 Reasons to Enable Debugging in Production https://oz-code.com/blog/production-debugging/3-reasons-to-enable-debugging-in-production Sun, 28 Feb 2021 20:25:40 +0000 https://oz-code.com/?p=17644 Software systems are incredibly complex and are full of defects. Sure, they’ll work most of the time, but every system has its day. Here are 3 reasons why debugging in Production should be as accessible and easy as possible.

The post 3 Reasons to Enable Debugging in Production appeared first on Ozcode.

]]>

Marc Andreesen’s famous, “Software is eating the world” was an “Aha moment” for many people. In his famous 2011 article, he showed how software has become the key component to delivering value in every industry. Software continues to grow exponentially in every aspect – development, testing, and deployment practices, tools, languages, architectures, networking, databases, are all moving forward at full speed. But with all this growth, one thing is certain. Nothing’s perfect. All these advances come with a cost. Software systems are incredibly complex and are full of defects. Sure, they’ll work most of the time, but every system has its day. Some of the more famous examples include spaceships crashing on Mars and stocks taking a tumble. But you don’t have to go that far. Services that build the basic fabric of our lives go down all the time, and this downtime can cost companies up to $5,600 per minute. So, here are 3 reasons why debugging in Production should be as accessible and easy as possible.

Reduce developer burnout

What developers want to do most is develop awesome code. They understand that part of that is debugging their code, but that’s OK. While they’re developing, debugging code is just another part of their day as they add lines of code, and then step through it when something goes wrong. Debugging in Production is different. By the time a bug is detected in Production, you don’t know which developer is responsible for the bug. Much of the time, you’re not even sure where the bug really is. Just because code throws an exception somewhere, doesn’t mean that’s where the bug is. So the developer tasked with fixing the bug has to stop developing awesome code and start the painful journey of trying to reproduce the bug, guessing the solution, adding code and logs, rebuilding, redeploying…and it doesn’t usually work first time. Modern software architectures like microservices and serverless only make things more difficult. The problematic piece of code may not even be running once the developer digs in and tries to solve the issue. This irritating process can be very arduous and time-consuming and therefore contributes to developer burnout.

Now imagine that these developers had access to tools that cut their Production debugging time by 80%. Even if you have DevOps engineers and IT managers gate-keeping Production systems, debugging Production errors on pre-Production environments like Staging or even QA can dramatically shorten the debug cycle and remove developer frustrations trying to fix Production errors.

Ozcode Production Debugger

Deliver more value

Over $10K of a developer’s yearly salary goes on debugging in Production. Part of the problem is that companies don’t have the right tools available, so developers waste a lot of time that costs companies a lot of money. Some studies show that developers can spend up to 25% of their time debugging in Production. If companies could get great Production Debugging tools for free, their developers would spend a lot less time on debugging and a lot more time developing the next great feature. So, while the company reduces the time developers waste on debugging (and developer burn-out in the process), it can also deliver more value.

We all win

We saw that enabling debugging in Production helps reduce developer burnout and enables companies to deliver more value. That means all those services we depend on won’t go down so much, and they’ll provide us with more and better features, all at a lower cost. Developers win, the businesses win, and in fact, we all win.

Ozcode Live Debugger

Fix bugs 5x faster

Debug microservices and serverless code

On-premises or in the cloud

The post 3 Reasons to Enable Debugging in Production appeared first on Ozcode.

]]>
Ozcode C# Riddle #1: What’s Your Type? https://oz-code.com/blog/net-c-tips/ozcode-c-riddle-1-whats-your-type Thu, 28 Jan 2021 16:13:42 +0000 https://oz-code.com/?p=17185 This is the first in our series of Ozcode C# riddles. In these riddles, we explore some of the deep, dark corners of the language to help you hone your problem-solving skills. Along the way, you’ll meet some C# debugging tools, and also develop some best practices for handling errors in C#. C# is a …

Ozcode C# Riddle #1: What’s Your Type? Read More »

The post Ozcode C# Riddle #1: What’s Your Type? appeared first on Ozcode.

]]>

This is the first in our series of Ozcode C# riddles. In these riddles, we explore some of the deep, dark corners of the language to help you hone your problem-solving skills. Along the way, you’ll meet some C# debugging tools, and also develop some best practices for handling errors in C#.

C# is a strongly typed language … for the most part. It uses static typing to enforce type safety at compile time, but since the introduction of the dynamic keyword in C#4, C# supports dynamic typing, and type safety is only enforced at runtime. When it comes to collections, things can get a bit tricky. Working with collections can involve anything from iterating through a simple list of items, to searching through a large and complex object graph. One way to traverse collections is to use an enumerator, and that’s in the core of today’s riddle.

Consider a collection of integers 1, 2, and 3. You can declare it as an array:

var arr = new[] { 1, 2, 3 };

You can also declare it using a generic List type (List<T>):

var list = new List { 1, 2, 3 };

Which declaration you use can have a huge impact on how your code runs.

The Riddle

What is the output of the code snippet below? (Answer below)

To solve this riddle, you need to know about:

  • collections, especially the process of iterating through them
  • the difference between value types and reference types

using System;
using System.Collections;
using System.Collections.Generic;

public class Program 
{
  public static void Main() 
  {
    var arr = new [] {1, 2, 3};
    var list = new List < int > {1 ,2, 3};

    var arrEnumerator = arr.GetEnumerator();
    var listEnumerator = list.GetEnumerator();

    MoveNext(arrEnumerator);
    MoveNext(arrEnumerator);
    MoveNext(arrEnumerator);

    Console.WriteLine();

    MoveNext(listEnumerator);
    MoveNext(listEnumerator);
    MoveNext(listEnumerator);
  }

  private static void MoveNext(IEnumerator enumerator) 
  {
    if (enumerator.MoveNext()) 
    {
      Console.WriteLine(enumerator.Current);
    }
  }

  private static void MoveNext(IEnumerator < int > enumerator) 
  {
    if (enumerator.MoveNext()) 
    {
      Console.WriteLine(enumerator.Current);
    }
  }
}
Enhance Visual Studio Debugging - Ozcode

The Solution

So, your knee-jerk response would be that the output is:

1
2
3

1
2
3

But that’s not the case.

If you don’t believe me, copy the snippet into your favorite IDE and run it.

So, what’s going on? On the face of it, in both cases, you’re just iterating through a set of integers. That may be true but iterating through an array is not the same as iterating through a generic List. All collections in C# have a GetEnumerator() method which returns an enumerator object that lets you iterate through the items of the collection. However, not all enumerators have the same type.

The behavior of the array is very intuitive. Invoking  MoveNext(arrEnumerator) three times iterates the array enumerator so it’s pointing to the last element, and getCurrent returns 2 as expected. So why doesn’t it work like that for the listEnumerator?

Well, the enumerator for a generic list is a struct:

public struct List.Enumerator : System.Collections.Generic.IEnumerator

And a struct is a value type.

Now, here’s the catch. In case you didn’t notice, we’re not iterating our two enumerators directly by calling their MoveNext methods, we’re doing it indirectly through our own MoveNext methods. When you pass a Value type to a method, the method doesn’t operate on the parameter directly, but on a copy of the parameter. So, when we apply MoveNext on the List enumerator (within our own MoveNext method), we’re iterating a copy of the enumerator rather than the original enumerator. This is known as boxing in C#. Consequently, the original enumerator remains at its starting point– before the first element of the list.

So, the output of this program is: (drumroll)


1
2
3

1
1
1

(Ba dah boom)

Don’t believe me? Check here.

Why doesn’t this happen for the array?

Since the array enumerator is a Reference type, when we pass it to our MoveNext method, the method does not create a copy, but rather operates directly on the object. Consequently, the array enumerator iterates through the integers to the last value which is returned with Current.

Why worry about enumerator types?

It’s rare to need to a collection’s enumerator directly. In my 10 years of experience programming in C#, I haven’t had to do that more than once or twice.  A more common (and easier) way to iterate through the elements of a collection (and a c# coding best practice) is to use a foreach loop which manages reference types and value types internally, so you don’t have to worry about it.


using System;
using System.Collections.Generic;

public class Program 
{
  public static void Main() 
  {
    var arr = new [] {1, 2, 3};
    var list = new List < int > {1, 2, 3};

    foreach(var i in arr) 
    {
      Console.WriteLine(i);
    }

    Console.WriteLine();

    foreach(var i in list) 
    {
      Console.WriteLine(i);
    }
  }
}
And now the result is as expected:

1
2
3

1
2
3
You really wanna check?

Main Takeaways

Your main takeaways from this riddle are:

  • Don’t use enumerators unless you really have to. Use a foreach loop instead.
  • When you pass a parameter to a method, make sure you know if it’s a Value type or a Reference type.

Ozcode - Join Us

 

The post Ozcode C# Riddle #1: What’s Your Type? appeared first on Ozcode.

]]>