Ozcode https://oz-code.com Debug like a Wizard Wed, 27 May 2020 15:10:50 +0000 en-US hourly 1 https://wordpress.org/?v=5.4.1 https://oz-code.com/wp-content/uploads/2020/03/ozcode_logo.svg Ozcode https://oz-code.com 32 32 Observability is not Enough for Production Debugging. You Need Radical Observability https://oz-code.com/blog/production-debugging/production-debugging-need-radical-observability https://oz-code.com/blog/production-debugging/production-debugging-need-radical-observability#comments Wed, 27 May 2020 14:57:58 +0000 https://oz-code.com/?p=11752 There are many tools on the market that offer observability. They do a good job for maintenance and troubleshooting, but they are not good enough for production debugging.

The post Observability is not Enough for Production Debugging. You Need Radical Observability appeared first on Ozcode.

]]>
There are many articles floating around in cyberspace that quote the Wikipedia definition of observability. If we apply this term, taken from control systems to the realm of software, observability refers to our ability to detect behavior we are not happy with, in a production system, and track down its cause. Naturally, observability is important in QA and staging environments too, but it becomes a critical business need when bad things start happening in Production. The need to understand what’s happening under the hood is not new, and over the years, industry monoliths like Google and Facebook established what is now known as the three pillars of observability, which are logs, metrics, and traces.

Not quite pillars

Observability software such as Application Performance Monitors, Error monitors, and Log Analyzers have become part of the standard tool stack that most enterprises use to obtain the pillars of observability that they need to watch over their production systems. But while the vendors of these tools tout their problem-solving capabilities, when something goes wrong in Production, SREs and DevOps practitioners often find themselves scrambling to find a solution. There’s no lack of reports of websites going down, and there are voices that are challenging the effectiveness of observability tools that have become industry standards. The discussion can get very technical, going into cardinality and sampling rates for metrics, storage costs of log files, and metadata attached to traces. The validity of these arguments begins the erode the previously perceived stalwartness of those three pillars. Now, nobody is saying that there’s no value in logs, metrics, and traces, but rather than considering them as pillars, we should, perhaps, consider them more as supporting observability.

What is Radical Observability for Debugging?

Drawing on that Wikipedia definition for observability, let’s take a crack at defining software observability for debugging.  

Observability for debugging is a measure of how well the internal error states of a software system can be inferred from its external outputs.

Translating that into plain English is basically, how well you can determine the root cause of an error from the way it is manifested.

Since an error is something the developers of the system did not anticipate (otherwise, of course, they would have written the code to avoid it in the first place), it is often manifested as an unhandled exception. Many of the tools that support observability claim they can help determine root cause. One may argue that they do help in solving certain problems, but production debugging is a different matter. None of these tools really enable the code level analysis needed to determine the root cause of production bugs. They may do what they do very well, but to debug production systems, you need to take things a bit deeper. So, what are the pillars of software observability for debugging?

  • Time-travel debug information
  • Relevant logs
  • Error execution flow

Time-travel debug information

While aggregative metrics may be useful in identifying performance bottlenecks, or problems with scalability, they don’t necessarily point you to a problem in the code that caused an error under very special circumstances. In fact, aggregative metrics may skip over a particular problematic scenario if the sampling rate is not high enough. This is inadequate for code level debugging. What you need is the time-travel debug information that you get when stepping through the code, line after line, with visibility into the values of every variable, method parameter and return value across the whole call stack. It’s this code-level visibility provided through time-travel that enables an effective root cause analysis, so you really understand what went wrong.

Relevant logs

… with an emphasis on the word “relevant.” Applications typically generate volumes and volumes of logs. Even if you have an APM or log analyzer to aggregate them and create colorful reports, sifting the relevant log entries from the clutter can be extremely challenging. In the context of software observability for debugging, an effective production debugger will do all the sifting for you and only aggregate those log entries relevant to the error. By itself, this is not usually enough for a root cause analysis, because the nature of errors is that you never know where they are going to occur, so usually, you have to add logs, redeploy, and reproduce the error to get more insights. Nevertheless, getting the logs relevant to the error takes you leaps and bounds towards a resolution.

Error execution flow

You might compare this to traces that you get on APMs, but it’s much, much more. One of the issues with tracing is the overhead it generates during collection. That could be mitigated by sampling specific traces, but then, you might miss the trace relevant to the error you’re investigating. Getting the exact trace relevant to that error is like using sampling to reduce overhead but knowing exactly which traces to sample. The error execution flow transcends microservices, serverless code, event queues, and any other fork in the road that your code may encounter. You are able to step through the code, line by line, from the interaction trigger, through the root cause of the error, to the exact place that threw the exception. Combine that with the time-travel debug information and relevant logs, and you have yourself a recipe for success (rapid resolution of the bug).

Why radical?

I call the software observability provided by time-travel debug information, together with relevant logs and error execution flow “radical” because it gives you visibility to unprecedented depths in your code. At any step of the error execution flow, you can drill down to view any variable, across the whole call stack, through microservices, etc. There is no deeper observability available today, and that’s why it is one of the pillars of production debugging. Only this depth of visibility into your code can point out the most esoteric of error states that could easily be missed by other tools. This is what you need to fix bugs, not colorful charts.


Time-travel fidelity
View the true state of your code anywhere in the execution flow of an error.

Relevant logs
Don’t sink under a sea of log files. Only analyze log entries relevant to the error.

Error execution flow
Trace an error from an exception back to the root cause with full visibility into your code.

Ozcode Production Debugger - Start for FREE

 

The post Observability is not Enough for Production Debugging. You Need Radical Observability appeared first on Ozcode.

]]>
https://oz-code.com/blog/production-debugging/production-debugging-need-radical-observability/feed 1
You Need a Sentry to Catch Exceptions in Production https://oz-code.com/blog/production-debugging/need-sentry-catch-exceptions-production https://oz-code.com/blog/production-debugging/need-sentry-catch-exceptions-production#comments Thu, 21 May 2020 07:13:55 +0000 https://oz-code.com/?p=11598 Unhandled exceptions in Production indicate something's wrong with your application. What's the best way to catch all exceptions and handle them for a quick resolution?

The post You Need a Sentry to Catch Exceptions in Production appeared first on Ozcode.

]]>
Exceptions in Production mean something is wrong. It may be a situation you anticipated, like reading a value into a field of your code from a database, only to discover the value is invalid. In these cases, you can catch the exception and handle it gracefully. If you didn’t anticipate the situation, then in the best case, you got lucky, and the error is not manifested to your customers. Still, these errors can recur, build up, and eventually, bite you in the butt. In the worst case, the s**t can really hit the fan, and all sorts of bad things can happen; your website could go down, medical equipment may malfunction, stocks can tank…or worse (and these things quickly become bash festivals in the media). Either way, you need to handle these situations, especially those you didn’t anticipate. The only way to make sure nothing gets through is to have a sentry that never sleeps, needs no supervision, and constantly watches your application, immediately pouncing on any exception that is thrown and letting you know about it so that you can contain the situation before your customers notice. At Ozcode, this is what we call Autonomous Exception Capture, and it’s one of the pillars of Production Debugging.

How does Ozcode catch exceptions autonomously?

Ozcode Production Debugger uses an agent that runs next to and monitors your application. This is your sentry. When your application throws an exception for the first time, the Ozcode agent adds some light-weight instrumentation to the binary code along the full execution path from the initial interaction that triggered the exception to the line of code that threw it. But you can’t debug the exception yet; you can only see it in the Ozcode dashboard.

Autonomous Exception Capture - Catch Exceptions Phase 1 - Ozcode

Next time your application throws the same exception from the same place in the code is when the magic happens. This time, when your sentry is ready to record the dirty deed down to minute detail. The instrumentation added first time around records:

  • the line of code that threw the exception
  • the code execution flow across the complete call stack
  • local variables, method parameters, and return values
  • the relevant log entries leading up to the exception
  • network requests
  • database queries
  • HTTP requests (both the incoming request that led to the exception and any outgoing requests to external services)

Once the agent has captured all these details, it can now remove the instrumentation. Not only does the instrumentation have no significant impact on your code, once the exception repeats, the instrumentation is gone altogether, so there’s nothing to worry about there. But your sentry never sleeps and continues to monitor and count the exceptions so you get an idea of how frequently they occur, which could be an indication of their severity.

Autonomous Exception Capture - Ozco

Developers’ dream-come-true

As developers, we know that what you really want to do is develop awesome code. Debugging is a chore, especially when it’s a production bug that blindsides you 2 minutes before you’re ready to head home for the weekend. But let’s think about this for a second. Autonomous exception capture removes many developer nightmares connected to debugging.

You don’t have to recreate the production environment and the exact scenario that triggered the exception in order to reproduce the bug. It’s all there, recorded for you in the exception capture. All the information you need is assembled in a familiar IDE-like environment. You don’t need to piece together a bunch of disparate log files. You can trace the code execution flow from the moment your application threw the exception back to the original interaction without having to mess with microservices, serverless code, and other complicating factors. Basically, you click a link and start stepping back through the code with full visibility into your application’s state at every step of the way until you see exactly what went wrong.

Debug Screen
Autonomous Exception Capture - Ozcode

A developer’s best friend is his QA engineer

If you’re a QA engineer finding bugs may be your “thing,” but reporting them? That’s when it becomes a chore. Don’t worry; your sentry has done the work making it easy for you to create the perfect bug report in a click.

You don’t have to work hard to find the bugs; the Ozcode agent notifies you directly in the dashboard when the application throws an exception. You don’t have to investigate the exact build number or environment in which the bug manifested because your developer friend doesn’t have to recreate the build or the environment. You don’t even need to describe the steps to reproduce the bug (you might not even have a clear picture of how to do that). You don’t have to scratch your head about which log files to attach or where to dig them up from. All you do is share a link with your developer who can now immerse themselves into the runtime context of the bug and start analyzing what went wrong. Best of all, you’ll never hear, “But it works on my machine.“ Developer is happy, you’re happy, Developer and QA are BFFs.

No exceptions shall pass

There’s really no better way to monitor your application for errors. An autonomous sentry (aka agent) that constantly watches for exceptions catches them and records what happened without eating up any significant resources is the best way to go. It serves the needs of developers, the needs of QA, and, most importantly, the needs of your business to fix production errors before your customers notice them.


Ozcode Production Debugger

Autonomous exception capture
Don’t chase after exceptions. They automatically appear in your dashboard.

Radical observability
Get insights into your code down to the deepest levels for an easy root cause analysis.

Time-travel fidelity
View the true state of your code anywhere in the execution flow of an error.

Autonomous Exception Capture - Ozcode

The post You Need a Sentry to Catch Exceptions in Production appeared first on Ozcode.

]]>
https://oz-code.com/blog/production-debugging/need-sentry-catch-exceptions-production/feed 1
The 4 Pillars of Production Debugging for an Always-on World https://oz-code.com/blog/production-debugging/pillars-effective-production-debugging https://oz-code.com/blog/production-debugging/pillars-effective-production-debugging#comments Thu, 14 May 2020 18:55:22 +0000 https://oz-code.com/?p=11499 Debugging production systems is a mission-critical process. Many of the tools that claim to debug in production fall short of the requirements. What are the pillars of an effective production debugger?

The post The 4 Pillars of Production Debugging for an Always-on World appeared first on Ozcode.

]]>
Debugging production systems is a critical business process. Modern civilization completely depends on the software. The most fundamental infrastructure that makes up the fabric of our lives runs on it, from electrical power, to clean drinking water to systems that monitor the very air we breathe. All the software managing these basic necessities of life must run at all times – 24/7/365. Even for “non-critical” software such as commercial websites, defects can cause severe damage in lost sales, lost customers, and lost reputation. Some estimates put the cost of downtime at $5600 per minute, so to prevent production bugs, the process of creating software includes exhaustive testing. But for all the safeguards you may put in place, production software is still defective. It’s not a matter of IF, but rather WHEN you will discover a defect. Some defects are small and can be fixed behind the scenes without anyone noticing, but others are big enough to crash company stocks, and knock spaceships out of orbit. So I’ll say it again, debugging production systems is critical. Once you discover a bug in Production, you need to fix it before it impacts your business, and if your customers are already feeling it, the urgency is even greater. In this post, I will touch on some of the tools and methods currently used in production debugging and why they are insufficient. Then, I will describe the fundamental pillars that a true and effective production debugger stands on.

Why is production debugging so hard?

Code that is still in development is under the complete control of the developer. The environment is known; the scenario is clear; the developer can put breakpoints anywhere in the code to stop and examine its state and make inline changes to see how they affect the outcome. Debugging production systems is quite different.

  • The developer usually can’t recreate the exact environment or scale in which the error occurred.
  • Reproducing the error can range from difficult to nearly impossible.
  • Often, an error is manifested at one location in the code, but the root cause is somewhere completely different. It can even be in a different module, a specific instance of a microservice, or even in serverless code that triggers an error and then disappears.
  • The information relevant to the error is distributed among a set of multiple disparate sources such as log files, the call stack, event queues, database queries, local variables, and more.
  • The developer can’t usually just put breakpoints in the code and step through the error scenario since that would stop service to the end-users (assuming the nightmare of a downed system had not already materialized).

An effective production debugger needs to overcome these challenges.

Production debugging wannabes

There are several product categories within the production debugging neighborhood. While they all do something to help debug production systems, none are sufficient to effectively determine root cause to enable a complete and final fix.

Debugging dumpsters

This is one of the oldest methods used to debug production systems. Get a memory dump of the system when an error occurs and try to decipher that. The problem with memory dumps is that at best, they are cryptic and hard to decipher, and at worst, they provide a snapshot of the computer’s memory at one point in the flow of execution, while the root cause of the error may be somewhere completely different.

Log analyzers

These tools have also been around for a while now. While they are great at helping to make sense of log files, their usefulness in production debugging is limited. To expect a developer to understand what happened after taking a first look at logs presupposes that he knew beforehand where the error was going to occur. And of course, errors are accidents; we never know when and where they will happen. So usually, the developer will have to guess what really happened, add log entries to verify his guess and try to reproduce the error over several such iterations. This can be a long and arduous process that can result in only a partial fix of the error.

Application Performance Monitors and Exception Monitors

Application Performance Monitors (APMs) have been around for about the last ten years. They do a great job of identifying bottlenecks in resource usage that impact application performance (especially around networking and databases), as their name indicates. But they do more. They provide alerts, and some can even home in on exceptions and show you the call stack when one is thrown. But that’s pretty much the depth of information you’ll get, and it’s a snapshot of a very specific moment in time. It certainly helps but is still not the best solution to find the root cause of a problem. Exception monitors are a subset of APMs in that they can provide information about exceptions that occur. They fall short of effective production debugging because, like APMs, they only provide a snapshot of your application at the time of the exception and don’t enable code-level debugging.

The pillars of production debugging

As I see it, the enormous potential cost of production bugs leads to the primary goal of an effective production debugger which has three parts to it:

Fix the bug…at its root…as quickly as possible.

Let’s look at those three parts.

Fix the bug: Well, kind of obvious. However the bug was manifested, you don’t want that to happen anymore.

at its root: A bug can manifest itself in different ways at different times. Fixing just one manifestation of the bug is like taking a pill to alleviate a recurring ailment rather than curing the underlying cause. You’ll find yourself grappling with the same bug time after time as it manifests itself in different ways. You need to be able to track any manifestation of the bug back in the code execution flow to find its root cause. Once you address the root cause of a bug, you’ll know it’s truly and finally fixed.

as quickly as possible: As I’ve already mentioned, time is of the essence with production debugging. Either the bug you’ve detected hasn’t impacted your customers yet, so you want to fix it before it does, or worse, it’s already impacting your customers, and every minute is costing you dearly. An effective production debugger meets this goal by aggregating the functionality of those “wannabes” I mentioned and then adding some capabilities that none of them have.

Ozcode Production Debugger - Start for FREE

Autonomous exception capture

You can’t fix what you don’t know about. A production debugger should be a sentry, that operates independently and is constantly on guard to catch any exception your software throws and notifies you about it immediately. This is your starting point; you need to know there’s something to debug, whether it’s already impacting your users or not. Just from the alert, you can already gain some insights. The number of times an exception occurs can provide some indication of the severity of a bug and its user impact. But that’s not enough. Knowing about an exception doesn’t help to debug it. The production debugger needs to capture the whole environment in which the exception happened, including the environment and relevant code, so the developer doesn’t have to work hard to try and reproduce the error. Capturing the error along with all of its associated information (log files, call stack, event timelines, network requests, database queries, etc.) is effectively a perfect bug report encompassing everything the developer needs to fix the bug.

Observability into your code

To really determine the root cause of an error, you need to be able to trace and view your application’s state from where the error manifested itself in the form of an exception, for every line of code that was executed back through the complete chain of events to the original trigger. At the most basic level, that means local variables, and method parameters and return values across the whole call stack. But that’s only half the story. In today’s world of distributed computing with asynchronous event queues, microservices, and serverless code, the trigger of the error may happen in a different thread/module/service than where the corresponding exception is thrown. So the immediate call stack of the service that manifests the error is not enough. You need to be able to trace the sequence of events back across the different microservices, network requests, database queries, etc., that participated in the error from trigger to manifestation. With this kind of radical observability, you should be able to see the exact place and time in your code where something went wrong. And then, there are the log files. A typical application generates gigabytes, if not terabytes of data that is stored on your servers. While this is a wealth of information, it also presents challenges. In today’s typical applications, the logs are as distributed as the application is. The different modules and microservices generate separate log files that need to be pieced together, like a jigsaw puzzle, to get an idea of what happened in the code execution flow of the error in question. And, no matter how hard your developers try, they can never anticipate where an error will occur, so the log entries you start off with are never enough. You’ll always need to do an initial analysis, guess where the real problem is, and add more logs to verify your guess. Now rebuild, redeploy, and look at the logs again. If you were right, great, but often, you’ll need several such iterations.

While log analyzers can do much to help you understand the content of your log files, they don’t do enough to point you to the relevant log entries and do a root cause analysis.

The role of an effective production debugger is to extract the logs relevant to the code execution flow of an error. Once you have only the relevant logs aggregated into one view as you step through the code, you’re in the direction of your root cause.

Displaying relevant logs - Ozcode

Time-travel fidelity

A snapshot of your code showing the values of variables at the time an exception was thrown is helpful but is not a complete picture of what happened along the way. It’s a bit like finding a relic in a cave and then trying to figure out how it got there. A variable may be invalid at one point in time, but out of context by the time the exception is thrown. To really understand what led to that exception, you need to be able to track back from the exception, through every line of code that was executed (in any module or service that was involved), and view the value of all the variables, and method parameters and return values at every step of the way. It’s like being able to visualize that relic, see the cavemen pick it up and walk backward out of the cave, watch them migrate backward to a different land, unpack a few belongings from their animal skin and then sit down at their fireplace. That’s what I call debugging with true time-travel fidelity.

Time travel debugging - Ozcode

DevOps integration

The widespread adoption of DevOps has brought great benefits to software development and significantly reduced development cycle times. You want your production debugger to fit into the rhythm of your DevOps processes and help maintain those gains.

Three DevOps pillars are automation, collaboration, and CI/CD. Autonomous exception capture sits comfortably with the notion of automation. Once your production debugger is installed, it automatically catches exceptions and notifies you immediately. But what about collaboration? An effective production debugger should promote collaboration between team members, helping them work together towards fixing a bug. That kind of collaboration involves focused communication that easily points team-members to something significant in the debugging process. And finally, your production debugger should fit into your CI/CD process. Since your production debugger knows how many times an exception is thrown, this can be a quality gate through which builds must pass before being promoted from one level of your DevOps pipeline to the next.

Pillars are what your production debugging will stand on

The production debugging neighborhood is slowly being populated with a variety of tools. But, while debugging production takes up a significant portion of developer time, most debugging tools do not provide all four pillars of an effective production debugger. These tools do have a place in the debugging neighborhood and can play nicely together, providing a wealth of useful information. Still, if you want to fix bugs at their roots as quickly as possible, there’s no substitute for autonomous exception capture, observability, time-travel fidelity, and DevOps integration.


Ozcode Production Debugger

Autonomous exception capture
Don’t chase after exceptions. They automatically appear in your dashboard.

Radical observability
Get insights into your code down to the deepest levels.

Time-travel fidelity
View the true state of your code anywhere in the execution flow of an error.

Ozcode Production Debugger - Start for FREE

The post The 4 Pillars of Production Debugging for an Always-on World appeared first on Ozcode.

]]>
https://oz-code.com/blog/production-debugging/pillars-effective-production-debugging/feed 4
What Makes a Perfect Bug Report? https://oz-code.com/blog/production-debugging/what-makes-perfect-bug-report https://oz-code.com/blog/production-debugging/what-makes-perfect-bug-report#comments Thu, 07 May 2020 13:41:24 +0000 https://oz-code.com/?p=11413 A perfect bug report makes it easy for developers to fix bugs and reduces developer/QA friction, but how do you assemble all of the information a developer needs? Instructions to reproduce, logs, code execution flow, network requests, database queries and more. The answer is in three words.

The post What Makes a Perfect Bug Report? appeared first on Ozcode.

]]>
It’s never a question of if your software will contain bugs, rather when and how they will be detected. That’s why we run a barrage of tests on our software before deploying it to Production. These tests indeed detect bugs, and we reject builds to fix those bugs, but with premium websites going down, company stocks collapsing, and the occasional rocket falling out of the sky, it’s clear that bugs do get through to Production. Now, we know that the later you detect a bug in the software development lifecycle (SDLC), the more expensive it is to fix. That’s why when a bug is detected in Production, fixing it becomes paramount to virtually everybody in the company. Unfortunately, bugs in Production are also the most difficult ones to solve. Compared to earlier phases in the SDLC, which are very controlled environments, you generally have much less information about things like the exact scenario, the data involved, the sequence of events, and more. Borrowing a concept from one of our earlier blog posts, fixing a Production bug is kind of like solving a murder mystery. To discover “whodunnit” (i.e., determine root cause), you need to follow a trail of clues. What then, are those clues that make up the perfect bug report so developers can fix production bugs quickly and finally? The real answer to this question is in three words at the end of this post, but if you want the details, read on.

Reproducing the error

A set of instructions on how to reproduce the bug is probably the most critical element of a perfect bug report. The developer needs this to examine the internal state of the software in order to understand what caused the bug. I mean, how can you expect a developer to fix a bug if he can’t see what’s happening under the hood. As such, this is also a source of friction between developers and QA engineers. QA provides the developer with a set of instructions, the developer follows the instructions, and … nada. And so, we come to, “It works on my machine.”

Ozcode takes on this challenge head-on, and instead of solving the problem, pre-empts it, removing the problem altogether. With Ozcode Production Debugger, QA just shares a link with the developer. Upon clicking the link, the developer is taken to the corresponding Debugging Screen, where he has access to all the information about the error recorded exactly as it occurred in its live runtime environment. No need to reproduce the error, and “It works on my machine” is gone forever.

Single shareable link in browser - Ozcode
Find the Needle - Ozcode

Log files

Now that we have reproducing the error out of the way, we need to understand what happened. This is where log files come in. Log files contain a wealth of information. In fact, developers log so much information that it can occupy terabytes on your server. When it comes down to it, extracting just the right information needed to understand the problem at hand becomes like the proverbial needle-in-a-haystack.  Not only that, log files can be distributed between different modules and microservices, so all these need to be collected and analyzed together. How does one piece together all these pieces of the puzzle? Well, the perfect bug report would do that for you.

In fact, that’s what Ozcode Production Debugger does. When an exception is thrown, all the relevant log entries are gathered from the different modules and microservices that participated in the execution flow of the error and assembled in one convenient view for the developer to examine.

Debug error logs - Ozcode
Slash debugging time by 80% - Ozcode

Code execution flow

Log files do provide a lot of information, but each log entry is a very specific piece of data at a very specific moment in time. To continue our murder mystery analogy, you could consider log entries as particular frames of a movie. To get the full picture, the developer has to follow the trail (the call stack) from the dead body lying on the floor (the exception that was thrown) back to the perpetrator’s first steps (the point of entry) on the crime scene (the module or service) and examine the clues (local variables, environment variables, method return values) at each step of the way. Fortunately, Ozcode Production Debugger is the sleuth that uncovers all this information.

The developer can click through each step of the call stack and travel through time to see every line of code that was executed and how the values of all the relevant variables changed with time.

Code execution flow - Ozcode

But in the age of distributed computing, our crime may pass through several different locations (microservices), each with its own execution flow, call stack, variables and all. Not to worry, our Sherlock Holmes manages that by providing a timeline of events across microservices that participated in the crime showing all network requests and database queries.

Production Debugger Timeline - Ozcode

Urgency and priority

So many bugs, so little time. Which ones should the developer address first? While setting priority for a fix might not really be QA’s job, you do want to give the engineer leading the sprint tools to decide how she should prioritize the various bugs that are detected. One criterion that could be useful is how frequently an exception is thrown. Something that happens a lot could mean a bad user experience, but that’s not really enough. Even if an exception is thrown a lot, if you’re only getting it in spikes at infrequent intervals, it could be an edge case that your users only rarely encounter. So, when the last time an exception was thrown is also an important criterion to consider.

Elementary, my dear Watson…

Production Debugger priority - Ozcode

The three words for a perfect bug report

If you’ve ever done QA and found yourself sweating to assemble the right log files, describe the steps to reproduce an error, specify the environment where the error occurred, the build number and on and on, you might appreciate how much easier it is if all you needed to do was provide a perfect bug report was send a (here are those three words I promised at the beginning of the post) single shareable link. That’s all it takes with Ozcode Production Debugger. The bug just appears in your dashboard. You select the bug, copy the link from your browser address bar and send that to your developer. Ozcode “Sherlock Holmes” Production Debugger has done the rest.


Ozcode Production Debugger

Autonomous exception capture
Don’t chase after exceptions. They automatically appear in your dashboard.

Perfect bug reports
All the data you need is automatically assembled into a single shareable link.

Guarantee reproducibility
Dispel any question of reproducibility. Keep QA and Dev aligned on the data and version.

Get radical observability - Ozcode

The post What Makes a Perfect Bug Report? appeared first on Ozcode.

]]>
https://oz-code.com/blog/production-debugging/what-makes-perfect-bug-report/feed 3
Remote Debugging for Azure Functions Can Be a Breeze https://oz-code.com/blog/azure/remote-debugging-azure-functions-breeze https://oz-code.com/blog/azure/remote-debugging-azure-functions-breeze#comments Thu, 30 Apr 2020 18:15:20 +0000 https://oz-code.com/?p=11142 Serverless architectures are gaining traction in the software industry, and it wouldn’t be surprising to see them rise on a similar curve that we have seen with microservices. But the ephemeral nature of these short-lived units of execution makes it very difficult to debug them. How do you debug code that only throws an exception under very special circumstances, and then disappears?

The post Remote Debugging for Azure Functions Can Be a Breeze appeared first on Ozcode.

]]>
Debugging Azure Functions can be a huge challenge if you don’t have the right tools. This post is based on a webinar I presented where I first introduce Azure Functions and then show how Ozcode Production Debugger overcomes the challenges of debugging them. To view the full webinar, scroll down to the end of the post.

Beyond microservices

Serverless computing has been around for about ten  years and has taken the benefits of scalability, robustness, and decoupling functionality introduced by microservices to the next level. It first became widely available when Amazon introduced AWS Lambda in 2014, followed by Google Cloud Functions and Azure Functions in 2016.

If the shoe fits…

One of the main benefits of this technology is that it is very scalable and can closely match the allocation of resources to usage, so you don’t need to anticipate and provision for usage spikes. The platform automatically scales on-demand, effectively to infinity, and then scales back down when usage subsides. The diagram below compares how resources are allocated in traditional cloud architectures compared to serverless.

Resource Usage in Traditional Architectures
Resource Usage in Serverless Architectures

With a billing model that is based entirely on execution, serverless takes the concept of “Pay Per Use” to the extreme and can make it very cheap to run applications and services on the public cloud. Take, for example, the HaveIBeenPwned website, owned by Microsoft MRD and MVP, Troy Hunt, which handles 20 million requests per month, but only costs about $30 per month to run.

But, while serverless computing is very cost-effective for the right type of application, one that utilizes relatively simple, short-lived functionality, it does place some limitations:

  • There’s a limit to how long serverless functions can run.
  • They can be costly if you continuously fire them up and run them to their limits.
  • They’re not well suited for complex business functions.

Introducing Azure Functions

Azure Functions is the serverless compute service on Microsoft Azure. Its pricing is tied to the amount of memory your application uses, and the amount of time it executes (which translates to units of Gigabyte-seconds of execution – GB-s), so clearly, you should optimize your Azure Functions-based application for those factors (which is a good practice anyway). For example, .NET Core 3.1 is much faster than previous versions, so that would be a good candidate on which to develop your serverless application. The first 400,000 GB/s and 1 million executions per month are free. After that, it’s $0.000016/GB-s and $0.20 per million executions, so it’s easy to see how websites showing activity in discrete short-lived surges can be very cheap to run.

There are three runtime versions of Azure Functions currently available, with the latest offering support for C# and F# on .NET Core 3.1, JavaScript with Node 10 and 12, Java 8, PowerShell Core 6, Python 3.6 – 3.8 and TypeScript.

Triggers and Bindings

Azure Functions is an event-driven platform and supports a variety of triggers through which you can fire up functions. For example, HTTP lets you invoke a function through an HTTP endpoint, just like calling a REST API. Or, suppose you wanted to resize every image uploaded to your applications to optimize blob storage. For that, you could use a blob trigger.  Bindings let you connect a variety of different resources to a function. There are both input bindings and output bindings through which your functions can receive and send data.  For example, for your function to send a Twilio message,  you don’t have to create a new Twilio client and tie it in with credentials. You can just set up a Twilio output binding, which lets you call into Twilio very easily and send out a message. You can learn more about Azure Functions Triggers and Bindings in the Microsoft documentation.

Web security for Azure Functions

Azure Functions offers three levels of security for integration with web APIs. The first level is for anonymous functions and is like a public API, which means anyone can use them without the need for authentication. Then there’s Function level authentication, which involves keys that pass back and forth between a caller and a function. At the highest level of security, Azure Functions lets you set up authentication via a serverless endpoint of Azure Active Directory (AAD) using JSON Web Tokens (JWT). For example, if you have a function written in React, AAD can manage authentication, scaling and then returns the JWT, which is now passed on to your call in Azure Functions. Your function can now decode and validate the JWT against AAD.

Durable Functions

Durable functions are a layer on top of Azure Functions that let you build complex workflows for reliable execution by managing state, retries, and delegation. Some of the typical application patterns which can be implemented using durable functions are:

  • Chaining – executing a sequence of functions in a specific order. For example, you could split up a function into a chain of calls and scale them independently, so even if one instance of a link in the chain goes down, it will not break the whole function.
  • Retries – retrying a request after a transient failure. This pattern is very common in messaging systems. If a message delivery fails, you retry after say, 20 s, then after 40 s, etc. until delivery finally succeeds or times out.
  • Timeout –there are many uses for timeouts. Two of the more common ones are either as reminders in business processes (e.g., if a user fills their eShopping cart, but doesn’t check out), or to retry a 3rd party service call if the service does not respond.
  • Fan out – lets you run tasks in parallel. For example, if an API returns a list of objects, you could run the same manipulation on them simultaneously.
  • External interaction – in this pattern, the application pauses execution in one of its states, waits for input, and then resumes execution. For example, a deployment might require the approval of a manager before being completed. You might think this kind of implementation does not sit well with the fact that Azure Functions are billed by execution time, but when using durable functions, you are not billed for the time that a function sleeps while it is waiting for input.

Remote debugging for an Azure web application

Let’s see an example of deploying Azure functions and then debugging them using Ozcode Production Debugger. A Vehicle Identification Number (VIN) number is a 17-digit sequence of characters that identifies a vehicle by year, make, model, and a few other attributes. In this example, we’ll use Azure Functions to run a VIN through a few steps to decode how large the corresponding vehicle is. This is a good application of Azure Functions since it is quick to run and does not involve many potentially time-consuming dependencies like networks or databases.

Creating and deploying an application to Azure Functions

We’ll start by creating a new Azure Function in Visual Studio. Since this is the latest version of Visual Studio, we’ll create a Functions version 3 application. 

Create Azure Functions Project - Ozcode

Functions are kind enough to prompt you for what sort of function you’d like to template out. Pick HttpTrigger and set the authentication to Anonymous.

Create HTTP Trigger Function - Ozcode

You should now have an empty functions project. We’re going to use two APIs from the National Highway Traffic and Safety Administration (NHTSA) for our function. The first one to decode the VIN and the second to get dimensions of the corresponding vehicle. They are both documented here: https://vpic.nhtsa.dot.gov/api/.

You can grab the source files for the objects to deserialize from our repo (https://github.com/stimms/FunctionDebugging/blob/master/FunctionDebugging/DecodedVin.cs and https://github.com/stimms/FunctionDebugging/blob/master/FunctionDebugging/DecodedVehicleSpec.cs), or you can use the little-known Paste as Class function in Visual Studio. 

Next, clean out the code in your HTTP trigger and replace it with: 

const string OVERALL_LENGTH = "OL";
        const string OVERALL_WIDTH = "OW";
        const string OVERALL_HEIGHT = "OH";

        static HttpClient client = new HttpClient();
        [FunctionName("DecodeVIN")]
        public static async Task<IActionResult> Decode(
            [HttpTrigger(AuthorizationLevel.Function, "get", Route = "decode/{vin}")] HttpRequest req, String vin,
            ILogger log)
        {
            log.LogInformation(vin);

            string vinDecodeUrl = $"https://vpic.nhtsa.dot.gov/api/vehicles/decodevinvaluesextended/{vin}?format=json";
            var result = await client.GetStringAsync(vinDecodeUrl);
            var envelope = System.Text.Json.JsonSerializer.Deserialize<DecodedVinEnvelope>(result);
            var decodedVin = envelope.Results.FirstOrDefault();

            string vehicleSpecUrl = $"https://vpic.nhtsa.dot.gov/api/vehicles/GetCanadianVehicleSpecifications/?Year={decodedVin.ModelYear}&Make={decodedVin.Make}&Model={decodedVin.Model}&units=&format=json";
            var vehicleSize = await client.GetStringAsync(vehicleSpecUrl);
            var specEnvelope = System.Text.Json.JsonSerializer.Deserialize<DecodedVehicleSpecEnvelope>(vehicleSize);
            var length = Int32.Parse(specEnvelope.Results.First().Specs.Single(x => x.Name == OVERALL_LENGTH).Value);
            var width =  Int32.Parse(specEnvelope.Results.First().Specs.Single(x => x.Name == OVERALL_WIDTH).Value);
            var height = Int32.Parse(specEnvelope.Results.First().Specs.Single(x => x.Name == OVERALL_HEIGHT).Value);

            return new OkObjectResult(length * width * height);
        }

You should now be able to hit F5 and try out the function locally by passing in a VIN. Try this one for an Acura JH4KA3261JC024072. Perfect!

Now let’s deploy the function to Azure. For expediency, we’ll publish with a right-click, but this technique should not be used for real applications. Select an Azure App Service plan.

Select Service Plan - Ozcode

Currently, Ozcode Production Debugger doesn’t work against Consumption Plans, but that is something we’re exploring. Create a new App Service Plan in your subscription and deploy the app to it. You will need an Azure Functions Premium plan for this capability.

New App Service

Remotely debugging the app on Azure with Ozcode Production Debugger

I showed you how to install Ozcode Production Debugger on Azure in a previous post. Let’s do that now.

If we now invoke our function with Acura VIN from above, it should work fine. Go ahead and try. But now try it with a different VIN like this Tesla: 5YJRE11B081000394. You’ll see that you get back a 500 error, which is not too helpful. Let’s debug that on Azure with Ozcode Production Debugger. You’ll see what the issue is right away:

There are no results for this particular VIN. An easy fix for that is to catch the exception and return a helpful error message or inserting a default value.

Debugging the ephemeral on Azure

There are many ways to deploy an application to Azure. You could put your app on a Virtual Machine, or even a cluster of redundant Virtual Machines. You could run it in App Services or Azure Batch. You could use Service Fabric, a Kubernetes cluster, or even deploy it right to an Azure Container Instance. Each of these options has its advantages and disadvantages which need to be considered and balanced before choosing the right deployment model. In this post, I introduced Azure Functions as another way for you to deploy applications and provided some of the criteria that would make Azure Functions the right choice. It would be a shame if the benefits of serverless architectures were negated because you were constantly chasing after elusive bugs. But that doesn’t have to be the case. Now, the inherently ephemeral nature of serverless in general, and Azure Functions, in particular, does not have to be an impediment to effective debugging. Ozcode Production Debugger takes all that pain away.

The post Remote Debugging for Azure Functions Can Be a Breeze appeared first on Ozcode.

]]>
https://oz-code.com/blog/azure/remote-debugging-azure-functions-breeze/feed 2
Continuous Delivery with Feature Flags (Toggles) is More Difficult Than It Seems https://oz-code.com/blog/general/continuous-delivery-feature-flags-more-difficult-than-it-seems https://oz-code.com/blog/general/continuous-delivery-feature-flags-more-difficult-than-it-seems#comments Wed, 22 Apr 2020 13:17:41 +0000 https://oz-code.com/?p=10972 In the days of short release cycles, feature flags are a way to manage developing long features. But they also introduce a set of challenges with testing. Find out what you can do to overcome those challenges.

The post Continuous Delivery with Feature Flags (Toggles) is More Difficult Than It Seems appeared first on Ozcode.

]]>
This post is a republication of a post on my blog, Michael’s Coding Spot.

Back in the distant past, in a simpler time, version releases were very different from today. Each product version release was a huge ceremony. First came a very long period of planning that ended with a specification document. Then, the development of that spec, another long period of manual testing, and when all bugs were finally fixed, the deployment. This process took anywhere from two months to a year to finish.

These days, the software world is very different. We’re deploying versions every sprint, which lasts about one to three weeks. Some companies, like Facebook, deploy to production every few hours, without manual testing at all. How is this magic possible? How did we move from yearly releases to weekly? This was achieved, in part, with continuous delivery, which acts as a quality gate to every code change. For each addition to the code, a build machine runs a bunch of automated tests that make sure the application works well. If the code doesn’t compile, or if one of the tests fails, the code addition isn’t approved. This way, we can deploy with confidence, without worrying (too much) that some bug broke the application.

Facebook actually does a bit more than that. They use canary deployments. They first deploy to a small percentage of users, then to a bigger percentage, and when they’re sure everything’s just fine, they deploy to everybody else.

But what if you’re working on something that lasts much more than one sprint? Maybe three sprints or 10. Are you going to work on a separate branch, ending with a huge merge? Are you going to run automated tests on that branch? This matter is not that simple, as I recently experienced.

Developing Long Features

Developing long features adds a few challenges in a continuous delivery environment. Let’s say that you develop this feature in your own branch. This means that once in a while, you’ll need to back-merge from the master/trunk/develop branch to keep them close. But, if your feature involves system-wide design changes, this is going to prove difficult. As both you and other team members continue to add more code, the distance from your branch to master is going to keep growing, and you’ll spend more and more time fixing conflicts and restoring things to working conditions.

Another way to go is to commit your feature’s code to the master branch before the feature is functional. That’s where Feature Flags enter the scene. A feature flag (or Feature Toggle) is just a configuration that controls whether your feature’s enabled in runtime. It’s going to be off on master and on in your own branch. With a feature flag, you can merge your code to master while still turning it off at runtime for the user. Sounds great, but this actually presents a whole new set of problems. For one thing, creating a well-working feature flag isn’t always easy. If your feature is connected to a lot of pieces, that’s actually going to be pretty hard. But even if you’re able to do that, there’s the matter of tests.

Feature Flags and Tests

The reason we’re able to deploy with confidence so frequently is because of a good suite of tests. Those tests make sure that all of the existing features still work. While developing that new feature, you’ll probably want to run that suite of tests to make sure you didn’t break anything. Not to mention running a bunch of new tests added for the new feature. So here’s the big question: When running those tests in the build server, is the feature flag on or off? Since the feature is disabled for the end-user, we have to run tests with the feature flag off to make sure the application works well in production. But if the tests run with the feature flag off, then other members of the team can easily break your new feature’s functionality without even realizing it. This might not seem like a big problem, but if you’re developing an infrastructure change that’s going to affect the whole system, you’ll be fixing broken tests all day instead of moving forward with that feature.

One solution is to run the tests twice. Once with the feature flag on and another time with the feature flag off. That seems reasonable, except that what happens when you have several of these long features in development? Well, you’ll have to run all of your tests with every kind of permutation. This can be both long-running and hard to maintain. These features are still during development, they’re likely to break easily, and that’s going to halt progress for the entire team. After all, for each change in code, if even one permutation has a single faulty test, then you have to fix it before moving forward.

So What to Do?

Unfortunately, there aren’t any easy solutions here. One way to go is to avoid long branches. Break your task into many small ones that can be integrated into the system without feature flags. But that’s probably a luxury you don’t have, or you wouldn’t need feature flags in the first place.

Another solution is to avoid running tests with the feature flag on. This means the developer owning that feature pays the price for the bugs that the rest of the team introduces. Not a terrible solution up to a certain point.

Maybe the solution to these kinds of situations is not technological at all. You might solve these matters with good old-fashioned coordination. Make sure to prevent big refactor tasks when there’s a feature flag, do design reviews, and just talk to your fellow team members.

The post Continuous Delivery with Feature Flags (Toggles) is More Difficult Than It Seems appeared first on Ozcode.

]]>
https://oz-code.com/blog/general/continuous-delivery-feature-flags-more-difficult-than-it-seems/feed 1
Production Debugging and the Rhythm of DevOps https://oz-code.com/blog/production-debugging/production-debugging-rhythm-devops https://oz-code.com/blog/production-debugging/production-debugging-rhythm-devops#comments Tue, 14 Apr 2020 10:02:15 +0000 https://oz-code.com/?p=10793 DevOps has brought significant benefits to organizations that adopted it correctly. Production Debugging fits right into the pillars of DevOps and is a necessity needed to achieve DevOps excellence.

The post Production Debugging and the Rhythm of DevOps appeared first on Ozcode.

]]>
The embrace of DevOps has brought significant efficiency gains for organizations that have been willing to make the necessary investments in culture, process, and tooling. In teams that have successfully adopted DevOps practices, the different team members, from Development and QA, through to Operations work together in a smooth and predictable rhythm to roll out new capabilities that bring value to customers. This “rhythm of DevOps“ is one of the key factors responsible for the benefits DevOps brings to organizations, and production debugging fits right in with that rhythm.

DevOps has a much broader scope than just Dev and Ops, and for DevOps to work, different teams and processes need to be in rhythm:

  • Design teams must work in rhythm with development teams working by the principles of Agile development
  • Development teams must work in rhythm with QA testing their code
  • QA must work in rhythm with Operations in charge of deploying releases
  • Design, Development, QA, and Operations must all work in rhythm with customer demands and the needs of the business. These must all work together in harmony.

To keep customer value, and therefore, the business moving forward, all these different components of DevOps must be synchronized with the same rhythm. If one component fails, the potential bottleneck has a ripple effect on the whole process. For example, Development will get backed up if Operations is not keeping the pace for deployments. At Ozcode, we believe rapid and effective debugging is a critical extension needed in the DevOps value stream. If bugs are not quickly identified, triaged, and fixed, the rhythm of DevOps and the harmony between the teams will be broken. Let’s examine some of the pillars of DevOps to understand why effective production debugging is needed to maintain the rhythm and achieve DevOps excellence.

Automation and autonomous exception capture

“Automate everything” is one of the driving principles behind DevOps. It permeates through the pipeline from the developer’s workspace through to effectively monitoring applications and systems in Production. However, while automation has accelerated the DevOps pipeline, it has also added enormous pressure at every stage. More code faster also means more bugs in QA, staging, and Production; more risk of kinks in the rhythm of DevOps.  Add microservices and serverless architectures, and the potential for debugging nightmares becomes scary. For example, how do you reproduce a bug that manifests across several microservices, or a bug in serverless code that is running one moment, and gone the next?

The answer is to automate catching those bugs in real-time as they happen. This is what Ozcode Production Debugger does with autonomous exception capture. Instead of having to accurately recreate a set of production microservices in a Dev environment (good luck with that), or recreate the exact environment in which a serverless function executed (even better luck with that), the Ozcode agent records the bug exactly as it happened in the runtime environment – with the decompiled code execution flow, variable and function return values, log files, call stack, event trace, network requests, database queries and more.

Ozcode Production Debugger Full Runtime Environment

Now, not only does QA save time because the tester doesn’t have to work hard to gather all that information for a bug report, the developer gets everything needed to triage the error and really understand what happened – no guesswork or sifting through endless logs.

Collaboration in the context of an error

Collaboration is one of the key cultural aspects of DevOps, bringing together members of different teams across the DevOps pipeline. To keep the DevOps rhythm going, developers and QA need to understand Operations’ requirements and vice-versa. Real-time feedback enables effective communication and helps teams make changes to resolve errors quickly.

Ozcode Production Debugger promotes collaboration between teams across the DevOps pipeline. Teams are put into the same interactive debugging context by sharing a link among all relevant team members. Through this link, they can collaborate in real time in the collaboration panel – an extension to the debugging context through which team members can communicate in real-time – no matter where they are physically located.

Ozcode Production Debugger Collaboration Panel

The quality gates of your CI/CD pipeline

CI/CD has done wonders to shorten release cycles. Integration errors are now detected quickly, and thanks to short feedback loops back to the right developer, they are also fixed quickly. Now, shorter release cycles put pressure on QA to test more builds at each stage as they move up the CI/CD pipeline. Before a build gets promoted to the next stage, it must pass a set of quality gates, from unit tests, regression tests, performance tests, and more, each organization with its own policies. Ozcode Production Debugger is a quality gate that can dramatically improve the quality of builds as they move up the pipeline to production.

The Ozcode Production Debugger maintains a tally of each exception thrown by a build, and the number of times it was thrown. There are two ways these simple numbers are important quality gates. First, you can ensure there is no regression of an error that was supposedly fixed as new builds are released. Second, you can determine the severity of errors by the number of times they recur and set limits, so that builds with frequent errors do not get promoted. We will soon be releasing an API that will allow you to integrate Ozcode Production Debugger with your CI/CD tool to enable fully automated quality gates based on exceptions detected by the Ozcode agent.

Ozcode Production Debugger Dashboard

Continuous monitoring for errors

DevOps does not end with the successful deployment of a build to production. Once deployed, an application needs to be closely monitored. The wide variety of Application Performance Monitoring (APM) products available on the market provide insights into a variety of performance KPIs for production systems. While these tools claim to help in diagnosing production errors, they fall far short of providing radical observability into the faulty code necessary for a root cause analysis of the error. Ozcode Production Debugger delivers continuous monitoring for errors where the APMs fall short.

With a lightweight agent that has no perceptible impact on the systems that it monitors, Ozcode Production Debugger complements performance monitoring with error monitoring that lets you set quality KPIs to complement those performance KPIs. With a high-level view of exceptions over time, Ozcode gives you a picture of system health with regards to errors and provides instant alerts when new errors occur in production, enabling a short MTTR.

Continuous debugging: the DevOps metronome

Bugs happen in production and pre-production environments. Wherever they occur in the DevOps pipeline, rapid resolution is critical in order to keep the DevOps rhythm going. An unresolved error at any step of the way can slow everything down, delay releases, and hamper productivity. Ozcode Production Debugger introduces the concept of continuous debugging to the DevOps pipeline in that it applies to QA, Staging, and Production environments, so it effectively enables debugging continuously throughout the DevOps pipeline. In each environment, it continuously detects errors and points to the exact location in the running code where they occur, dramatically reducing debugging time by up to 80% to enable rapid recovery and keep the DevOps pipeline moving forward. Though bugs will happen in QA, Staging, and Production, with Ozcode Production Debugger, they needn’t slow down the rhythm of DevOps.

The post Production Debugging and the Rhythm of DevOps appeared first on Ozcode.

]]>
https://oz-code.com/blog/production-debugging/production-debugging-rhythm-devops/feed 2
#WFH vs. ~#WFH in Coronavirus Days https://oz-code.com/blog/general/wfh-vs-not-wfh-coronavirus-days https://oz-code.com/blog/general/wfh-vs-not-wfh-coronavirus-days#comments Thu, 02 Apr 2020 07:41:48 +0000 https://oz-code.com/?p=10710 During the coronavirus pandemic, some have to come into work while most of us are #WFH. What adjustments should companies make for both of these situations?

The post #WFH vs. ~#WFH in Coronavirus Days appeared first on Ozcode.

]]>
During this coronavirus pandemic, those who are fortunate enough to still be working can be divided into two groups: those who are working at home, and those essential workers who still have to come into the workplace. Companies now need to put in place a new and different set of priorities to ensure that both groups can continue to get their work done, while staying safe and healthy.

The new operational priorities for ~#WFH

For those still coming into work, the challenge of beating rush-hour traffic has been replaced with new challenges in the workplace, and companies must do everything in their power to help overcome these challenges.

Social distancing in the company of co-workers

Keeping our distance from people is more difficult than it seems. Man is a social animal and we’re all used to getting up close to those around us (to a greater or lesser extent depending on our respective cultures and concept of personal space), especially if they’re colleagues or even work-buddies. So companies must take steps to help keep us all 2m away from each other. Here are a few things they can do:

  • Minimize the use of open spaces. Take full advantage of offices that have been vacated by those who are #WFH.
  • If you do have to use an open space, at the very least, don’t occupy more than every other cubicle.
  • Put up signs around the office to remind people – it’s very easy to just approach someone for a quick discussion without thinking about it.
  • Stagger people’s coffee and lunch breaks to ensure they don’t congregate in the kitchen/canteen/coffee area. Similarly, if your opening and closing hours are strict, you should relax that requirement so that not everybody is arriving or leaving at the same time.
  • Depending on the size of your restrooms, don’t let more than one or two people in at a time. Make sure everyone is aware of alternative restrooms in case the one nearest to them is occupied, and If there’s a line forming outside a restroom, make sure people keep the 2m rule between them.

Hygiene at all costs

It seems hand sanitizer is the new liquid gold according to Google Trends.

Google Trends for Hand Sanitizer - Ozcode

And when all of this started, the supermarket and pharmacy shelves were depleted before you could rub your hands together. But by now, everyone has restocked, and hand sanitizers are now in plentiful supply. Here are some measures companies should take to make it easy to maximize hygiene at the office.

  • Companies should place a bottle of hand sanitizer at the office entrance and all employees should be strongly encouraged to disinfect their hands upon arriving at work. Throughout the course of the day, everyone should be encouraged to wash their hands regularly with soap and water.
  • Every workspace should be furnished with disinfectant that employees can use to wipe down their work surfaces
  • People will sneeze, people will cough. These human reactions to dust, particles, and other irritants are inevitable. Strongly promote coughing and sneezing into one’s inner elbow or sleeve and keep all workspaces well-ventilated.
  • Those who can tolerate it for the whole day should wear disposable gloves and a face mask. Ideally, this would be a requirement, however, I can tell you from experience, some of those facemasks really restrict your breathing, and I personally wouldn’t be able to work with one the whole day.

Making #WFH as productive as possible

Here at Ozcode, we’re all #WFH. We have all the tools to debug code and write code that helps you debug code while at home. That doesn’t mean we’re not essential workers. Like everyone else who is #WFH, it’s just not essential that we come into the office.

Working from Home on Zoom - Ozcode

Some will be more equipped than others for this new working arrangement, and companies should do everything they can to give #WFHomers all the tools and facilities they need to get their jobs done.

  • People working on desktop computers should either be furnished with laptops instead, or be allowed to take their desktops home, screen and all.
  • Make sure everybody has access to video conferencing. While nothing beats F2F, these applications are the next best thing, and can be instrumental in keeping everybody synch’d and psyched under sub-optimal working conditions. Take measures to alleviate bottlenecks on your network. With much of your workforce at home, you may experience performance issues. Where possible, move workloads away from on-premises servers to the public cloud. Many companies were already in the process of doing this as part of their digital transformation when coronavirus hit, so this would be a good time to accelerate cloud migrations and remove some strain from the company network. If necessary, increase the bandwidth to your premises, and in any case, monitor your network to identify any bottlenecks before they impact performance.

Another interesting Google trend for 2019 is that searches for heroes soared in 2019. The inspiring video in the link shows many ways in which people became heroes, whether locally or worldwide. Coronavirus has brought to light many new heroes. People are applauding medical staff all around the world. They are obvious heroes today, but in fact, all those still coming in to work are heroes of a sort. They are part of the critical group on whose shoulders lay the survival of the company, the “doctors and nurses” keeping the business healthy. But those working at home should be applauded as heroes also. They too are working under less than optimal conditions, making the best of a difficult situation. This is a new normal that we all have to get used to, at least for a while, but as long as everybody appreciates and applauds everybody else, tolerance, motivation, and hopefully productivity will remain high.  

The post #WFH vs. ~#WFH in Coronavirus Days appeared first on Ozcode.

]]>
https://oz-code.com/blog/general/wfh-vs-not-wfh-coronavirus-days/feed 1
When Production Debugging for .NET Meets Infrastructure as Code https://oz-code.com/blog/production-debugging/production-debugging-dotnet-meets-infrastructure-as-code https://oz-code.com/blog/production-debugging/production-debugging-dotnet-meets-infrastructure-as-code#comments Sun, 22 Mar 2020 19:52:33 +0000 https://oz-code.com/?p=10561 Infrastructure as code has brought great advances to DevOps. This post is based on a webinar in which I presented how to deploy Ozcode Production Debugger to Azure using Pulumi to reduce C# debugging time. The webinar recording is provided at the end of this post.

The post When Production Debugging for .NET Meets Infrastructure as Code appeared first on Ozcode.

]]>
Historically, deploying applications to production was a somewhat haphazard, undefined process. It looked something like this:

  • Build your application and compress it into a single file on a disk
  • Bring the disk it to your Operations team along with a 20-page instruction manual describing in minute detail how to deploy your application, including database setup, configuration variables, and a host of other parameters.
  • Operations would schedule the installation into their list of tasks, and about a week later, you would get an email from them that the application is installed.
  • Hooray!!! Let’s get started.
  • Not so fast.
  • Invariably, there were errors in the installation. The environment might not be configured correctly, or there was a missing module, or some other critical detail had been missed. Eventually, you weren’t even surprised this happened on the first try because Operations needed to follow a long, tedious, and highly detailed list of instructions for a manual installation. Clearly, the whole process was very error-prone
  • So, back to Operations for another try, more waiting…eventually, it would work.

The dawn of one-click provisioning

Infrastructure as Code (IaC) came along to make deployment of applications much more robust by using descriptions of the required infrastructure and configuration in highly structured file formats. Once the infrastructure was defined in files, those specifications could be version controlled, and provisioning became a repeatable, one-click process. Need to deploy infrastructure for your application on multiple environments for development, QA, and Staging?  No problem. Deployments based on IaC were much faster while reducing both costs and the risk of error.

Pioneers like Chef, Puppet, and Ansible were game-changers in this domain, and as cloud computing started taking over the industry, all the major cloud providers offered corresponding templating tools like Cloud Deployment Manager for GCP, ARM for Azure, and CloudFormation for AWS. Now, infrastructure templates are great, but the problem is that each vendor uses a different templating language. Anything you write as an ARM template will have to be rewritten if you want to port it to a CloudFormation template for deployment on AWS. Terraform took it a step further with scripting templates that abstracted away the specific provider so you could use the same script on different clouds.

While these tools make provisioning a much easier task, they all use descriptive languages such as XML, JSON, and YAML, which places some limitations. For example, try creating some infrastructure in a loop using Terraform. It’s possible, but it ain’t pretty.

Next-gen IaC with Pulumi

Pulumi takes a different approach to IaC. Instead of using declarative scripts, Pulumi defines infrastructure with “real” code, currently supporting JavaScript, Python, .NET, and Go. Specifying infrastructure with coding languages is much more flexible than using scripts, expanding your capabilities to support complex deployment scenarios. For example, creating infrastructure in a loop now becomes a trivial exercise. But it gets better. Effectively being in the context of a running program means you can make external calls as you’re provisioning infrastructure. For example, you can call a web service to generate a strong password as you’re spinning up a database. Or how about calling a web service that calculates the cost of running your infrastructure on each of the leading cloud service providers, and then taking the most cost-effective option.

Pulumi supports orchestration of any type of infrastructure, including serverless, Kubernetes, and containers, and currently works with the three major cloud providers, AWS, Azure, and GCP.

Time to get hands-on

Let’s see Pulumi in action. In this example, we’ll set up an environment on Azure with a web server and an SQL server database, and then deploy Ozcode Production Debugger to that environment. We’ll define all that using C# in Pulumi.

  1. Download Pulumi
    On windows the installer is just a Chocolatey Nuget command.
Install Pulumi

If you want, there are alternatives including a snippet of Powershell code: https://www.pulumi.com/docs/get-started/install/

Another way to install Pulumi

2. Make sure you have the latest .NET Core SDK installed. You can download it from: https://dotnet.microsoft.com/download

Download .NET Core SDK

Finally, you need to install the latest Azure CLI which you can download from: https://docs.microsoft.com/en-us/cli/azure/install-azure-cli-windows?view=azure-cli-latest

Install Azure CLI on Windows

4. Open a command prompt and create a new directory. In that directory, we’re going to start by running:

az login

This will prompt you to log into your azure account.

5. Run Pulumi to create the structure

pulumi new azure-csharp

6. Copy the program.cs file contents from: https://github.com/stimms/pulumideploy/blob/master/Program.cs

7. If you have your own ASP.NET Core project you want to deploy, update the content directory in your project to point at this location:
https://github.com/stimms/pulumideploy/blob/master/Program.cs#L56

If you don’t, you can use our example from here: https://github.com/stimms/pulumiwebapp, but you still need to update the path on line 56 up there to point at it.
Go ahead and build the WebApp.

8. Get a license key for Pulumi (I’m sure you have an explanation on the current way to do this) and update the code on line 89 to reference it.
https://github.com/stimms/pulumideploy/blob/master/Program.cs#L89

9. Run

pulumi up

This will stand up your new stack in Azure. You should be able to go to the site URL which is printed at the end of the pulumi run command and see the site running and generating primes (if you used our example app).

Install Ozcode Production Debugger

10. In the Azure portal, we want to set up the Production Debugger. Go to the WebApp and search for extensions:

Search for Ozcode Production Debugger extension on Azure

11. Select the Ozcode Production Debugger and install it:

Install Ozcode Production Debugger

12. Restart the Web App

13. Now we want to cause an exception to be thrown. In our App, you can simply append the following to the URL:
?countTo=30 

14. Sit back and watch the errors roll in. First time Ozcode Production Debugger detects an exception, it will instrument the relevant code. From then on, you will see all the information needed to debug the code including a complete stack trace, logs, variable values, network requests and more.

IaC pioneers like Chef, Puppet, and Ansible brought the benefits of version control and repeatability to infrastructure orchestration. Terraform took it a step further, abstracting away the actual cloud provider you’re using. Next-gen tools like Pulumi are at the forefront of this domain, enabling all the capabilities of popular programming languages in common use today.

To view a recording of the full webinar on which this blog is based, click below:

The post When Production Debugging for .NET Meets Infrastructure as Code appeared first on Ozcode.

]]>
https://oz-code.com/blog/production-debugging/production-debugging-dotnet-meets-infrastructure-as-code/feed 3
Advanced Object Graph Search for Visual Studio https://oz-code.com/blog/visual-studio-extension/advanced-object-graph-search-visual-studio Tue, 17 Mar 2020 16:13:50 +0000 https://www.oz-code.com/?p=7760 We will see how to use Ozcode to search for objects in a very complex object graph.

The post Advanced Object Graph Search for Visual Studio appeared first on Ozcode.

]]>
Turns out that the search tools in Ozcode are available for more than just collections: we can also search in objects even when the object graph is very complex. Let’s see how that works.

Real-World Examples

I always like to give people real-world examples when exploring features in Ozcode so they can see how truly applicable the features are. With complex object graphs, I had a little bit of trouble coming up with an idea. See the thing is that complex objects, large domain models and the such are difficult to work with and probably an anti-pattern. I didn’t want to show a bunch of terrible code so I had to search a little for a real-world complex object.

Enter Pulumi. Building complex infrastructure to host your applications used to involve writing vast deployment guides to hand over to some poor ops person who had to carefully follow the 12 pages of instructions. The rise of DevOps culture has highlighted what a bad practice that is and replaced it with repeatable infrastructure as code.

If you’ve ever had to build an ARM-template on Azure or a Cloud Formation template on AWS then you’ll know that it is an inelegant solution to the infrastructure as code problem. These files are a bunch of JSON which describes the resources on a cloud. If you want to be less tied to a particular cloud then you can use Terraform to create resources on a myriad of different clouds. However, out of the box, Terraform uses YAML another easy to mess up language.

Pulumi aims to change all of that. Pulumi mostly replaces the YAML and JSON with actual programming languages which allows for code completion, unit testing, and powerful parameterization. You can write your templates in TypeScript, JavaScript, or Python. C# (and one would assume other .NET languages) are in preview right now.

Let’s explore using Ozcode to improve examining the large objects in Pulumi.

Getting that Debugger Attached

The .NET version of Pulumi is a runner program called pulumi.exe coupled with the .NET Core code you write. Because your library is dynamically loaded by Pulumi.exe it can be a bit of a challenge to attach a debugger in the first place.

Fortunately, this world is just full of helpful people like Mikhail who works at Pulumi.

A helpful tweet goes a long way

So I did just that.

static Task<int> Main()
{
    var breakNow = true; //trick compiler and avoids unreachable code warning
    while (breakNow)
    {
        Thread.Sleep(1000);
    }
...

The executable produced by my build is called “infra.exe” so I opened the attach to process dialog and searched for that process.

Attaching to this did indeed get me into the debugger and let me manually skip out of the loop.

A Large Object

There are some good examples of pretty complex deployment scenarios on the Pulumi github and I chose a slight variation of this one for setting up an app service. We have a pretty sizable object in the configuration for the AppService:

AppServiceArgs appServiceArgs = new AppServiceArgs
{
    ResourceGroupName = resourceGroup.Name,
    AppServicePlanId = appServicePlan.Id,
    HttpsOnly = true,
    Tags =
    {
        { "Owner", "Jane" },
        { "EmergencyContact", "Frank" },
        { "CostCenter", "Marketing" }
    },
    AppSettings =
    {
        { "WEBSITE_RUN_FROM_PACKAGE", codeBlobUrl },
        { "UserName", "bob" },
        { "StorageAccount", container.StorageAccountName },
        { "RemoteURL", "https://google.com" },
        { "MaxUsers", 12.ToString() },
        { "MinUsers", 4.ToString() }
    },
    ConnectionStrings =
    {
        new AppServiceConnectionStringsArgs
        {
            Name = "db",
            Type = "SQLAzure",
            Value = Output.Tuple<string, string, string>(sqlServer.Name, database.Name, password).Apply(t =>
            {
                (string server, string database, string pwd) = t;
                return $"Server= tcp:{server}.database.windows.net;initial catalog={database};userID={username};password={pwd};Min Pool Size=0;Max Pool Size=30;Persist Security Info=true;";
            }),
        },
        new AppServiceConnectionStringsArgs
        {
            Name = "db",
            Type = "SQLAzure",
            Value = Output.Tuple<string, string, string>(sqlServer2.Name, database2.Name, password).Apply(t =>
            {
                (string server, string database, string pwd) = t;
                return $"Server= tcp:{server}.database.windows.net;initial catalog={database};userID={username};password={pwd};Min Pool Size=0;Max Pool Size=30;Persist Security Info=true;";
            }),
        },
    },
};

If we debug that then we can see how searching inside a large object really shines. In this case, we want to find who owns a resource which is indicated by a tag. There could be dozens of tags on a resource so hunting through a buried collection will take time.

Ozcode can search in objects, even several layers down to rapidly find exactly what we’re looking for: Jane owns this resource.

Searching inside a complex object

Just like in collections we can search deeper in the hierarchy by clicking on the chevrons.

Bonus

Now you’re through learning how to search in objects let me give you another cool tip. I don’t know if you’ve ever tried right-clicking on an object in the quick watch screen but there’s a world of cools stuff in there.

Quick watching showing search in objects

My favorite is the “Show All Instances of” because it immediately finds any other instances of an object. This is great when you need to hop about in a series of similar data structures.

The post Advanced Object Graph Search for Visual Studio appeared first on Ozcode.

]]>