Logging has become a best practice companies use to obtain observability into their software systems when they have to investigate issues that inevitably turn up. Traditionally, logs are added as an integral part of the source code that we refer to as “static logs.” The observability that static logging provides comes at a cost. The more observability you want, the more logs you have to generate, store, and manage. This activity can significantly impact the system’s performance and incur high running costs so that companies continuously have to address the tradeoff between observability and performance/costs by tuning their static logging. Nevertheless, companies are not going to stop using static logs any time soon. Among their many uses, they are frequently the first line of defense when resolving production issues and still the best way to maintain historical records of system usage over long periods of time.
The paradox and pitfalls of static logs
When adding static log messages to code, developers have three questions to answer:
- Severity: What should the severity level of the message be?
- Location: Where exactly in the code should the message go?
- Information: Exactly what information should the message provide?
While the answers to each of these questions can dramatically affect the observability/cost tradeoff, the most significant one is what information to capture. If you don’t capture enough information, you don’t get the observability you need to resolve issues when they turn up. On the other hand, capturing too much information can carry unacceptable consequences in terms of cost and performance. This is the paradox of static logs:
If you know what to log, you’ve already solved the bug.
Consequently, an incident resolution process that relies on static logs requires adding logs to gain the observability needed for a root cause analysis. However, adding logs carries the burden of going through lengthy CI/CD cycles, deploying a new build to production, and then reproducing the error. These are processes that can take anywhere from hours to days, and often, several iterations are needed. Even then, there are several pitfalls developers must avoid:
- Data privacy: Logs cannot contain sensitive data that must remain hidden from developers.
- Logging data structures: Logging structures like recursive trees or large lists can significantly impact memory and occupy large volumes of storage.
- Side effects: The act of logging certain fields can change the state or flow of a program.
- Log freshness: As software evolves, log entries can become stale and irrelevant, or worse, misleading.
- Exceptions: Logging when an exception is thrown can be tricky because application state can change between throwing the exception and catching it.
- Conditional logging: Adding logic just to decide if to log a message or not carries the questionable risk of changing your program flow.
Dynamic logging is complementary to static logging and takes a different approach. Instead of adding lines of source code that create log messages, it uses byte code manipulation to instrument your live code and add a log message when and where it’s needed at runtime. The log message has access to and can report on any data in scope at its location and is removed once data collection is no longer needed. Ozcode uses an agent installed alongside your application to enable dynamic logging by letting you set tracepoints (aka non-breaking breakpoints) anywhere in your live running code. During a tracepoint session, as the application runs through a tracepoint, it generates the specified log message and provides access to the complete application state at that point. With the whole application state available at each tracepoint, there is no longer any question about what data to log, which effectively solves the paradox of static logs.
Essentially, dynamic logging gives us maximum observability, at code level, on-demand.
Dynamic logging with Ozcode also avoids the pitfalls of static logging:
- PII redaction keeps data private
Ozcode’s live debugger can be configured to redact PII so sensitive information never leaves the production environment and is masked in the dynamic log output.
- Resource usage is capped for complex data structures
The Ozcode agent caps resource usage, so dynamic logging with Ozcode places no noticeable strain on CPU or memory.
- No side-effects
The Ozcode agent avoids properties that can mutate the object, thus ensuring no side effects when capturing an object’s state.
- Dynamic logs are always fresh
Dynamic logs are defined by the developer on-demand in the live running code, so they are always fresh and up to date.
- Capturing exceptions is a core competence for Ozcode
Ozcode’s live debugger captures full time-travel debug information for all exceptions, so there is no concern of losing any information related to exceptions in dynamic logging.
- Conditional logs can be updated at any time
Since dynamic logs do not incur any side effects, they pose no risk for conditional logs and can be updated as needed on the fly.
In addition, dynamic logs incur far less overhead than static logs in terms of the amount of code generated and CPU usage. Moreover, when logging is disabled, static logs can still have a performance impact while dynamic logs have none.
Static logging still plays a critical role in modern computing systems. It is the first line of defense when resolving production errors and is also still the best way to maintain and monitor activity over long periods of time. Still, the paradox of static logging places limitations when trying to resolve errors in live systems, but these can be overcome by dynamic logging. By providing code-level observability on-demand, dynamic logging enables a quick incident response time to dramatically reduce MTTR while delivering better software, faster and at a lower cost.