The case for Structured, Contextual Logs

By: Morten Jensen, Sep 23 2019

Introduction

If correctly composed, logs can be an extremely useful resource to tap into in the following use cases:

Business & Operational Dashboards can today be built to aggregate and chart metrics derived from logs in near-realtime. Dashboards can be tailored not only to IT but also to the wider organisation.

Alarms can be fired to alert people either by thresholds or by anomaly from the log-derived metrics.

Logs and metrics are enablers for feedback loops that enable organisations to react quickly.

In other words, proper logs and supporting services are a business-critical necessity in any organisation today. They need to be treated as such as they can give an organisation that is geared towards consuming actionable metrics a competitive advantage through quick response.

Structure & Correlation

To support the wide variety of log use cases, logs should meet certain criteria:

Some log entry fields will always be present. For instance consider a public-facing web service - HTTP request method & path should always be present in order to identify the call type and operation. Same goes for timestamp, message and a log-entry linking request id.

Other fields are added dynamically. For instance, a web service HTTP response status, which is only known towards the end of the processed request.

Traditional logs have in the past consisted of one-line text where fields were separated by spaces or tab, commas or other arbitrary delimiters like brackets etc. Such log formats all suffer from the challenge of effectively needing to write or configure a parser to ingest them into a store where they can be searched, filtered and aggregated. Furthermore, the formatting is often found to be inconsistent across different services and therefore service log correlation can be hard to achieve.

JSON has evolved to be an excellent format to emit logs in. It can be searched, filtered and events aggregated in a way that supports all of the above-mentioned use cases.

Example

Here is a simplified example, which shows 2 log entries for the same request to a hypothetical AWS Lambda Calendar web service:

# Request log start entry
{
    "thread": "main",
    "level": "INFO",
    "loggerName": "com.example.lambda.Handler",
    "message": "handleRequest: start",
    "contextMap": {
        "client-id": "calendar-android-app-v1.23",
        "user-id": "87654321-1234-5678-90ab-1234567890ab",
        "request-id": "12345678-7da1-49e3-8a7b-207a12a740b6",
        "request-method": "POST",
        "request-path": "/calendar/event",
        "start-free-memory": "52084216",
        "start-max-memory": "883884032",
        "start-time": "2019-09-23 06:50:16.461 +0000",
        "start-total-memory": "62849024",
        "viewer-country": "IE"
    }
}

# Request log end entry
{
    "thread": "main",
    "level": "INFO",
    "loggerName": "com.example.lambda.Handler",
    "message": "handleRequest: end",
    "contextMap": {
        "client-id": "calendar-android-app-v1.23",
        "user-id": "87654321-1234-5678-90ab-1234567890ab",
        "duration": "4645",
        "end-free-memory": "40031544",
        "end-max-memory": "883884032",
        "end-time": "2019-09-23 06:50:21.106 +0000",
        "end-total-memory": "62980096",
        "request-id": "12345678-7da1-49e3-8a7b-207a12a740b6",
        "request-method": "POST",
        "request-path": "/calendar/event",
        "response-status": "200",
        "start-free-memory": "52084216",
        "start-max-memory": "883884032",
        "start-time": "2019-09-23 06:50:16.461 +0000",
        "start-total-memory": "62849024",
        "viewer-country": "IE"
    }
}

From the two log entries we can derive any number of insights - for example:

Furthermore, the log entries satisfy practically all of the established criteria:

Tools

Achieving contextual and structured logs like this is possible and indeed reasonably simple to implement across programming languages like Java and Python. A couple of examples are:

Additional work to standardise log emission and formats across code bases and services through e.g. Decorator or Adapter patterns is definitely warranted.

Getting started

Code bases today can contain 100’s or 1000’s of different log calls. Many code bases use older log frameworks, which do not natively emit JSON - for instance Log4j 1.x.

Furthermore, many log calls may not emit particularly relevant logging information. Often, some log entries have been left in from a past debug session.

It can therefore be daunting to get started refactoring all of the logs.

We would make the following recommendations to get started:

Personally Identifiable Information

Some considerations are warranted if storing personally identifiable information in logs. The considerations include:

Conclusion

Structured, contextual logs solve a number of business, operational and development challenges, which means that giving logs the attention they deserve will help organisations immensely.

There are several tools to handle storage, usage and the lifecycle of logs. In particular AWS provides a comprensive, Cloud-native set of services that satisfy most organisational log requirements.

Finally, in the past many complaints have been made suggesting that JSON is unreadable to humans. However, managed services like the AWS Cloudwatch Logs & Insights Consoles have gone a long way to make it easier to read the logs. While it takes some getting used to reading JSON logs it should therefore be less of a concern today.