Repeatable execution by Mark Seemann
What to log, and how to log it.
When I visit software organisations to help them make their code more maintainable, I often see code like this:
public ILog Log { get; } public ActionResult Post(ReservationDto dto) { Log.Debug($"Entering {nameof(Post)} method..."); if (!DateTime.TryParse(dto.Date, out var _)) { Log.Warning("Invalid reservation date."); return BadRequest($"Invalid date: {dto.Date}."); } Log.Debug("Mapping DTO to Domain Model."); Reservation reservation = Mapper.Map(dto); if (reservation.Date < DateTime.Now) { Log.Warning("Invalid reservation date."); return BadRequest($"Invalid date: {reservation.Date}."); } Log.Debug("Reading existing reservations from database."); var reservations = Repository.ReadReservations(reservation.Date); bool accepted = maîtreD.CanAccept(reservations, reservation); if (!accepted) { Log.Warning("Not enough capacity"); return StatusCode( StatusCodes.Status500InternalServerError, "Couldn't accept."); } Log.Info("Adding reservation to database."); Repository.Create(reservation); Log.Debug($"Leaving {nameof(Post)} method..."); return Ok(); }
Logging like this annoys me. It adds avoidable noise to the code, making it harder to read, and thus, more difficult to maintain.
Ideal #
The above code ought to look like this:
public ActionResult Post(ReservationDto dto) { if (!DateTime.TryParse(dto.Date, out var _)) return BadRequest($"Invalid date: {dto.Date}."); Reservation reservation = Mapper.Map(dto); if (reservation.Date < Clock.GetCurrentDateTime()) return BadRequest($"Invalid date: {reservation.Date}."); var reservations = Repository.ReadReservations(reservation.Date); bool accepted = maîtreD.CanAccept(reservations, reservation); if (!accepted) { return StatusCode( StatusCodes.Status500InternalServerError, "Couldn't accept."); } Repository.Create(reservation); return Ok(); }
This is more readable. The logging statements are gone from the code, thereby amplifying the essential behaviour of the Post
method. The noise is gone.
Wait a minute! you might say, You can't just remove logging! Logging is important.
Yes, I agree, and I didn't. This code still logs. It logs just what you need to log. No more, no less.
Types of logging #
Before we talk about the technical details, I think it's important to establish some vocabulary and context. In this article, I use the term logging broadly to describe any sort of action of recording what happened while software executed. There's more than one reason an application might have to do that:
- Instrumentation. You may log to support your own work. The first code listing in this article is a typical example of this style of logging. If you've ever had the responsibility of having to support an application that runs in production, you know that you need insight into what happens. When people report strange behaviours, you need those logs to support troubleshooting.
- Telemetry. You may log to support other people's work. You can write status updates, warnings, and errors to support operations. You can record Key Performance Indicators (KPIs) to support 'the business'.
- Auditing. You may log because you're legally obliged to do so.
- Metering. You may log who does what so that you can bill users based on consumption.
Particularly when it comes to instrumentation, I often see examples of 'overlogging'. When logging is done to support future troubleshooting, you can't predict what you're going to need, so it's better to log too much data than too little.
It'd be even better to log only what you need. Not too little, not too much, but just the right amount of logging. Obviously, we should call this Goldilogs.
Repeatability #
How do you know what to log? How do you know that you've logged everything that you'll need, when you don't know your future needs?
The key is repeatability. Just like you should be able to reproduce builds and repeat deployments, you should also be able to reproduce execution.
If you can replay what happened when a problem manifested itself, you can troubleshoot it. You need to log just enough data to enable you to repeat execution. How do you identify that data?
Consider a line of code like this:
int z = x + y;
Would you log that?
It might make sense to log what x
and y
are, particularly if these values are run-time values (e.g. entered by a user, the result of a web service call, etc.):
Log.Debug($"Adding {x} and {y}."); int z = x + y;
Would you ever log the result, though?
Log.Debug($"Adding {x} and {y}."); int z = x + y; Log.Debug($"Result of addition: {z}");
There's no reason to log the result of the calculation. Addition is a pure function; it's deterministic. If you know the inputs, you can always repeat the calculation to get the output. Two plus two is always four.
The more your code is composed from pure functions, the less you need to log.
Log only impure actions #
In principle, all code bases interleave pure functions with impure actions. In most procedural or object-oriented code, no attempt is made of separating the two:
I've here illustrated impure actions with red and pure functions with green. Imagine that this is a conceptual block of code, with execution flowing from top to bottom. When you write normal procedural or object-oriented code, most of the code will have some sort of local side effect in the form of a state change, a more system-wide side effect, or be non-deterministic. Occasionally, arithmetic calculation or similar will form small pure islands.
While you don't need to log the output of those pure functions, it hardly makes a difference, since most of the code is impure. It would be a busy log, in any case.
Once you shift towards functional-first programming, your code may begin to look like this instead:
You may still have some code that occasionally executes impure actions, but largely, most of the code is pure. If you know the inputs to all the pure code, you can reproduce that part of the code. This means that you only need to log the non-deterministic parts: the impure actions. Particularly, you need to log the outputs from the impure actions, because these impure output values become the inputs to the next pure block of code.
This style of architecture is what you'll often get with a well-designed F# code base, but you can also replicate it in C# or another object-oriented programming language. I'd also draw a diagram like this to illustrate how Haskell code works if you model interactions with free monads.
This is the most generally applicable example, so articles in this short series show a Haskell code base with free monads, as well as a C# code base.
In reality, you can often get away with an impureim sandwich:
This architecture makes things simpler, including logging. You only need to log the inital and the concluding impure actions. The rest, you can always recompute.
I could have implemented the comprehensive example code shown in the next articles as impureim sandwiches, but I chose to use free monads in the Haskell example, and Dependency Injection in the C# example. I did this in order to offer examples from which you can extrapolate a more complex architecture for your production code.
Examples #
I've produced two equivalent example code bases to show how to log just enough data. The first is in Haskell because it's the best way to be sure that pure and impure code is properly separated.
Both example applications have the same externally visible behaviour. They showcase a focused vertical slice of a restaurant reservation system. The only feature they support is the creation of a reservation.
Clients make reservations by making an HTTP POST request to the reservation system:
POST /reservations HTTP/1.1 Content-Type: application/json { "id": "84cef648-1e5f-467a-9d13-1b81db7f6df3", "date": "2021-12-21 19:00:00", "email": "[email protected]", "name": "Mark Seemann", "quantity": 4 }
This is an attempt to make a reservation for four people at December 21, 2021 at seven in the evening. Both code bases support this HTTP API.
If the web service accepts the reservation, it'll write the reservation as a record in a SQL Server database. The table is defined as:
CREATE TABLE [dbo].[Reservations] ( [Id] INT NOT NULL IDENTITY, [Guid] UNIQUEIDENTIFIER NOT NULL UNIQUE, [Date] DATETIME2 NOT NULL, [Name] NVARCHAR (50) NOT NULL, [Email] NVARCHAR (50) NOT NULL, [Quantity] INT NOT NULL PRIMARY KEY CLUSTERED ([Id] ASC)
Both implementations of the service can run on the same database.
The examples follow in separate articles:
Readers not comfortable with Haskell are welcome to skip directly to the C# article.Log metadata #
In this article series, I focus on run-time data. The point is that there's a formal method to identify what to log: Log the inputs to and outputs from impure actions.
I don't focus on metadata, but apart from run-time data, each log entry should be accompanied by metadata. As a minimum, each entry should come with information about the time it was observed, but here's a list of metadata to consider:
- Date and time of the log entry. Make sure to include the time zone, or alternatively, log exclusively in UTC.
- The version of the software that produced the entry. This is particularly important if you deploy new versions of the software several times a day.
- The user account or security context in which the application runs.
- The machine ID, if you consolidate server farm logs in one place.
- Correlation IDs, if present.
Conclusion #
You only need to log what happens in impure actions. In a normal imperative or object-oriented code base, this is almost a useless selection criterion, because most of what happens is impure. Thus, you need to log almost everything.
There's many benefits to be had from moving towards a functional architecture. One of them is that it simplifies logging. Even a functional-first approach, as is often seen in idiomatic F# code bases, can simplify your logging efforts. The good news is that you can adopt a similar architecture in object-oriented code. You don't even have to compromise the design.
I've worked on big C# code bases where we logged all the impure actions. It was typically less than a dozen impure actions per HTTP request. When there was a problem in production, I could usually reproduce what caused it based on the logs.
You don't have to overlog to be able to troubleshoot your production code. Log the data that matters, and only that. Log the impure inputs and outputs.
Comments
I like the simplicity of "log the impure inputs and outputs", and logging to ensure repeatability. But consider a common workflow: Load a (DDD) aggregate from DB, call pure domain logic it, and store the result.
The aggregate may be very complex, e.g. an order with not just many properties itself, but also many sub-entities (line items, etc.) and value objects. In order to get repeatability, you need to log the entire aggregate that was loaded. This is hard/verbose (don't get too tempted by F#'s nice stringification of records and unions – you'll want to avoid logging sensitive information), and you'll still end up with huge multi-line dumps in your log (say, 10 lines for the order and 10 lines per line item = 100 lines for an order with 9 line items, for a single operation / log statement).
Intuitively to me, that seems like a silly amount of logging, but I can't see any alternative if you want to get full repeatability in order to debug problems. Thoughts?
If your application caches sensitive information, then it would be reasonble to only cache an encrypted version. That way, logging the information is not a security issue and neither is holding that infomration in memory (where malware could read it via memory-scraping).
Not all logging is meant to be directly consumed by humans. Structued logging is makes it easy for computers to consume logging events. For an event sourcing architecture (such as the Elm architecture), one can record all events and then create tooling to allow playback. I hope Elmish.WPF gets something like this if/when some of the Fable debugging tools are ported to F#.
Christer, thank you for writing. There's two different concerns, as far as I can see. I'd like to start with the size issue.
The size of software data is unintuitive to the human brain. A data structure that looks overwhelmingly vast to us is nothing but a few hundred kilobytes in raw data. I often have that discussion regarding API design, but the same arguments could apply to logging. How much would 100 lines of structured JSON entail?
Let's assume some that JSON properties are numbers (prices, quantities, IDs, etcetera.) while others could be unbounded strings. Let's assume that the numbers all take up four bytes, while Unicode strings are each 100 bytes on average. The average byte size of a 'line' could be around 50 bytes, depending on the proportion of numbers to strings. 100 lines, then, would be 5,000 bytes, or around 5 kB.
Even if data structures were a few orders of magnitude larger, that in itself wouldn't keep me up at night.
Of course, that's not the whole story. The volume of data is also important. If you log hundred such entries every second, it obviously adds up. It could be prohibitive.
That scenario doesn't fit my experience, but for the sake of argument, let's assume that that's the case. What's the alternative to logging the impure operations?
You can decide to log less, but that has to be an explicit architectural decision, because if you do that, there's going to be behaviour that you can't reproduce. Logging all impure operations is the minimum amount of logging that'll guarantee that you can fully reproduce all behaviour. You may decide that there's behaviour that you don't need to be able to reconstruct, but I think that this ought to be an explicit decision.
There may be a better alternative. It also addresses the issue regarding sensitive data.
Write pure functions that take as little input as possible, and produce as little output as possible. In the realm of security design there's the concept of datensparsamkeit (data frugality). Only deal with the data you really need.
Does that pure function really need to take as input an entire aggregate, or would a smaller projection do? If the latter, implement the impure operation so that it returns the projection. That's a smaller data set, so less to log.
The same goes for sensitive input. Perhaps instead of a CPR number the function only needs an age value. If so, then you only need to log the age.
These are deliberate design decision you must take. You don't get such solutions for free, but you can if you will.
Thank you for the reply. It makes sense that this should be a deliberate design decision.
I'm working full-time with F# myself, and would be very interested in seeing how you think this could be solved in F#. In this series, you are demonstrating solutions for Haskell (free monads) and C# (dependency injection), but as you have alluded to previously on your blog, neither of those are idiomatic solutions in F# (free monads are cumbersome without higher-kinded types, and DI primarily fits an object-oriented architecture).
I realize you may not choose to write another blog post in this series tackling F# (though it would nicely "fill the gap" between Haskell and C#, and you're definitely the one to write it!), but even a few keywords/pointers would be helpful.
Christer, thank you for writing. I am, indeed, not planning to add another article to this small series. Not that writing the article itself would be too much trouble, but in order to stay on par with the two other articles, I'd have to develop the corresponding F# code base. That currently doesn't look like it'd be the best use of my time.
In F# you can use partial application for dependency injection. I hope that nothing I wrote gave the impression that this isn't idiomatic F#. What I've demonstrated is that it isn't functional, but since F# is a multiparadigmatic language, that's still fine.
The C# example in this article series shows what's essentially an impureim sandwich, so it shouldn't be too hard to translate that architecture to F#. It's already quite functional.