Skip to content

RFC: Client Side State Encryption #874

Closed
@StephanHCB

Description

Summary

This feature adds the option to encrypt local state files, remote state, and plan files. Encryption is off-by-default.
Partial encryption, when enabled, only encrypts values marked as sensitive to protect credentials contained in the
state. Full encryption, when enabled, protects against any information disclosure from leaked state or plans.

Problem Statement

OpenTofu state and plans contain lots of sensitive information.

The most obvious example are credentials such as primary access keys to storage, but even ignoring any credentials
state often includes a full map of your network, including every VM, kubernetes cluster, database, etc.
That is a treasure trove for an attacker who wishes to orient themselves in your private network.

Unlike runtime information processed by OpenTofu, which only lives in memory and is discarded when the run ends,
state and plans are persisted. In large installations, state is not (just) stored in local files because multiple
users need access to it. Remote state backend options include simple storage (such as storage accounts, various
databases, ...), meaning these storage options do not "understand" the state, but there are also extended backends,
which do wish to gain information from state. The persistent nature and (often) cloud storage of state increases
the risk of it falling into the wrong hands.

Large corporations and financial institutions have compliance requirements for storage of sensitive information.
One frequent requirement is encryption at rest using a customer managed key. This is exactly what this feature
provides, and if you use it intelligently, even the cloud provider storing your state will not have access to the
encryption key at all.

User-facing description

OpenTofu masks sensitive values in its printed output, but those very same sensitive values are written to state:

example snippet from a statefile for an Azure storage account with the primary access key (of course not a real one)

Pay particular attention to the line listing the primary access key. The storage account listed here doesn't exist,
but if it did, the primary access key would give an attacker full access to all the data on the storage account.

Getting Started

Note: the exact format of the configuration is likely to change as we test out the implementation and figure out
the precise details. So don't rely too much on exact field names or format of the contents at this point in time.

With the feature this RFC is about, you could simply set an environment variable before running OpenTofu:

export TF_STATE_ENCRYPTION='{"backend":{"method":{"name":"full"},"key_provider":{"name":"passphrase","config":{"passphrase":"foobarbaz"}}}}'

For readability let's spell out the value of the environment variable even though you wouldn't normally set it like this:

export TF_STATE_ENCRYPTION='''{
  "backend": {
    "method": {
      "name": "full"
    },
    "key_provider": {
      "name": "passphrase",
      "config": {
        "passphrase": "foobarbaz"
      }
    }
  }
}'''

And suddenly, your remote state looks like this:

{
    "encryption": {
        "version": 1,
        "method": {
            "name": "full",
            "config": {}
        }
    },
    "payload": "e93e3e7ad3434055251f695865a13c11744b97e54cb7dee8f8fb40d1fb096b728f2a00606e7109f0720aacb15008b410cf2f92dd7989c2ff10b9712b6ef7d69ecdad1dccd2f1bddd127f0f0d87c79c3c062e03c2297614e2effa2fb1f4072d86df0dda4fc061"
}

This is the same state as before, only fully encrypted with AES256 using a key derived from the passphrase you provided.

Actually, most of the settings shown in the environment variable have sensible defaults, so this also works:

export TF_STATE_ENCRYPTION='''{
  "backend": {
    "key_provider": {
      "config": {
        "passphrase": "foobarbaz"
      }
    }
  }
}'''

You can also specify the 32-byte key directly instead of providing a passphrase:

export TF_STATE_ENCRYPTION='''{
  "backend": {
    "method": {
      "name": "full"
    },
    "key_provider": {
      "name": "direct",
      "config": {
        "key": "a0a1a2a3a4a5a6a7a8a9b0b1b2b3b4b5b6b7b8b9c0c1c2c3c4c5c6c7c8c9d0d1"
      }
    }
  }
}'''

Whether you use a passphrase or directly provide the key, it comes from an environment variable. Even if your state
is stored in another storage account, noone outside your organisation would have the encryption key.
Your users that run OpenTofu will need it, though.

Better yet, the key can also come from AWS KMS, all you'd need to change for that is the environment variable value:

export TF_STATE_ENCRYPTION='''{
  "backend": {
    "method": {
      "name": "full"
    },
    "key_provider": {
      "name": "awskms",
      "config": {
        "region": "us-east-1",
        "key_id": "alias/terraform"
      }
    }
  }
}'''

Or retrieve your encryption key from an Azure Key Vault, or GCP Key Mgmt, or Vault. Of course, if you retrieve
the key from the cloud provider your state storage is located at, they have both the state and the key now, so
maybe don't use the same cloud provider if you worry about attacks from their side (or from government actors):

Using external key retrieval options allows you to place the equivalent configuration in the
remote state configuration, so the configuration is checked in with your code, and still be
properly secure, because now the configuration does not need to include the actual encryption key.

Instead of full state encryption, you can have just the sensitive values encrypted in the state:

export TF_STATE_ENCRYPTION=TODO example

This will make your state look almost exactly like the original unencrypted state, so you can still easily doctor it if
you need to, except that the primary access key is now encrypted, and that the encryption section is present.

{
    "encryption": {
        "version": 1,
        "methods": {
            "encrypt/SOPS/xyz": {
               ...
            }
        }
    },
    TODO
}

Once Your State Is Encrypted

State encryption is completely transparent. All OpenTofu commands work exactly the same, even tofu state push and
tofu state pull work as expected. The latter downloads the state, and prints it in decrypted form, which is useful
if you ever run into the need to manually doctor your state. Lately, that need has become much rarer than it
used to be.

Since the configuration can be set in environment variables, wrappers like Terragrunt work just fine. As do typical
CI systems for OpenTofu such as Atlantis.

Note: We will need to test whether it is possible to use multiple different encryption keys with terragrunt. It may
be that within the same tree, you must stick to one key. We know from experience, that terragrunt run-all works
in that scenario.

If your CI system is more involved and insists on reading your state contents, you can't use full state encryption.
You may still be able to use partial state encryption, configuring it to only encrypt the sensitive values. This will still
prevent exposing your passwords to both the CI system and the state storage, greatly frustrating any threat actors
trying to get into your infrastructure through those attack vectors.

If you want to rotate state encryption keys, or even switch state encryption methods, there is a second
environment variable called TF_STATE_DECRYPTION_FALLBACK. This one is tried for decryption if the primary
configuration in TF_STATE_ENCRYPTION fails to decrypt your state successfully. Encryption, unlike decryption, always
uses only the primary configuration, so you can use this to rotate your key on the next write operation.

Unencrypted state is recognized and automatically bypasses the decryption step. That's what happens during initial
encryption, or if for some other reason your state happens to be currently unencrypted.

If you set TF_STATE_DECRYPTION_FALLBACK but not TF_STATE_ENCRYPTION, the next apply will decrypt your state.
This is your way out of this feature, if you, say, need to downgrade to a version of OpenTofu that doesn't yet support
state encryption.

Note: Depending on situation, you may need to make some change, e.g. to a null_resource, to force a state write.

Advanced Configuration

Instead of using an environment variable, you can also add equivalent configuration to the remote state configuration in your code:

terraform {
  // ...
  backend "azurerm" {
    // ...
  }

  state_encryption {
    # configuration for remote state encryption
    backend {
      key_provider {
        name = "awskms"
        config = {
          region = "us-east-1"
          key_id = "alias/terraform"
        }
      }

      # Default, so no need to spell this out
      # method {
      #   name = "full"
      # }

      # Force tofu to fail when encryption not correctly configured (e.g. forgot to set environment variable)
      required = true
    }

    # configuration for local statefile encryption
    statefile {
      key_provider {
        // ...
      }
      method {
        // ...
      }
      # Force tofu to fail when encryption not correctly configured (e.g. forgot to set environment variable)
      # required = true
    }
  }

  state_decryption_fallback {  
    // ...
  }
}

Of course, if used with the passphrase or direct key_provider, that would mean
you are checking in your encryption key with your code, so that is probably not what you want to do.

There are ways around this. First of all, configuration is merged between

  • configuration in the code under terraform { state_encryption { backend {} } }
  • configuration from the environment variable under the top-level key backend

in that order, so you could leave out the key parameter in the code and only provide that part of the configuration
via environment variable.

Or you could use one of the key derivation methods described above to get the key from Vault, Azure Key Vault,
GCP Key Mgmt, or AWS Key Mgmt. In that case, the configuration will not contain the encryption key, so that
would be safe to check in.

Mixing encryption keys and the terraform_remote_state data source

In principle, every terraform state can have its own key.

It is however not recommended to be too fine-grained here for reasons of practicality. In practice,
only codebases that have unrelated state will be easy to work on with separate keys. Remember that
any user who even just needs to run tofu init -upgrade will need read and write access to state anyway. You are
just making it harder for your users if you make things complicated.

Note: Some CI and automation tools may impose harder limits on this. Most notably, I suspect that terragrunt
run-all will not be happy with different keys in the same tree (but I haven't tried this).

Sometimes, one module depends on the state of another module. That is why the terraform_remote_state
data source will be expanded to also allow its own encryption configuration.

data "terraform_remote_state" "foo" {
  backend = "azurerm" # or "local"
  
  config = {
    // ...
  }

  state_encryption {
    // ...
  }
  state_decryption_fallback {
    // ...
  } 
}

Again, this configuration can also be specified via environment variable:

export TF_STATE_ENCRYPTION='''{
  "local": {
    "method": {...}
    "key_provider": {...}
  },
  "plan": {
    "method": {...}
    "key_provider": {...}
  },
  "backend": {
    "method": {...}
    "key_provider": {...}
  },
  "terraform_remote_state.foo": {
    "method": {...}
    "key_provider": {...}
  }
}'''

In this example, we are also encrypting local state and plan files. Note that many CI systems
may not like this.

Technical Description

Isolated Implementation

This feature can be built in a way that almost completely isolates it from existing code.

This isolation allows testing the feature almost completely using unit tests. The only parts that can not
be tested in isolation are

  • obtaining the values of the environment variables (a one-liner each)
  • interacting with external systems for key derivation (Vault, Key Vault, GCP Key Mgmt, AWS Key Mgmt)

The backend configuration is handled in internal/backend.

The remote state data source configuration is handled in internal/builtin/providers/tf.

OpenTofu has one place each where state files, remote state, or plan files are written. Right before
writing the byte slice we can insert the encryption step.

OpenTofu has one place each where state files, remote state, or plan files are read. Right after
reading the byte slice we can insert the decryption step.

Note that the configuration steps and the encryption steps are performed at very different times during
tofu runs.

Note also that each remote state data source can have its own encryption configuration, and that this configuration
comes from two sources

  • the relevant part of the environment variables
  • the code that defines the data source

Rather than managing instances of the encryption flow externally the current plan is to have a cache for
these instances, so that each place that needs them can acquire them. We shall see if this is really the best
approach.

Top Level Flow

Note: this description ignores caching

For encryption, the top level flow is

  • parse and validate the primary configuration
    • if failed, log and abort (configuration error)
  • if primary configuration unset, return state unchanged
  • try encryption with primary configuration
    • if successful, return result
  • log encryption failure and abort

For decryption, the top level flow is a bit more involved due to the fallback method

  • parse and validate both primary and fallback configuration
    • if either fails, log and abort (configuration error)
  • if neither configuration set, return state unchanged
  • if state is not encrypted
    • if primary configuration is set, so unencrypted state is unexpected, log a warning
    • return state unchanged
  • try decryption with primary configuration, if any,
    • if successful, return result
  • try decryption with fallback configuration, if any,
    • if successful, return result
  • log decryption failure and abort

Pluggable Encryption/Key Derivation Methods

Encryption/Decryption methods can be made pluggable, so they register themselves with the top level flow.

Each pluggable method can be tested in isolation.

Pluggable methods need to implement an interface like this one (draft):

// Method is the interface that must be implemented for a state encryption method.
//
// Note that the encrypted payload must still be valid json, because some remote state backends
// expect valid json.
type Method interface {
	// Decrypt the state or plan.
	//
	// payload is a json document passed in as a []byte.
	//
	// if you do not return an error, you must ensure you return a json document as a []byte.
	Decrypt(payload []byte, configuration Config) ([]byte, error)

	// Encrypt the plaintext state or plan.
	//
	// payload is a json document passed in as a []byte.
	//
	// if you do not return an error, you must ensure you return a json document as
	// a []byte, because some remote state storage backends rely on this.
	Encrypt(payload []byte, configuration Config) ([]byte, error)
}

Pluggable key derivation methods need to implement another, similar interface (draft):

// KeyProvider is the interface that must be implemented for a key provider.
type KeyProvider interface {
	// ObtainKey obtains an encryption key.
	//
	// if you do not return an error, you must ensure you return a valid encryption configuration
	// (possibly altered from the one you were given).
	ObtainKey(payload []byte, configuration Config) (Config, error)
}

Note: the exact details of the interfaces will likely change as we implement the feature.

Introduced Dependencies

For interaction with Vault, Key Vault, GCP Key Mgmt, AWS Key Mgmt the plan is to use
https://gocloud.dev/howto/secrets/. This library is already in use in Pulumi, where it is used to
derive session keys for AES encryption in much the same way we'd use it here. It is Apache 2.0 licensed.

A proof of concept with SOPS for partial encryption led us to conclude
that this library is currently not suitable for our use case
,
though this may change in the future, there are several open issues for enhancements.

Format of Encrypted State

Unencrypted state written by a current version of opentofu is a json document with top level fields as defined in
struct stateV4.

What should encrypted state look like?

There are some general requirements:

  • State must be a valid json document.
    For example, the Azure remote state backend sets content type "application/json" when storing state blobs, and some of the other remote backends also expect state to be json.
  • When decrypting, we must be able to recognize whether the json document has actually been encrypted using the method we're trying to use to decrypt it.
    It might be unencrypted (during initial encryption), it might be encrypted with a different method (while switching encryption methods), etc. Just passing through wrongly decrypted state must be avoided especially in the case of partially encrypted state, so we must be able to detect this.
  • When decrypting, we must be able to recognize whether state was successfully decrypted, for example by checking a hash.
    This is again necessary to avoid working on wrongly decrypted state.

Proposal: All encryption methods add a top-level object called encryption as follows:

{
  "encryption": {
       "version": 1,
       "method": {
            "name": "full",
            "config": {}
       }
   }
  ... (the rest of the state)
}

Hashing will be left to the method implementations. A method implementation is expected to fail if decryption produced an invalid hash. This will lead to the implementation falling through to the fallback decryption method, thus correctly dealing with these lifecycle events: key rotation / switching encryption methods / decryption.

If the "encryption" key is not present, the top level state decryption logic can recognize that it has been handed unencrypted state for decryption, can print a warning, and can just pass it through. There is no need to even enter any of the individual state encryption methods. Lifecycle event: initial encryption

Partial Encryption Methods

A partial encryption method encrypts some of the leaf values in the state document, but leaves its structure intact and does not alter any json keys. The idea is that software - or humans - will still be able to understand the state, possibly with some sensitive values masked by encryption.

Thus we get an extra requirement for this case:

  • The existing json keys and structure should not be altered by partial encryption methods.
    This ensures that a tolerant json reader can just read the state, and essentially ignore that some leaf values may be encrypted.

Proposal: Aside from the "encryption" top-level key, just keep the usual state fields "version", ..., "outputs", "resources".

Full Encryption Methods

A full encryption method encrypts the entirety of the state document, obscuring its structure along with all the values in it. The idea is that no information can be gleaned from the encrypted state.

Proposal: Aside from the "encryption" top-level key, ensure that the usual state fields "version", ..., "outputs", "resources" are NOT present. Instead, store the encrypted payload in a field like "payload" in some textual representation suitable for json (hex, base64, ...).

Key Derivation Methods

A key derivation method is used before the specified encryption method. Its purpose is not to change the state, but rather to derive an encryption key.

Proposal:

Add an object "encryption" / "key_provider" only if the key provider needs to store information in the state.

Format of encrypted plan files

Note that plan files have a different format - they are zip files, so partial encryption does not make a lot of sense for them, but other than that many similar considerations apply.

Proposal: Encryption wraps them in a json structure. See the section about Full Encryption Methods for state.

Note: We will need to ensure that partial encryption methods are not available for plan files, and this means configuration for plan file encryption should probably be separated from the configuration for state encryption so they can be configured independently.

Interactions with other features

I am not currently aware of any interactions with other OpenTofu features.

There are CI systems out there that read and interpret state or plan files. These will not work together with
full state encryption. Partial encryption of only sensitive values may yield better results.

Edge Cases

The User-facing description covers the known edge cases regarding lifecycle events, which are all covered
by the Top Level Flows (see above).

Encrypted state is marked as such, and contains information about the encryption method used to produce it.
In order to avoid working on invalid state, OpenTofu will fail rather than work with state it cannot
decrypt.

If a key derivation method is in use that relies on an external system, there is a chance that writing state
will fail at the end of an apply run. This is not fundamentally different from writing remote state,
which also relies on external systems, but this introduces extra failure modes, which should be handled
in the implementation of those methods that talk to external systems.

If encrypted state is truncated, the hash value will not be correct, so decryption will fail completely.
This can make doctoring such state harder. The recommendation in that case is to fall back on the
last correct version of the state from backup.

Rationale and alternatives

This feature has been requested many years ago, but was never added, even though a PR was open for a long time.
Some users have expressed dismay at this situation years ago.

Where I work, CISO asked if we could find a way to encrypt state. I agree with their assessment that having lots of
credentials in the state is a security issue, that's why I wrote the original code a few years ago, which we have now
been using in production for several years without any problems.

This proposal can be implemented such that nothing changes by default, but if enabled, state can be selectively
or completely protected by encryption.

There are certainly other feasible solutions, such as redesigning OpenTofu to not store credentials in its state, but
I am not aware of a solution that is as minimally invasive and at the same time as effective as this proposal.

Downsides

One concern that has been raised is that rather than introducing encryption, we should "do it right" and
redesign OpenTofu to not store credentials in state, or even not to have state at all. The concern is that
implementing encryption for state will reduce the impetus to really fix what is considered a fundamental
design flaw in OpenTofu.

Another concern is an increase in complexity that goes along with encryption. My answer to this is that
the encryption is off-by-default, so unless one actively enables it, nothing changes for the end user.

We need to take care to explain the feature well in the documentation. We need to ensure users are made aware
the feature is fundamentally limited to symmetric encryption (because every OpenTofu run will need to be able to
read and write state), and that state encryption is an additional security feature, and not meant to replace
e.g. role-based access control or firewall rules. Also, users need to be aware that encrypting state does
not really reduce the amount of knowledge of your resources that your cloud provider has, because they can
just ask their API, after all everything is running on their hardware. The only mitigation I can offer is
good documentation that clearly mentions these caveats.

We will also need to clearly explain that encrypted state means the users must ensure backup of the key or
pass phrase, or else they may lose all their state. It may seem obvious, but hindsight always is.

While my original implementation did not add any dependencies, this proposal plans to rely on libraries, some with a
rather large dependency tree. Introducing a list of additional dependencies incurs a maintenance cost. If one of the
dependencies becomes unmaintained or otherwise unavailable, users may be forced to switch encryption and key management
methods. Also, build times and binary size will increase.

Unresolved Questions

  • I have tried a few providers, and when looking at the state, the sensitive_fields array was always empty. Only the
    outputs had the sensitive boolean flag set in state. This may adversely affect partial encryption of just the
    sensitive fields. This will need further investigation before partial encryption of only sensitive fields
    can be implemented.

Related Issues

#297 Proposal: State Encryption

(This RFC was developed and discussed in comments on that proposal while it matured.)

opentofu/roadmap#19

#801 Secrets should not be stored in state

#1030 Implementation issue with subtasks (the first version will
only feature full state encryption)

Proof of Concept

See this comment for the latest version of the PoC, including instructions for use.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

acceptedThis issue has been accepted for implementation.rfc

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions