Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental support for using an OCI repository as an alternative provider installation method #2170

Draft
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

apparentlymart
Copy link
Contributor

For now this is just a prototype of part of what we're discussing over in #2163.

It might eventually turn into a real implementation, but for now the goal is to just get the basic behavior in place so that we can try it out against some real OCI registry implementations, and we can get some experience with the new workflow for building and publishing providers into an OCI registry before we finalize the design in the RFC.


For now the new feature is guarded by the "experiments" mechanism, so it's only available when OpenTofu is built with experiments enabled. This is to reinforce that nothing here is final yet. It remains to be seen whether we'll merge with it still guarded in this way or if we'll wait until the design is finalized enough to enable it in release builds.

You can enable experiments when building the tofu executable, like this:

go install -ldflags="-X 'main.experimentsAllowed=yes'" ./cmd/tofu

@apparentlymart apparentlymart self-assigned this Nov 15, 2024
@opentofu opentofu deleted a comment from github-actions bot Nov 15, 2024
@apparentlymart apparentlymart force-pushed the f-providersource-oci-mirror branch 11 times, most recently from c08e220 to ba2fce1 Compare November 20, 2024 00:33
@apparentlymart
Copy link
Contributor Author

Just a note to self for later:

Upstream in Go there's a recently-accepted proposal to add a safer filesystem abstraction called io.Root that is designed to prevent operations from "escaping" from a particular root directory, whether by direct relative paths or by symlinks: golang/go#67002 .

This likely won't be available soon enough for us to use it for our first round here, but we can try to design our extraction code to be compatible with the shape of that API so that we can adopt it later, so that we won't need to maintain a "safe file I/O" cross-platform abstraction forever ourselves.

This extends our existing concept of "experimental features" to the CLI
Configuration language, so that we can (in future commits) have
configuration features that are not yet fully implemented or not yet ready
to be constrained by our compatibility promises.

As with all other uses of "experiments" in this codebase, these are
available only if explicitly activated at build time and so experimental
CLI config features would not be available in official release builds.

Signed-off-by: Martin Atkins <[email protected]>
@apparentlymart apparentlymart force-pushed the f-providersource-oci-mirror branch 4 times, most recently from 0e21c3c to b8a81df Compare November 21, 2024 00:26
@apparentlymart
Copy link
Contributor Author

apparentlymart commented Nov 21, 2024

This now has some basic real implementations of both the provider source and the "package location" type.

As things currently stand there are two quirks:

  1. I initially wrote it to expect GetImageMetadata to return an object address using the specific digest that the metadata was derived from and wrote the test with that in mind, but then when I tested it for real later I found that it actually just echoes back the address it was given, including a tag-based reference.

    I did it the way I did it because we typically want to return the most specific location possible so that when we describe the location in UI output it leaves little doubt about what exactly OpenTofu was fetching, and the digest of the image manifest seemed like the best way to achieve that within the OCI distribution model. However, we could perhaps relent if it's too difficult to return that information.

    (Unlike all of the other PackageLocation implementations, this one directly contains the digests of all of the layers we intend to retrieve, so I don't think there's a security benefit to locking in the digest of the image manifest here: we're going to retrieve the selected layers whatever happens. But mentioning the specific digest in our messaging means that the reader can't be confused if someone later changes the tag to point at a different image manifest.)

  2. If I temporarily bypass the check in the above to see how the rest of it behaves, I end up with an interesting error during the actual installation step (in PackagePackageOCIObject.InstallProviderPackage):

    failed to open GZIP stream for /tmp/sha256_29bcc7505719a7b2fd72d514d516258766f527ff13e84d345ecae6e1fc699765.tar.gz (gzip: invalid header)
    

    If I actually inspect that file I find that it contains JSON, rather than a gzipped tar archive:

    {
      "_type": "https://in-toto.io/Statement/v0.1",
      "predicateType": "https://slsa.dev/provenance/v0.2",
      "subject": [
        {
          "name": "pkg:docker/ghcr.io/apparentlymart/[email protected]?platform=darwin%2Farm64",
          "digest": {
            "sha256": "1eade246136cc142a85dff3e1a2d06153e994c4acd3d979ac40d29861cf41a1c"
          }
        }
      ],
      "predicate": {
        "builder": {
          "id": ""
        },
        "buildType": "https://mobyproject.org/buildkit@v1",
        "materials": [
          {
            "uri": "pkg:docker/[email protected]?platform=linux%2Famd64",
            "digest": {
              "sha256": "0f2d5c38dd7a4f4f733e688e3a6733cb5ab1ac6e3cb4603a5dd564e5bfb80eed"
            }
          }
        ],
        "invocation": {
          "configSource": {
            "entryPoint": "Dockerfile"
          },
          "parameters": {
            "frontend": "dockerfile.v0",
            "locals": [
              {
                "name": "context"
              },
              {
                "name": "dockerfile"
              }
            ]
          },
          "environment": {
            "platform": "linux/amd64"
          }
        },
        "metadata": {
          "buildInvocationID": "ruyzqann80ztk29z08fcakiyh",
          "buildStartedOn": "2024-11-20T14:59:14.921334022-08:00",
          "buildFinishedOn": "2024-11-20T14:59:15.642217555-08:00",
          "completeness": {
            "parameters": true,
            "environment": true,
            "materials": false
          },
          "reproducible": false,
          "https://mobyproject.org/buildkit@v1#metadata": {}
        }
      }
    }

    I assume I must've built a differently-shaped package than the OCI Distribution client code is expecting, but I'm not familiar enough with the fine details of these protocols and manifest conventions to guess what I did wrong.

    In order to produce a multi-platform image I used docker buildx like this:

    docker buildx build \
        --platform linux/amd64,darwin/amd64,windows/amd64,windows/arm64,linux/arm64,darwin/arm64 \
        --tag ghcr.io/apparentlymart/opentofu-provider-assume:0.1.0 \
        .

    ...with the following Dockerfile:

    FROM --platform=$BUILDPLATFORM alpine:3.14 AS build
    ARG TARGETPLATFORM
    ARG BUILDPLATFORM
    ARG TARGETOS
    ARG TARGETARCH
    RUN wget -O package.zip https://github.com/apparentlymart/terraform-provider-assume/releases/download/v0.1.0/terraform-provider-assume_0.1.0_${TARGETOS}_${TARGETARCH}.zip
    # README.md excluded because in some of my packages it's apparently a broken symlink included accidentally :(
    RUN unzip package.zip -x README.md -d package
    FROM scratch
    LABEL org.opentofu.package-type="provider_binary"
    COPY --from=build /package/* /

    My OCI repository is currently public, although I will probably delete it and recreate it once we're done working here since I'm currently using it only for testing.

I've run out of time for today, but I'll probably continue poking at this tomorrow.


I'm currently testing using the following root module:

terraform {
  required_providers {
    assume = {
      source = "ghcr.io.opentofu-oci.example.com/apparentlymart/assume"
    }
  }
}

...and the following CLI Configuration to opt in to the ability to install a provider directly from an OCI repository without having to set up an oci_mirror (this opt-in is temporary just to avoid this half-finished and probably-vulnerable code getting exercised by accident):

provider_installation {
  direct {
    oci_registry_experiment = true
  }
}

@abstractionfactory
Copy link
Contributor

@apparentlymart thanks, I'll try to debug why this is happening and I'll also add some content-type checks, maybe we need to ignore layers that are not tarballs.

@abstractionfactory
Copy link
Contributor

@apparentlymart found the bug, libregistry had a foreach-variable-reference issue when selecting the multi-arch manifest. Try running this:

go get github.com/opentofu/libregistry@oci

@apparentlymart
Copy link
Contributor Author

With the updated libregistry, along with some other tweaks for things I'd missed in the install implementation, I was able to successfully install my "assume" provider from my OCI repository:

Initializing provider plugins...
- Finding latest version of ghcr.io.opentofu-oci.example.com/apparentlymart/assume...
- Installing ghcr.io.opentofu-oci.example.com/apparentlymart/assume v0.1.0...
- Installed ghcr.io.opentofu-oci.example.com/apparentlymart/assume v0.1.0 (unauthenticated)

OpenTofu has created a lock file .terraform.lock.hcl to record the provider
selections it made above. Include this file in your version control repository
so that OpenTofu can guarantee to make the same selections by default when
you run "tofu init" in the future.

╷
│ Warning: Incomplete lock file information for providers
│ 
│ Due to your customized provider installation methods, OpenTofu was forced to calculate lock file checksums locally for the following providers:
│   - ghcr.io.opentofu-oci.example.com/apparentlymart/assume
│ 
│ The current .terraform.lock.hcl file only includes checksums for linux_amd64, so OpenTofu running on another platform will fail to install these providers.
│ 
│ To calculate additional checksums for another platform, run:
│   tofu providers lock -platform=linux_amd64
│ (where linux_amd64 is the platform to generate)
╵

OpenTofu has been successfully initialized!

You may now begin working with OpenTofu. Try running "tofu plan" to see
any changes that are required for your infrastructure. All OpenTofu commands
should now work.

If you ever set or change modules or backend configuration for OpenTofu,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

Since we don't currently have any means to distribute the provider developer's signed checksums this currently falls into the same warning as for the other alternative installation methods where the dependency lock file ends up incomplete and needs to be explicitly completed using the tofu providers lock command.

Unfortunately because that command intentionally bypasses the CLI-configuration-level provider installation settings (because by default it assumes you want to pull the checksums from the origin registry) that command doesn't currently work, but if we decide to move forward with this design then I don't expect it will be too difficult to update that command to use the new getproviders.DirectSource instead of getproviders.RegistrySource and then it'll start supporting the same OCI registry selection tricks that the oci_registry_experiment = true option is currently enabling as a temporary opt-in.

(We still have the thing about returning the digest-based reference instead of the tag-based reference from the metadata request, but we're going to deal with that part soon with some further updates to libregistry.)

This is a new provider installation method which is similar to
network_mirror but uses a set of conventions for fetching a provider from
an OCI distribution repository, instead of a wholly OpenTofu-specific
protocol.

This commit only introduces the CLI configuration block and the decoding
and validation of its contents, and only for experimental builds. If
someone tries to use this in an experimental build then it will return
an error saying that the new method isn't supported yet. Full support for
it will follow in later commits.

Signed-off-by: Martin Atkins <[email protected]>
This is a stub for a future provider installation method that will use a
user-provided template to translate a provider source address into an
OCI distribution repository address and then use the OCI distribution
registry protocol to obtain package metadata and, eventually, the packages
themselves.

Signed-off-by: Martin Atkins <[email protected]>
Previously we treated PackageLocation only as pure data, describing a
location from which something could be retrieved. Unexported functions in
package providercache then treated PackageLocation values as a closed
union dispatched using a type switch.

That strategy has been okay for our relatively-generic package locations
so far, but in a future commit we intend to add a new kind of package
location referring to a manifest in an OCI distribution repository, and
installing from _that_ will require a nontrivial amount of
OCI-distribution-specific logic that will be more tightly coupled to the
getproviders.Source that will return such locations, and so we're switching
to a model where _the location object itself_ is responsible for knowing
how to install a provider package from its location, as a method of
PackageLocation.

The implementation of installing from each location type now moves from
package providercache to package getproviders, which is arguably a better
thematic home for that functionality anyway.

For now these remain tested only indirectly through the tests in package
providercache, since we didn't previously have any independent tests for
these unexported functions. We might want to add more tightly-scoped unit
tests for these to package getproviders in future, but for now this is not
materially worse than it was before.

Signed-off-by: Martin Atkins <[email protected]>
This package location type will be used only by our new "OCI mirror"
provider installation method, to allow it to describe installation of
a container image with a specific digest from a specific repository on
a specific registry.

As with all of the "OCI mirror" functionality for now, this is starting
just as a stub with implementation to follow later once we have an OCI
distribution client implementation available in this codebase.

Signed-off-by: Martin Atkins <[email protected]>
We'll be mapping from provider source address to OCI distribution
repository address using a user-provided template, so we need to make sure
that the result is actually a valid address after evaluating it.

One potential cause of an invalid result is from trying to install a
provider whose namespace or type name contains non-ASCII characters, since
OpenTofu allows any character that would be valid in an "internationalized
domain name" while OCI distribution only allows ASCII letters and numerals.
Therefore we have a special error message for that case, but also a more
general error message for situations where the literal parts of the
template are somehow wrong.

It's not ideal to only be able to map a subset of valid provider addresses
onto OCI repository addresses, but since non-ASCII characters in provider
source addresses seem pretty rare in practice we'll accept this for now
and see if we get feedback about it. If someone does need to use such a
provider from an OCI mirror in the meantime, a workaround would be to
use an oci_mirror installation method with the exact provider address in
the "include" argument, and then use a totally-literal template that maps
to an address that only includes ASCII characters.

Signed-off-by: Martin Atkins <[email protected]>
This is a light implementation of the "Level 1" profile of URI Templates as
described in RFC 6570.

For automatic installation of providers from OCI registries we need to be
able to generate an OCI distribution repository address based on a provider
source address, since different OCI distribution registry implementations
have different restrictions on which name shapes they will allow.

For the "oci_mirror" provider installation method we used HCL templates
for this, which made sense in that context because installation methods
are specified in the HCL-based CLI configuration.

However, for automatic installation using an OCI registry as the provider
registry for a particular hostname the mapping from provider source string
to OCI repository address will live in the host's service discovery
document, which is retrieved over the network and so if we used HCL there
then we'd be exposing the HCL parser to arbitrary input retrieved over
the network.

We ought to use something less heavy and more standardized for a wire
protocol, and RFC 6570's URI template syntax provides a plausible
compromise of something that is standardized outside of OpenTofu and is
relatively minimal in what it provides, particularly at level 1.

Technically an OCI distribution repository address (at least, the way we're
defining them) is _not_ actually a URI, but it's similar enough in shape
to a URI that we can use this spec as a starting point, and also set us
up to use the URI template syntax for similar needs in the service
discovery protocol in future, beyond the current OCI-specific goals.

Signed-off-by: Martin Atkins <[email protected]>
…stry

In future commits this experiment will enable support for an alternative
provider registry protocol defined in terms of the OCI distribution
specification, instead of OpenTofu's own provider registry protocol.

This commit only includes some temporary syntax for opting in to the
experiment, which for now will just cause an immediate error if enabled.
The actual implementation of this new setting is still to come.

Signed-off-by: Martin Atkins <[email protected]>
This turned out to make more sense architecturally to live in the svchost
module where the service discovery protocol is implemented, since a
service discovery document is where the URI templates would appear.

Signed-off-by: Martin Atkins <[email protected]>
In experimental builds this allows using the CLI configuration to opt-in
to a different treatment of the "direct" installation method where
we support both "providers.v1" and "oci-providers.v1" services and, for
the latter, treat the service discovery string as a template resolving to
an OCI repository address whose images are to be used as provider packages.

For now this uses a forked version of the upstream "svchost" module, with
the URI template and OCI registry address support added. If we decide to
move forward with this then we'll need to decide whether to make this fork
permanent (presumably moving it into the opentofu organization) or to
find some other approach to handle it outside of the svchost module.

Signed-off-by: Martin Atkins <[email protected]>
Previously we were using a string consisting of a hostname and a name
concatenated with a slash, but the OCI distribution spec doesn't define
any such address syntax -- that's something we've essentially invented
for OpenTofu, albeit with strong inspiration from Docker -- so we'll
use a struct type to represent these addresses internally and then deal
with parsing the slash-concatenated form at the same time as we're doing
all of the other parsing, for consistency.

This means that it's now the responsibility of whatever is dealing with
the address template to also deal with the splitting of the hostname and
the repository name.

Signed-off-by: Martin Atkins <[email protected]>
In addition to the OCI mirror and the ability to use service discovery for
a fully-custom mapping from OpenTofu provider address to OCI repository
address on a particular hostname, this introduces a special hostname suffix
that can be used after any existing OCI registry hostname to achieve an
opinionated default mapping to that registry's namespace of OCI
repositories.

We don't actually have a real domain to use for this yet, so for now we're
using opentofu-oci.example.com as a placeholder. If we decide to move
forward with this approach then we'd switch this to using a domain we
actually own.

Signed-off-by: Martin Atkins <[email protected]>
@apparentlymart apparentlymart force-pushed the f-providersource-oci-mirror branch 2 times, most recently from a38298f to d61e0f2 Compare November 21, 2024 19:27
@apparentlymart
Copy link
Contributor Author

apparentlymart commented Nov 21, 2024

We actually got the tag vs. digest thing sorted today after all, so that's now in and this is basically working for a first installation from nothing.

The verification of packages against checksums already recorded in the dependency lock file on subsequent installation doesn't seem to be working quite right yet: it's reporting a checksum failure even though the checksum actually matches. I'll figure out what's causing that next.

Edit: I fixed the checksum verification problem, so this now supports reinstallation of a provider version previously recorded in the lock file. With that done, I think this is now minimally functional, and so I'm going to start researching what our options might be for actually authenticating the packages (to achieve something similar to the GPG-based authentication OpenTofu does for providers installed using its own registry protocol, but in a more container-ecosystem-idiomatic way).

This is an initial implementation of OCIMirrorSource in terms of
libregistry's OCI client code. It has the minimum required to select a
suitable image manifest from a multi-platform manifest and return it as
a PackageOCIObject location.

It does not yet support package authentication at all, and it also
currently fails because libregistry's client is returning a tag-based
image manifest address, rather than the digest for the specific image it
returned. We'll deal with both of those concerns in future commits.

Co-authored-by: AbstractionFactory <[email protected]>
Signed-off-by: Martin Atkins <[email protected]>
This now uses the included OCI distribution client to pull the object
by iterating over all of the directory entries in all of the layers that
were included in the manifest.

This is a minimal initial implementation for experimenting with, but it
has numerous TODOs and FIXMEs to attend to before we could use this in
a non-experimental capacity.

Co-authored-by: AbstractionFactory <[email protected]>
Signed-off-by: Martin Atkins <[email protected]>
@apparentlymart
Copy link
Contributor Author

apparentlymart commented Nov 22, 2024

Some early notes on package authentication. Nothing below is a decision; I'm just writing down what I've learned so far for later reference.

The prevailing conventions for container authentication are those implemented by Cosign. Here's what I've understood so far about the shape of that:

  • A container image is signed by signing its top-level manifest, which for our purposes is the manifest that the selected tag directly refers to, and is the multi-platform wrapping manifest that in turn points to a separate manifest for each supported platform. The top-level manifest's digest is derived from the digests of all of the platform-specific manifests, which are in turn derived from the digests of the layer blobs, so this effectively indirectly signs the entire artifact.

  • An image signature is conventionally represented by a tag in the same repository whose name is the digest of the top-level manifest (with the colon replaced by a dash) followed by the suffix ".sig". In the repository I've been using for testing, an example is sha256-e53413b272c8a766ef39960a49af0eee2341fa9e520013fc229bab007412291a.sig.

  • The main thing that OpenTofu achieves with provider package signatures is using the checksum of one downloaded package to indirectly trust other packages that were signed along with it: if OpenTofu is installing a package for linux_amd64 and that package matches one of the checksums that the provider developer has signed then OpenTofu assumes that all of the other signed checksums (for other platform packages) are similarly trustable, and will record them in the dependency lock file. (The assumption here is that the operator will somehow verify the specific package they've just installed, and will check the dependency lock file into version control only if they conclude that the new package was trustworthy. In the case where the checksums were all signed by the provider developer, we assume that deciding to trust the package you installed implies also deciding to trust the packages for other platforms signed along with it.)

    Without this extra assurance OpenTofu conservatively assumes that it must only record the checksum it calculated itself, which is what leads to the "Warning: Incomplete lock file information for providers" warning and typically to the need to explicitly run tofu providers lock to produce a lock file that will work for coworkers who run OpenTofu on different platforms.

    I think the most direct analog to that design within the Cosign conventions would be to introduce a new OpenTofu checksum scheme that directly captures the digest of a platform-specific image manifest, and then for a signed image we'd be able to capture all of the platform-specific image manifests into the dependency lock file, but for an unsigned image we'd only capture the digest of the one we actually installed.

    (In both cases we'd also capture the installation-method-agnostic h1: checksum for the package that was installed, for similar reasons to why OpenTofu typically saves both zh: and h1: checksums for providers installed via its own registry protocol: a container image digest can only be used to verify installation from a container registry, and cannot be used to verify a local cached copy of the provider or a package distributed via a filesystem or network mirror.)

  • A secondary thing that OpenTofu does with provider package signatures today is to mention the identity of the signer in the tofu init output, currently always presented as an ASCII-armored GPG key ID string.

    Signer identity is considerably more complicated for cosign, though for well-argued reasons. The most typical way to use cosign is to ask an identity provider to assert that the requester controls an identity owned by that provider, such as a GitHub username.

    (Other usage patterns are possible, such as signing using locally-issued keys, but I'm going to focus on the identity-provider-based model for now since that seems the most appropriate for public distribution of packages. In future we might wish to also allow operators to configure OpenTofu to trust other authorities when interacting with specific OCI registries/repositories for internal use only, but I'll dig into that some more another time.)

    Being able to present the signer as a human-legible identifier such as a GitHub username seems like a clearly superior result than just returning an opaque GPG key ID, so I think this difference is a benefit rather than a problem. However, OpenTofu's internal model for "package authentication" is currently not flexible to other kinds of identity and so we will likely need to rework that API to support other identity types. That API is all internal though, so there's no significant blocker to changing it once we've developed a deeper understanding of how we ought to generalize it.

As a starting point I'm going to experiment with this new checksum type for capturing image manifest digests directly into the dependency lock file, purely as an additional verification method. I need to study Cosign's details more before getting stuck into the actual signature verification part, and I expect we'll need to grow the libregistry OCI client API a little more to expose the additional operations needed to find and fetch the signature and other cosign metadata for a multi-platform image manifest.

@abstractionfactory
Copy link
Contributor

Cross-linking the cosign issue: #307

@apparentlymart apparentlymart force-pushed the f-providersource-oci-mirror branch 3 times, most recently from fd0d2c0 to 84e0151 Compare November 22, 2024 23:28
@apparentlymart
Copy link
Contributor Author

apparentlymart commented Nov 23, 2024

I encountered an interesting new design challenge while working through implementing the new hash scheme today:

Previously we had one hash scheme that supported everything (the h1: scheme) and then a weird extra legacy hash scheme that only works for not-yet-unpacked zip files (the zh: scheme). This means that we could compensate for some locations not supporting zh: by falling back on h1:, at the expense of some gaps when installing from unusual locations like an unpacked filesystem mirror where you tend to need to use tofu providers lock to force the calculation of extra h1: hashes to go along with the zh: ones.

The new scheme I've added today, which for now is called ch: for "container hash", is similar to zh: in that it only works for one location type: OCI repository manifests. However, it has a new problem: we cannot calculate a h1: hash from an OCI repository manifest location, and we can't calculate a ch: hash from anything except an OCI repository manifest location, and so there's no overlapping hash scheme we can use as a bridge for "upgrading" from location-type-specific to general hashing.

This has a few different interesting implications, but the most troublesome one for right now is that the current package authentication/verification model doesn't differentiate between pre-installation and post-installation authentication/verification: it expects to be able to perform all checks before installation (e.g. based on the .zip file we've not extracted yet) and then a subset of checks after installation (just h1: against the local directory) to make sure that what was installed still matches what we'd intended to install.

Therefore I expect to need to make some tweaks to the authentication model so that we can more cleanly separate the pre-install and post-install checks. For OCI manifest locations in particular we can only check ch: checksums before installation, and we can only check h1: checksums after installation, and the h1: checksums become relevant only when reinstalling a package we've seen before whose checksums have already been recorded in the dependency lock file by a previous tofu init run.

It's getting near the end of my work week now so I don't really have time to get into solving that properly today, but I'll think about it some more next week.

An OCI registry talks about the content of objects using its own special
digest scheme, and we can't derive a "h1:" checksum directly from those,
so instead we'll follow a similar approach as with the legacy "zh:" scheme
we use to work around some comparable limitations of the OpenTofu provider
registry protocol where it only knows how to talk about whole-zip-file
checksums, and not checksums of their contents.

As of this commit nothing is actually using this scheme, but
OCIMirrorSource and PackageOCIObject will make more use of it in future
commits to allow us to pre-populate hashes for all platforms whenever a
container image is validly signed.

Signed-off-by: Martin Atkins <[email protected]>
Our PackageAuthentication model was designed for a world where at least
some authentication tasks need to wait until after we've fetched a package
from a remote location, but for OCIMirrorSource we have the advantage that
the repository is content-addressable and so we can check ahead of time
whether the platform-specific manifest is signed and then we only need to
make sure during installation that the manifest and layers actually match
their digests.

Therefore this introduces NewPrecheckedAuthentication to allow for this
special situation where all of the authentication happens inside a
Source.PackageMeta, rather than some being delayed until the package
has been fetched. This PackageAuthentication implementation just returns
exactly what it was given at instantiation, and so
OCIMirrorSource.PackageMeta can for now report just that the checksum was
verified because fetching the package implicitly verifies its checksum.

In future commits we'll also integrate Cosign signature checks into
OCIMirrorSource.PackageMeta, and if that succeeds we'll be able to report
that the package is to be treated as signed as long as its checksum is
valid during installation.

Signed-off-by: Martin Atkins <[email protected]>
@apparentlymart
Copy link
Contributor Author

apparentlymart commented Nov 26, 2024

After some further consideration today I realized that the main thing that's special about installing from an OCI repository is that the repository is content-addressable and so installing from the "location" returned by OCIMirrorSource.PackageMeta also implicitly verifies that the digests of the manifest and any layers are correct, so it's actually okay that we cannot verify the ch:-schemed checksum against the local directory after installation.

Therefore I've added a new special getproviders.PackageAuthentication implementation to handle situations like this where the Source.PackageMeta implementation is able to pre-authenticate the package whose location it's returning, and so we just need to get a fixed authentication result and acceptable hashes into the PackageMeta object to use once the installation step has succeeded.

Since we don't yet have enough libregistry functionality to implement the Cosign signature checks the implementation currently just always reports "checksum verified" as the authentication result, as a temporary placeholder. In future commits we can make OCIMirrorSource.PackageMeta perform the Cosign signature check and indicate in its result that the package is signed, which would then allow it to be treated just like a signed package from the main OpenTofu provider registry protocol, including the recording of a ch: hash for each platform-specific manifest that was covered by the signature into the dependency lock file.

Tomorrow I'm going to investigate exactly what additional information we'd need to determine whether a particular manifest has a signature and, if so, to fetch the information required to verify that signature.


At RFC time we'll also need to ponder whether the environment variable OPENTOFU_ENFORCE_GPG_VALIDATION should also cause OCIMirrorSource.PackageMeta to return an error if it encounters an unsigned manifest.

That environment variable was introduced to make OpenTofu fail if any package from the main OpenTofu registry lacks a valid signature, and so I expect that anyone who has set it would prefer similar treatment for installation from OCI repositories too, but the environment variable name is a bit of a misnomer for that case since we won't actually be using GPG for OCI object signing.

For the sake of this experimental implementation I'm going to ignore that question and just make it return signingSkipped when there doesn't seem to be any signing metadata available for a particular package, since that's a plausible place to start. Despite that, it won't be effective for an attacker to delete the signature metadata for a previously-signed manifest, because we'll use the manifest digests from the dependency lock file to ensure that the manifests all still match what we had previously verified as signed.

@abstractionfactory
Copy link
Contributor

abstractionfactory commented Nov 26, 2024

@apparentlymart there is an older signature verification mechanism that isn't as complex as cosign (which won't be available in libregistry as long as the outstanding issues regarding stability and support are resolved in their go library). I never managed to get it to work on previous projects (mainly due to a lack of tooling a the time), but Docker Content Trust may be worth looking at.

@apparentlymart
Copy link
Contributor Author

My intention with Cosign for the moment was just to understand how their verification method works as a protocol, rather than as a specific Go library implementation, since we might be able to reimplement just that part either directly in OpenTofu or in libregistry, without depending on the reference implementation at all.

Of course, whether that is viable will depend on how complicated the protocol is and how many different variations it supports. If it seems like a separate verification-only implementation would either be too complex or too hard to maintain over time then of course I'd consider other alternatives instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants