Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an ability to install providers directly from GitHub #2109

Closed
kvendingoldo opened this issue Oct 28, 2024 · 21 comments
Closed

Add an ability to install providers directly from GitHub #2109

kvendingoldo opened this issue Oct 28, 2024 · 21 comments
Labels
enhancement New feature or request pending-decision This issue has not been accepted for implementation nor rejected. It's still open to discussion.

Comments

@kvendingoldo
Copy link

kvendingoldo commented Oct 28, 2024

The problem in your OpenTofu project

The current provider registry model, inherited from Terraform and designed by HashiCorp, centralizes control over provider distribution. While this model was effective initially, the registry has now become a potential point of restriction for some users due to political, security, and accessibility concerns.

This dependency limits certain users from accessing or using providers freely, as they may not be able to—or may not wish to—access the official registry due to restrictions, governance, or compliance issues.

Proposal

Make OpenTofu decentralized.

Introduce functionality to install providers directly from GitHub repositories, bypassing the centralized registry. This would enable users to specify a GitHub repository as a provider source, allowing for more flexible and decentralized provider management. The solution should include the following capabilities:

  • Direct GitHub installation: Allow users to specify a GitHub repository (in URL or user/repo format) to install providers without relying on the central registry.
  • Private repository access: Include support for private repositories via GitHub token-based authentication.
  • Enhanced error handling: Provide clear error messages for invalid repositories, network issues, or permission problems.
  • User documentation: Create a documentation with usage examples and authentication guidance.
  • Provider documentation: Create documentation outlining the steps that providers must do in order for OpenTofu to use them.

GitHub repository should follow the specific pattern, that we can describe in this ticket / documentation. This question include:

  • Releases format (binaries name, hash sums name, etc)
  • Documentation format

Example of usage:

jq = {
  source = "github.com/dotemacs/jq"
  version = "~> 1.0"
}

After the implementation of this feature, and official registry will be just an additional source, where providers/modules can be taken. Basically, GitHub provider repository should support the provider by themself (binaries, hashsums, documentation, examples, etc). As a good example, we can create a provider example with CICD automation.

Why this feature is a mandatory

This feature is crucial for equitable access to provider resources and is aligned with OpenTofu’s commitment to openness and flexibility. Enforcing a registry-only installation method imposes unnecessary restrictions, particularly when modules already enjoy GitHub-based installations. By enabling GitHub installations for providers, we remove centralized control over provider access, allowing OpenTofu to better serve users across varied political and security landscapes.

Open questions

  1. Will be it better to allow to specify URL to binary + URL to hashsum instead of GitHub? E.g.: GitHub can be also blocked somewhere.
  2. Terraform supports bundles. I think that this question should be moved to another issue, but it's a really important mechanism for production installation that also solve the issue with unaccessible or blocked resources like registry, github, etc. Probably, opentofu bundle should support a downloading from multiple sources like github / registry / private registry / etc. Also, I think, that the bundle can be moved to a separate project.

Misc

Terraform now supports provider rewrites, allowing you to download the binary and configure a rewrite—essentially just syntactic sugar to simplify these manual steps.

References

@kvendingoldo kvendingoldo added enhancement New feature or request pending-decision This issue has not been accepted for implementation nor rejected. It's still open to discussion. labels Oct 28, 2024
@abstractionfactory
Copy link
Contributor

Hello and thank you for this issue! The core team regularly reviews new issues and discusses them, but this can take a little time. Please bear with us while we get to your issue. If you're interested, the contribution guide has a section about the decision-making process.

In the mean time, if anyone wants to express their support for this feature, please upvote this issue.

@apparentlymart
Copy link
Contributor

apparentlymart commented Oct 28, 2024

None of the following comment is an official statement on behalf of the OpenTofu core team, and regardless of that I don't intend the following as an argument for or against this proposal and am intending only to share some context around it.


OpenTofu is already "decentralized" in the sense that it can install providers from any service that implements the provider registry protocol. The OpenTofu project runs a significant implementation of that protocol, but as far as the OpenTofu client software is concerned it speaks the same protocol as any other provider registry would.

In principle, GitHub already could implement this protocol to expose an OpenTofu-compatible provider registry on its own domain. This sort of thing was already done by some other similar products, such as gitlab.com offering a module registry (though not currently a provider registry, at the time of writing) which OpenTofu's module installer can interact with.

Viewed through the lens of what is already available, this proposal takes a slightly different shape: it asks the OpenTofu project to unilaterally add a special case for the github.com domain which causes it to be treated differently (using a custom implementation rather than the existing registry protocol) regardless of any OpenTofu service discovery document that GitHub might choose to publish in future, thereby preventing GitHub from offering OpenTofu provider registry services themselves if they choose to do so.

Of course we can only speculate about how likely it is for that to happen! It might be justified to do this in a similar way to how OpenTofu's module installer already has a special case for addresses starting with github.com/, but I'd note that the similar historical special case for gitlab.com already interferes with GitLab's attempt to run a module registry by causing OpenTofu's handling of gitlab.com/ addresses to be ambiguous. (Those intending to use gitlab.com's registry must get the syntax just right to ensure that it gets handled as a normal module registry address rather than as the gitlab.com special case, and will get confusing errors if they make a mistake.)

Therefore I think if we were to proceed with this we should at least consult with GitHub first to try to determine whether they are comfortable with us effectively taking control of provider installation on their domain. Perhaps they'd prefer to run such a service themselves (now or in the future) or perhaps they'd prefer us to use a different domain for the special case to make it clearer that the behavior is controlled by us rather than by GitHub themselves, and so any failures are to be reported to this project rather than to GitHub.


I think it's also worth noting that the existing registry.opentofu.org is effectively already an interface to GitHub.com run through a different hostname that is controlled by the OpenTofu project. It's unfortunate that as long as the OpenTofu project runs that adapter layer as an extra network layer on top of GitHub's API, rather than it being provided directly by GitHub somehow, anyone wanting to publish providers needs to navigate the legal constraints and policies of two US-incorporated entities (the OpenTofu project and GitHub) rather than just a single one (GitHub alone).

In that sense I suppose this proposal effectively calls for moving the existing registry functionality to be implemented client-side rather than server-side, so that all of the rules for extracting relevant package metadata from GitHub's API would be embedded in the OpenTofu client rather than in the existing registry's metadata generator and so only GitHub's own policies would control who can publish providers installable by that mechanism.

In particular that would mean that it would become more difficult to evolve that logic in future as GitHub's API changes or as OpenTofu's requirements for provider metadata evolve. It is debatable how likely it is that we will need to make such changes, but nonetheless that's a significant consideration of having this logic for integrating with a third-party system be embedded in the client rather than in a remote network service.


With all of that said... thanks for starting this discussion. It's definitely an interesting bag of tradeoffs (both technical and non-technical) to weigh and consider.

In the meantime, I'd like to remind all readers that OpenTofu is already designed to be decentralized and support any number of competing provider and module registry services. While I certainly wouldn't try to argue that running your own registry is easy enough to compete with special-cased client support in the OpenTofu executable, it is something you can already do today if you are unable to use the OpenTofu-project-run registry for any reason.

@ksemele
Copy link

ksemele commented Oct 28, 2024

I think this is a great idea.
It reminds me of an idea from the FluxCD source-controller API.
In their case, you can use GitRepository, OCIRepository, or HelmRepository for the same source of charts...

So I think it would be great if we could use "list of registries" or "list of repos with provider code".

@orbitz
Copy link

orbitz commented Oct 29, 2024

Would it be viable for the source attribute to be treated more like a URL with different schemas, so you could do source = "git+ssh://github.com/...."? Maybe there is degraded functionality in some schemas?

@kvendingoldo
Copy link
Author

@orbitz why? It should be http / https, because it will be communication with GitHub releases, not Git repo.

@orbitz
Copy link

orbitz commented Oct 29, 2024

The point being to raise the provider source from the provider protocol and give a foundation for supporting other protocols. Whether it be http/https/git isn't that important. Although if it's HTTP I would expect source to point to the specific release and not the GitHub release page, avoiding GitHub specific implementations details.

@apparentlymart
Copy link
Contributor

apparentlymart commented Oct 29, 2024

These addresses are primarily unique identifiers and only secondarily network addresses. They are intentionally designed to be "symbolic addresses that happen to have a default installation method for usability" to allow for use-cases like the filesystem and network mirrors, which allow telling OpenTofu to install packages from somewhere other than the primary registry their source address seems to imply.

My initial thought about the idea of using URLs is that it would either require us to break the distinction between unique identifier vs. installation strategy, or to create a more confusing situation where we use standard URL syntax but yet use it in a non-standard way.

All of that is to say: we could potentially grow the provider address syntax to also support URLs, but that has some consequences outside the direct scope we're discussing here -- the provider installer -- since OpenTofu uses these addresses also to track provider dependencies between modules, and provider instance addresses in state snapshots between plan/apply rounds.

(I wrote more on this over in #618 recently, in case any readers would like to learn more about what I described above. That issue is discussing some other variations on "make it easier to use other registries", with similar tradeoffs to be made, though I think that issue is different in that it's talking about additional ways for an operator to manually opt out of the default treatment of source addresses, whereas this issue is talking about changing the default treatment of certain provider addresses or about adding new valid provider address syntax that gets treated differently by OpenTofu than the current syntax.)

@Yantrio
Copy link
Member

Yantrio commented Oct 29, 2024

I am quite concerned about the scalability of this with the current mechanisms in place. Right now to fetch all the information that OpenTofu needs (Manifests, Checksums, GPG Keys etc) takes multiple http requests per provider version. This results in minutes taken to just list the versions of the hashicorp AWS provider.

Private registries exist for this exact reason to provide, and maybe cache this information instead of it being generated at runtime.

My personal recommendation would be to run a "public" private provider registry or to fork the OpenTofu registry as it is all public, you just need to bring your own cloudflare account.

@kvendingoldo
Copy link
Author

@Yantrio In your case it means, that it's impossible to use OpenTofu easily without "server" side in a face of registry. Fetching providers from GitHub should solve this issue, e.g. for users that don't want to server a private registry. Think about non-profit project without engineers and experience, they want to get a usability of HCL, but don't want to find a guy who will support private registry, and don't have extra money for infrastructure. In my example, I'm talking about shelters of many other non-profit organizations that getting grants from the cloud, but don't have extra money for the salary.

@kvendingoldo
Copy link
Author

And btw. Packer successfully use GitHub and nobody complain about the high load.

Example:

packer {
  required_plugins {
    docker = {
      source  = "github.com/hashicorp/docker"
      version = "~> 1"
    }
  }
}

@marcinwyszynski
Copy link
Contributor

Just wondering if it were possible to run a private registry on localhost and just proxy to GitHub?

@kvendingoldo
Copy link
Author

Just wondering if it were possible to run a private registry on localhost and just proxy to GitHub?

A perfect idea to speed up build and make the fetching de-centralized

@marcinwyszynski
Copy link
Contributor

This would not require any changes to the OpenTofu core code, would it?

@apparentlymart
Copy link
Contributor

apparentlymart commented Oct 29, 2024

With OpenTofu as it exists today (that is: without adding any new features) I think there are two main options:

  1. Run a provider mirror protocol server on localhost, and use the CLI configuration's Explicit Provider Installation Method Settings to tell OpenTofu that it should install all providers from the local mirror, regardless of which hostname is acting as the top-level namespace in their source addresses:

    provider_installation {
      network_mirror {
        url = "https://127.0.0.1:8443/"
      }
    }

    The existing tofu providers mirror command can populate a directory with files suitable for a minimal implementation of the provider mirror protocol containing the providers needed for a particular configuration, so a minimal implementation of this would be to start up a static web server exposing that directory.

    However, this mechanism was originally designed for use over an untrusted network and so it currently requires using an https URL and for the server to have a valid certificate that's trusted by the system, which is often hard to arrange for localhost. Perhaps we could consider loosening that rule either to allow the user to choose for themselves whether they need TLS (allowing http URLs in all cases) or, if we want to compromise against the existing design we could make an exception specifically for the hosts localhost, 127.0.0.1 and [::1] to permit http URLs. (That compromise would roughly match how web platform features typically carve a hole in TLS requirements for local development purposes.)

  2. Run a server on localhost which implements the provider registry protocol and then use source addresses like localhost/namespace/name (where localhost is therefore filling the slot where registry.opentofu.org might normally go) to specify that the provider should always come from a registry running on localhost.

    This is a somewhat-messier situation because it effectively means that every system where OpenTofu is running can have a different idea of what these localhost-based source addresses represent. The checksums of those packages would be recorded in the dependency lock file using the localhost-based source addresses, so using the same configuration on a different computer with different packages in its mirror would detect checksum errors during tofu init.

    It also suffers the same constraint as the previous point: the provider registry protocol begins with Remote Service Discovery which uses a Well-known URI pattern which always uses HTTPs. Therefore using this successfully would either require having a valid localhost certificate on the local registry or using the undocumented CLI configuration host block to force OpenTofu to skip the service discovery step and just use a hard-coded set of service endpoints:

    # The following CLI configuration block was intended for OpenTofu
    # development only and so isn't documented, but is available for
    # use nonetheless. If you _do_ choose to use it, keep in mind that
    # any URLs copied from another host's service discovery document
    # are subject to change at any time, because vendors providing
    # OpenTofu services use the service discovery system to abstract
    # away exactly where the services are provided.
    # Overriding it to refer only to URLs _you_ directly control is not
    # dangerous, though.
    host "localhost" {
      services = {
        "providers.v1" = "http://127.0.0.1:8080/providers/"
      }
    }

    Note that although the initial service discovery request is required to be over HTTPS due to the "well-known URI", there are no such restrictions on the per-service URLs in the resulting discovery document or in an overridden discovery document configured like I've shown above.

    However, if you're going to use non-default CLI configuration settings anyway then it's probably more straightforward to use the provider mirror protocol I described in the previous point. I would assume that the main advantage of running a separate registry, rather than just a mirror of another registry, is that OpenTofu will attempt to access it by default if a source address in a required_providers block includes the registry's hostname.

    (It's also technically possible to override registry.opentofu.org's own services using a host "registry.opentofu.org" block, and so you could use the CLI configuration to tell OpenTofu a different base URL for that host's providers.v1 service. If you do that then this new endpoint would be used for providers whose source has registry.opentofu.org, but you should then make sure that the packages you offer there have identical checksums to the one on the official registry to avoid recording mismatching checksums in the dependency lock file that might cause confusion later.)

  3. If the service would've been running on localhost anyway then perhaps it's simpler overall to just use a filesystem mirror, which avoids the need to run any network services and instructs OpenTofu to just search a specific local filesystem directory for plugins:

    provider_installation {
      filesystem_mirror {
        path = "/var/opentofu/providers"
      }
    }

    The specified directory must contain files and directories following one of two supported layouts, as documented in the relevant part of the CLI configuration docs.

I've described these existing options only to further the above discussion about what local-only solutions might be possible in today's OpenTofu. I don't intend any of it as an argument for or against adding new features to meet the use-case that this issue represents.

@orbitz
Copy link

orbitz commented Oct 30, 2024

My initial thought about the idea of using URLs is that it would either require us to break the distinction between unique identifier vs. installation strategy, or to create a more confusing situation where we use standard URL syntax but yet use it in a non-standard way.

Yes, you're right. In reality I think I'm proposing using URIs or URNs (I don't know, all the naming gets confused), but using the URI notation which is pretty expressive and flexible and familiar to be able to hide a bunch of details. But maybe I don't know them well enough such that even what I''m proposing is a bit of an abuse. But I think the underlying ask is to be able to express: "I know exactly which provider I want, and from where, so just let me specify that".

@apparentlymart
Copy link
Contributor

apparentlymart commented Oct 30, 2024

Hi @orbitz,

I think the main tension we need to think about is that in OpenTofu providers are a global idea: all modules must be able to agree on which providers they are using because provider configurations can be shared between modules.

The current design intentionally separates the identifier specified in source from the details of how that thing gets installed so that it's possible to use a mixture of third-party and first-party modules together in an environment where the origin registry of some of the needed providers is not directly accessible. That is, above all else, the main constraint that the current system was designed to meet.

Allowing the specification of an exact location directly in the source argument could be okay if all of the modules in your environment are first-party and all agree on a physical location to use.

If you later wanted to change that physical location, you'd need to update all of your modules at once to all agree on the new source location, but that's at least doable if you control all of them. However, you'd also need to make sure to tofu apply immediately after changing all of the source locations so that OpenTofu can create a new state snapshot using the new provider addresses, because some operations rely on the provider addresses tracked in the state rather than the ones in the configuration.

You would not be able to mix those first-party modules with any third-party modules because the third-party modules would probably not agree on which source addresses to use.


In an older blog post about similar challenges that arose due to module source addresses being a mix of physical location and unique identifier (written about this project's predecessor, but still potentially applicable to OpenTofu) I thought a little about a new separate file for specifying "dependency overrides", which I think could also be a candidate design for the problem we're discussing here.

I had imagined the "Dependency Override File" as something you could optionally include in version control alongside your root module, and therefore something you could distribute along with your main source code rather than configured out-of-band in the CLI configuration file.

In the example there I was mainly motivated by module registry addresses and so I didn't dwell very much on how it might work for provider addresses, but one specific way that could work is to allow the provider_pkg block to specify exactly where to find the packages to satisfy a particular source address:

provider_pkg "registry.opentofu.org/apparentlymart/assume" {
  version = "0.1.0"

  binary_package_urls = {
    linux_amd64   = "https://github.com/apparentlymart/terraform-provider-assume/releases/download/v0.1.0/terraform-provider-assume_0.1.0_linux_amd64.zip"
    linux_arm64   = "https://github.com/apparentlymart/terraform-provider-assume/releases/download/v0.1.0/terraform-provider-assume_0.1.0_linux_arm64.zip"
    darwin_arm64  = "https://github.com/apparentlymart/terraform-provider-assume/releases/download/v0.1.0/terraform-provider-assume_0.1.0_darwin_arm64.zip"
    windows_amd64 = "https://github.com/apparentlymart/terraform-provider-assume/releases/download/v0.1.0/terraform-provider-assume_0.1.0_windows_amd64.zip"
  }

  # Optional to allow OpenTofu to still verify the checksums on install and record
  # them in the dependency lock file.
  checksums_url = "https://github.com/apparentlymart/terraform-provider-assume/releases/download/v0.1.0/terraform-provider-assume_0.1.0_SHA256SUMS"

  # (is there some way to bring the gpg verification into here too? 🤷🏻‍♂️)
}

One way to think of this is that this file represents a sort of partial "registry" of provider and module packages that you can include directly in your source repository, so that the included packages can be installed directly from wherever they are and bypassing any network-based registry or other configured installation strategy,

It still retains the registry.opentofu.org/apparentlymart/assume address as an anchoring identifier to associate those packages with, and so I believe this is something that could be handled entirely by tofu init's provider installer without any significant impact to the rest of the system: the installer would "just" use the information from the matching provider_pkg block in place of what it would normally learn from the registry protocols.

I wrote it out with an exact version number and literal URLs above for simplicity's sake, but I expect instead folks would want to write out a rule for generating source URLs based on a version number and platform, so we could potentially reduce it to this:

provider_pkg "registry.opentofu.org/apparentlymart/assume" {
  # Optional version constraint limiting which versions of this provider
  # this override block applies to. If this were omitted then it would
  # apply to _all_ versions of this provider.
  version = ">= 1.0.0"

  binary_package_url = "https://github.com/apparentlymart/terraform-provider-assume/releases/download/v${pkg.version}/terraform-provider-assume_${pkg.version}_${pkg.platform}.zip"
  checksums_url      = "https://github.com/apparentlymart/terraform-provider-assume/releases/download/v${pkg.version}/terraform-provider-assume_${pkg.version}_SHA256SUMS"
  # ...
}

This simplified form does admittedly mean that a 404 Not Found from the generated URL could now represent various different problems -- "no such version", "valid version but unsupported platform", "the URL template you specified is just wrong", ... -- and so OpenTofu's error messages would probably be worse in this case than in the install-from-registry case, but it's probably okay to assume that anyone who is going to this much effort to manually specify package locations will have enough awareness of what they've set up to debug it if any errors are returned.

(This "rule for generating package metadata" approach does mean that there would not be any definitive list of all of the available versions of any package, which is a problem we'd need to solve since the existing installer relies on the ability to find the latest available version matching a version constraint. Perhaps the compromise would be that the version attribute is actually required after all, and must include both a lower and upper bound so that OpenTofu can assume that the upper bound is the newest available version. I'm not sure 🤷‍♂️ )

I dunno if this idea is even on the right track, but I'm just trying to think aloud about different compromises we could make to give more options for plugin installation while limiting the changes only to the plugin installer itself, without significantly redesigning the rest of the system.

@apparentlymart
Copy link
Contributor

apparentlymart commented Oct 31, 2024

I saw the mention of Packer's direct use of GitHub above, and so out of curiosity I investigated an older version of Packer that was released under MPL 2.0 license so we can consider how that might translate into OpenTofu assuming we did decide to go ahead with implementing a GitHub-specific installation strategy in the OpenTofu client.

The main interesting thing I found is the specific API URL patterns Packer uses for different request types.

Specifically,

  • To implement "list available versions" Packer uses List Matching References to enumerate all of the tags in the repository. It seems to just assume that any tag in the repository is a valid release.

    This particular GitHub API does not seem to limit the number of items in the response and thus require pagination arguments -- or at least, the docs don't mention it -- so I guess this can potentially scale to any number of tags without needing to perform additional requests: the response will just get gradually larger over time, just like the OpenTofu registry API for fetching all of the available versions.

  • To implement "get a binary package" Packer uses the releases download API with a client-side-constructed filename. If there is no available release artifact with the generated name associated with the tag then it fails.

    It's not clear to me how this avoids reporting an installation error in the window of time between the creation of a tag and its associated "Release" and artifacts being published, since it never actually interacts with the GitHub Releases API directly. 🤔

    Edit: Based on some bug reports I see in the Packer repository, it seems like Packer's behavior is to keep trying increasingly-older versions from the list of tags until one of them succeeds or it runs out of versions to try. Since a package should presumably only be missing from the newest release during the narrow window in its release process, Packer presumably just ignores the not-yet-completed release and installs the previous version instead.

  • Getting the official checksums for a plugin works similarly to getting a binary package, with a different locally-generated artifact filename.

One meta thing I noticed is that Packer defines an environment variable PACKER_GITHUB_API_TOKEN where it expects the user to provide a GitHub API token client-side so that they can benefit from the higher rate limits for authenticated requests. Therefore each Packer client effectively has a private rate limit, unlike OpenTofu's registry where a single client is making asynchronous batch requests on behalf of everyone who uses the registry.

It's an interesting design!

@kvendingoldo
Copy link
Author

OCI can also be a good idea (#1672), preparing a plan now.

@apparentlymart
Copy link
Contributor

Indeed, supporting OCI as an alternative protocol for provider package installation seems like a nice compromise since it's a service that some organizations might already be running. Let's make sure to discuss that over in #1672 though, so that the folks who are already watching that issue can follow the discussion there.

Thanks!

@kvendingoldo
Copy link
Author

supporting OCI as an alternative protocol for provider package installation seems like a nice compromise

And btw., if OCI will be implemented, people will able to use GHCR as source.

@abstractionfactory
Copy link
Contributor

Hey folks, thank you for the lively discussion on this issue. We have discussed this, as well as the OCI issue in the core team and working on OCI support would enable a wide range of platforms for hosting providers including in an air-gapped setup, while integrating GitHub Releases would only enable GitHub. In contrast, all provider authors on GitHub can use OCI (via ghcr.io) as indicated, so only supporting OCI does not detract from their ability to publish independently of the OpenTofu Registry.

Since we are fairly far into creating a robust OCI implementation and writing a spec for both providers and modules, I'm going to close this issue. Please keep tracking the OCI-related issues (#1672, primarily) for more details on the implementation.

@abstractionfactory abstractionfactory closed this as not planned Won't fix, can't repro, duplicate, stale Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request pending-decision This issue has not been accepted for implementation nor rejected. It's still open to discussion.
Projects
None yet
Development

No branches or pull requests

7 participants