Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the stack concept #931

Open
lapwingcloud opened this issue Nov 28, 2023 · 30 comments
Open

Implement the stack concept #931

lapwingcloud opened this issue Nov 28, 2023 · 30 comments
Labels
blocked Issues which are blocked by inbound dependencies pending-decision This issue has not been accepted for implementation nor rejected. It's still open to discussion. rfc

Comments

@lapwingcloud
Copy link

OpenTofu Version

OpenTofu v1.6.0-alpha5
on linux_amd64

Use Cases

Stack a way to split monolith infrastructure configurations to smaller subsets, each subset of configurations manages the corresponding subset of the infrastructure, which has the following benefit

  • reduce blast radius
  • faster to apply
  • less api costs
  • make collaboration easier (like less merge conflicts)

Terraform announces they will be adding the stack in their official implementation https://www.hashicorp.com/blog/terraform-stacks-explained

Attempted Solutions

It already be resolved with different custom implementations, e.g. we can simply use folders as stacks, and have different variables for different environments

├── service-bar
│   ├── database.tf
│   ├── ec2.tf
│   ├── elb.tf
│   └── vars
│       ├── all.tfvars
│       ├── production.tfvars
│       └── staging.tfvars
└── service-foo
    ├── database.tf
    ├── ec2.tf
    └── elb.tf

This will usually require a custom wrapper to tofu or terraform to pass the different vars when running the plan and apply

This customization logic also implemented in several third party tools including terragrunt, terramate etc.

Proposal

all the third party or customized implementation works great, but since terraform is going to standardize it I'd also like to see it being standardized in tofu so we can be more consistent and focus on 1 tool (tofu) itself for scaling the configurations for large projects

References

No response

@lapwingcloud lapwingcloud added enhancement New feature or request pending-decision This issue has not been accepted for implementation nor rejected. It's still open to discussion. labels Nov 28, 2023
@ghost ghost added rfc blocked-until-stable-release and removed enhancement New feature or request labels Nov 28, 2023
@ghost
Copy link

ghost commented Nov 28, 2023

Hey @lapwingcloud thank you so much for this suggestion! We are currently working towards the 1.6 stable release. Any additional features will have to wait until that release is finished.

@gtirloni
Copy link

Maybe this can wait until Hashicorp actually delivers the feature so OpenTofu can learn what works or not with the stack and inform any technical specifications.

@ghost ghost added blocked Issues which are blocked by inbound dependencies and removed blocked-until-stable-release labels Jan 29, 2024
@skyzyx
Copy link

skyzyx commented Jul 9, 2024

This sounds very much like how Terragrunt can be used. I, for one, would love to see more of Terragrunt's functionality implemented natively in OpenTofu.

@muandane
Copy link

Would love to see this in tofu instead of having to opt for a terraform/opentofu wrapper.... terraform are actually working on this but might be only for terraform Cloud using the rpcapi command

@nikolay
Copy link

nikolay commented Oct 25, 2024

Terraform Stacks is the main stopper from me fully embracing OpenTofu as Stacks really solve a lot of real-world Terraform headaches!

@gabops
Copy link

gabops commented Nov 11, 2024

Having this added to OpenTofu would be fantastic not only because of the benefits in terms of functionality but also because, at least in my case, would allow me to ditch Terragrunt entirely (tool I never liked TBH). However, after trying to understand how this works in Terraform, it seems that it's a feature completely coupled to their cloud solution (I didn't find any tutorial nor "stack" subcommand to be used locally). If this is the case, it's a question of defining the workflow and not only just the implementation of the logic. Just to clarify, I guess the intention is to implement something like? 🤔:

tofu stack plan
tofu stack plan -component foo
tofu stack deploy
...

Cheers!

@abstractionfactory
Copy link
Contributor

Hey folks, please make sure to upvote the issue above if this is of interesting to you.

@EdanBrooke
Copy link

I am watching this issue closely as I, too, would like to simplify our IaC pipelines and preferably move away from Terragrunt.

We're still trying to choose between TF & Tofu. TF Stacks seems to be a HCP feature and we don't want to go down that route so I'm hopeful that Tofu can deliver in this regard.

Thank you OpenTofu maintainers for your work.

@muandane
Copy link

I am watching this issue closely as I, too, would like to simplify our IaC pipelines and preferably move away from Terragrunt.

We're still trying to choose between TF & Tofu. TF Stacks seems to be a HCP feature and we don't want to go down that route so I'm hopeful that Tofu can deliver in this regard.

Thank you OpenTofu maintainers for your work.

I forgot where i saw this info, but stacks are not going to be ported probably to terraform outside of HCP because the Stacks feature isn't run exactly by terraform but with the automation surrounding it in HCP, so the road to Tofu stacks might be a bit further than we think since such automation would require a new mechanism to be added into the code base

@abstractionfactory
Copy link
Contributor

Nothing would prevent us from adding a stack concept to OpenTofu itself. Where I see the problem is OpenTofu's internal state. In my mind, a stack concept would involve applying one project, then applying a next project, and so on. This currently cannot be done inside a single OpenTofu instance because OpenTofu has a lot of internal state. However, if we were to get rid of the internal global state, nothing would stop us from doing something like this. As a side-effect, this may also make OpenTofu usable as a Go library with a bit of added work.

That being said, we are talking about refactoring 300k lines of code here. I would like to understand why Terragrunt is not suitable as a tool before jumping into any potential discussions about the implementation. As someone who never used Terragrunt, it seems to get the job done and simply merging that functionality into OpenTofu just for getting rid of an extra tool to use seems like a poor use of resources.

@gabops
Copy link

gabops commented Nov 11, 2024

Nothing would prevent us from adding a stack concept to OpenTofu itself. Where I see the problem is OpenTofu's internal state. In my mind, a stack concept would involve applying one project, then applying a next project, and so on. This currently cannot be done inside a single OpenTofu instance because OpenTofu has a lot of internal state. However, if we were to get rid of the internal global state, nothing would stop us from doing something like this. As a side-effect, this may also make OpenTofu usable as a Go library with a bit of added work.

That being said, we are talking about refactoring 300k lines of code here. I would like to understand why Terragrunt is not suitable as a tool before jumping into any potential discussions about the implementation. As someone who never used Terragrunt, it seems to get the job done and simply merging that functionality into OpenTofu just for getting rid of an extra tool to use seems like a poor use of resources.

The three reasons that come to my head for preferring to get rid of Terragrunt are:

  1. It forces you to pollute your project with a bunch of terragrunt.hcl files spread on a directory tree, forcing you to adopt certain practices (e.g project layout) to a point where one ends up managing a Terragrunt project rather than just a Terraform/OpenTofu project that uses a 3rd party runner.

  2. It's a 3rd party dependency which is an extra moving part with its own complexity and shenanigans. Also, another element to master and to teach to someone who joins the team.

  3. In big and complex projects, Terragrunt is a PITA to debug and quite slow.

Also, for what I've read in the documentation, I personally find Terraform's approach way better thought and simpler (just a bunch of files in your project's root directory: components.tfstack.hcl, deployments.tfdeploy.hcl etc), although to be fair, I haven't seen a complex use case and as someone mentioned earlier this functionality seems to rely on HCP and not in Terraform CLI itself.

@abstractionfactory
Copy link
Contributor

I wonder if this would be something to take up with the Terragrunt folks first and at the very least discuss how some of the frustrations can be addressed. They are an open source project and support OpenTofu. Us reimplementing a large part of what they are doing really seems like a wasted opportunity.

@gabops
Copy link

gabops commented Nov 12, 2024

I wonder if this would be something to take up with the Terragrunt folks first and at the very least discuss how some of the frustrations can be addressed. They are an open source project and support OpenTofu. Us reimplementing a large part of what they are doing really seems like a wasted opportunity.

I guess that if implementing that functionality is that hard, we don't have other choice although it's impossible for the Terragrunt team to drastically change their implementation design.

Another option would be that the OpenTofu project could built its own tool (with casinos, and hookers!) that implements Terraform Stacks and call it via subcommand? This introduces a new tool which contradicts the second point I made earlier but we would still get a simpler and more cohesive tool. I don't know. Just throwing out ideas.

@EdanBrooke
Copy link

Nothing would prevent us from adding a stack concept to OpenTofu itself. Where I see the problem is OpenTofu's internal state. In my mind, a stack concept would involve applying one project, then applying a next project, and so on. This currently cannot be done inside a single OpenTofu instance because OpenTofu has a lot of internal state. However, if we were to get rid of the internal global state, nothing would stop us from doing something like this. As a side-effect, this may also make OpenTofu usable as a Go library with a bit of added work.
That being said, we are talking about refactoring 300k lines of code here. I would like to understand why Terragrunt is not suitable as a tool before jumping into any potential discussions about the implementation. As someone who never used Terragrunt, it seems to get the job done and simply merging that functionality into OpenTofu just for getting rid of an extra tool to use seems like a poor use of resources.

The three reasons that come to my head for preferring to get rid of Terragrunt are:

  1. It forces you to pollute your project with a bunch of terragrunt.hcl files spread on a directory tree, forcing you to adopt certain practices (e.g project layout) to a point where one ends up managing a Terragrunt project rather than just a Terraform/OpenTofu project that uses a 3rd party runner.
  2. It's a 3rd party dependency which is an extra moving part with its own complexity and shenanigans. Also, another element to master and to teach to someone who joins the team.
  3. In big and complex projects, Terragrunt is a PITA to debug and quite slow.

Also, for what I've read in the documentation, I personally find Terraform's approach way better thought and simpler (just a bunch of files in your project's root directory: components.tfstack.hcl, deployments.tfdeploy.hcl etc), although to be fair, I haven't seen a complex use case and as someone mentioned earlier this functionality seems to rely on HCP and not in Terraform CLI itself.

Terragrunt v1.0 is on the horizon but I can't seem to find any projected release dates on their roadmap.
This promises "Terragrunt Stacks" which seems to address the terragrunt.hcl pollution you referred to.

Maybe the best way to approach this is to let OpenTofu do what OpenTofu does brilliantly and let Terragrunt help orchestrate the deployment of the OpenTofu modules for different projects or environments?

I've been conversing with the Terragrunt team over on this issue regarding their new Terragrunt Stacks proposal: gruntwork-io/terragrunt#3313 (comment)

@yhakbar
Copy link
Contributor

yhakbar commented Nov 12, 2024

Hey folks!

Terragrunt maintainer here.

First, I want to say that I want to be mindful of taking up too much space in OpenTofu forums talking about Terragrunt. The OpenTofu team has quite a lot on their plates already, and worrying about how people feel about third party tools can take up a lot of headspace! We've recently been putting a lot of effort into making sure that there's better forums for the Terragrunt community to discuss and learn from each other, so if you want to discuss things that are explicitly Terragrunt, please feel free to visit one of these two and get a conversation going there:

We know that we could have been doing a better job at that historically, and we're trying to step up our efforts at fostering a healthy community.

Second, I want to make it explicit that Gruntwork, and particularly the maintainers of Terragrunt, are strong OpenTofu advocates and want it to succeed. We want it to be the best tool possible, and meet directly with OpenTofu maintainers to try to improve OpenTofu, even when features we are pushing for might make Terragrunt less necessary for certain users. A specific example of this is how we're coordinating with the OpenTofu team on #1483, which makes Terragrunt Provider Caching less necessary. We're not trying to push Terragrunt use because it fills in gaps that OpenTofu could sensibly provide on its own, but because it adds capabilities on top of OpenTofu that improves the experience of managing IaC at scale.

All that being said, I personally think OpenTofu taking on Stacks wouldn't be the best use of resources for the OpenTofu team. There's plenty for OpenTofu to address in the way OpenTofu works today that doesn't require building out an extra tool for its own implementation of Stacks, orchestrating OpenTofu executions across multiple pieces of state. Terragrunt users have been getting that for free for a long time.

The Terragrunt Stacks RFC is designed to address that drawback of terragrunt.hcl file sprawl, while keeping the tool backwards compatible for users that already have Terragrunt configurations they don't want to rewrite. It's not a new binary, or a dependency on some proprietary service or anything. It's just a way of representing existing Terragrunt configurations more succinctly using the same open source tool. I wrote a long walkthrough that breaks this down, and has a simple bash script to mock out the behavior so that you can get a feel for what Terragrunt Stacks will be like. We also have more planned for improving the ergonomics of Terragrunt Stacks and the overall experience of using Terragrunt to orchestrate OpenTofu, and are always looking for feedback.

If you don't mind a lot of terragrunt.hcl files, however, most of what I've heard regarding how Terraform Stacks work (I haven't tried it, though, and don't know anybody in real life that has, TBH) can be achieved using Terragrunt today. Gruntwork has customers successfully managing massive IaC estates using Terragrunt and OpenTofu without any notion of Stacks.

There's a lot planned for Terragrunt 1.0, and if Terragrunt users feel like it doesn't meet their needs today or as described in that blog post, we want to hear about it. The reason we aren't announcing a release date is that we want to give time to change things, as a result of community feedback, before we land on a stable interface for a 1.0 that we plan on supporting for a long time with no breaking changes. We're also going to be building things out in the open and gradually, so users won't suddenly manage their infrastructure differently all at once after a specific 1.0 release.

Hopefully this gives some insight as to how we're seeing this issue on the Terragrunt side!

@abstractionfactory
Copy link
Contributor

Thanks for your input @yhakbar, I'm hoping we can avoid duplicating effort here.

@nikolay
Copy link

nikolay commented Nov 12, 2024

@yhakbar Any rough idea when v1 would be out? "A lot planned" sounds like "many months" if not "many years" in the future.

@yhakbar
Copy link
Contributor

yhakbar commented Nov 13, 2024

@nikolay No projected timeline right now. Definitely not "many years", though.

If you have any concerns preventing you from adopting Terragrunt today, you don't have to wait for 1.0! Share your feedback on Terragrunt forums mentioned above, and we'll take your feedback seriously.

@nikolay
Copy link

nikolay commented Nov 13, 2024

@yhakbar So, over one year for sure. Thanks!

@abstractionfactory
Copy link
Contributor

abstractionfactory commented Nov 13, 2024

@nikolay I have specifically asked @yhakbar to jump on this thread and help us out since I know very little about Terragrunt. He or his colleagues are under no obligation to provide us with support, so let's thank them for their valuable time by leading a constructive conversation. Criticism is ok, but snarky comments don't help foster a positive atmosphere. If you are up for it, I would like to ask you to remove your comment and I will do the same with mine.

@gabops
Copy link

gabops commented Nov 13, 2024

In an attempt to try to keep this conversation fruitful I have to say that, despite Terragrunt aspirations and promises (which, don't get me wrong, I'm grateful for), I still think that is a good idea that OpenTofu should provide a native way of handling stacks (or at least, a way of controlling the order and parallelization of "backend grouped resources", call it "stacks" or whatever). Relying on a 3rd party tool for such a common pattern on medium/large serious projects feels wrong. I think we have a great opportunity here for improving the usage of OpenTofu.

@marziply
Copy link

@gabops I completely agree. With all due respect to the Terragrunt team, I don't want to depend on third party tooling for a feature that I believe should definitely be native to OpenTofu.

@abstractionfactory
Copy link
Contributor

Thank you everyone for the input, it is very valuable. As outlined above, I believe this is quite tricky to do in the OpenTofu codebase today as much of the data is globally scoped. Let's wait for input from the folks who know the codebase better, but my impression is that a bit of refactoring is in order until stacks can be attempted natively in OpenTofu.

@nikolay
Copy link

nikolay commented Nov 14, 2024

@abstractionfactory Let's not go a wild goose chase, please! I asked for a timeframe, and based on the answer, it seems that v1 is at least a year in the future and many of us need a solution soon. Similarly to @gabops and @marziply, I never want to depend on third-party tools (Terragrunt, Terramate, etc.), which also use an approach that I believe is a patchwork around HashiCorp's Terraform limitations and years of unwillingness to fix them. Many of the problems Terragrunt "solves" will be soon solved by OpenTofu, for example, dynamic providers, allowing variables and locals everywhere, etc. So, honestly, it's better to focus all efforts on solving the long-standing Terraform issues with OpenTofu rather than relying on third-party tools providing patchworks, which tools also don't seem to have the same dedicated resources and community to move fast and deliver something much better than Terraform Stacks, which also seems to be a separate tool, which they purposely kept separate to lure customers into their Captain Cook's RUM-infused product and not solve the Terraform issues so people are pressed to pay for Terraform Cloud. It could seem like a conspiracy theory, but it's more like Occam's Razor given the switch to BUSL and the acquisition by IBM.

@abstractionfactory
Copy link
Contributor

abstractionfactory commented Nov 14, 2024

@nikolay thank you for the input, your perspective is very helpful. Could you elaborate on what you would be using stacks for? I'm interested in the specific problem you would like to resolve mainly because there are things we can do relatively quickly and there are things that are extremely hard to do.

Specifically, for example configuring providers from data sources may be possible with some work, while running multiple apply operations for different projects in parallel, or even sequentially from the same tofu binary is and will for the foreseeable future be impossible due to the accumulated technical debt in the codebase. In other words, the latter will, for the next 1-2 years, only be possible with an external tool, even if we were to maintain that tool ourselves. (Happy to discuss technical details on a separate discussion thread or in Slack.)

@DavidGamba
Copy link

Implementing a thin wrapper around the existing terraform/tofu binary to implement the stacks feature is "fairly" easy but as mentioned before it requires multiple invocations of the command.

There a few things that stand in the way of a "very" easy thin wrapper implementation, which maybe should be fixed first, so that it becomes trivial.

  1. Running multiple instances of Terraform init causes race conditions when writing to the plugin cache, so those init calls need to be run in serial for each component in the stack. The benefit of the plugin cache is that if a provider is found it will not re-download it unlike the terraform provider mirror command.

Once you have your providers in the cache I have had no issues running init in parallel otherwise. In CI I only rebuild that cache when I see changes to the lock files.

  1. Running two workspaces of the same component in parallel results in a race condition to the .terraform/terraform.tfstate file.
    When running stacks, I have to make sure to use a different TF_DATA_DIR for each workpace.

  2. Obviously use the TF_WORKSPACE env var rather than doing a terraform workspace select.

@nikolay
Copy link

nikolay commented Nov 14, 2024

@abstractionfactory I will try to distill and explain the challenges and how I worked around them with pure Terraform and HPC Terraform. At one point I was thinking that all this could be handled by a "workflow" provider, which runs Terraform as a provider within the provider. Crazy, right?

@abstractionfactory
Copy link
Contributor

Hey folks, please take a look at #2221 as well as #2222 and cast your votes since these came up during a discussion and are closely related to this topic.

@nikolay
Copy link

nikolay commented Nov 28, 2024

@abstractionfactory Both are great, but taking the Terragrunt approach in the OpenTofu Pipes is a bit dangerous unless there's a state lock for the inputs until the dependent process is done. Otherwise, you may get inconsistent results. I didn't know how dependencies in Terragrunt work and I thought that it just generates a bunch of terraform_remote_state obejcts but it does what the Pipes proposal does, and I got disappointed a little bit.

Anyway, this would work, too, but I would suggest adding something simple to OpenTofu - types and the ability to type outputs as well. IRL, outputs could produce really complex data structures, so, it was fine to have dynamic typing before Stacks/Pipes, but with that in mind, I think outputs should be typed, too, and types should be reused, but that could be accomplished by symlinking for starters. So, something like types.tofu could be used to type the outputs and the variables (inputs) of the consuming workspace via, initially, a symlink. Later, there could be an import/include feature.

One other reason for the need of types is also for the defaults they carry. One big issue with Terraform Cloud (no, I won't call it HCP Terraform) is that if you have terraform_remote_state and that state doesn't exist, you get an error - even if you have defaults specified! This forces you to modify the code just to make interconnected things work! With types, you don't have to repeat all this as the types will have optional() with defaults or just default value overall. Types also could be nested in another types - right now, often I have to repeat the same type definition multiple times!

Also, with terraform_remote_state, you can also do for_each - with fixed inputs, this can't be done.

The refactor feature is great as it really requires skills to extract resources from state although recently, I started to use imports and state rm's.

@nikolay
Copy link

nikolay commented Nov 28, 2024

@abstractionfactory I added a types proposal: #2230

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked Issues which are blocked by inbound dependencies pending-decision This issue has not been accepted for implementation nor rejected. It's still open to discussion. rfc
Projects
None yet
Development

No branches or pull requests