-
Notifications
You must be signed in to change notification settings - Fork 907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question regarding lock mechanism #2169
Comments
Hello @BenjaminDecreusefond thank you for the issue. I think this may be closely related to other locking-related issues, but I'll let people more knowledgeable than me in this area say more about that. In the mean time, I've queued up this issue for the core team to discuss, please bear with us until we get to this issue. |
Hi @BenjaminDecreusefond! Thanks for starting this discussion. From the perspective of OpenTofu CLI, the state locking mechanism is already expecting the locks to be separate for each workspace, because the locking API belongs to what we call a "state manager" and each one of those manages the state for only one workspace. The backend API includes a method which takes a workspace name and returns a state manager, and then OpenTofu calls the lock method on that state manager. Of course, that doesn't necessarily guarantee that the underlying state manager implementation will also treat them as separate: it's possible to implement this API in a way that acquires a broader lock so that a lock on any workspace effectively locks them all. You mentioned that you are using the Is it possible that the cross-workspace locking is something that Terrakube is doing in its server, rather than something OpenTofu is doing client-side? I'm not familiar with Terrakube so I don't know if this is true, but just want to try to understand exactly where this constraint is coming from so we can figure out what it would take to weaken it in the way you described. |
Putting the question of lock granularity aside for the moment, I think there is another potential improvement we could consider: Today's state locking model is a single advisory lock that only one client can hold at a time, regardless of what they are intending to do with the state. In principle we could rework the locking API to differentiate between read-only (shared) locks and writable (exclusive) locks. A suitably-advanced state storage implementation could then choose to allow multiple read-only locks to be held at the same time, but to treat writable locks as exclusive both to other write locks and to other read-only locks. We could then arrange for OpenTofu to acquire a writable lock only for the commands that will both read and write state during their work, which includes (but isn't necessarily limited to):
In particular, I think it should be safe for commands like In the saved plan workflow, OpenTofu saves enough information in the plan file to detect if the state has changed since the plan was created, so there is no need to hold a writable lock across both the plan and apply phases if you are willing to tolerate an error when someone tries to apply a stale plan. I think we would still need to give concrete implementations the option of treating all locks as exclusive, because not all state storage implementations have a good way to implement a read-write lock, but if we include that information in our lock requests then each implementation can presumably decide whether or not to differentiate between the two lock types. If we decide that this is worth pursuing then we should probably include it in #2157. In particular, hopefully we'd design it into the new plugin-based state storage API from the outset so that we don't need to make a breaking protocol change. |
Yep ! I think I totally agree with you for the second part. Being able to run read-only commands concurrently would be really helpful ! However, I'm not sure to understand your explanation about the fact that the mechanism is set by Terrakube on the lock file and not by Tofu. From my perspective when you run To me if a new plan is being run it will be queued until the previous plan finishes ? Maybe I misunderstood something ? Thanks ! |
@Cipher-08 I've removed your post as it appears to be a llm summary of past comments with no additional substance. |
Hello ! Thanks everyone for your answers ! I think I managed to solved the issue with terraform env variables !
For us it is an unrecommended approach and would like to avoid it as much as possible ! :) Nonetheless I pursued my research and found at that we can use
Assuming that we have the environment This way we keep the same file structure and we can Since this issue seem to have initiated a lock mechanism refactor I will let it open but feel free to close if you want ! :) Thanks for your help ! |
Hi @BenjaminDecreusefond! I'm glad you found a working solution. If you'd be willing, I'd still like to understand more about what you are doing here, since the solution you've described doesn't match my current understanding of how the system works. 🤔 You mentioned that you are using Terrakube and so I'd been assuming you were doing something like what they describe in CLI-driven Workflow, with a If that were true then the state locking would be delegated to the remote server, rather than enforced on your local system. The So I assume that something special is happening in your case, and I'd like to understand what that is to make sure you're depending on an intended behavior of the system, rather than on a coincidence that might change in future if either OpenTofu or Terrakube's behavior changes, or on something that might make the behavior unsafe for you in practice. 😬 I found some code in Terrakube that seems to implement the workspace locking API, and it does seem like it treats locking as a per-workspace problem. Therefore I would not expect any operation on a specific workspace to block any operation on any other workspace. From your original description I assumed you were talking about different workspace operations blocking one another, but on re-read it occurs to me that you might've been asking about multiple operations in the same workspace. Is that true? If that is true then that would explain why the operations were blocking each other before, but it still doesn't explain why selecting a different Do you have any ideas about what incorrect assumption I might be making here? This is the first time I've learned about Terrakube so I may be totally misunderstanding how it works or how you are using it. 😖 |
Hi @apparentlymart ! I'll try to be as clear as possible ! :) Terrakube supports two approach for workspace. One is CLI-driven workflow and the other is VCS driven workflow. In our case we are using VCS driven workflow and the idea was that we wanted to recreate a TFE VCS like system for the Terrakube environment. Then we had the idea to create a lambda triggered by a webhook. The lambda would go into modified directories and run a speculative plan on the concerned project. In the tree structure above it is important to note that for each folder inside the The issue I had in the first place was that when I run two I think there might be a confusion for the piece of code you found on Terrakube. Terrakube offers two ways to apply terraform, My guess is that using the I tried to be as clear as possible ! Benjamin |
Thanks for that extra context, @BenjaminDecreusefond. I think that all explains why the API code I was looking at is not important, but it leaves one question unanswered: why did the two I have no answer to that question. I can't think of any reason why that should be true. I also still don't understand why a separate Perhaps I should just leave this unexplained. 😀 I just worry a little that your solution isn't working the way you think it is and that you might get surprised later if something changes. 😬 |
Hi @apparentlymart ! Looking back at the error ! I think I know why it did fix the issue !
As we can see in the error ! The lock is indeed inside the .terraform/terraform.tfstate which explains why moving them to separated folders fixed the issue ! |
Ahh, okay! That does seem to explain it. What you've encountered here is not actually the normal state locking, but instead the code that interacts with the backend configuration cached in The use of "state" to refer to this concept is some technical debt resulting from it being derived from a very old version of the state snapshot format that OpenTofu no longer uses for any purpose other than this special It's always confusing that this code generates messages referring to this file as "state"; these messages are all just inherited from the older code that this was derived from. This issue seems like a good prompt to review these legacy codepaths and understand what locking patterns they are using, and whether it's actually necessary to take and hold locks here. And along with that, also a good reminder to clean up all of this legacy messaging about "states" that often causes confusion like this. |
Yes I do agree with you ! Also I think it could be nice to remove the locking mechanism for read-only commands as you mentioned earlier ! Regards ! |
Thanks for confirming, @BenjaminDecreusefond. I think then we probably need to decide what to do with this issue. 🤔 What you originally discussed here was, it turns out, related to some awkward legacy behavior in the code that manages the "backend configuration state" file ( However, you found that you can use the We also have the separate question of whether we want to switch from a pure mutex to a rwlock-style locking strategy for the real OpenTofu state, as stored through a remote backend. Since we already have #2157 working towards a vision for how the backend concept might evolve in future, I think I'm just going to leave myself a comment there for now to remind me to add something to that RFC encouraging a future author of a "plugin-based state storage" RFC to consider whether we ought to design a shared vs. exclusive lock representation into the plugin API. Does that all seem reasonable to you? |
Hi again! Since you found a suitable workaround for your situation, and since I already captured the question of shared vs. exclusive locks in a comment over in #2157, I'm going to close this issue now. I think there is still a valid question about whether the filesystem-level locking of Thanks for your patience while we worked out what the problem was here! |
OpenTofu Version
The problem in your OpenTofu project
Hello!
I’d like to get your insights on the locking mechanism in tofu. At our company, we use a single Terraform configuration to manage multiple environments, selecting specific environments through different variable files. All init and plan operations are run within the same working directory. We’re using an open-source remote backend called Terrakube, similar to Terraform Enterprise (TFE) with concepts like workspaces and working directories.
Our goal is to create a TFE-like VCS integration solution by triggering a Lambda function on pull request events. This setup works well when we run init and plan sequentially for each environment in a folder. However, with a large number of environments, sequential execution becomes time-consuming. Ideally, we want to run several tofu plan operations in parallel within the same directory.
The main challenge here is that the state file’s lock mechanism prevents parallel runs, as only one operation can access the state at a time.
Could you suggest a way to refactor this setup to allow for parallel plan operations while respecting the state locking requirements?
Attempted Solutions
Tried to make a copy of the working directory for each env but the remote workspace working directory is enable to use TF_CLI_ARGS_plan.
Proposal
My idea is that, for
remote backend
we change the names .terraform and .terraform.lock to for instance.terraoform-<workspace-name>
and.terraform.lock.<workspace-name>
this would allow to manage and apply different state file a the same time. It's a proposal, I'm really not sure it is doable nor a good idea !Lemme know ! :)
References
No response
The text was updated successfully, but these errors were encountered: