Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add force-lock command #1124

Open
jjshanks opened this issue Jan 15, 2024 · 5 comments
Open

Add force-lock command #1124

jjshanks opened this issue Jan 15, 2024 · 5 comments
Labels
pending-decision This issue has not been accepted for implementation nor rejected. It's still open to discussion. question Further information is requested rfc

Comments

@jjshanks
Copy link
Contributor

jjshanks commented Jan 15, 2024

Summary

This proposal is to add a new force-lock command.

Problem Statement

There is no way to manually lock a state via the CLI, only unlock one.

User-facing description

Usage would be similar to force-unlock except it wouldn't take a lock id it would return one

> tofu force-lock
Lock ID is 0d66876d-8ac8-4724-9cda-91d5592a96da

Technical Description

  • New command force-lock which will be just like force-unlock except it won't take a lock id and will call Lock() instead of Unlock() on the state manager
  • For backends that don't allow locking (if any) or need the process to keep running to maintain the lock an error message will be printed.
  • Documentation

Rationale and alternatives

Teams can have multiple reasons to want to lock state such as deployment freezes and refactoring.

Currently the only way to lock state is either editing the metadata of the state file by hand or using a utility like https://github.com/minamijoyo/tflock

An implementation over a subset of backends meets a need from the community. Expanding to other backends can be delayed until users of those backends request it.

Downsides

  • Command won't work for all backends and could lead to confusion/questions from users

Unresolved Questions

If a state is already locked should an error be returned or just return the existing lock id?

Related Issues

No response

Proof of Concept

No response

@jjshanks jjshanks added pending-decision This issue has not been accepted for implementation nor rejected. It's still open to discussion. rfc labels Jan 15, 2024
@ghost
Copy link

ghost commented Jan 24, 2024

Hey @jjshanks thank you for this RFC. There are some concerns around the ability to hold a lock with OpenTofu exiting on backends, such as Consul as documented here. Could you expand your RFC and address how this would work with regards to specific backends?

@ghost ghost added the question Further information is requested label Jan 24, 2024
@jjshanks
Copy link
Contributor Author

Thanks for the link @janosdebugs. To summarize the other discussions the potential options are

  1. Implement for all backends, this would require some backends (local, consul, etc.) to continue to run and enforce the lock.
    • For cases where the process needs to keep running could add a flag to make it explicit
  2. Implement for some backends, this would require outputting some kind of error when unsupported backends are invoked

I don't have any experience using backends that aren't s3/dynamo so I'm not sure I'm the best person to say how other backends should behave. But my gut reaction would be to give an error for anything that needs to keep running for the lock, because users of those backends already don't use force-unlock.

I'll update the RFC to reflect the above for now until there is more feedback.

@apparentlymart
Copy link
Contributor

apparentlymart commented Oct 9, 2024

The current API for locking is the Lock method of statemgr.Locker.

Right now that doesn't have any way to distinguish between a lock that can safely be dropped automatically when the process exits vs. a lock that must survive beyond the end of the current process (or return an error if that's not possible).

It seems like the smallest change to the existing API would be to extend statemgr.LockInfo with a new field that serves as a hint to the implementation that it should prefer to fail if it can't promise to preserve the lock beyond the life of the process. The local and consul backends could then check for that hint and return an error if it's set, unless we can find some reasonable way for those backends to create a more persistent lock in future.

It's a awkward that the default behavior for an unmodified backend would be to disregard the new hint field and acquire a useless lock that will immediately be dropped, but thankfully since we have all of the backends in-tree right now we can make sure they all get updated to support the new hint field (if needed) as part of the same PR.


This is perhaps an example of something that would be a little harder if we waited until after #382, because we'd no longer be able to update all existing backends to support this new capability. (Though I think we could still do it, just that we'd need to take a different API design strategy such as having force-lock call an entirely new method that existing backends don't support at all, and then the default would be for the new command to not be supported until all backends have been updated, so there would likely always be backend implementations out there that could support force-lock but nonetheless haven't been updated to support it yet.)

@darpham
Copy link

darpham commented Oct 25, 2024

Cross-posting to share the knowledge once force-unlock is implemented:)

If it's of any use to folks stumbling across this, here's a bash/zsh compatible interactive script to more easily unlock state. Could be updated for running in CI/CD in an automated fashion with some pre-validation - YMMY and use with due caution.

tf_unlock() {
    local LOCK_ID
    local ERROR_OUTPUT
    local LOCK_DETAILS

    # Capture both stdout and stderr
    ERROR_OUTPUT=$(terraform plan -json 2>&1)

    # Check if there's a lock error
    if [[ $ERROR_OUTPUT == *"Error acquiring the state lock"* ]]; then
        LOCK_DETAILS=$(echo "$ERROR_OUTPUT" | jq -r '.diagnostic.detail // empty' 2>/dev/null)
        if [[ -z "$LOCK_DETAILS" ]]; then
            LOCK_DETAILS=$(echo "$ERROR_OUTPUT" | grep -A10 "Lock Info:")
        fi
        LOCK_ID=$(echo "$LOCK_DETAILS" | awk '/ID:/ {print $2; exit}')

        echo "State is locked. Lock details:"
        echo "$LOCK_DETAILS"
        echo

        echo -n "Do you want to unlock this state? Type 'yes' to confirm: "
        read response
        if [[ "$response" == "yes" ]]; then
            echo "Attempting to unlock..."
            if terraform force-unlock --force "${LOCK_ID}"; then
                echo "Terraform state has been successfully unlocked!"
            else
                echo "Failed to unlock the state. Please check the error message above." >&2
                return 1
            fi
        else
            echo "Unlock cancelled."
            return 0
        fi
    elif [[ $ERROR_OUTPUT == *"Error:"* ]]; then
        echo "Error occurred while checking Terraform state:" >&2
        echo "$ERROR_OUTPUT" >&2
        return 1
    else
        echo "State is not locked. No action needed."
    fi
}

Originally posted by @darpham in hashicorp/terraform#30277 (comment)

@RLRabinowitz
Copy link
Contributor

@klongmitre tagging you here, this is the issue we've discussed at KubeCon recently.

If you are interested in this feature, please make sure to upvote the original issue, as we rely on the upvotes to determine community demand

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending-decision This issue has not been accepted for implementation nor rejected. It's still open to discussion. question Further information is requested rfc
Projects
None yet
Development

No branches or pull requests

4 participants