Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes in the hash algorithm #5161

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

ashwini-orchestral
Copy link
Contributor

For security purposes, the hash function for symmetric encryption is changed ie. from SHA1 to SHA256. Also, a migration script is added to update the database with new encrypted values.

@pull-request-size pull-request-size bot added the size/L PR that changes 100-499 lines. Requires some effort to review. label Feb 22, 2021
@arm4b arm4b added this to the 3.5.0 milestone Feb 22, 2021
@blag
Copy link
Contributor

blag commented Feb 22, 2021

SHA-1 is not used in encryption, it is used in HMAC, which is, to my knowledge, still currently considered secure.

https://crypto.stackexchange.com/questions/26510/why-is-hmac-sha1-still-considered-secure/26518#26518

We should walk, but not necessarily run, away from SHA1 as HMAC.

@Kami
Copy link
Member

Kami commented Feb 22, 2021

Thanks for the contribution.

As @blag pointed out, SHA1 is used for MAC and not for actual symmetric encryption.

In theory, I'm fine with changing it, but in case we do decide to change the MAC algorithm we should re-evaluate if the current implementation (both for the MAC and actual symmetric encryption) is indeed (still recommended) and a safe choice - and if there are more things we should change, we should change those as well (but of course since it's a security sensitive code, we need to be careful). And if we do it, maybe we should just use sha512 to be future proof - that code is not used in that many places so I don't think it should add tons of overhead.

We used to use keyczar which is not supported anymore so we migrated the same code and algorithms to cryptography. It's likely that there are safer and more robust defaults available these days.

As far as implementation and roll out goes - StackStorm is a distributed system which means we should approach all such changes in a specific manner to ensure a consistent roll out without issues (it basically means that the code needs to support both versions for a while aka until changes are rolled out to all the services and migration script finishes).

We don't have database object versioning in place and we also can't rely on a migration script by itself - the code needs to support both implementations - basically on read we need to support sha1 and sha256, but when the value is updated, we should write it out in a new format. This way we also don't need to add another database field.

And if we want to make sure all the values are indeed migrated in a reasonable time frame, we should instruct users to run a migration script or perhaps go with the read-repair approach (on the read, if it still uses a new format, re-write the object with a new format - probably the migration script is better and easier to do).

@arm4b
Copy link
Member

arm4b commented Feb 22, 2021

The comment about backward compatibility makes a lot of sense to me 👍
We could also add a deprecation warning to logs in the case when an old key format is still used.

@m4dcoder
Copy link
Contributor

I think the discussion should be focus on whether the sha1 function use for hmac poses security risk or tainted image of st2 on a vulnerability scan.

On the lesser issue on migration, I need some explanation why users cannot run a migration script during st2 upgrade. User just run the script to re-encrypt KVP in the datastore and move on. We've asked users to run migration script before in prior releases. Why is this an issue now?

@m4dcoder
Copy link
Contributor

There's better understanding on the issue here after some discussion. The data is still encrypted with AES-256 per docs at https://docs.stackstorm.com/datastore.html?highlight=encryption_key_path#securing-secrets-admin-only. The sha1 hash is used to generate the hmac hash to verify the encrypted text. So technically, this is low risk. However, per feedback, there could be collisions and change is still recommended because of the collision weaknesses.

@cognifloyd
Copy link
Member

We just reformatted the code with black. (Hooray!) And this PR got caught in the cross fire too. (Arrgh!)
Luckily, merging master into this PR should not have many conflicts. Cheers!

@amanda11 amanda11 modified the milestones: 3.5.0, 3.6.0 Jun 29, 2021
@CLAassistant
Copy link

CLAassistant commented Sep 6, 2021

CLA assistant check
All committers have signed the CLA.

Copy link
Member

@cognifloyd cognifloyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is ready for 3.6.0, as it doesn't look finished to me. I suspect we'll need to discuss it more before we can merge it as well.

tools/migrate_encrypted_value_to_db.py Outdated Show resolved Hide resolved
tools/migrate_encrypted_value_to_db.py Outdated Show resolved Hide resolved
Comment on lines +75 to +76
crypto_key = AESKey.generate()
new_encrypted_value = symmetric_encrypt(crypto_key, plaintext)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we generating a new crypto_key for every keyvalue entry?
I would imagine that we should generate the new key once, and save it to some path, and then use that key to reencrypt all the values.

Comment on lines 39 to +40
value = me.StringField()
new_encrypted_value = me.StringField()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a new field for the value?
I see where new_encrypted_value gets written in the migration script, but I don't see anything else that consumes it (eg to copy it to the value field after successfully reencrypting all the values).

@cognifloyd
Copy link
Member

@ashwini-orchestral Can you:

  • add a changelog entry
  • could you address my other comments:
    • move crypto_key = AESKey.generate() somewhere else so it's not regenerated for every key, and save it. We don't want to regenerate the all the values with a new key only to discard the key (which would prevent decrypting the newly re-encrypted values).
    • drop new_encrypted_value from the model or explain why its needed.

@cognifloyd cognifloyd marked this pull request as draft April 1, 2022 19:08
@cognifloyd
Copy link
Member

I'm marking this as a draft until it the implementation is complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
database migrations security size/L PR that changes 100-499 lines. Requires some effort to review. status:under discussion
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants