Skip to content

Conversation

@charettes
Copy link
Contributor

@charettes charettes commented Nov 18, 2020

One caveat is that they don't use the RDB format but pickle seemed like a good candidate to mimic what is expected to be an opaque and non-standard format.

https://redis.io/commands/dump

DUMP key

Serialize the value stored at key in a Redis-specific format and return it to the user. The returned value can be synthesized back into a Redis key using the RESTORE command.

The serialization format is opaque and non-standard, however it has a few semantic characteristics:

It contains a 64-bit checksum that is used to make sure errors will be detected. The RESTORE command makes sure to check the checksum before synthesizing a key using the serialized value.

Opted for sha1 here.

Values are encoded in the same format used by RDB.

This could unfortunately not be achieve as it would be impractical.

https://redis.io/commands/restore

RESTORE key ttl serialized-value

If ttl is 0 the key is created without any expire, otherwise the specified expire time (in milliseconds) is set.

@coveralls
Copy link

coveralls commented Nov 18, 2020

Coverage Status

Coverage decreased (-0.2%) to 95.961% when pulling d6e89d2 on charettes:dump-restore into e04fc6e on jamesls:master.

@bmerry
Copy link
Collaborator

bmerry commented Nov 18, 2020

Thanks. I'm going to have to think about this for a bit. In my use of fakeredis at work we have actually implemented DUMP with the redis format, but only for a few types (string, hash and zset). You can see it here. The rdbtools package contains all the code for decoding the redis dump format, as well as documentation of it. So it would be nice to implement a DUMP that uses the redis format, but it would still require a lot more work, and maybe it's worth getting something going in the meantime.

I also try to avoid pickle where possible just because of the security implications (loading a pickle can execute arbitrary code). That's probably not such an issue for a unit testing library, but maybe something like msgpack would work just as well?

@charettes
Copy link
Contributor Author

charettes commented Nov 18, 2020

So it would be nice to implement a DUMP that uses the redis format, but it would still require a lot more work, and maybe it's worth getting something going in the meantime.

That's my general feeling as well. It makes sense to use RDB if that's the actual data storage of your server but otherwise it feels like there's too much extra work involved for the sake of achieving this property

It also brings the question of which version of RDB should be used, what if it changes between Redis versions? Do do you now support both format? Require a version flag to be specified on client creation?

I also try to avoid pickle where possible just because of the security implications (loading a pickle can execute arbitrary code). That's probably not such an issue for a unit testing library, but maybe something like msgpack would work just as well?

I also think this isn't an issue for a library that advertise itself as having a single purpose: to write unittests. I guess other formats could work as well (e.g. base64 encoded json.dump) but pickle had the advantage of preserving the original Python type without requiring any form of massaging on serde (e.g. handling of byte strings, sets, dicts) and avoid another external dependency since it's part of the stdlib.

@bmerry
Copy link
Collaborator

bmerry commented Nov 18, 2020

I also think this isn't an issue for a library that advertise itself as having a single purpose: to write unittests.

Just because one builds a tool for a purpose doesn't mean people won't find other uses for it (e.g. #221) 😄 But I think it probably falls into the category of "good enough for now". I'm a little swamped at the moment, but I'll try to look at this in the next week - it does look like a nice small change.

@bmerry
Copy link
Collaborator

bmerry commented Nov 19, 2020

I've decided I'm happy with the pickle approach for now. The code you've written so far looks good, but there are a few gaps in the implementation, and a few changes I'd like:

  • In the README (where you've mentioned that pickle is used), can you add a warning about security e.g. "Warning: Do not use RESTORE with untrusted data, as a malicious pickle can execute arbitrary code."
  • There should be a test of RESTORE with a non-zero TTL. Without freezegun it's probably not practical to check that the TTL was restored precisely, but a test that some TTL was restored would be good, maybe along with a very rough bounds check to catch any units mixup between ms and seconds.
  • The RESTORE command returns an error if the target key already exists - that needs to be implemented and tested.
  • There are several optional parameters to RESTORE. I think at a minimum REPLACE should be supported, and ABSTIME should be easy enough to support too. The others aren't really useful with fakeredis, but if it's not too difficult it wouldn't hurt to accept but ignore them.

@charettes
Copy link
Contributor Author

@bmerry thanks for the quick feedback.

All of above points were addressed except for support for ABSTTL and noop IDLETIME and FREQ as redis-py doesn't support these options yet as I assume they are quite niche

https://github.com/andymccurdy/redis-py/blob/8c176cdc7f36e2cbcd3254768766035a7b7cd8b3/redis/client.py#L1814-L1822

I have a commit ready for ABSTTL though assuming redis-py catches up which I'll submit as another PR based on this one. The test it include fails with a TypeError though for the reasons explained above but I guess work could be reused if support for absttl=True ever lands upstream.

Copy link
Collaborator

@bmerry bmerry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work. Just a few more niggles I missed the first time.

else:
raise SimpleError(SYNTAX_ERROR_MSG)
if key and not replace:
raise redis.ResponseError(RESTORE_KEY_EXISTS)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these redis.ResponseErrors should all be SimpleErrors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, here's what you get using redis-py against an existing Redis server.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh looks like SimpleError are turned into ResponseError automatically.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. fakeredis also support aioredis, where SimpleError gets turned into an aioredis exception.

now_us %= 1000000
return [str(now_s).encode(), str(now_us).encode()]

# Generic commands
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this line, and move the new code to the "Key commands" section (because that's the category they're listed under in the redis command reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts about adjusting the README to match the Redis command reference? That's where I got the order from.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what's happened is that the data that backs the command reference has two names for each section e.g. the URL for the "Keys" section is https://redis.io/commands#generic. The script that generates that section of the README is using the short names (hence "generic") but the comments in _server.py use the long names.


# Generic commands

@command((Key(),))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need the following to make this do the right thing if the key doesn't exist (which also needs a test):

Suggested change
@command((Key(),))
@command((Key(missing_return=None),))

One caveat is that they don't use the RDB format but pickle seemed like a good
candidate to mimic what is expected to be an opaque and non-standard format.
@charettes
Copy link
Contributor Author

@bmerry all comments should be addressed.

@bmerry
Copy link
Collaborator

bmerry commented Nov 23, 2020

Thanks for the work - all looks good now!

@bmerry bmerry merged commit 4cfb9d3 into jamesls:master Nov 23, 2020
@charettes charettes deleted the dump-restore branch November 23, 2020 14:44
@charettes
Copy link
Contributor Author

Thanks for the merge @bmerry. Any idea when a new release containing these changes will be released? We're currently using a monkeypatch of FakeServer we'd like to eventually get rid of.

@bmerry
Copy link
Collaborator

bmerry commented Nov 23, 2020

Any idea when a new release containing these changes will be released?

There's one PR in progress which I hope to get merged this week before a release. I just have to remember to make time in my day job. If you haven't seen anything by next Thursday then feel free to remind me again.

@charettes
Copy link
Contributor Author

Thanks for the release @bmerry, appreciate 🙇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants