What's the difference between `active_bytes` and `reserved_bytes`?

I need to show that some technique called gradient checkpointing can really save GPU memory usage during backward propagation. When I see the result there are two columns on the left showing `active_bytes` and `reserved_bytes`. In my testing, while active bytes read `3.83G`, the reserved bytes read `9.35G`. So why does PyTorch still reserve that much GPU memory?