Skip to content

pytorch 1.4 can not load model saved by 1.7 #48915

@Light--

Description

@Light--

🐛 Bug

model trained by pytorch 1.7.0 cuda 11.0.221, but cannot load by pytorch1.4.0, cuda 10.0.130

To Reproduce

Steps to reproduce the behavior:

  1. train model and save by 1.7
  2. load by 1.4
torch.load('/home/user1/model_best_b.pth.tar')
Traceback (most recent call last):
  File "/data/user1/pkgs/conda/envs/drc/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3417, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-10-13d633918c2f>", line 1, in <module>
    torch.load('/home/wangjunchu/pjs/fae/paper/ckpt/to_test/20201204175958/Arcface50_t4_bs50_bslr_0.001_fclr_0.01/model_best_bacc.pth.tar')
  File "/data/user1/pkgs/conda/envs/drc/lib/python3.7/site-packages/torch/serialization.py", line 527, in load
    with _open_zipfile_reader(f) as opened_zipfile:
  File "/data/user1/pkgs/conda/envs/drc/lib/python3.7/site-packages/torch/serialization.py", line 224, in __init__
    super(_open_zipfile_reader, self).__init__(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: version_ <= kMaxSupportedFileFormatVersion INTERNAL ASSERT FAILED at /pytorch/caffe2/serialize/inline_container.cc:132, please report a bug to PyTorch. Attempted to read a PyTorch file with version 3, but the maximum supported version for reading is 2. Your PyTorch installation may be too old. (init at /pytorch/caffe2/serialize/inline_container.cc:132)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7f3deaa57193 in /data/user1/pkgs/conda/envs/drc/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: caffe2::serialize::PyTorchStreamReader::init() + 0x1f5b (0x7f3d447949eb in /data/user1/pkgs/conda/envs/drc/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #2: caffe2::serialize::PyTorchStreamReader::PyTorchStreamReader(std::string const&) + 0x64 (0x7f3d44795c04 in /data/user1/pkgs/conda/envs/drc/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #3: <unknown function> + 0x6c6536 (0x7f3dcc2d4536 in /data/user1/pkgs/conda/envs/drc/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: <unknown function> + 0x295a74 (0x7f3dcbea3a74 in /data/user1/pkgs/conda/envs/drc/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: _PyMethodDef_RawFastCallDict + 0x24d (0x55ba98d39bfd in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #6: _PyCFunction_FastCallDict + 0x21 (0x55ba98d39d81 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #7: _PyObject_Call_Prepend + 0x63 (0x55ba98d37a73 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #8: PyObject_Call + 0x6e (0x55ba98d29fde in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #9: <unknown function> + 0xabddd (0x55ba98cadddd in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #10: _PyObject_FastCallKeywords + 0x128 (0x55ba98d7ff78 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #11: _PyEval_EvalFrameDefault + 0x5389 (0x55ba98dd2a39 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #12: _PyEval_EvalCodeWithName + 0x5da (0x55ba98d1766a in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #13: _PyFunction_FastCallDict + 0x1d5 (0x55ba98d184c5 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #14: _PyObject_Call_Prepend + 0x63 (0x55ba98d37a73 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #15: <unknown function> + 0x17d1ba (0x55ba98d7f1ba in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #16: _PyObject_FastCallKeywords + 0x128 (0x55ba98d7ff78 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #17: _PyEval_EvalFrameDefault + 0x4a96 (0x55ba98dd2146 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #18: _PyEval_EvalCodeWithName + 0x2f9 (0x55ba98d17389 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #19: _PyFunction_FastCallKeywords + 0x387 (0x55ba98d6b2b7 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #20: _PyEval_EvalFrameDefault + 0x4b39 (0x55ba98dd21e9 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #21: _PyEval_EvalCodeWithName + 0x2f9 (0x55ba98d17389 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #22: PyEval_EvalCodeEx + 0x44 (0x55ba98d182b4 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #23: PyEval_EvalCode + 0x1c (0x55ba98d182dc in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #24: <unknown function> + 0x1db30d (0x55ba98ddd30d in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #25: _PyMethodDef_RawFastCallKeywords + 0xe9 (0x55ba98d6b939 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #26: _PyCFunction_FastCallKeywords + 0x21 (0x55ba98d6bbd1 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #27: _PyEval_EvalFrameDefault + 0x47a4 (0x55ba98dd1e54 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #28: _PyGen_Send + 0x2a2 (0x55ba98d80f82 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #29: _PyEval_EvalFrameDefault + 0x1a76 (0x55ba98dcf126 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #30: _PyGen_Send + 0x2a2 (0x55ba98d80f82 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #31: _PyEval_EvalFrameDefault + 0x1a76 (0x55ba98dcf126 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #32: _PyGen_Send + 0x2a2 (0x55ba98d80f82 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #33: _PyMethodDef_RawFastCallKeywords + 0x8d (0x55ba98d6b8dd in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #34: _PyMethodDescr_FastCallKeywords + 0x4f (0x55ba98d7fdbf in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #35: _PyEval_EvalFrameDefault + 0x4c9d (0x55ba98dd234d in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #36: _PyFunction_FastCallKeywords + 0xfb (0x55ba98d6b02b in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #37: _PyEval_EvalFrameDefault + 0x416 (0x55ba98dcdac6 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #38: _PyFunction_FastCallKeywords + 0xfb (0x55ba98d6b02b in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #39: _PyEval_EvalFrameDefault + 0x690 (0x55ba98dcdd40 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #40: _PyEval_EvalCodeWithName + 0x2f9 (0x55ba98d17389 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #41: _PyFunction_FastCallKeywords + 0x387 (0x55ba98d6b2b7 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #42: _PyEval_EvalFrameDefault + 0x14d4 (0x55ba98dceb84 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #43: _PyFunction_FastCallKeywords + 0xfb (0x55ba98d6b02b in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #44: _PyEval_EvalFrameDefault + 0x690 (0x55ba98dcdd40 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #45: _PyFunction_FastCallKeywords + 0xfb (0x55ba98d6b02b in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #46: _PyEval_EvalFrameDefault + 0x690 (0x55ba98dcdd40 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #47: _PyEval_EvalCodeWithName + 0x2f9 (0x55ba98d17389 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #48: _PyFunction_FastCallKeywords + 0x325 (0x55ba98d6b255 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #49: _PyEval_EvalFrameDefault + 0x690 (0x55ba98dcdd40 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #50: _PyFunction_FastCallKeywords + 0xfb (0x55ba98d6b02b in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #51: _PyEval_EvalFrameDefault + 0x416 (0x55ba98dcdac6 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #52: _PyFunction_FastCallKeywords + 0xfb (0x55ba98d6b02b in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #53: _PyEval_EvalFrameDefault + 0x4b39 (0x55ba98dd21e9 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #54: _PyEval_EvalCodeWithName + 0x2f9 (0x55ba98d17389 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #55: PyEval_EvalCodeEx + 0x44 (0x55ba98d182b4 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #56: PyEval_EvalCode + 0x1c (0x55ba98d182dc in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #57: <unknown function> + 0x22c664 (0x55ba98e2e664 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #58: PyRun_FileExFlags + 0xa1 (0x55ba98e38a91 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #59: PyRun_SimpleFileExFlags + 0x1c3 (0x55ba98e38c83 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #60: <unknown function> + 0x237db5 (0x55ba98e39db5 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #61: _Py_UnixMain + 0x3c (0x55ba98e39edc in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #62: __libc_start_main + 0xf0 (0x7f3df6c6e830 in /lib/x86_64-linux-gnu/libc.so.6)
frame #63: <unknown function> + 0x1db3e0 (0x55ba98ddd3e0 in /data/user1/pkgs/conda/envs/drc/bin/python)
  1. load by 1.7

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user1/anaconda3/lib/python3.8/site-packages/torch/serialization.py", line 594, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/home/user1/anaconda3/lib/python3.8/site-packages/torch/serialization.py", line 853, in _load
    result = unpickler.load()
  File "/home/user1/anaconda3/lib/python3.8/site-packages/torch/serialization.py", line 845, in persistent_load
    load_tensor(data_type, size, key, _maybe_decode_ascii(location))
  File "/home/user1/anaconda3/lib/python3.8/site-packages/torch/serialization.py", line 834, in load_tensor
    loaded_storages[key] = restore_location(storage, location)
  File "/home/user1/anaconda3/lib/python3.8/site-packages/torch/serialization.py", line 175, in default_restore_location
    result = fn(storage, location)
  File "/home/user1/anaconda3/lib/python3.8/site-packages/torch/serialization.py", line 157, in _cuda_deserialize
    return obj.cuda(device)
  File "/home/user1/anaconda3/lib/python3.8/site-packages/torch/_utils.py", line 79, in _cuda
    return new_type(self.size()).copy_(self, non_blocking)
  File "/home/user1/anaconda3/lib/python3.8/site-packages/torch/cuda/__init__.py", line 462, in _lazy_new
    return super(_CudaBase, cls).__new__(cls, *args, **kwargs)
RuntimeError: CUDA error: out of memory

Expected behavior

normal loaded.

Environment

env of 1.7:

PyTorch version: 1.7.0
Is debug build: True
CUDA used to build PyTorch: 10.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.1 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.16.3
Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: TITAN RTX
GPU 1: TITAN RTX

Nvidia driver version: 455.38
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn.so.8
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_train.so.8
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_train.so.8
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.0.5
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.16.2
[pip3] torch==1.4.0
[pip3] torchvision==0.5.0
[pip3] torchviz==0.0.1
[conda] blas                      1.0                         mkl    defaults
[conda] cudatoolkit               10.1.243             h6bb024c_0    http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
[conda] mkl                       2020.1                      217    defaults
[conda] mkl-service               2.3.0            py38he904b0f_0    defaults
[conda] mkl_fft                   1.1.0            py38h23d657b_0    defaults
[conda] mkl_random                1.1.1            py38h0573a6f_0    defaults
[conda] numpy                     1.18.5           py38ha1c710e_0    defaults
[conda] numpy-base                1.18.5           py38hde5b4d6_0    defaults
[conda] numpydoc                  1.1.0                      py_0    defaults
[conda] pytorch                   1.7.0           py3.8_cuda10.1.243_cudnn7.6.3_0    http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch
[conda] torchaudio                0.7.0                      py38    http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch
[conda] torchvision               0.8.1                py38_cu101    http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch

env of 1.4:

PyTorch version: 1.4.0
Is debug build: False
CUDA used to build PyTorch: 10.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 16.04.3 LTS (x86_64)
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
Clang version: Could not collect
CMake version: Could not collect
Python version: 3.7 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: 10.0.130
GPU models and configuration:
GPU 0: TITAN RTX
GPU 1: TITAN RTX
GPU 2: TITAN RTX
GPU 3: TITAN RTX

Nvidia driver version: 440.44
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.18.3
[pip3] torch==1.4.0
[pip3] torchvision==0.6.0
[conda] cudatoolkit               10.1.243             h6bb024c_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
[conda] numpy                     1.16.2                   pypi_0    pypi
[conda] torch                     1.4.0                    pypi_0    pypi
[conda] torchvision               0.5.0                    pypi_0    pypi
[conda] torchviz                  0.0.1                    pypi_0    pypi

Additional context

model trained and load by 1.4 is ok.
weird bug and i don't know why
it's urgent, please help....

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions