-
Notifications
You must be signed in to change notification settings - Fork 26.4k
Closed
Description
🐛 Bug
model trained by pytorch 1.7.0 cuda 11.0.221, but cannot load by pytorch1.4.0, cuda 10.0.130
To Reproduce
Steps to reproduce the behavior:
- train model and save by 1.7
- load by 1.4
torch.load('/home/user1/model_best_b.pth.tar')
Traceback (most recent call last):
File "/data/user1/pkgs/conda/envs/drc/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3417, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-10-13d633918c2f>", line 1, in <module>
torch.load('/home/wangjunchu/pjs/fae/paper/ckpt/to_test/20201204175958/Arcface50_t4_bs50_bslr_0.001_fclr_0.01/model_best_bacc.pth.tar')
File "/data/user1/pkgs/conda/envs/drc/lib/python3.7/site-packages/torch/serialization.py", line 527, in load
with _open_zipfile_reader(f) as opened_zipfile:
File "/data/user1/pkgs/conda/envs/drc/lib/python3.7/site-packages/torch/serialization.py", line 224, in __init__
super(_open_zipfile_reader, self).__init__(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: version_ <= kMaxSupportedFileFormatVersion INTERNAL ASSERT FAILED at /pytorch/caffe2/serialize/inline_container.cc:132, please report a bug to PyTorch. Attempted to read a PyTorch file with version 3, but the maximum supported version for reading is 2. Your PyTorch installation may be too old. (init at /pytorch/caffe2/serialize/inline_container.cc:132)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7f3deaa57193 in /data/user1/pkgs/conda/envs/drc/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: caffe2::serialize::PyTorchStreamReader::init() + 0x1f5b (0x7f3d447949eb in /data/user1/pkgs/conda/envs/drc/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #2: caffe2::serialize::PyTorchStreamReader::PyTorchStreamReader(std::string const&) + 0x64 (0x7f3d44795c04 in /data/user1/pkgs/conda/envs/drc/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #3: <unknown function> + 0x6c6536 (0x7f3dcc2d4536 in /data/user1/pkgs/conda/envs/drc/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: <unknown function> + 0x295a74 (0x7f3dcbea3a74 in /data/user1/pkgs/conda/envs/drc/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: _PyMethodDef_RawFastCallDict + 0x24d (0x55ba98d39bfd in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #6: _PyCFunction_FastCallDict + 0x21 (0x55ba98d39d81 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #7: _PyObject_Call_Prepend + 0x63 (0x55ba98d37a73 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #8: PyObject_Call + 0x6e (0x55ba98d29fde in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #9: <unknown function> + 0xabddd (0x55ba98cadddd in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #10: _PyObject_FastCallKeywords + 0x128 (0x55ba98d7ff78 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #11: _PyEval_EvalFrameDefault + 0x5389 (0x55ba98dd2a39 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #12: _PyEval_EvalCodeWithName + 0x5da (0x55ba98d1766a in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #13: _PyFunction_FastCallDict + 0x1d5 (0x55ba98d184c5 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #14: _PyObject_Call_Prepend + 0x63 (0x55ba98d37a73 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #15: <unknown function> + 0x17d1ba (0x55ba98d7f1ba in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #16: _PyObject_FastCallKeywords + 0x128 (0x55ba98d7ff78 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #17: _PyEval_EvalFrameDefault + 0x4a96 (0x55ba98dd2146 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #18: _PyEval_EvalCodeWithName + 0x2f9 (0x55ba98d17389 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #19: _PyFunction_FastCallKeywords + 0x387 (0x55ba98d6b2b7 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #20: _PyEval_EvalFrameDefault + 0x4b39 (0x55ba98dd21e9 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #21: _PyEval_EvalCodeWithName + 0x2f9 (0x55ba98d17389 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #22: PyEval_EvalCodeEx + 0x44 (0x55ba98d182b4 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #23: PyEval_EvalCode + 0x1c (0x55ba98d182dc in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #24: <unknown function> + 0x1db30d (0x55ba98ddd30d in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #25: _PyMethodDef_RawFastCallKeywords + 0xe9 (0x55ba98d6b939 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #26: _PyCFunction_FastCallKeywords + 0x21 (0x55ba98d6bbd1 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #27: _PyEval_EvalFrameDefault + 0x47a4 (0x55ba98dd1e54 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #28: _PyGen_Send + 0x2a2 (0x55ba98d80f82 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #29: _PyEval_EvalFrameDefault + 0x1a76 (0x55ba98dcf126 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #30: _PyGen_Send + 0x2a2 (0x55ba98d80f82 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #31: _PyEval_EvalFrameDefault + 0x1a76 (0x55ba98dcf126 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #32: _PyGen_Send + 0x2a2 (0x55ba98d80f82 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #33: _PyMethodDef_RawFastCallKeywords + 0x8d (0x55ba98d6b8dd in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #34: _PyMethodDescr_FastCallKeywords + 0x4f (0x55ba98d7fdbf in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #35: _PyEval_EvalFrameDefault + 0x4c9d (0x55ba98dd234d in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #36: _PyFunction_FastCallKeywords + 0xfb (0x55ba98d6b02b in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #37: _PyEval_EvalFrameDefault + 0x416 (0x55ba98dcdac6 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #38: _PyFunction_FastCallKeywords + 0xfb (0x55ba98d6b02b in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #39: _PyEval_EvalFrameDefault + 0x690 (0x55ba98dcdd40 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #40: _PyEval_EvalCodeWithName + 0x2f9 (0x55ba98d17389 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #41: _PyFunction_FastCallKeywords + 0x387 (0x55ba98d6b2b7 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #42: _PyEval_EvalFrameDefault + 0x14d4 (0x55ba98dceb84 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #43: _PyFunction_FastCallKeywords + 0xfb (0x55ba98d6b02b in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #44: _PyEval_EvalFrameDefault + 0x690 (0x55ba98dcdd40 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #45: _PyFunction_FastCallKeywords + 0xfb (0x55ba98d6b02b in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #46: _PyEval_EvalFrameDefault + 0x690 (0x55ba98dcdd40 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #47: _PyEval_EvalCodeWithName + 0x2f9 (0x55ba98d17389 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #48: _PyFunction_FastCallKeywords + 0x325 (0x55ba98d6b255 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #49: _PyEval_EvalFrameDefault + 0x690 (0x55ba98dcdd40 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #50: _PyFunction_FastCallKeywords + 0xfb (0x55ba98d6b02b in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #51: _PyEval_EvalFrameDefault + 0x416 (0x55ba98dcdac6 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #52: _PyFunction_FastCallKeywords + 0xfb (0x55ba98d6b02b in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #53: _PyEval_EvalFrameDefault + 0x4b39 (0x55ba98dd21e9 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #54: _PyEval_EvalCodeWithName + 0x2f9 (0x55ba98d17389 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #55: PyEval_EvalCodeEx + 0x44 (0x55ba98d182b4 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #56: PyEval_EvalCode + 0x1c (0x55ba98d182dc in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #57: <unknown function> + 0x22c664 (0x55ba98e2e664 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #58: PyRun_FileExFlags + 0xa1 (0x55ba98e38a91 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #59: PyRun_SimpleFileExFlags + 0x1c3 (0x55ba98e38c83 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #60: <unknown function> + 0x237db5 (0x55ba98e39db5 in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #61: _Py_UnixMain + 0x3c (0x55ba98e39edc in /data/user1/pkgs/conda/envs/drc/bin/python)
frame #62: __libc_start_main + 0xf0 (0x7f3df6c6e830 in /lib/x86_64-linux-gnu/libc.so.6)
frame #63: <unknown function> + 0x1db3e0 (0x55ba98ddd3e0 in /data/user1/pkgs/conda/envs/drc/bin/python)
- load by 1.7
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/user1/anaconda3/lib/python3.8/site-packages/torch/serialization.py", line 594, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "/home/user1/anaconda3/lib/python3.8/site-packages/torch/serialization.py", line 853, in _load
result = unpickler.load()
File "/home/user1/anaconda3/lib/python3.8/site-packages/torch/serialization.py", line 845, in persistent_load
load_tensor(data_type, size, key, _maybe_decode_ascii(location))
File "/home/user1/anaconda3/lib/python3.8/site-packages/torch/serialization.py", line 834, in load_tensor
loaded_storages[key] = restore_location(storage, location)
File "/home/user1/anaconda3/lib/python3.8/site-packages/torch/serialization.py", line 175, in default_restore_location
result = fn(storage, location)
File "/home/user1/anaconda3/lib/python3.8/site-packages/torch/serialization.py", line 157, in _cuda_deserialize
return obj.cuda(device)
File "/home/user1/anaconda3/lib/python3.8/site-packages/torch/_utils.py", line 79, in _cuda
return new_type(self.size()).copy_(self, non_blocking)
File "/home/user1/anaconda3/lib/python3.8/site-packages/torch/cuda/__init__.py", line 462, in _lazy_new
return super(_CudaBase, cls).__new__(cls, *args, **kwargs)
RuntimeError: CUDA error: out of memory
Expected behavior
normal loaded.
Environment
env of 1.7:
PyTorch version: 1.7.0
Is debug build: True
CUDA used to build PyTorch: 10.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.1 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.16.3
Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: TITAN RTX
GPU 1: TITAN RTX
Nvidia driver version: 455.38
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn.so.8
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_train.so.8
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_train.so.8
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.0.5
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.16.2
[pip3] torch==1.4.0
[pip3] torchvision==0.5.0
[pip3] torchviz==0.0.1
[conda] blas 1.0 mkl defaults
[conda] cudatoolkit 10.1.243 h6bb024c_0 http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
[conda] mkl 2020.1 217 defaults
[conda] mkl-service 2.3.0 py38he904b0f_0 defaults
[conda] mkl_fft 1.1.0 py38h23d657b_0 defaults
[conda] mkl_random 1.1.1 py38h0573a6f_0 defaults
[conda] numpy 1.18.5 py38ha1c710e_0 defaults
[conda] numpy-base 1.18.5 py38hde5b4d6_0 defaults
[conda] numpydoc 1.1.0 py_0 defaults
[conda] pytorch 1.7.0 py3.8_cuda10.1.243_cudnn7.6.3_0 http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch
[conda] torchaudio 0.7.0 py38 http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch
[conda] torchvision 0.8.1 py38_cu101 http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch
env of 1.4:
PyTorch version: 1.4.0
Is debug build: False
CUDA used to build PyTorch: 10.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 16.04.3 LTS (x86_64)
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
Clang version: Could not collect
CMake version: Could not collect
Python version: 3.7 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: 10.0.130
GPU models and configuration:
GPU 0: TITAN RTX
GPU 1: TITAN RTX
GPU 2: TITAN RTX
GPU 3: TITAN RTX
Nvidia driver version: 440.44
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.18.3
[pip3] torch==1.4.0
[pip3] torchvision==0.6.0
[conda] cudatoolkit 10.1.243 h6bb024c_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
[conda] numpy 1.16.2 pypi_0 pypi
[conda] torch 1.4.0 pypi_0 pypi
[conda] torchvision 0.5.0 pypi_0 pypi
[conda] torchviz 0.0.1 pypi_0 pypi
Additional context
model trained and load by 1.4 is ok.
weird bug and i don't know why
it's urgent, please help....
Metadata
Metadata
Assignees
Labels
No labels