â»æ³¨æâ»
é«è² è·ãç¶ãã¨"GPU Hang"ã§ããã»ã¹ãæ»ã¬åé¡ã解決ãã¦ããªãã®ã§ãPytorch/ROCmãåããç®çã§Ryzen 7 8700Gã®PCãçµãã®ã¯ç¾æç¹(2024/12/28)ã§ãå§ããã¾ããã ç¾ç¶APUã®å èµGPUèªä½ãROCmå ¬å¼éãµãã¼ããªã®ã§ããµãã¼ãããã¦å®å®åä½ããã¾ã§å¾ ã¤ã»ããè³¢æã ã¨æãã¾ãã
主ãªè£ ç½®æ§æ
- ãã¼ã¹ããã: ASRock DeskMini X600
- CPU(APU): AMD Ryzen 7 8700G
- Memory: 64GB (ãã®ãã¡16GBãVRAMã«å²ãå½ã¦ã)
- OS: Ubuntu 24.04.1 LTS
ä»åã®ç®æ¨
UEFIè¨å®
ROCmã®ã¤ã³ã¹ãã¼ã«
ROCmã®Quick start installation guideãåèã«ãã¤ã¤ãå¾ã§å ¥ããPytorchã«åããã¦ãROCmãã¼ã¸ã§ã³6.2ãå ¥ããã
sudo apt update sudo apt install "linux-headers-$(uname -r)" "linux-modules-extra-$(uname -r)" sudo apt install python3-setuptools python3-wheel libpython3.12 sudo usermod -a -G render,video $LOGNAME wget https://repo.radeon.com/amdgpu-install/6.2.4/ubuntu/noble/amdgpu-install_6.2.60204-1_all.deb sudo apt install ./amdgpu-install_6.2.60204-1_all.deb sudo apt update sudo apt install amdgpu-dkms rocm sudo amdgpu-install
çµãã£ããä¸åº¦rebootããã
stable-diffusion-webui ãåããä»®æ³ç°å¢ã®æ§ç¯
stable-diffusion-webui ã® Install and Run on AMD GPUsã«å¾ã£ã¦ç°å¢æ§ç¯ãé²ãããPythonã®ãã¼ã¸ã§ã³ãæ°ããããã¨åããªãã£ããããã®ã§ãå ¬å¼ããã¥ã¡ã³ãã«åããããä»åã¯3.10ãæå®ããã¦ããã®ã§ã3.10.16ã使ãã
Ubuntu 24.04.1ã®Pythonããã±ã¼ã¸ã¯3.12ã ä»ã®ã½ããã§ãå¥ãã¼ã¸ã§ã³ã使ãå¯è½æ§ãããã®ã§ãpyenvã§è¤æ°ãã¼ã¸ã§ã³ã®Pythonãåãæ¿ãããã¨ã«ããã ã¾ãã使ç¨ããã©ã¤ãã©ãªã®ãã¼ã¸ã§ã³ãä»ã®ã½ããã¨åããªããã¨ãããããã®ã§ãvenvã§ä»®æ³ç°å¢ãä½ããã¨ã«ããã
ã¾ãã¯venvã®ã¤ã³ã¹ãã¼ã«ã
sudo apt install python3-venv
次ã¯pyenvãpyenvã®ã¤ã³ã¹ãã¼ã«æ¹æ³ã«å¾ã£ã¦ã
curl https://pyenv.run | bash
.bashrc ã«ä¸è¨ã®å 容ã追è¨ããã
export PATH="$HOME/.pyenv/bin:$PATH" eval "$(pyenv init --path)" eval "$(pyenv virtualenv-init -)"
ä¸åº¦bashã§å ¥ããªããã¦ã次㯠stable-diffusion-webui ã clone
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui cd stable-diffusion-webui
pyenvã§Python 3.10ãã¤ã³ã¹ãã¼ã«ãã使ç¨ããPythonã®ãã¼ã¸ã§ã³ãåãæ¿ãã
pyenv install 3.10 pyenv local 3.10.16
ä»®æ³ç°å¢ãä½ããå ¬å¼ã§ã¯ãã£ã¬ã¯ããªåvenvã ãã好ã¿ã§.venvã«ããã
python3.10 -m venv .venv source .venv/bin/activate
pytorchã®ãµã¤ãã®Start Locally㧠Stable/Linux/Pip/Python/ROCm6.2ãé¸æããã¨ãå®è¡ãã¹ãã³ãã³ãã表示ãããã®ã§ãããã«å¾ã£ã¦ROCmçã®Pytorchãã¤ã³ã¹ãã¼ã«ã
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2
stable-diffusion-webui ã®èµ·å
ä»ã®ã¨ããROCmã¯å èµGPUãå ¬å¼ã«ã¯ãµãã¼ããã¦ããªãã®ã§ããã¾ããã¦åããã Radeon 780Mã¯RDNA3ã¢ã¼ããã¯ãã£ã®gfx1103ãªã®ã§ãgfx1100ã«è¦ããããã
export PYTORCH_ROCM_ARCH=gfx1100 export HSA_OVERRIDE_GFX_VERSION=11.0.0 ./webui.sh --listen --skip-torch-cuda-test --precision full --no-half
ãã¨ã¯ãã©ã¦ã¶ã§ http://IPã¢ãã¬ã¹:7860/ ã«ã¢ã¯ã»ã¹ãã¦ç»åãçæãããã
ä»å¾ã®èª²é¡
- Radeon 780M èªä½ã¯ half precision (FP16)対å¿ãã¦ãããããªã®ã«ã"--precision full --no-half" ãä»ããªãã¨åããªã
- rocm-smiã§è¦ããVRAM%ã0%ã«è¦ãã
- ç»åçæã®æçµstepã§é·æéå¾ ãããããã¨ããã
- é£ç¶ã§åããã¦ããã¨"GPU Hang"ã§ããã»ã¹ãæ»ã¬
"GPU Hang"åé¡ã®è£è¶³
HW Exception by GPU node-1 (Agent handle: 0x601800a34760) reason :GPU Hang
ã®ãããªã¡ãã»ã¼ã¸ãåºã¦ããã»ã¹ãæ»ã«ã¾ãã gdbã«coreãèªã¾ãã¦btã§è¦ã¦ããä¾å¤ãã³ãã©ã®ä¸ã§ã¹ã¬ããã殺ãã¦ãããã¨ããåããã¾ããã ãã©ã¤ã(ããã£ã¨ä¸)ãæªããã¨æã£ã¦dmesgã§è¦ãã¨ãamdgpuãã©ã¤ãã
amdgpu: MES failed to respond to msg=REMOVE_QUEUE
ã¨ããã¨ã©ã¼ãåãã¦GPU resetãçºè¡ãã¦ãã¾ãã
ãã®åé¡ã¯Linux Kernel MLã§ãå ±åããã¦ãã¾ããã æªè§£æ±ºã§ãã©ãããã¡ã¼ã ã¦ã§ã¢ã®ä¿®æ£ãå¾ ã¤ãããªãããã«è¦ãã¾ãã