Skip to content

[Bug] regression with sd1.5 model & specific LoRAs #1076

@rene-descartes2021

Description

@rene-descartes2021

Git commit

$ git rev-parse HEAD bfbb929

I did verify this specific commit introduces the crash/abort. The commit before does not.

Operating System & Version

Android/Termux kernel 4.14.276-g6ef255005cea-ab9062920

GGML backends

CPU

Command-line arguments used

sd -W 512 -H 512 -p "<lora:SDXL:0.6> a pony" -m realcartoonRealistic_v17.safetensors --lora-model-dir ~/x/LoRAs/sd1.5/

Steps to reproduce

Seems to depend on LoRA. SDXL crash. PCM and TCD speedup LoRAs don't crash. Other LoRAs I tried are hit/miss.

Here is link to SDXL LoRA.
Here is link to model, an fp16 pruned sd1.5 checkpoint. Version V17 from that page, the 1.99GB one.

Here is my compile script. I used OpenBLAS with GGML. Altogether my adjustments seem about 5% quicker in my case.

Also for some reason the -mcpu=native plus their add-ons don't compile with an f16 extension (as seen with clang -mcpu=native+blah+no-blah --print-enabled-extensions) so I explicitly specify cortex-a75 via flags and patched their logic a bit, was gonna post a bug & patch to GGML eventually. EDIT: couldn't reproduce again, not sure how/why I first observed this. Looks ok now: diff <(clang -mcpu=native+dotprod+noi8mm+nosve+nosme --print-enabled-extensions /dev/null) <(clang -mcpu=cortex-a75 --print-enabled-extensions /dev/null) dunno what I did wrong originally.

So well, might be one of my adjustments. I'll try stock without OpenBLAS or this other stuff in a bit.

myflags="-mcpu=cortex-a75 -ffast-math -fno-finite-math-only"
git pull --ff-only #|| exit
git submodule update #|| exit
mkdir -p build || exit
cd build || exit
cmake .. -DCMAKE_BUILD_TYPE=Release -DGGML_OPENBLAS=ON -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS -DCMAKE_C_FLAGS="$myflags" -DCMAKE_CXX_FLAGS="$myflags" || exit
cmake --build . --config Release

What you expected to happen

Works like before

What actually happened

Crash on failed ASSERT after loading LoRA:

[DEBUG] lora.hpp:93   - finished loaded lora
/data/data/com.termux/files/home/dev/llm/sd/stable-diffusion.cpp/ggml_extend.hpp:1389: GGML_ASSERT(tensor->type == GGML_TYPE_F32 || tensor->type == GGML_TYPE_F16 || tensor->type == GGML_TYPE_I32) failed

Logs / error messages / stack trace

Here is full stdout and stderr of above command with '-v' parameter:

System Info: 
    SSE3 = 0 |     AVX = 0 |     AVX2 = 0 |     AVX512 = 0 |     AVX512_VBMI = 0 |     AVX512_VNNI = 0 |     FMA = 0 |     NEON = 1 |     ARM_FMA = 1 |     F16C = 0 |     FP16_VA = 1 |     WASM_SIMD = 0 |     VSX = 0 | SDCliParams {
  mode: img_gen,
  output_path: "output.png",
  verbose: true,
  color: false,
  canny_preprocess: false,
  preview_method: none,
  preview_interval: 1,
  preview_path: "preview.png",
  preview_fps: 16,
  taesd_preview: false,
  preview_noisy: false
}
SDContextParams {
  n_threads: 4,
  model_path: "realcartoonRealistic_v17.safetensors",
  clip_l_path: "",
  clip_g_path: "",
  clip_vision_path: "",
  t5xxl_path: "",
  llm_path: "",
  llm_vision_path: "",
  diffusion_model_path: "",
  high_noise_diffusion_model_path: "",
  vae_path: "",
  taesd_path: "",
  esrgan_path: "",
  control_net_path: "",
  embedding_dir: "",
  wtype: NONE,
  tensor_type_rules: "",
  lora_model_dir: "/data/data/com.termux/files/home/x/LoRAs/sd1.5/",
  photo_maker_path: "",
  rng_type: cuda,
  sampler_rng_type: NONE,
  flow_shift: INF
  offload_params_to_cpu: false,
  control_net_cpu: false,
  clip_on_cpu: false,
  vae_on_cpu: false,
  diffusion_flash_attn: false,
  diffusion_conv_direct: false,
  vae_conv_direct: false,
  chroma_use_dit_mask: true,
  chroma_use_t5_mask: false,
  chroma_t5_mask_pad: 1,
  prediction: NONE,
  lora_apply_mode: auto,
  vae_tiling_params: { 0, 0, 0, 0.5, 0, 0 },
  force_sdxl_vae_conv_scale: false
}
SDGenerationParams {
  prompt: "<lora:SDXL:0.6> a pony",
  negative_prompt: "",
  clip_skip: -1,
  width: 512,
  height: 512,
  batch_count: 1,
  init_image_path: "",
  end_image_path: "",
  mask_image_path: "",
  control_image_path: "",
  ref_image_paths: [],
  control_video_path: "",
  auto_resize_ref_image: true,
  increase_ref_index: false,
  pm_id_images_dir: "",
  pm_id_embed_path: "",
  pm_style_strength: 20,
  skip_layers: [7, 8, 9],
  sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: NONE, sample_steps: 20, eta: 0.00, shifted_timestep: 0),
  high_noise_skip_layers: [7, 8, 9],
  high_noise_sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: NONE, sample_steps: 20, eta: 0.00, shifted_timestep: 0),
  easycache_option: "",
  easycache: disabled (threshold=1.75162e-43, start=0, end=0),
  moe_boundary: 0.875,
  video_frames: 1,
  fps: 16,
  vace_strength: 1,
  strength: 0.75,
  control_strength: 0.9,
  seed: 42,
  upscale_repeats: 1,
}
[DEBUG] stable-diffusion.cpp:189  - Using CPU backend
[INFO ] stable-diffusion.cpp:227  - loading model from 'realcartoonRealistic_v17.safetensors'
[INFO ] model.cpp:373  - load realcartoonRealistic_v17.safetensors using safetensors format
[DEBUG] model.cpp:503  - init from 'realcartoonRealistic_v17.safetensors', prefix = ''
[INFO ] stable-diffusion.cpp:311  - Version: SD 1.x 
[INFO ] stable-diffusion.cpp:339  - Weight type stat:                      f16: 1130 
[INFO ] stable-diffusion.cpp:340  - Conditioner weight type stat:          f16: 196  
[INFO ] stable-diffusion.cpp:341  - Diffusion model weight type stat:      f16: 686  
[INFO ] stable-diffusion.cpp:342  - VAE weight type stat:                  f16: 248  
[DEBUG] stable-diffusion.cpp:344  - ggml tensor size = 400 bytes
[DEBUG] clip.hpp:171  - vocab size: 49408
[DEBUG] clip.hpp:182  - trigger word img already in vocab
[DEBUG] ggml_extend.hpp:1877 - clip params backend buffer size =  235.06 MB(RAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1877 - unet params backend buffer size =  1640.25 MB(RAM) (686 tensors)
[DEBUG] ggml_extend.hpp:1877 - vae params backend buffer size =  94.47 MB(RAM) (140 tensors)
[DEBUG] stable-diffusion.cpp:676  - loading weights
[DEBUG] model.cpp:1348 - using 4 threads for model loading
[DEBUG] model.cpp:1370 - loading tensors from realcartoonRealistic_v17.safetensors

  |>                                                 | 2/1130 - 2000.00it/s�[K
  |=====>                                            | 124/1130 - 616.92it/s�[K
  |============>                                     | 287/1130 - 715.71it/s�[K
  |======================>                           | 499/1130 - 830.28it/s�[K
  |======================>                           | 502/1130 - 625.94it/s�[K
  |==========================>                       | 595/1130 - 593.81it/s�[K
  |=============================>                    | 660/1130 - 549.08it/s�[K
  |==============================>                   | 692/1130 - 493.58it/s�[K
  |===============================>                  | 709/1130 - 442.57it/s�[K
  |=================================>                | 748/1130 - 415.09it/s�[K
  |=================================>                | 764/1130 - 381.62it/s�[K
  |=====================================>            | 854/1130 - 387.83it/s�[K
  |======================================>           | 866/1130 - 360.53it/s�[K
  |=======================================>          | 884/1130 - 339.61it/s�[K
  |========================================>         | 916/1130 - 326.79it/s�[K
  |=========================================>        | 942/1130 - 313.69it/s�[K
  |===========================================>      | 974/1130 - 304.09it/s�[K
  |=============================================>    | 1028/1130 - 302.09it/s�[K
  |==================================================| 1130/1130 - 313.54it/s�[K
[INFO ] model.cpp:1577 - loading tensors completed, taking 3.61s (process: 0.00s, read: 3.57s, memcpy: 0.00s, convert: 0.00s, copy_to_backend: 0.00s)
[DEBUG] stable-diffusion.cpp:703  - finished loaded file
[INFO ] stable-diffusion.cpp:775  - total params memory size = 1969.78MB (VRAM 0.00MB, RAM 1969.78MB): text_encoders 235.06MB(RAM), diffusion_model 1640.25MB(RAM), vae 94.47MB(RAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM)
[INFO ] stable-diffusion.cpp:832  - running in eps-prediction mode
[DEBUG] stable-diffusion.cpp:3139 - generate_image 512x512
[INFO ] stable-diffusion.cpp:3170 - sampling using Euler A method
[INFO ] denoiser.hpp:364  - get_sigmas with discrete scheduler
[INFO ] stable-diffusion.cpp:3283 - TXT2IMG
[DEBUG] stable-diffusion.cpp:1134 - lora SDXL:0.60
[INFO ] stable-diffusion.cpp:969  - apply lora immediately
[INFO ] stable-diffusion.cpp:975  - attempting to apply 1 LoRAs
[INFO ] model.cpp:373  - load /data/data/com.termux/files/home/x/LoRAs/sd1.5/SDXL.safetensors using safetensors format
[DEBUG] model.cpp:503  - init from '/data/data/com.termux/files/home/x/LoRAs/sd1.5/SDXL.safetensors', prefix = 'lora.'
[INFO ] lora.hpp:40   - loading LoRA from '/data/data/com.termux/files/home/x/LoRAs/sd1.5/SDXL.safetensors'
[DEBUG] model.cpp:1348 - using 4 threads for model loading
[DEBUG] model.cpp:1370 - loading tensors from /data/data/com.termux/files/home/x/LoRAs/sd1.5/SDXL.safetensors

  |=======>                                          | 153/1050 - 38250.00it/s�[K
  |==================================================| 1050/1050 - 5097.09it/s�[K
[INFO ] model.cpp:1577 - loading tensors completed, taking 0.21s (process: 0.01s, read: 0.00s, memcpy: 0.00s, convert: 0.00s, copy_to_backend: 0.00s)
[DEBUG] ggml_extend.hpp:1877 - lora params backend buffer size =  172.55 MB(RAM) (1050 tensors)
[DEBUG] model.cpp:1348 - using 4 threads for model loading
[DEBUG] model.cpp:1370 - loading tensors from /data/data/com.termux/files/home/x/LoRAs/sd1.5/SDXL.safetensors

  |=======================>                          | 490/1050 - 2437.81it/s�[K
  |=============================================>    | 965/1050 - 2406.48it/s�[K
  |==================================================| 1050/1050 - 1732.67it/s�[K
[INFO ] model.cpp:1577 - loading tensors completed, taking 0.61s (process: 0.01s, read: 0.42s, memcpy: 0.00s, convert: 0.00s, copy_to_backend: 0.00s)
[DEBUG] lora.hpp:93   - finished loaded lora
/data/data/com.termux/files/home/dev/llm/sd/stable-diffusion.cpp/ggml_extend.hpp:1389: GGML_ASSERT(tensor->type == GGML_TYPE_F32 || tensor->type == GGML_TYPE_F16 || tensor->type == GGML_TYPE_I32) failed
0: 0x55f10b7048 
1: 0x55f10b7004 
2: 0x55f10ca404 
3: 0x55f0f889bc 
4: 0x55f0f88138 
5: 0x55f0f8b6f4 
6: 0x55f0eefecc 
7: 0x55f0eefc88 
8: 0x55f0eef40c 
9: 0x55f0ebf130 
10: 0x55f0f83228 
11: 0x55f0ebedb8 
12: 0x55f0eaa444 
13: 0x55f0eaf158 
14: 0x55f0e23134 
15: 0x7d34cb00f8 __libc_init

Additional context / environment details

Written above. Let me know if more details needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions