-
Notifications
You must be signed in to change notification settings - Fork 467
Description
Git commit
$ git rev-parse HEAD bfbb929
I did verify this specific commit introduces the crash/abort. The commit before does not.
Operating System & Version
Android/Termux kernel 4.14.276-g6ef255005cea-ab9062920
GGML backends
CPU
Command-line arguments used
sd -W 512 -H 512 -p "<lora:SDXL:0.6> a pony" -m realcartoonRealistic_v17.safetensors --lora-model-dir ~/x/LoRAs/sd1.5/
Steps to reproduce
Seems to depend on LoRA. SDXL crash. PCM and TCD speedup LoRAs don't crash. Other LoRAs I tried are hit/miss.
Here is link to SDXL LoRA.
Here is link to model, an fp16 pruned sd1.5 checkpoint. Version V17 from that page, the 1.99GB one.
Here is my compile script. I used OpenBLAS with GGML. Altogether my adjustments seem about 5% quicker in my case.
Also for some reason the EDIT: couldn't reproduce again, not sure how/why I first observed this. Looks ok now: -mcpu=native plus their add-ons don't compile with an f16 extension (as seen with clang -mcpu=native+blah+no-blah --print-enabled-extensions) so I explicitly specify cortex-a75 via flags and patched their logic a bit, was gonna post a bug & patch to GGML eventually.diff <(clang -mcpu=native+dotprod+noi8mm+nosve+nosme --print-enabled-extensions /dev/null) <(clang -mcpu=cortex-a75 --print-enabled-extensions /dev/null) dunno what I did wrong originally.
So well, might be one of my adjustments. I'll try stock without OpenBLAS or this other stuff in a bit.
myflags="-mcpu=cortex-a75 -ffast-math -fno-finite-math-only"
git pull --ff-only #|| exit
git submodule update #|| exit
mkdir -p build || exit
cd build || exit
cmake .. -DCMAKE_BUILD_TYPE=Release -DGGML_OPENBLAS=ON -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS -DCMAKE_C_FLAGS="$myflags" -DCMAKE_CXX_FLAGS="$myflags" || exit
cmake --build . --config ReleaseWhat you expected to happen
Works like before
What actually happened
Crash on failed ASSERT after loading LoRA:
[DEBUG] lora.hpp:93 - finished loaded lora
/data/data/com.termux/files/home/dev/llm/sd/stable-diffusion.cpp/ggml_extend.hpp:1389: GGML_ASSERT(tensor->type == GGML_TYPE_F32 || tensor->type == GGML_TYPE_F16 || tensor->type == GGML_TYPE_I32) failed
Logs / error messages / stack trace
Here is full stdout and stderr of above command with '-v' parameter:
System Info:
SSE3 = 0 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | VSX = 0 | SDCliParams {
mode: img_gen,
output_path: "output.png",
verbose: true,
color: false,
canny_preprocess: false,
preview_method: none,
preview_interval: 1,
preview_path: "preview.png",
preview_fps: 16,
taesd_preview: false,
preview_noisy: false
}
SDContextParams {
n_threads: 4,
model_path: "realcartoonRealistic_v17.safetensors",
clip_l_path: "",
clip_g_path: "",
clip_vision_path: "",
t5xxl_path: "",
llm_path: "",
llm_vision_path: "",
diffusion_model_path: "",
high_noise_diffusion_model_path: "",
vae_path: "",
taesd_path: "",
esrgan_path: "",
control_net_path: "",
embedding_dir: "",
wtype: NONE,
tensor_type_rules: "",
lora_model_dir: "/data/data/com.termux/files/home/x/LoRAs/sd1.5/",
photo_maker_path: "",
rng_type: cuda,
sampler_rng_type: NONE,
flow_shift: INF
offload_params_to_cpu: false,
control_net_cpu: false,
clip_on_cpu: false,
vae_on_cpu: false,
diffusion_flash_attn: false,
diffusion_conv_direct: false,
vae_conv_direct: false,
chroma_use_dit_mask: true,
chroma_use_t5_mask: false,
chroma_t5_mask_pad: 1,
prediction: NONE,
lora_apply_mode: auto,
vae_tiling_params: { 0, 0, 0, 0.5, 0, 0 },
force_sdxl_vae_conv_scale: false
}
SDGenerationParams {
prompt: "<lora:SDXL:0.6> a pony",
negative_prompt: "",
clip_skip: -1,
width: 512,
height: 512,
batch_count: 1,
init_image_path: "",
end_image_path: "",
mask_image_path: "",
control_image_path: "",
ref_image_paths: [],
control_video_path: "",
auto_resize_ref_image: true,
increase_ref_index: false,
pm_id_images_dir: "",
pm_id_embed_path: "",
pm_style_strength: 20,
skip_layers: [7, 8, 9],
sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: NONE, sample_steps: 20, eta: 0.00, shifted_timestep: 0),
high_noise_skip_layers: [7, 8, 9],
high_noise_sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: NONE, sample_steps: 20, eta: 0.00, shifted_timestep: 0),
easycache_option: "",
easycache: disabled (threshold=1.75162e-43, start=0, end=0),
moe_boundary: 0.875,
video_frames: 1,
fps: 16,
vace_strength: 1,
strength: 0.75,
control_strength: 0.9,
seed: 42,
upscale_repeats: 1,
}
[DEBUG] stable-diffusion.cpp:189 - Using CPU backend
[INFO ] stable-diffusion.cpp:227 - loading model from 'realcartoonRealistic_v17.safetensors'
[INFO ] model.cpp:373 - load realcartoonRealistic_v17.safetensors using safetensors format
[DEBUG] model.cpp:503 - init from 'realcartoonRealistic_v17.safetensors', prefix = ''
[INFO ] stable-diffusion.cpp:311 - Version: SD 1.x
[INFO ] stable-diffusion.cpp:339 - Weight type stat: f16: 1130
[INFO ] stable-diffusion.cpp:340 - Conditioner weight type stat: f16: 196
[INFO ] stable-diffusion.cpp:341 - Diffusion model weight type stat: f16: 686
[INFO ] stable-diffusion.cpp:342 - VAE weight type stat: f16: 248
[DEBUG] stable-diffusion.cpp:344 - ggml tensor size = 400 bytes
[DEBUG] clip.hpp:171 - vocab size: 49408
[DEBUG] clip.hpp:182 - trigger word img already in vocab
[DEBUG] ggml_extend.hpp:1877 - clip params backend buffer size = 235.06 MB(RAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1877 - unet params backend buffer size = 1640.25 MB(RAM) (686 tensors)
[DEBUG] ggml_extend.hpp:1877 - vae params backend buffer size = 94.47 MB(RAM) (140 tensors)
[DEBUG] stable-diffusion.cpp:676 - loading weights
[DEBUG] model.cpp:1348 - using 4 threads for model loading
[DEBUG] model.cpp:1370 - loading tensors from realcartoonRealistic_v17.safetensors
|> | 2/1130 - 2000.00it/s�[K
|=====> | 124/1130 - 616.92it/s�[K
|============> | 287/1130 - 715.71it/s�[K
|======================> | 499/1130 - 830.28it/s�[K
|======================> | 502/1130 - 625.94it/s�[K
|==========================> | 595/1130 - 593.81it/s�[K
|=============================> | 660/1130 - 549.08it/s�[K
|==============================> | 692/1130 - 493.58it/s�[K
|===============================> | 709/1130 - 442.57it/s�[K
|=================================> | 748/1130 - 415.09it/s�[K
|=================================> | 764/1130 - 381.62it/s�[K
|=====================================> | 854/1130 - 387.83it/s�[K
|======================================> | 866/1130 - 360.53it/s�[K
|=======================================> | 884/1130 - 339.61it/s�[K
|========================================> | 916/1130 - 326.79it/s�[K
|=========================================> | 942/1130 - 313.69it/s�[K
|===========================================> | 974/1130 - 304.09it/s�[K
|=============================================> | 1028/1130 - 302.09it/s�[K
|==================================================| 1130/1130 - 313.54it/s�[K
[INFO ] model.cpp:1577 - loading tensors completed, taking 3.61s (process: 0.00s, read: 3.57s, memcpy: 0.00s, convert: 0.00s, copy_to_backend: 0.00s)
[DEBUG] stable-diffusion.cpp:703 - finished loaded file
[INFO ] stable-diffusion.cpp:775 - total params memory size = 1969.78MB (VRAM 0.00MB, RAM 1969.78MB): text_encoders 235.06MB(RAM), diffusion_model 1640.25MB(RAM), vae 94.47MB(RAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM)
[INFO ] stable-diffusion.cpp:832 - running in eps-prediction mode
[DEBUG] stable-diffusion.cpp:3139 - generate_image 512x512
[INFO ] stable-diffusion.cpp:3170 - sampling using Euler A method
[INFO ] denoiser.hpp:364 - get_sigmas with discrete scheduler
[INFO ] stable-diffusion.cpp:3283 - TXT2IMG
[DEBUG] stable-diffusion.cpp:1134 - lora SDXL:0.60
[INFO ] stable-diffusion.cpp:969 - apply lora immediately
[INFO ] stable-diffusion.cpp:975 - attempting to apply 1 LoRAs
[INFO ] model.cpp:373 - load /data/data/com.termux/files/home/x/LoRAs/sd1.5/SDXL.safetensors using safetensors format
[DEBUG] model.cpp:503 - init from '/data/data/com.termux/files/home/x/LoRAs/sd1.5/SDXL.safetensors', prefix = 'lora.'
[INFO ] lora.hpp:40 - loading LoRA from '/data/data/com.termux/files/home/x/LoRAs/sd1.5/SDXL.safetensors'
[DEBUG] model.cpp:1348 - using 4 threads for model loading
[DEBUG] model.cpp:1370 - loading tensors from /data/data/com.termux/files/home/x/LoRAs/sd1.5/SDXL.safetensors
|=======> | 153/1050 - 38250.00it/s�[K
|==================================================| 1050/1050 - 5097.09it/s�[K
[INFO ] model.cpp:1577 - loading tensors completed, taking 0.21s (process: 0.01s, read: 0.00s, memcpy: 0.00s, convert: 0.00s, copy_to_backend: 0.00s)
[DEBUG] ggml_extend.hpp:1877 - lora params backend buffer size = 172.55 MB(RAM) (1050 tensors)
[DEBUG] model.cpp:1348 - using 4 threads for model loading
[DEBUG] model.cpp:1370 - loading tensors from /data/data/com.termux/files/home/x/LoRAs/sd1.5/SDXL.safetensors
|=======================> | 490/1050 - 2437.81it/s�[K
|=============================================> | 965/1050 - 2406.48it/s�[K
|==================================================| 1050/1050 - 1732.67it/s�[K
[INFO ] model.cpp:1577 - loading tensors completed, taking 0.61s (process: 0.01s, read: 0.42s, memcpy: 0.00s, convert: 0.00s, copy_to_backend: 0.00s)
[DEBUG] lora.hpp:93 - finished loaded lora
/data/data/com.termux/files/home/dev/llm/sd/stable-diffusion.cpp/ggml_extend.hpp:1389: GGML_ASSERT(tensor->type == GGML_TYPE_F32 || tensor->type == GGML_TYPE_F16 || tensor->type == GGML_TYPE_I32) failed
0: 0x55f10b7048
1: 0x55f10b7004
2: 0x55f10ca404
3: 0x55f0f889bc
4: 0x55f0f88138
5: 0x55f0f8b6f4
6: 0x55f0eefecc
7: 0x55f0eefc88
8: 0x55f0eef40c
9: 0x55f0ebf130
10: 0x55f0f83228
11: 0x55f0ebedb8
12: 0x55f0eaa444
13: 0x55f0eaf158
14: 0x55f0e23134
15: 0x7d34cb00f8 __libc_init
Additional context / environment details
Written above. Let me know if more details needed.