Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault on model open and close #80

Open
shuttie opened this issue Sep 24, 2024 · 2 comments
Open

Segfault on model open and close #80

shuttie opened this issue Sep 24, 2024 · 2 comments

Comments

@shuttie
Copy link
Contributor

shuttie commented Sep 24, 2024

On a latest 3.4.1 version I have a JVM crash when using this code:

val params = new ModelParameters().setModelFilePath("qwen2-0_5b-instruct-q4_0.gguf")
val model = new LlamaModel(params)
model.close() // <-- crashes here

For a code where I actually do generation (like in README), then the close() call causes no crash. It does not depend on model, but the qwen2 is small enough to illustrate the issue.

JVM Crash log:

Stack: [0x00007f3cb1c01000,0x00007f3cb2401000],  sp=0x00007f3cb23ffa08,  free space=8186k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libc.so.6+0x15ed00]

siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x00007f3c1cc00000

Native stacktrace:

jhsdb jstack --core core --exe /usr/lib/jvm/openjdk-17/bin/java
Attaching to core core from executable /usr/lib/jvm/openjdk-17/bin/java, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 17.0.12+7
Deadlock Detection:

No deadlocks found.

"main" #1 prio=5 tid=0x00007f4da402e130 nid=0x17f2 runnable [0x00007f4daa5fe000]
   java.lang.Thread.State: RUNNABLE
   JavaThread state: _thread_in_native
 - de.kherud.llama.LlamaModel.delete() @bci=0 (Interpreted frame)
 - de.kherud.llama.LlamaModel.close() @bci=1, line=115 (Interpreted frame)
 - ai.nixiesearch.util.LlamaCrash$.main(java.lang.String[]) @bci=43, line=15 (Interpreted frame)
 - ai.nixiesearch.util.LlamaCrash.main(java.lang.String[]) @bci=4 (Interpreted frame)

clhsdb pstack:

----------------- 6130 -----------------
"main" #1 prio=5 tid=0x00007f4da402e130 nid=0x17f2 runnable [0x00007f4daa5fe000]
   java.lang.Thread.State: RUNNABLE
   JavaThread state: _thread_in_native
0x00007f4d5fa3264f      std::_Rb_tree<std::pair<std::string, std::string>, std::pair<std::pair<std::string, std::string> const, int>, std::_Select1st<std::pair<std::pair<std::string, std::string> const, int> >, std::less<std::pair<std::string, std::string> >, std::allocator<std::pair<std::pair<std::string, std::string> const, int> > >::_M_erase(std::_Rb_tree_node<std::pair<std::pair<std::string, std::string> const, int> >*) [clone .isra.0] + 0x2f
Locked ownable synchronizers:
    - None

I will later build a -DLLAMA_DEBUG version of the native library and check out the proper stacktrace. But for me sounds like something not fully being initialized on start, and got deleted on close.

hs_err_pid30598.log

@shuttie
Copy link
Contributor Author

shuttie commented Sep 24, 2024

And here's the debug build:

Current thread (0x00007f7124000bd0):  JavaThread "Thread-0" [_thread_in_native, id=16695, stack(0x00007f7250401000,0x00007f7250c01000)]

Stack: [0x00007f7250401000,0x00007f7250c01000],  sp=0x00007f7250bfeaa0,  free space=8182k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libggml.so+0x57c4e]  ggml_backend_buffer_clear+0x15
C  [libllama.so+0x262087]  llama_kv_cache_clear(llama_kv_cache&)+0xe4
C  [libllama.so+0x2c0fe8]  llama_kv_cache_clear+0x1e
C  [libjllama.so+0x2590a0]  server_context::kv_cache_clear()+0x1c
C  [libjllama.so+0x26378e]  server_context::update_slots()+0x75e
C  [libjllama.so+0x2bdd02]  void std::__invoke_impl<void, void (server_context::*&)(), server_context*&>(std::__invoke_memfun_deref, void (server_context::*&)(), server_context*&)+0x67
C  [libjllama.so+0x2b7fa5]  std::__invoke_result<void (server_context::*&)(), server_context*&>::type std::__invoke<void (server_context::*&)(), server_context*&>(void (server_context::*&)(), server_context*&)+0x37
C  [libjllama.so+0x2afc42]  void std::_Bind<void (server_context::*(server_context*))()>::__call<void, , 0ul>(std::tuple<>&&, std::_Index_tuple<0ul>)+0x48
C  [libjllama.so+0x2a9df2]  void std::_Bind<void (server_context::*(server_context*))()>::operator()<, void>()+0x24
C  [libjllama.so+0x29e3a2]  void std::__invoke_impl<void, std::_Bind<void (server_context::*(server_context*))()>&>(std::__invoke_other, std::_Bind<void (server_context::*(server_context*))()>&)+0x20
C  [libjllama.so+0x294137]  std::enable_if<std::__and_<std::is_void<void>, std::__is_invocable<std::_Bind<void (server_context::*(server_context*))()>&> >::value, void>::type std::__invoke_r<void, std::_Bind<void (server_context::*(server_context*))()>&>(std::_Bind<void (server_context::*(server_context*))()>&)+0x20
C  [libjllama.so+0x283fc1]  std::_Function_handler<void (), std::_Bind<void (server_context::*(server_context*))()> >::_M_invoke(std::_Any_data const&)+0x20
C  [libjllama.so+0x26e7f4]  std::function<void ()>::operator()() const+0x32
C  [libjllama.so+0x25286e]  server_queue::start_loop()+0x23c
C  [libjllama.so+0x2471e2]  Java_de_kherud_llama_LlamaModel_loadModel::{lambda()#1}::operator()() const+0xa4
C  [libjllama.so+0x24bffa]  void std::__invoke_impl<void, Java_de_kherud_llama_LlamaModel_loadModel::{lambda()#1}>(std::__invoke_other, Java_de_kherud_llama_LlamaModel_loadModel::{lambda()#1}&&)+0x20
C  [libjllama.so+0x24bfaf]  std::__invoke_result<Java_de_kherud_llama_LlamaModel_loadModel::{lambda()#1}>::type std::__invoke<Java_de_kherud_llama_LlamaModel_loadModel::{lambda()#1}>(Java_de_kherud_llama_LlamaModel_loadModel::{lambda()#1}&&)+0x20
C  [libjllama.so+0x24bf5c]  void std::thread::_Invoker<std::tuple<Java_de_kherud_llama_LlamaModel_loadModel::{lambda()#1}> >::_M_invoke<0ul>(std::_Index_tuple<0ul>)+0x28
C  [libjllama.so+0x24bf30]  std::thread::_Invoker<std::tuple<Java_de_kherud_llama_LlamaModel_loadModel::{lambda()#1}> >::operator()()+0x18
C  [libjllama.so+0x24bf14]  std::thread::_State_impl<std::thread::_Invoker<std::tuple<Java_de_kherud_llama_LlamaModel_loadModel::{lambda()#1}> > >::_M_run()+0x1c


siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x00007f7553bc5329

Registers:
RAX=0x00007f7553bc52f1, RBX=0x00007f72a49dc9c0, RCX=0x0000000000000235, RDX=0x00007f7250bfead8
RSP=0x00007f7250bfeaa0, RBP=0x00007f7250bfeab0, RSI=0x0000000000000000, RDI=0x00007f7553bc52f1
R8 =0x0000000000000002, R9 =0x0000000000000001, R10=0x000000000000000a, R11=0x00007f725e859084
R12=0xffffffffffffff88, R13=0x0000000000000002, R14=0x00007f72a9b4d980, R15=0x00007f72a9b4da87
RIP=0x00007f725f5ecc4e, EFLAGS=0x0000000000010206, CSGSFS=0x002b000000000033, ERR=0x0000000000000004
  TRAPNO=0x000000000000000e


@shuttie
Copy link
Contributor Author

shuttie commented Sep 24, 2024

And the last bit:

val params = new ModelParameters().setModelFilePath("qwen2-0_5b-instruct-q4_0.gguf")
val model = new LlamaModel(params)
Thread.sleep(1000)
model.close() // <-- no crash!

So it seems like a race condition on start, when the model is not yet fully loaded, but we start unloading it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant