Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process finished with exit code -1073741819 (0xC0000005) while trying to infer CodeGemma-2B GGUF #77

Open
32kda opened this issue Sep 10, 2024 · 9 comments

Comments

@32kda
Copy link

32kda commented Sep 10, 2024

Hello!
I'm trying to get some answer from CodeGemma 2B gguf, but JVM crashes shortly after start, without producing any model output
GGUF file downloaded from HuggingFace

OS:Windows 10

Code:
`package org.example;

import de.kherud.llama.InferenceParameters;
import de.kherud.llama.LlamaModel;
import de.kherud.llama.LlamaOutput;
import de.kherud.llama.ModelParameters;
import de.kherud.llama.args.MiroStat;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.nio.charset.StandardCharsets;

public class Main {

    public static void main(String... args) throws IOException {
        ModelParameters modelParams = new ModelParameters()
                .setModelFilePath("D:/work/codegemma-2b.Q2_K.gguf")
                .setNGpuLayers(43);

        String system = "This is a conversation between User and Llama, a friendly chatbot.\n" +
                "Llama is helpful, kind, honest, good at writing, and never fails to answer any " +
                "requests immediately and with precision.\n";
        BufferedReader reader = new BufferedReader(new InputStreamReader(System.in, StandardCharsets.UTF_8));
        try (LlamaModel model = new LlamaModel(modelParams)) {
            System.out.print(system);
            String prompt = system;
            while (true) {
                prompt += "\nUser: ";
                System.out.print("\nUser: ");
                String input = reader.readLine();
                prompt += input;
                System.out.print("Llama: ");
                prompt += "\nLlama: ";
                InferenceParameters inferParams = new InferenceParameters(prompt)
                        .setTemperature(0.7f)
                        .setPenalizeNl(true)
                        .setMiroStat(MiroStat.V2);
                for (LlamaOutput output : model.generate(inferParams)) {
                    System.out.print(output);
                    prompt += output;
                }
            }
        }
    }

}`

Program output attached:

log.txt

No JVM crash dump file was generated somewhy

@shuttie
Copy link
Contributor

shuttie commented Sep 10, 2024

What is your CPU model? I see in the log that the llamacpp binary was build with AVX2 which should support almost all CPUs released in the last 12 years - but still.

@32kda
Copy link
Author

32kda commented Sep 11, 2024

Hello!
изображение

@32kda
Copy link
Author

32kda commented Sep 17, 2024

Also seeing quite same problem on i7 11370h

@shuttie
Copy link
Contributor

shuttie commented Sep 18, 2024

Can you maybe experiment a bit:

  • Does this crash only with CodeGemma2B? Is it working OK with other models? Have you tried the Q4/Q8 flavors of this model?
  • Can you run this model with vanilla llamacpp on your hardware, without using java-llama?

A small nitpick: setNGpuLayers is ignored as we don't yet have win64 CUDA support, only linux-x86_64.

@kherud
Copy link
Owner

kherud commented Sep 18, 2024

I would recommend to use version 3.3.0 of the binding for now if that works. For CPU it's functionally equivalent. I'll soon release a new version which updates to the newest available llama.cpp, which I hope will fix the problems.

@32kda
Copy link
Author

32kda commented Sep 18, 2024

Some experiments so far
Switching to 3.3.0 didn't help
Trying model codegemma-2b.Q8_0.gguf didn't help as well(

@shuttie
Copy link
Contributor

shuttie commented Sep 18, 2024

Ok, and what about using vanilla llamacpp?

@JakobPogacnikSouvent
Copy link

I think this is related to issue #83. I was having the same problem so I tried simplifying my code to the following

package org.example;

import de.kherud.llama.InferenceParameters;
import de.kherud.llama.LlamaModel;
import de.kherud.llama.LlamaOutput;
import de.kherud.llama.ModelParameters;
import de.kherud.llama.args.MiroStat;

public class Example {

    public static void main(String... args) {
        ModelParameters modelParams = new ModelParameters()
                .setModelFilePath("models/mistral-7b-instruct-v0.2.Q2_K.gguf")
                .setNGpuLayers(43);

        String system = "This is a conversation between User and Llama, a friendly chatbot.\n" +
                "Llama is helpful, kind, honest, good at writing, and never fails to answer any " +
                "requests immediately and with precision.\n";

        try (LlamaModel model = new LlamaModel(modelParams)) {
            System.out.print(system);
            String prompt = system;

            prompt += "\nUser: Why is the sky blue?";
            prompt += "\nLlama: ";

            InferenceParameters inferParams = new InferenceParameters(prompt)
                    .setTemperature(0.7f)
                    .setPenalizeNl(true)
                    .setMiroStat(MiroStat.V2)
                    .setStopStrings("\n");
            for (LlamaOutput output : model.generate(inferParams)) {
                System.out.print(output);
            }
        }
    }
}

after which I got additional info on my output

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00007ff9c8ea2f58, pid=2616, tid=4044
#
# JRE version: Java(TM) SE Runtime Environment (23.0+37) (build 23+37-2369)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (23+37-2369, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, windows-amd64)
# Problematic frame:
# C  [msvcp140.dll+0x12f58]
#
# No core dump will be written. Minidumps are not enabled by default on client versions of Windows
#
# An error report file with more information is saved as:
# C:\Users\<username>\Documents\<path-to-file>\java-llamacpp-examples\hs_err_pid2616.log
[2.000s][warning][os] Loading hsdis library failed
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

Full output and generated log file attached below:
error.txt
hs_err_pid2616.log

@32kda
Copy link
Author

32kda commented Oct 28, 2024

Vanilla one works for me, as well as using ollama

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants