How to Get GGUF Models

This guide provides step-by-step instructions for obtaining GGUF-formatted models that are compatible with the LLAMA2, such as the Gemma and TinyLlama series. These models can be employed for the development, usage, and evaluation of crabml.

We will explore how to find models on the platforms like Hugging Face and detail the conversion/quantization process using llama.cpp.

Obtaining Models from Community Platforms

Platforms like Hugging Face allow users to upload and share their models. You can find the models you need by using the search tools provided on these platforms.

How to Search for Models on Hugging Face

Navigate to Hugging Face Models.
Type your keywords, such as gemma gguf, into the search bar.
Review the list of models that meet your search criteria.
Choose a model to see more details and download the necessary files from the "Files and versions" tab.

Converting/Quantizing with llama.cpp

To convert models to the GGUF format or apply specific quantization parameters, llama.cpp can be used. Always refer to the most recent llama.cpp documentation for updated commands and usage.

Compiling llama.cpp

Execute the following steps to compile llama.cpp:

Clone the llama.cpp repository to your local machine.

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp

(Optional) If you wish to exclude k-quantization, set an environment variable before compiling.
```
export LLAMA_NO_K_QUANTS=1
```
Compile the source code to create the executables:
```
make -j
```

Verify the presence of the quantize executable:

❯ ls | grep 'quantize'
.rwxr-xr-x 1.4M user 24 Mar 10:09 quantize
.rwxr-xr-x 1.5M user 24 Mar 10:09 quantize-stats

Converting the Model

To convert your non-GGUF model(*.pth, *.pt, *.bin) using llama.cpp, proceed with the following:

(Optional) Clone the huggingface model repo:

# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
# Replace `<repo>` with something like `PY007/TinyLlama-1.1B-Chat-v0.3`.
git clone https://huggingface.co/<repo>

Prepare Python environment:

python -m venv venv
source venv/bin/activate
python -m pip install -r requirements.txt

Convert non-GGUF models with convert.py:

# Replace `<outtype>` with {f32,f16,q8_0}, but requantizing from type q8_0 is disabled
# Replace `<model>` with the path of model
python convert.py <model> --outfile <model>-<outtype>.gguf  --outtype <outtype>

Quantifying the Model

To quantify your GGUF model using llama.cpp, proceed with the following:

Prepare your GGUF model file (e.g., <model>.gguf).

Quantize the model using the quantize executable:

# Replace `<type>` with target type
./quantize <model>.gguf <model>-<type>.gguf <type>

After the process is complete, you will have a quantized GGUF model file.

Conclusion

This document has guided you through obtaining GGUF-formatted models from Hugging Face and coverting/quantizing them with the llama.cpp tool. For further assistance, consult the provided documentation on each platform or contact the model contributors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how-to-get-gguf-models.md

how-to-get-gguf-models.md

How to Get GGUF Models

Obtaining Models from Community Platforms

How to Search for Models on Hugging Face

Converting/Quantizing with llama.cpp

Compiling llama.cpp

Converting the Model

Quantifying the Model

Conclusion

Files

how-to-get-gguf-models.md

Latest commit

History

how-to-get-gguf-models.md

File metadata and controls

How to Get GGUF Models

Obtaining Models from Community Platforms

How to Search for Models on Hugging Face

Converting/Quantizing with llama.cpp

Compiling llama.cpp

Converting the Model

Quantifying the Model

Conclusion