Skip to content

Commit

Permalink
docs: update instructions for llama2 (kaito-project#475)
Browse files Browse the repository at this point in the history
**Reason for Change**:
Update instructions for building llama2 container images.

**Issue Fixed**:
N/A

**Notes for Reviewers**:
Verified container images run after consolidating llama2 model weight
files into a single directory. Also fixed path error in llama2
Dockerfile when running build from root of the repo.

Signed-off-by: Paul Yu <[email protected]>
  • Loading branch information
pauldotyu authored Jun 18, 2024
1 parent fa0cafe commit e592eb3
Show file tree
Hide file tree
Showing 3 changed files with 28 additions and 11 deletions.
2 changes: 1 addition & 1 deletion docker/presets/models/llama-2/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -30,4 +30,4 @@ ARG VERSION
RUN echo $VERSION > /workspace/llama/version.txt

ADD ${WEIGHTS_PATH} /workspace/llama/llama-2/weights
ADD kaito/presets/inference/${MODEL_TYPE} /workspace/llama/llama-2
ADD presets/inference/${MODEL_TYPE} /workspace/llama/llama-2
18 changes: 13 additions & 5 deletions presets/models/llama2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,21 +19,27 @@ The sample docker files and the source code of the inference API server are in t
#### 2. Download models

This step must be done manually. Llama2 model weights can be downloaded by following the instructions [here](https://github.com/facebookresearch/llama#download).

#### 3. Build locally

Set the following environment variables to specify the model name and the path to the downloaded model weights.
```
export LLAMA_MODEL_NAME=<one of the supported llama2 model names listed above>
export LLAMA_WEIGHTS_PATH=<path to your downloaded model weight files>
export VERSION=0.0.1
```

#### 3. Build locally
> [!IMPORTANT]
> The inference API server expects all the model weight files to be in the same directory. So, make sure to consolidate all downloaded files in the same directory and use that path in the `LLAMA_WEIGHTS_PATH` variable.
Use the following command to build the llama2 inference service image from the root of the repo.
```
docker build \
--file docker/presets/inference/llama-2/Dockerfile \
--build-arg WEIGHTS_PATH=$LLAMA_WEIGHTS_PATH \
--build-arg WEIGHTS_PATH=${LLAMA_WEIGHTS_PATH} \
--build-arg MODEL_TYPE=llama2-completion \
--build-arg VERSION=0.0.1 \
-t $LLAMA_MODEL_NAME:latest .
--build-arg VERSION=${VERSION} \
-t ${LLAMA_MODEL_NAME}:${VERSION} .
```

Then `docker push` the images to your private registry.
Expand All @@ -52,6 +58,8 @@ inference:
- <IMAGE PULL SECRETS>
```

See [examples/inference](../../../examples/inference) for sample manifests.

## Usage

The inference service endpoint is `/generate`.
Expand Down
19 changes: 14 additions & 5 deletions presets/models/llama2chat/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,21 +19,28 @@ The sample docker files and the source code of the inference API server are in t
#### 2. Download models

This step must be done manually. Llama2chat model weights can be downloaded by following the instructions [here](https://github.com/facebookresearch/llama#download).

#### 3. Build locally

Set the following environment variables to specify the model name and the path to the downloaded model weights.
```
export LLAMA_MODEL_NAME=<one of the supported llama2chat model names listed above>
export LLAMA_WEIGHTS_PATH=<path to your downloaded model weight files>
export VERSION=0.0.1
```

#### 3. Build locally
> [!IMPORTANT]
> The inference API server expects all the model weight files to be in the same directory. So, make sure to consolidate all downloaded files in the same directory and use that path in the `LLAMA_WEIGHTS_PATH` variable.

Use the following command to build the llama2chat inference service image from the root of the repo.
```
docker build \
--file docker/presets/inference/llama-2/Dockerfile \
--build-arg WEIGHTS_PATH=$LLAMA_WEIGHTS_PATH \
--build-arg WEIGHTS_PATH=${LLAMA_WEIGHTS_PATH} \
--build-arg MODEL_TYPE=llama2-chat \
--build-arg VERSION=0.0.1 \
-t $LLAMA_MODEL_NAME:latest .
--build-arg VERSION=${VERSION} \
-t ${LLAMA_MODEL_NAME}:${VERSION} .
```

Then `docker push` the images to your private registry.
Expand All @@ -52,6 +59,8 @@ inference:
- <IMAGE PULL SECRETS>
```

See [examples/inference](../../../examples/inference) for sample manifests.

## Usage

The inference service endpoint is `/chat`.
Expand Down

0 comments on commit e592eb3

Please sign in to comment.