docs: update instructions for llama2 (kaito-project#475)

**Reason for Change**: Update instructions for building llama2 container images. **Issue Fixed**: N/A **Notes for Reviewers**: Verified container images run after consolidating llama2 model weight files into a single directory. Also fixed path error in llama2 Dockerfile when running build from root of the repo. Signed-off-by: Paul Yu <[email protected]>
bangqipropel · Jun 18, 2024 · e592eb3 · e592eb3
1 parent fa0cafe
commit e592eb3
Show file tree

Hide file tree

Showing 3 changed files with 28 additions and 11 deletions.
diff --git a/docker/presets/models/llama-2/Dockerfile b/docker/presets/models/llama-2/Dockerfile
@@ -30,4 +30,4 @@ ARG VERSION
 RUN echo $VERSION > /workspace/llama/version.txt
 
 ADD ${WEIGHTS_PATH} /workspace/llama/llama-2/weights
-ADD kaito/presets/inference/${MODEL_TYPE} /workspace/llama/llama-2
+ADD presets/inference/${MODEL_TYPE} /workspace/llama/llama-2
diff --git a/presets/models/llama2/README.md b/presets/models/llama2/README.md
@@ -19,21 +19,27 @@ The sample docker files and the source code of the inference API server are in t
 #### 2. Download models
 
 This step must be done manually. Llama2 model weights can be downloaded by following the instructions [here](https://github.com/facebookresearch/llama#download).
+
+#### 3. Build locally
+
+Set the following environment variables to specify the model name and the path to the downloaded model weights.
 ```
 export LLAMA_MODEL_NAME=<one of the supported llama2 model names listed above>
 export LLAMA_WEIGHTS_PATH=<path to your downloaded model weight files>
-
+export VERSION=0.0.1
 ```
 
-#### 3. Build locally
+> [!IMPORTANT]
+> The inference API server expects all the model weight files to be in the same directory. So, make sure to consolidate all downloaded files in the same directory and use that path in the `LLAMA_WEIGHTS_PATH` variable.
+
 Use the following command to build the llama2 inference service image from the root of the repo.
 ```
 docker build \
   --file docker/presets/inference/llama-2/Dockerfile \
-  --build-arg WEIGHTS_PATH=$LLAMA_WEIGHTS_PATH \
+  --build-arg WEIGHTS_PATH=${LLAMA_WEIGHTS_PATH} \
   --build-arg MODEL_TYPE=llama2-completion \
-  --build-arg VERSION=0.0.1 \
-  -t $LLAMA_MODEL_NAME:latest .
+  --build-arg VERSION=${VERSION} \
+  -t ${LLAMA_MODEL_NAME}:${VERSION} .
 ```
 
 Then `docker push` the images to your private registry.
@@ -52,6 +58,8 @@ inference:
         - <IMAGE PULL SECRETS>
 ```
 
+See [examples/inference](../../../examples/inference) for sample manifests.
+
 ## Usage
 
 The inference service endpoint is `/generate`.

diff --git a/presets/models/llama2chat/README.md b/presets/models/llama2chat/README.md
@@ -19,21 +19,28 @@ The sample docker files and the source code of the inference API server are in t
 #### 2. Download models
 
 This step must be done manually. Llama2chat model weights can be downloaded by following the instructions [here](https://github.com/facebookresearch/llama#download).
+
+#### 3. Build locally
+
+Set the following environment variables to specify the model name and the path to the downloaded model weights.
 ```
 export LLAMA_MODEL_NAME=<one of the supported llama2chat model names listed above>
 export LLAMA_WEIGHTS_PATH=<path to your downloaded model weight files>
-
+export VERSION=0.0.1
 ```
 
-#### 3. Build locally
+> [!IMPORTANT]
+> The inference API server expects all the model weight files to be in the same directory. So, make sure to consolidate all downloaded files in the same directory and use that path in the `LLAMA_WEIGHTS_PATH` variable.
+
+
 Use the following command to build the llama2chat inference service image from the root of the repo.
 ```
 docker build \
   --file docker/presets/inference/llama-2/Dockerfile \
-  --build-arg WEIGHTS_PATH=$LLAMA_WEIGHTS_PATH \
+  --build-arg WEIGHTS_PATH=${LLAMA_WEIGHTS_PATH} \
   --build-arg MODEL_TYPE=llama2-chat \
-  --build-arg VERSION=0.0.1 \
-  -t $LLAMA_MODEL_NAME:latest .
+  --build-arg VERSION=${VERSION} \
+  -t ${LLAMA_MODEL_NAME}:${VERSION} .
 ```
 
 Then `docker push` the images to your private registry.
@@ -52,6 +59,8 @@ inference:
         - <IMAGE PULL SECRETS>
 ```
 
+See [examples/inference](../../../examples/inference) for sample manifests.
+
 ## Usage
 
 The inference service endpoint is `/chat`.