PopTransformer is a framework that allows you to develop and run highly optimized transformer-based models -- for inference only -- with the Poplar SDK on Graphcore IPUs. PopTransformer includes layers, operators, and models.
- [07/31/2023] Added support for LLMs, including ChatGLM2, Llama2 and RWKV.
- [07/31/2023] Added support for inference with FP8 and INT4.
- [07/31/2023] Code was refactored and enhanced to make it easier to implement models.
To setup the development environment on the C600:
- (Optional) Create a Python virtual environment.
- Enable the Poplar SDK (all models are tested with SDK version 3.2.0):
source [path-to-sdk]/enable
- Run
make
to compile custom ops. - Run
pip install -r requirements.txt
to install Python requirements.
If you are using IPU-PODs or Bow Pods, run the following script to setup a Docker container:
bash docker/setup_container.sh
The following shows how you can run a simple example from the examples
directory:
cd examples/gpt2
python inference.py --config-name='sharding' log_level='info' model.topk=5
This section describes how to build a new model that you can run with PopTransformer.
It is best if you are familiar with the following before starting to write a new model that uses PopTransformer:
- The the IPU architecture, programming model and tools available is described in the IPU Programmer's Guide.
- The Poplar graph programming framework is described in the Poplar and PopLibs User Guide.
- The Poplar Advanced Runtime (PopART) for importing and executing models using the ONNX format is described in the PopART User Guide.
- Preparation.
1.1. Create a new directory in the examples
directory for your model, for example examples/your_model
. This directory will contain the running script and the configuration. Also, create a sub-directory called conf
that will contain the configuration files, examples/your_model/conf
.
1.2 Create an inference.py
in the model directory. We use hydra to initialize classes from a YAML configuration file.
The file tree for the examples
directory should look like:
├── examples
│ ├── chatglm
│ │ ├── conf
│ │ └── inference.py
│ ├── gpt2
│ │ ├── conf
│ │ └── inference.py
│ └── [your model]
│ ├── conf
│ └── inference.py
1.3. Create a directory for your model in the poptransformer/models
directory. This will contain the model implementation.
- Implement layers
In inference.py
, implement the following layers:
2.1. Inherit the base layer class, BaseLayer
.
2.2. Override collect_bind_layer_weights
with get_param_from_state_dict
to load weights from a pre-trained model, and bind to the main graph with add_initialized_input_tensor
.
2.3. Override the __call__
function.
2.4. Build the tensor parallel layer (TPCustomLayer
) for tensor parallel execution if needed.
class BaseCustomLayer(BaseLayer):
def __init__(self):
...
def collect_bind_layer_weights(self):
# load weight and bind to graph here
weight_np = self.get_param_from_state_dict(...)
self.weight_id = self.add_initialized_input_tensor(weight_np)
def __call__(self, graph, x):
# build the inference process for this layer
return ops.matmul(graph, x, self.weight_id)
class TPCustomLayer(BaseCustomLayer):
...
class CustomLayer(TPCustomLayer, BaseCustomLayer):
...
def __init__(self, *args, **kwargs):
# choose the parent layer you need by a parameter registered in REGISTRY
self.layer_class = ...
super().__init__(self, *args, **kwargs)
def collect_bind_layer_weights(self):
return self.layer_class.collect_bind_layer_weights(self)
def __call__(self, x):
# bind fn
return self.layer_class.__call__(self, graph, x)
- Implement model
Next, implement your model in poptransformer/models/your_model
. See poptransformer/models/gpt2/model.py
for an example. Refer to the the PopART User Guide for more information on the API.
3.1. Inherit the base model class: HFDecBaseModel
or HFDec2stageBaseModel
.
3.2. Override functions if needed.
class GPTDecModel(HFDecBaseModel):
def __init__(self, **kwargs):
...
def build_model_graph(self):
# build your model graph here
...
def build_input_dict(self, **kwargs):
# build the processing input fn for your model
...
def build_output_dict(self, anchor_arrays):
# build the output processing fn for your model
...
- Test and run
Write tests for your model, for example to allow you to compare the results using PopTransformer with other frameworks, like PyTorch or TensorFlow.
If you have done all the above and written tests, then you can simply run PopTransformer from the entry file examples/your_model/inference.py
by:
cd examples/your_model
python inference.py
The content of this repository is licensed under the Apache License, Version 2.0 except for the model code for Llama 2 which is licensed under the Llama 2 Community License Agreement.
Use of the pre-trained weights is subject to the following licenses:
- GPT2-XL: GPT-2 is licensed under a Modified MIT License
- ChatGLM: ChatGLM-6B model license
- RWKV: Apache-2.0 as specified in the model card for the original weights on Hugging Face. PopTransformer requires weights in the Hugging Face format, which are available in the RWKV Space on Hugging Face. According to the Hugging Face Terms of Service, the converted weights are also licensed under Apache-2.0.
- ChatGLM2: ChatGLM2-6B model license
- Llama 2: Llama 2 Community License Agreement