FauxPilot Windows

For Linux or WSL2, click here. This repository is also available on Linux and macOS if you are also using pwsh on it.

This is an attempt to build a locally hosted alternative to GitHub Copilot. It uses the SalesForce CodeGen models inside of NVIDIA's Triton Inference Server with the FasterTransformer backend.

Prerequisites

Windows PowerShell or pwsh
Docker
docker compose (version >= 1.28)
NVIDIA GPU (Compute Capability >= 6.0, That is GTX 10XX or newer)
7z-zstd

For Linux and macOS, you need zstd instead of this.

Note that the VRAM requirements listed by setup.ps1 are total -- if you have multiple GPUs, you can split the model across them. So, if you have two NVIDIA RTX 3080 GPUs, you should be able to run the 6B model by putting half on each GPU.

Support and Warranty

lmao

Okay, fine, we now have some minimal information on the wiki and a discussion forum where you can ask questions. Still no formal support or warranty though!

Setup

Install Docker and Docker Compose, The easiest way is to install Docker Desktop.

You can run docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi to test the CUDA working setup. This should result in a console output shown below:

Fri Aug 26 20:20:28 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 516.94       CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:2B:00.0  On |                  N/A |
| 41%   50C    P5    96W / 371W |  21480MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A        88      C   /tritonserver                   N/A      |
+-----------------------------------------------------------------------------+

Install 7z-zstd.

As a suggestion, you can add the directory of 7z-zstd (usually C:\Program Files\7-Zip-Zstandard) to the PATH. Then restart Terminal and open pwsh, type Get-Command -Name 7z and press Enter. if everything is ok, you will see some information about 7z.exe instead of errors or warnings message.

Run the setup script to choose a model to use. This will download the model from Huggingface and then convert it for use with FasterTransformer.

$ .\setup.ps1
[1] codegen-350M-mono (2GB total VRAM required; Python-only)
[2] codegen-350M-multi (2GB total VRAM required; multi-language)
[3] codegen-2B-mono (7GB total VRAM required; Python-only)
[4] codegen-2B-multi (7GB total VRAM required; multi-language)
[5] codegen-6B-mono (13GB total VRAM required; Python-only)
[6] codegen-6B-multi (13GB total VRAM required; multi-language)
[7] codegen-16B-mono (32GB total VRAM required; Python-only)
[8] codegen-16B-multi (32GB total VRAM required; multi-language)
Enter your choice [6]:
Enter number of GPUs [1]:
Where do you want to save the model [C:\Users\Frederisk\Documents\GitHub\fauxpilot\models]?:
Downloading the model from HuggingFace, this will take a while...
Done! Now run .\launch.ps1 to start the FauxPilot server.

Alternatively you can set options by passing arguments. You can go through .\launch.ps1 -Help or Get-Help -Name .\launch.ps1 -Full for more details.

.\setup.ps1 -Silent -Model codegen-6B-multi -NumGpus 1 -ModelDir C:\foo

Then you can just run .\launch.ps1. This process can take considerable amount of time to load. In general, It's already loaded when you see output like this:

......
fauxpilot-windows-triton-1         | +-------------------+---------+--------+
fauxpilot-windows-triton-1         | | Model             | Version | Status |
fauxpilot-windows-triton-1         | +-------------------+---------+--------+
fauxpilot-windows-triton-1         | | fastertransformer | 1       | READY  |
fauxpilot-windows-triton-1         | +-------------------+---------+--------+
......
fauxpilot-triton-1         | I0803 01:51:04.740423 93 grpc_server.cc:4587] Started GRPCInferenceService at 0.0.0.0:8001
fauxpilot-triton-1         | I0803 01:51:04.740608 93 http_server.cc:3303] Started HTTPService at 0.0.0.0:8000
fauxpilot-triton-1         | I0803 01:51:04.781561 93 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002

Enjoy!

Copilot Plugin Support

Yes, it's possible. Please check this issue.

Terminology

API: Application Programming Interface
CC: Compute Capability
CUDA: Compute Unified Device Architecture
FT: Faster Transformer
JSON: JavaScript Object Notation
gRPC: Remote Procedure call by Google
GPT-J: A transformer model trained using Ben Wang's Mesh Transformer JAX
REST: REpresentational State Transfer

Acknowledgement

The code logic of this repository is derived from moyix/fauxpilot and refactored by myself.

Name		Name	Last commit message	Last commit date
Latest commit History 212 Commits
.github/workflows		.github/workflows
.vscode		.vscode
converter		converter
copilot_proxy		copilot_proxy
img		img
python_backend		python_backend
tests/python_backend		tests/python_backend
.editorconfig		.editorconfig
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
api.dockerignore		api.dockerignore
docker-compose.yaml		docker-compose.yaml
launch.ps1		launch.ps1
proxy.Dockerfile		proxy.Dockerfile
setup.cfg		setup.cfg
setup.ps1		setup.ps1
shutdown.ps1		shutdown.ps1
triton.Dockerfile		triton.Dockerfile
triton.dockerignore		triton.dockerignore

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FauxPilot Windows

Prerequisites

Support and Warranty

Setup

Copilot Plugin Support

Terminology

Acknowledgement

About

Sponsor this project

Languages

License

Frederisk/fauxpilot-windows

Folders and files

Latest commit

History

Repository files navigation

FauxPilot Windows

Prerequisites

Support and Warranty

Setup

Copilot Plugin Support

Terminology

Acknowledgement

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Sponsor this project

Languages