model-serving

Samantha API

Samantha API server

Deploying to modal

Login to modal using CLI.
Set up default env: modal config set-environment staging
modal deploy --env staging modal_staging.py

Install and run vllm service

$ cd services/vllm
$ poetry install
$ poetry shell
$ python samantha_api/web.py --model ehartford/samantha-33b --tensor-parallel-size 2 --host 127.0.0.1 --port 8000 --backlog 4096

$ python -m model_api --model julep-ai/samantha-1-turbo

Set up skypilot to run service on A100 spot instances

you can use this as a starting point:
https://github.com/julep-ai/samantha-monorepo/blob/main/infra/sky/vllm.yaml

Docs:

quickstart: https://skypilot.readthedocs.io/en/latest/getting-started/quickstart.html
spot jobs: https://skypilot.readthedocs.io/en/latest/examples/spot-jobs.html
services: https://skypilot.readthedocs.io/en/latest/serving/sky-serve.html

Setup:

Authenticate gcloud cli. gcloud auth login and then gcloud auth application-default login
pip install --upgrade skypilot nightly
Run sky check to check that it detected the gcp credentials

Create service:

Edit the vllm.yaml file with setup instructions of our custom code
sky serve up -n vllm-service vllm.yaml to start service (no support for in-place update unfortunately)
sky serve logs vllm-service 1 (1 is the ID of first replica, repeat for every replica)
watch -n10 sky serve status for live status of services

Notes:

Right now sky serve up does not support using environment variables for some reason so set them manually in the file itself (and remember to unset before committing to git)
Right now sky serve does not support updating a service -- which means if you change anything, you have to sky serve down vllm-service and then sky serve up ... again...

Name		Name	Last commit message	Last commit date
parent directory ..
artifacts		artifacts
model_api		model_api
tests		tests
.gitignore		.gitignore
.tool-versions		.tool-versions
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
modal_mixtral.py		modal_mixtral.py
modal_production.py		modal_production.py
modal_staging.py		modal_staging.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
test_chat_template.py		test_chat_template.py
update_tokenizer_template.py		update_tokenizer_template.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model-serving

model-serving

README.md

Samantha API

Deploying to modal

Install and run vllm service

Set up skypilot to run service on A100 spot instances

Docs:

Setup:

Create service:

Notes:

Files

model-serving

Directory actions

More options

Directory actions

More options

Latest commit

History

model-serving

Folders and files

parent directory

README.md

Samantha API

Deploying to modal

Install and run vllm service

Set up skypilot to run service on A100 spot instances

Docs:

Setup:

Create service:

Notes: