Samantha API server
- Login to modal using CLI.
- Set up default env:
modal config set-environment staging
modal deploy --env staging modal_staging.py
$ cd services/vllm
$ poetry install
$ poetry shell
$ python samantha_api/web.py --model ehartford/samantha-33b --tensor-parallel-size 2 --host 127.0.0.1 --port 8000 --backlog 4096
$ python -m model_api --model julep-ai/samantha-1-turbo
you can use this as a starting point:
https://github.com/julep-ai/samantha-monorepo/blob/main/infra/sky/vllm.yaml
- quickstart: https://skypilot.readthedocs.io/en/latest/getting-started/quickstart.html
- spot jobs: https://skypilot.readthedocs.io/en/latest/examples/spot-jobs.html
- services: https://skypilot.readthedocs.io/en/latest/serving/sky-serve.html
- Authenticate gcloud cli.
gcloud auth login
and thengcloud auth application-default login
pip install --upgrade skypilot nightly
- Run
sky check
to check that it detected the gcp credentials
- Edit the vllm.yaml file with setup instructions of our custom code
sky serve up -n vllm-service vllm.yaml
to start service (no support for in-place update unfortunately)sky serve logs vllm-service 1
(1 is the ID of first replica, repeat for every replica)watch -n10 sky serve status
for live status of services
- Right now
sky serve up
does not support using environment variables for some reason so set them manually in the file itself (and remember to unset before committing to git) - Right now
sky serve
does not support updating a service -- which means if you change anything, you have tosky serve down vllm-service
and thensky serve up ...
again...