feat: onnx runtime shared sessions #430

kallebysantos · 2024-10-28T13:38:13Z

This PR is an adapted part of #368 work, that was closed due proposal changing

What kind of change does this PR introduce?

Feature, Enhancement

What is the current behavior?

model sessions are eager evaluated and not survive cross worker life-cycles

What is the new behavior?

This PR introduces shared sessions logic and more other ort improvements like GPU support and optimizations.

👷 It's also a base foundation for an owned onnx runtime that can be integrated direclty with huggingface/transformers.js library, that will allow a better inference without the needed of coupling models like gte-small on edge-runtime image.

Tester docker image:

You can get a docker image of this PR from docker hub:

# default runtime
docker pull kallebysantos/edge-runtime:latest

# gpu with cuda provider
docker pull kallebysantos/edge-runtime:latest-cuda

Session lifecycle:

This PR introduces a Lazy map of ort:Sessions, it means that sessions will be loaded once and then shared between worker cycles.

Cleaning up sessions:
Each ort:Session is attached to an Arc smart pointer and will only be dropped if no consumer is attached to it, but in order to that users must explicit call the EdgeRuntime.ai.tryCleanupUnusedSession() method.

NOTE: This method is only available for the main worker

// cleanup unused sessions every 30s
setInterval(async () => {
  const { activeUserWorkersCount } = await EdgeRuntime.getRuntimeMetrics();
  if (activeUserWorkersCount > 0) {
    return;
  }
  try {
    const cleanupCount = await EdgeRuntime.ai.tryCleanupUnusedSession();
    if (cleanupCount == 0) {
      return;
    }
    console.log('EdgeRuntime.ai.tryCleanupUnusedSession', cleanupCount);
  } catch (e) {
    console.error(e.toString());
  }
}, 30 * 1000);

GPU Support:

The gpu support allows session inference in specialized hardware and its backed with CUDA. There is no configuration to do by the final user, just call the Session for gte-small. But in order to enable gpu inference the Dockerfile now has two main build stages (That should be specified during docker build) :

edge-runtime (CPU only):
This stage builds the default edge-runtime, where ort::Session's are loaded using CPU.

docker build --target "edge-runtime" .

Resulting image with ~150 Mb

edge-runtime-cuda (GPU/CPU):
This stage builds the default edge-runtime in a nvidia/cuda machine that allows loading using GPU or CPU(as fallback).

docker build --target "edge-runtime-cuda" .

Resulting image with ~2.20 GB

Each stage needs to install the appropriated onnx-runtime. So in order that, the install_onnx.sh has updated with a 4º parameter flag --gpu, that will download a cuda version from the official microsoft/onnxruntime repository.

Using GPU image:

In order to use the gpu image the docker-compose file must include the following properties for the functions service:

services:
  functions:
    # Built was describe before
    image: supabase/edge-runtime:latest-cuda
    # or directly by compose
    build:
      context: .
      dockerfile: Dockerfile
      target: edge-runtime-cuda

   # Required to use gpu inside the container
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1 # Change here if more devices are installed
              capabilities: [gpu]

IMPORTANT NOTE: The target infrastructure must be prepared with NVIDIA Container Toolkit to allow gpu support inside docker

Final considerations:

Like I'd describe before, this is an adapted work from #368 where we spitted out only the core features that improves ort support for edge-runtime.

Finally, thanks for @nyannyacha that help me a loot 🙏

Dockerfile

nyannyacha · 2024-10-31T03:51:26Z

👷 It's also a base foundation for an owned onnx runtime that can be integrated direclty with huggingface/transformers.js#947 library, that will allow a better inference without the needed of coupling models like gte-small on edge-runtime image.

If so, are you also preparing other PR after this PR has been merged?
Overall, the PR looks good. 😁

nyannyacha · 2024-10-31T03:53:36Z

Anyway, I'll be testing this locally with k6 soon.
If there are any issues, I'll let you know. 😋

`install_onnx` script now supports `--gpu` flag to download runtime with cuda provider Signed-off-by: kallebysantos <[email protected]>

Signed-off-by: kallebysantos <[email protected]>

- Using `HashMap` allows the reusing of sessions between requests. Signed-off-by: kallebysantos <[email protected]>

Signed-off-by: kallebysantos <[email protected]>

nyannyacha

LTGM 👍

I've loadtested locally with k6 and it seems to be working fine.

Many thanks for your time and effort in bringing this feature to us!
I look forward to seeing you again in your follow-up PR.

cc @laktek

kallebysantos mentioned this pull request Oct 29, 2024

feat: add support for custom ai models | Pipeline API RFC #368

Closed

nyannyacha self-assigned this Oct 31, 2024

nyannyacha reviewed Oct 31, 2024

View reviewed changes

Dockerfile Show resolved Hide resolved

kallebysantos force-pushed the ai branch 2 times, most recently from 57d6ccd to 8601949 Compare October 31, 2024 16:33

kallebysantos and others added 23 commits October 31, 2024 16:35

feat: improved install_onnx script

bd2a99c

`install_onnx` script now supports `--gpu` flag to download runtime with cuda provider Signed-off-by: kallebysantos <[email protected]>

stamp: expose the received unix signal number when exiting

797709c

chore: update dependencies

da4883a

Signed-off-by: kallebysantos <[email protected]>

stamp: init onnx runtime using ctor

d67900a

Signed-off-by: kallebysantos <[email protected]>

stamp: don't propagate panic caused by library loading failure

5361bc4

fix(sb_ai): reflect upstream api changes

b1451e0

Signed-off-by: kallebysantos <[email protected]>

chore: update scripts/run_dind.sh

2e7ab58

Signed-off-by: kallebysantos <[email protected]>

chore(k6): bump k6 to 0.52.0

717e6cf

chore(k6): update setup.sh

79af205

chore(k6): update tsconfig.json

6bdbde3

chore(k6): add dependency

085a510

chore(k6): update package-lock.json

909a59f

fix(k6): make gte scenario more robustly

2bb4fb5

fix(k6): add a test for request cancelled to gte scenario

d743791

stamp: optimize pipeline loading

893ebb0

- Using `HashMap` allows the reusing of sessions between requests. Signed-off-by: kallebysantos <[email protected]>

stamp: introduce session cleanup logic

b0c8f6c

Signed-off-by: kallebysantos <[email protected]>

chore: update examples/main.ts

185ffba

Signed-off-by: kallebysantos <[email protected]>

stamp: move DenoRuntimeDropToken to base_rt crate

ce29ee9

stamp: rename mod

b167311

perf: makes run inference task in the blocking thread pool

6a41745

stamp: polishing

4c986e3

chore: update dependencies

a9f24ba

stamp: adjust cpu metrics guard

faf7538

kallebysantos and others added 4 commits October 31, 2024 16:35

stamp: clippy

42f1ebd

feat: add GPU support

8601949

Signed-off-by: kallebysantos <[email protected]>

chore: apply format

bfc3661

stamp: insert tracing macros at some points

53b2100

nyannyacha approved these changes Nov 1, 2024

View reviewed changes

laktek merged commit fc80ebb into supabase:main Nov 1, 2024
3 checks passed

nyannyacha mentioned this pull request Nov 1, 2024

fix: bump edge-runtime to 1.60.1 supabase/cli#2820

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: onnx runtime shared sessions #430

feat: onnx runtime shared sessions #430

kallebysantos commented Oct 28, 2024

nyannyacha commented Oct 31, 2024

nyannyacha commented Oct 31, 2024

nyannyacha left a comment

feat: onnx runtime shared sessions #430

feat: onnx runtime shared sessions #430

Conversation

kallebysantos commented Oct 28, 2024

What kind of change does this PR introduce?

What is the current behavior?

What is the new behavior?

Tester docker image:

Session lifecycle:

GPU Support:

Using GPU image:

Final considerations:

nyannyacha commented Oct 31, 2024

nyannyacha commented Oct 31, 2024

nyannyacha left a comment

Choose a reason for hiding this comment