-
-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: onnx runtime shared sessions #430
Conversation
If so, are you also preparing other PR after this PR has been merged? |
Anyway, I'll be testing this locally with k6 soon. |
57d6ccd
to
8601949
Compare
`install_onnx` script now supports `--gpu` flag to download runtime with cuda provider Signed-off-by: kallebysantos <[email protected]>
Signed-off-by: kallebysantos <[email protected]>
Signed-off-by: kallebysantos <[email protected]>
Signed-off-by: kallebysantos <[email protected]>
Signed-off-by: kallebysantos <[email protected]>
- Using `HashMap` allows the reusing of sessions between requests. Signed-off-by: kallebysantos <[email protected]>
Signed-off-by: kallebysantos <[email protected]>
Signed-off-by: kallebysantos <[email protected]>
Signed-off-by: kallebysantos <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LTGM 👍
I've loadtested locally with k6 and it seems to be working fine.
Many thanks for your time and effort in bringing this feature to us!
I look forward to seeing you again in your follow-up PR.
cc @laktek
What kind of change does this PR introduce?
Feature, Enhancement
What is the current behavior?
model sessions are eager evaluated and not survive cross worker life-cycles
What is the new behavior?
This PR introduces shared sessions logic and more other ort improvements like GPU support and optimizations.
Tester docker image:
You can get a docker image of this PR from docker hub:
Session lifecycle:
This PR introduces a
Lazy
map ofort:Sessions
, it means that sessions will be loaded once and then shared between worker cycles.Cleaning up sessions:
Each
ort:Session
is attached to anArc
smart pointer and will only be dropped if no consumer is attached to it, but in order to that users must explicit call theEdgeRuntime.ai.tryCleanupUnusedSession()
method.GPU Support:
The
gpu
support allowssession
inference in specialized hardware and its backed with CUDA. There is no configuration to do by the final user, just call theSession
forgte-small
. But in order to enablegpu
inference theDockerfile
now has two mainbuild stages
(That should be specified duringdocker build
) :edge-runtime (CPU only):
This stage builds the default
edge-runtime
, whereort::Session
's are loaded using CPU.edge-runtime-cuda (GPU/CPU):
This stage builds the default
edge-runtime
in anvidia/cuda
machine that allows loading usingGPU
orCPU
(as fallback).Each stage needs to install the appropriated
onnx-runtime
. So in order that, theinstall_onnx.sh
has updated with a 4º parameter flag--gpu
, that will download acuda
version from the officialmicrosoft/onnxruntime
repository.Using GPU image:
In order to use the
gpu
image thedocker-compose
file must include the following properties for thefunctions
service:Final considerations:
Like I'd describe before, this is an adapted work from #368 where we spitted out only the core features that improves ort support for
edge-runtime
.Finally, thanks for @nyannyacha that help me a loot 🙏