Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: exposing deno_cache + ai cache interceptor #446

Merged
merged 31 commits into from
Nov 24, 2024

Conversation

kallebysantos
Copy link
Contributor

What kind of change does this PR introduce?

feature, enhancement

What is the current behavior?

When using huggingface/transformers.js the model assets can't be cached. It means that each worker life cycle will require a new fetch of these assets.

What is the new behavior?

image

The deno_cache is now exposed to Js land, so that transformers.js lib can use the global caches object to store model assets. Also an interceptor has been attached to deno_cache, with this we can interpect only .onnx requests and use it during session load.

How it works?

By attaching an interceptor, we can filter by .onnx requests and return back the url bytes instead of fetching the model. This way transformers.js lib will think that its own model's bytes and use it during session init.
The exposed onnx runtime will knows when the incoming bytes is either url string or model bytes this way it can choose to load from internal cache or memory.

When loading from internal cache its achieve almost the same behaviour of PR #368 (Pipeline RFC)

Final considerations:

This is an adapted work from #368 where we spitted out only the core features that improves ort support for edge-runtime.

Finally, thanks for @nyannyacha that help me with this proposal and code as well 🙏

@nyannyacha nyannyacha marked this pull request as ready for review November 21, 2024 08:33
crates/sb_ai/utils.rs Outdated Show resolved Hide resolved
@nyannyacha nyannyacha force-pushed the feat-cache-api branch 2 times, most recently from aa91f49 to 1f5024b Compare November 23, 2024 01:17
@kallebysantos kallebysantos force-pushed the feat-cache-api branch 2 times, most recently from 3e81242 to e9b2f50 Compare November 23, 2024 12:08
kallebysantos and others added 22 commits November 23, 2024 21:11
**NOTE:** saving cache in `temp folder` consider move it to a better
location
- adding a cache adapter that intercepts `transformers-cache`
- fetch and caching models from rust land
- using a separated file to store `ort` predictions snapshots for both
`x64` and `arm64`.
- Checking for `RUST_LOG` env, then executing tests as `debug` with
tracing enabled
- Applying url check to prevent request errors, matching between `Url`
and `Model Bytes`.
- Adding tests scenarios for `env.useBrowserCache = true`
Nyannyacha and others added 3 commits November 23, 2024 12:15
@laktek
Copy link
Contributor

laktek commented Nov 24, 2024

@kallebysantos @nyannyacha Thanks for these changes! Merging this PR (I will run some performance tests in our staging environment before rolling this out)

@laktek laktek merged commit 13ee2a2 into supabase:main Nov 24, 2024
3 checks passed
Copy link

🎉 This PR is included in version 1.64.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants