Skip to content

An ONNX-based implementation of the CLIP model that doesn't depend on torch or torchvision.

License

Notifications You must be signed in to change notification settings

lakeraai/onnx_clip

Repository files navigation

onnx_clip

An ONNX-based implementation of CLIP that doesn't depend on torch or torchvision. It also has a friendlier API than the original implementation.

This works by

  • running the text and vision encoders (the ViT-B/32 variant) in ONNX Runtime
  • using a pure NumPy version of the tokenizer
  • using a pure NumPy+PIL version of the preprocess function. The PIL dependency could also be removed with minimal code changes - see preprocessor.py.

Installation

To install, run the following in the root of the repository:

pip install .

Usage

All you need to do is call the OnnxClip model class. An example:

from onnx_clip import OnnxClip, softmax, get_similarity_scores
from PIL import Image

images = [Image.open("onnx_clip/data/franz-kafka.jpg").convert("RGB")]
texts = ["a photo of a man", "a photo of a woman"]

# Your images/texts will get split into batches of this size before being
# passed to CLIP, to limit memory usage
onnx_model = OnnxClip(batch_size=16)

# Unlike the original CLIP, there is no need to run tokenization/preprocessing
# separately - simply run get_image_embeddings directly on PIL images/NumPy
# arrays, and run get_text_embeddings directly on strings.
image_embeddings = onnx_model.get_image_embeddings(images)
text_embeddings = onnx_model.get_text_embeddings(texts)

# To use the embeddings for zero-shot classification, you can use these two
# functions. Here we run on a single image, but any number is supported.
logits = get_similarity_scores(image_embeddings, text_embeddings)
probabilities = softmax(logits)

print("Logits:", logits)

for text, p in zip(texts, probabilities[0]):
    print(f"Probability that the image is '{text}': {p:.3f}")

Building & developing from source

Note: The following may give timeout errors due to the filesizes. If so, this can be fixed with poetry version 1.1.13 - see this related issue.

Install, run, build and publish with Poetry

Install Poetry

curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python -

To setup the project and create a virtual environment run the following command from the project's root directory.

poetry install

To build a source and wheel distribution of the library run the following command from the project's root directory.

poetry build

Publishing a new version to PyPI (for project maintainers)

First, remove/move the downloaded LFS files, so that they're not packaged with the code. Otherwise, this creates a huge .whl file that PyPI refuses and it causes confusing errors.

Then, follow this guide. tl;dr: go to the PyPI account page, generate an API token and put it into the $PYPI_PASSWORD environment variable. Then run

poetry publish --build --username lakera --password $PYPI_PASSWORD

Help

Please let us know how we can support you: [email protected].

LICENSE

See the LICENSE file in this repository.

The franz-kafka.jpg is taken from here.

About

An ONNX-based implementation of the CLIP model that doesn't depend on torch or torchvision.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages