Learn linear quantization techniques using the Quanto library and downcasting methods with the Transformers library to compress and optimize generative AI models effectively.
compression optimize quantization model-compression model-deployment linear-quantization transformers-library model-optimization hugging-face generative-ai downcasting quanto-library quantization-fundamentals
-
Updated
Apr 23, 2024 - Jupyter Notebook