"Adaptive Hybrid Quantization Framework for deploying 7B+ LLMs on low-VRAM devices (e.g., GTX 1050). Features surgical block alignment and Numba-accelerated inference.
-
Updated
Dec 24, 2025 - Python
"Adaptive Hybrid Quantization Framework for deploying 7B+ LLMs on low-VRAM devices (e.g., GTX 1050). Features surgical block alignment and Numba-accelerated inference.
One-click Windows installer for Z-Image Turbo AI image generation. Optimized for low-VRAM GPUs (4GB+). Features Gradio web UI, automatic setup, and GGUF model support.
Lightweight 6GB VRAM Gradio web app with auto-installer for running AuraFlow locally — no cloud, no clutter.
A privacy-first Generative AI pipeline for prototyping 3D-style game assets on consumer hardware. Optimized for low-VRAM (4GB) GPUs using PyTorch, Diffusers, and Streamlit.
Contains the notebooks and workflows configured to run inference from Wan 2.2 Animate with ComfyUI on Kaggle T4 GPUs smoothly
Designed for open-weights LLMs to test capabilities using BFCL tests.
Add a description, image, and links to the low-vram topic page so that developers can more easily learn about it.
To associate your repository with the low-vram topic, visit your repo's landing page and select "manage topics."