High-efficiency floating-point neural network inference operators for mobile, server, and Web
-
Updated
Mar 4, 2026 - C
High-efficiency floating-point neural network inference operators for mobile, server, and Web
Efficient Deep Learning Systems course materials (HSE, YSDA)
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
The Tensor Algebra SuperOptimizer for Deep Learning
Everything you need to know about LLM inference
[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration
Batch normalization fusion for PyTorch. This is an archived repository, which is not maintained.
Optimize layers structure of Keras model to reduce computation time
A set of tool which would make your life easier with Tensorrt and Onnxruntime. This Repo is designed for YoloV3
Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)
[CVPR 2025] DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
The blog, read report and code example for AGI/LLM related knowledge.
Accelerating Long Context LLM Inference with Accuracy-Preserving Context Optimization in SGLang, vLLM, RAG, and Agentic AI.
Krasis is a Hybrid LLM runtime which focuses on efficient running of larger models on consumer grade VRAM limited hardware
Optimizing Monocular Depth Estimation with TensorRT: Model Conversion, Inference Acceleration, and 3D Reconstruction
LightTTS is a lightweight TTS inference framework optimized for CosyVoice2 and CosyVoice3, enabling fast and scalable speech synthesis in Python and supports stream and bistream modes.
Run 70B+ LLMs on a single 4GB GPU — no quantization required.
Official code of Attention-MoA: Enhancing Mixture-of-Agents via Inter-Agent Semantic Attention and Deep Residual Synthesis
Learn the ins and outs of efficiently serving Large Language Models (LLMs). Dive into optimization techniques, including KV caching and Low Rank Adapters (LoRA), and gain hands-on experience with Predibase’s LoRAX framework inference server.
cross-platform modular neural network inference library, small and efficient
Add a description, image, and links to the inference-optimization topic page so that developers can more easily learn about it.
To associate your repository with the inference-optimization topic, visit your repo's landing page and select "manage topics."