distributed-inference

Star

Here are 20 public repositories matching this topic...

flashinfer-ai / flashinfer

Star

FlashInfer: Kernel Library for LLM Serving

gpu cuda jit pytorch nvidia moe attention llm-inference large-large-models distributed-inference

Updated Mar 25, 2026
Python

gpustack / gpustack

Star

A GPU cluster manager that configures and orchestrates inference engines like vLLM and SGLang for high-performance AI model deployment.

cuda inference openai llama maas rocm ascend llm llm-serving vllm genai llm-inference qwen deepseek sglang distributed-inference high-performance-inference mindie

Updated Mar 25, 2026
Python

mzbac / mlx_sharding

Star

Distributed Inference for mlx LLm

mlx distributed-inference

Updated Aug 1, 2024
Python

ADT109119 / llamacpp-distributed-inference

Star

一個基於 llama.cpp 的分佈式 LLM 推理程式，讓您能夠利用區域網路內的多台電腦協同進行大型語言模型的分佈式推理，使用 Electron 的製作跨平台桌面應用程式操作 UI。

rpc llm llamacpp llm-inference gguf distributed-llm distributed-inference

Updated Aug 24, 2025
JavaScript

fangvv / JMDC

Star

Code for paper "JMDC: A Joint Model and Data Compression System for Deep Neural Networks Collaborative Computing in Edge-Cloud Networks"

compression deep-learning distributed-computing inference dnn neural-networks pruning quantization dnn-optimization edge-computing mobilenet edge-cloud ai-models dnn-partition distributed-inference

Updated Aug 24, 2025
Python

sutro-sh / sutro

Star

Analyze and generate unstructured data using LLMs, from quick experiments to billion token jobs.

csv s3 pandas data-engineering parquet data-processing data-pipelines observability synthetic-data unstructured-data mlops batch-inference polars llm-inference evals distributed-inference

Updated Mar 17, 2026
Python

Accelerate reproducible inference experiments for large language models with LLM-D! This lab automates the setup of a complete evaluation environment on OpenShift/OKD: GPU worker pools, core operators, observability, traffic control, and ready-to-run example workloads.

inference platform-engineering vllm llm-inference distributed-inference llm-d

Updated Jan 21, 2026
Python

ipc-lab / collaborative-inference-oac

Star

Source code of the paper "Private Collaborative Edge Inference via Over-the-Air Computation".

machine-learning ensemble-learning wireless-communications differential-privacy multi-view-learning edge-inference over-the-air-computation collaborative-inference distributed-inference

Updated Jan 14, 2025
Python

B-A-M-N / SOLLOL

Star

Super Ollama Load Balancer - Performance-aware routing for distributed Ollama deployments with Ray, Dask, and adaptive metrics

Updated Nov 14, 2025
Python

akivasolutions / tightwad

Star

Pool your CUDA + ROCm GPUs into one OpenAI-compatible API. Speculative decoding proxy gives you 2-3x faster inference — for free, using hardware you already own. Stop renting GPU clouds. Be a tightwad.

Updated Feb 20, 2026
Python

arseniy0924 / rpc_manager

Star

Web UI for orchestrating distributed llama.cpp RPC GPU clusters with auto node discovery, telemetry, and one-click deployment.

python flask rpc rpc-server rpc-client gpu-cluster llm llama-cpp gguf distributed-inference ai-cluster gguf-models ai-cluster-automation

Updated Mar 8, 2026
JavaScript

JiangkaiWu / Attribute_Reid

Star

Official impl. of ACM MM paper "Identity-Aware Attribute Recognition via Real-Time Distributed Inference in Mobile Edge Clouds". A distributed inference model for pedestrian attribute recognition with re-ID in an MEC-enabled camera monitoring system. Jointly training of pedestrian attribute recognition and Re-ID.

mec re-identification edge-computing re-id attribute-recognition mobile-edge-computing distributed-inference

Updated Apr 26, 2020
Python

Ptchwir3 / Rookery

Star

Turn any Kubernetes Cluster into a private LLM endpoint. One Helm command deploys distributed inference across commodity hardware. Raspberry Pi's, old servers, mixed architectures. OpenAI-Compatible API Powered by llama.cpp RPC

Updated Mar 4, 2026
Dockerfile

ABHIPATEL98 / AI-Inference-On-HPC

Star

A comprehensive framework for multi-node, multi-GPU scalable LLM inference on HPC systems using vLLM and Ollama. Includes distributed deployment templates, benchmarking workflows, and chatbot/RAG pipelines for high-throughput, production-grade AI services

hpc rag ai-inference vllm llm-inference ollama distributed-inference vllm-serve ai-inference-server multinode-training

Updated Dec 10, 2025
Python

Ratio1 / EDIL

Star

Encrypted Decentralized Inference and Learning (E.D.I.L.)

deep-neural-networks deep-learning he federated-learning decentralized-training distributed-inference ratio1

Updated Jan 20, 2026
Python

openhivesai / kv-first

Star

A cache-centric architecture, compatibility contracts, and protocols for KV cache handoff in LLM inference.

open-frameworks specification kv-cache llm-inference distributed-inference

Updated Mar 22, 2026

SergiuDeveloper / distributed-llama.cpp

Star

Distributed LLM inference across multiple machines. A central server routes OpenAI-compatible requests to llama.cpp client nodes, with automatic model distribution and mutual TLS security.

docker golang grpc openai inference-server llm llm-serving llama-cpp llm-inference local-ai gguf distributed-inference

Updated Mar 11, 2026
Go

jbenongftw / gpu-perf-engineering-resources

Star

🚀 Master GPU kernel programming and optimization for high-performance AI systems with this comprehensive learning guide and resource hub.

Updated Mar 26, 2026

MetaxisResearch / parallax

Star

Distributed inference across heterogeneous hardware.

python pytorch model-parallelism heterogeneous-computing pipeline-parallelism llm distributed-inference

Updated Mar 13, 2026
Python

cesarb-ai / dgx-spark-cluster-compass

Star

Practical guide to clustering NVIDIA DGX Spark nodes for multi-node vLLM inference (NCCL, RoCE, Ray), with troubleshooting playbooks and step-by-step notebooks.

nvidia roce ray homelab nccl mlops tensor-parallelism vllm distributed-inference dgx-spark

Updated Mar 25, 2026
Jupyter Notebook

Improve this page

Add a description, image, and links to the distributed-inference topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the distributed-inference topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

distributed-inference

Here are 20 public repositories matching this topic...

flashinfer-ai / flashinfer

gpustack / gpustack

mzbac / mlx_sharding

ADT109119 / llamacpp-distributed-inference

fangvv / JMDC

sutro-sh / sutro

aleskandro / llm-d-lab

ipc-lab / collaborative-inference-oac

B-A-M-N / SOLLOL

akivasolutions / tightwad

arseniy0924 / rpc_manager

JiangkaiWu / Attribute_Reid

Ptchwir3 / Rookery

ABHIPATEL98 / AI-Inference-On-HPC

Ratio1 / EDIL

openhivesai / kv-first

SergiuDeveloper / distributed-llama.cpp

jbenongftw / gpu-perf-engineering-resources

MetaxisResearch / parallax

cesarb-ai / dgx-spark-cluster-compass

Improve this page

Add this topic to your repo