EmbeddedLLM

All

52 repositories

vllm
Public
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
inference pytorch transformer gpt amdgpu rocm model-serving llm llm-inference
Python
•
Apache License 2.0
•7.4k•86•26•0•Updated May 14, 2025May 14, 2025
JamAIBase
Public
The collaborative spreadsheet for AI. Chain cells into powerful pipelines, experiment with prompts and models, and evaluate LLM responses in real-time. Work together seamlessly to build and iterate on AI applications.
python workflow ai serverless chatbot spreadsheet svelte orchestration baas agents
Python
•
Apache License 2.0
•28•1k•1•0•Updated May 14, 2025May 14, 2025
vllmtests
Public
This is a repository containing the tools for testing vLLM correctness and perf regression
Shell
•
Apache License 2.0
•0•0•0•0•Updated May 12, 2025May 12, 2025
aiter
Public
AI Tensor Engine for ROCm
Python
•
MIT License
•39•0•0•0•Updated May 8, 2025May 8, 2025
aiter-api-watcher
Public
This is a repository to monitor the fast changing ROCm/aiter repository to alert user that AITER function of interests e.g. in vLLM, in SGLang has been updated at certain commit.
Python
•
Apache License 2.0
•0•0•39•0•Updated Apr 27, 2025Apr 27, 2025
vllm-rocmfork
Public
A high-throughput and memory-efficient inference and serving engine for LLMs
Python
•
Apache License 2.0
•7.4k•0•0•0•Updated Apr 23, 2025Apr 23, 2025
vllmWorkshop
Public
vLLM Workshop Content
Apache License 2.0
•0•2•0•0•Updated Apr 3, 2025Apr 3, 2025
amd-gpu-workshops
Public
Jupyter Notebook
•3•0•0•0•Updated Mar 20, 2025Mar 20, 2025
jamaibase-ts-docs
Public
Typescript Documentation of JamAISDK
HTML
•0•0•0•0•Updated Mar 14, 2025Mar 14, 2025
git-version-tutorial
Public
Python
•
Apache License 2.0
•1•0•0•0•Updated Feb 24, 2025Feb 24, 2025
lmcache-vllm
Public
The driver for LMCache core to run in vLLM
Python
•
Apache License 2.0
•26•0•0•0•Updated Jan 24, 2025Jan 24, 2025
LMCache
Public
ROCm support of Ultra-Fast and Cheaper Long-Context LLM Inference
Python
•
Apache License 2.0
•151•0•0•0•Updated Jan 24, 2025Jan 24, 2025
lmcache-tests
Public
Python
•8•0•0•0•Updated Jan 23, 2025Jan 23, 2025
production-stack
Public
Python
•
Apache License 2.0
•182•0•0•0•Updated Jan 22, 2025Jan 22, 2025
kvpress
Public
LLM KV cache compression made easy
Python
•
Apache License 2.0
•36•0•0•0•Updated Jan 21, 2025Jan 21, 2025
litellm
Public
Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
Python
•
Other
•2.9k•0•0•0•Updated Jan 13, 2025Jan 13, 2025
composable_kernel
Public
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
C++
•
Other
•184•0•0•0•Updated Dec 20, 2024Dec 20, 2024
Mooncake
Public
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
C++
•
Apache License 2.0
•245•0•0•0•Updated Dec 16, 2024Dec 16, 2024
torchac_rocm
Public
ROCm Implementation of torchac_cuda from LMCache
Cuda
•6•0•0•0•Updated Dec 16, 2024Dec 16, 2024
etalon
Public
LLM Serving Performance Evaluation Harness
Python
•
Apache License 2.0
•11•0•0•0•Updated Dec 16, 2024Dec 16, 2024
infinity-executable
Public
Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks.
Python
•
MIT License
•145•0•0•1•Updated Dec 7, 2024Dec 7, 2024
Liger-Kernel
Public
Efficient Triton Kernels for LLM Training
Python
•
BSD 2-Clause "Simplified" License
•324•0•0•0•Updated Dec 6, 2024Dec 6, 2024
Star-Attention
Public
Efficient LLM Inference over Long Sequences
Python
•
Apache License 2.0
•19•0•0•0•Updated Nov 29, 2024Nov 29, 2024
LLM_Sizing_Guide
Public
A calculator to estimate the memory footprint, capacity, and latency on NVIDIA AMD Intel
Python
•8•0•0•0•Updated Nov 24, 2024Nov 24, 2024
SageAttention-rocm
Public
ROCm Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
Cuda
•
Apache License 2.0
•103•3•0•0•Updated Nov 21, 2024Nov 21, 2024
axolotl-amd
Public
Go ahead and axolotl questions
Python
•
Apache License 2.0
•1k•0•0•0•Updated Nov 16, 2024Nov 16, 2024
skypilot
Public
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
Python
•
Apache License 2.0
•648•0•0•0•Updated Nov 7, 2024Nov 7, 2024
flash-attention-docker
Public
This is a repository that contains a CI/CD that will try to compile docker images that already built flash attention into the image to facilitate quicker development and deployment of other frameworks.
Shell
•
Apache License 2.0
•0•1•0•0•Updated Oct 26, 2024Oct 26, 2024
flash-attention-rocm
Public
ROCm Fork of Fast and memory-efficient exact attention (The idea of this branch is to hope to generate flash attention pypi package to be readily installed and used.
Python
•
BSD 3-Clause "New" or "Revised" License
•1.7k•0•0•0•Updated Oct 26, 2024Oct 26, 2024
unstructured-python-client
Public
A Python client for the Unstructured hosted API
Python
•
MIT License
•17•0•0•1•Updated Oct 14, 2024Oct 14, 2024