Skip to content
Change the repository type filter

All

    Repositories list

    • vllm

      Public
      vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      Apache License 2.0
      7.4k86260Updated May 14, 2025May 14, 2025
    • JamAIBase

      Public
      The collaborative spreadsheet for AI. Chain cells into powerful pipelines, experiment with prompts and models, and evaluate LLM responses in real-time. Work together seamlessly to build and iterate on AI applications.
      Python
      Apache License 2.0
      281k10Updated May 14, 2025May 14, 2025
    • vllmtests

      Public
      This is a repository containing the tools for testing vLLM correctness and perf regression
      Shell
      Apache License 2.0
      0000Updated May 12, 2025May 12, 2025
    • aiter

      Public
      AI Tensor Engine for ROCm
      Python
      MIT License
      39000Updated May 8, 2025May 8, 2025
    • This is a repository to monitor the fast changing ROCm/aiter repository to alert user that AITER function of interests e.g. in vLLM, in SGLang has been updated at certain commit.
      Python
      Apache License 2.0
      00390Updated Apr 27, 2025Apr 27, 2025
    • A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      Apache License 2.0
      7.4k000Updated Apr 23, 2025Apr 23, 2025
    • vLLM Workshop Content
      Apache License 2.0
      0200Updated Apr 3, 2025Apr 3, 2025
    • Jupyter Notebook
      3000Updated Mar 20, 2025Mar 20, 2025
    • Typescript Documentation of JamAISDK
      HTML
      0000Updated Mar 14, 2025Mar 14, 2025
    • Python
      Apache License 2.0
      1000Updated Feb 24, 2025Feb 24, 2025
    • The driver for LMCache core to run in vLLM
      Python
      Apache License 2.0
      26000Updated Jan 24, 2025Jan 24, 2025
    • LMCache

      Public
      ROCm support of Ultra-Fast and Cheaper Long-Context LLM Inference
      Python
      Apache License 2.0
      151000Updated Jan 24, 2025Jan 24, 2025
    • Python
      8000Updated Jan 23, 2025Jan 23, 2025
    • Python
      Apache License 2.0
      182000Updated Jan 22, 2025Jan 22, 2025
    • kvpress

      Public
      LLM KV cache compression made easy
      Python
      Apache License 2.0
      36000Updated Jan 21, 2025Jan 21, 2025
    • litellm

      Public
      Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
      Python
      Other
      2.9k000Updated Jan 13, 2025Jan 13, 2025
    • Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
      C++
      Other
      184000Updated Dec 20, 2024Dec 20, 2024
    • Mooncake

      Public
      Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
      C++
      Apache License 2.0
      245000Updated Dec 16, 2024Dec 16, 2024
    • ROCm Implementation of torchac_cuda from LMCache
      Cuda
      6000Updated Dec 16, 2024Dec 16, 2024
    • etalon

      Public
      LLM Serving Performance Evaluation Harness
      Python
      Apache License 2.0
      11000Updated Dec 16, 2024Dec 16, 2024
    • Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks.
      Python
      MIT License
      145001Updated Dec 7, 2024Dec 7, 2024
    • Efficient Triton Kernels for LLM Training
      Python
      BSD 2-Clause "Simplified" License
      324000Updated Dec 6, 2024Dec 6, 2024
    • Efficient LLM Inference over Long Sequences
      Python
      Apache License 2.0
      19000Updated Nov 29, 2024Nov 29, 2024
    • A calculator to estimate the memory footprint, capacity, and latency on NVIDIA AMD Intel
      Python
      8000Updated Nov 24, 2024Nov 24, 2024
    • ROCm Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
      Cuda
      Apache License 2.0
      103300Updated Nov 21, 2024Nov 21, 2024
    • Go ahead and axolotl questions
      Python
      Apache License 2.0
      1k000Updated Nov 16, 2024Nov 16, 2024
    • skypilot

      Public
      SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
      Python
      Apache License 2.0
      648000Updated Nov 7, 2024Nov 7, 2024
    • This is a repository that contains a CI/CD that will try to compile docker images that already built flash attention into the image to facilitate quicker development and deployment of other frameworks.
      Shell
      Apache License 2.0
      0100Updated Oct 26, 2024Oct 26, 2024
    • ROCm Fork of Fast and memory-efficient exact attention (The idea of this branch is to hope to generate flash attention pypi package to be readily installed and used.
      Python
      BSD 3-Clause "New" or "Revised" License
      1.7k000Updated Oct 26, 2024Oct 26, 2024
    • A Python client for the Unstructured hosted API
      Python
      MIT License
      17001Updated Oct 14, 2024Oct 14, 2024