kaizizzzzzz

🤒

KAI_kyle kaizizzzzzz

🤒

Master student @ Cornell

8 followers · 35 following

Ithaca, New York

Achievements

Lists (2)

Sort

✨ Inspiration

1 repository

🚀 My stack

1 repository

Starred repositories

DefTruth / CUDA-Learn-Notes

🎉 Modern CUDA Learn Notes with PyTorch: Tensor/CUDA Cores, 📖150+ CUDA Kernels with PyTorch bindings, 📖HGEMM/SGEMM (95%~99% cuBLAS performance), 📖100+ LLM/CUDA Blogs.

Cuda 1,441 157 Updated Nov 15, 2024

Jokeren / GPA

GPU Performance Advisor

Python 63 8 Updated Jul 25, 2022

srush / awesome-o1

A bibliography and survey of the papers surrounding o1

TeX 725 31 Updated Nov 8, 2024

yaohuicai / smoothe-artifact

Python 1 Updated Nov 7, 2024

bitsandbytes-foundation / bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.

Python 6,283 630 Updated Nov 14, 2024

rapidstream-org / rapidstream-cookbook

Tutorials and useful scripts for using RapidStream.

Verilog 1 Updated Nov 1, 2024

sustcsonglin / flash-linear-attention

Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton

Python 1,334 69 Updated Nov 14, 2024

kaizizzzzzz / Bitnet-C-benchmark

Single-thread, end-to-end C++ implementation of the Bitnet (1.58-bit weight) model

C++ 1 Updated Nov 11, 2024

thuml / depyf

depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.

Python 498 11 Updated Nov 4, 2024

thu-ml / SageAttention

Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Python 389 16 Updated Nov 15, 2024

jy-yuan / KIVI

[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Python 241 23 Updated Oct 10, 2024

OpenGVLab / OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

Python 729 56 Updated Oct 8, 2024

microsoft / BitNet

Official inference framework for 1-bit LLMs

C++ 11,078 752 Updated Nov 11, 2024

gpu-mode / amd-cluster

Repo to submit jobs to the AMD cluster

Python 8 Updated Oct 30, 2024

EPFL-LAP / dynamatic

DHLS (Dynamic High-Level Synthesis) compiler based on MLIR

C++ 62 19 Updated Nov 14, 2024

cchan / tccl

extensible collectives library in triton

Python 65 2 Updated Sep 23, 2024

IsaacRe / vllm-kvcompress

KV cache compression for high-throughput LLM inference

Python 83 4 Updated Nov 4, 2024

triton-lang / triton

Development repository for the Triton language and compiler

C++ 13,396 1,638 Updated Nov 15, 2024

rapidstream-org / rapidstream-tapa

RapidStream TAPA compiles task-parallel HLS program into high-frequency FPGA accelerators.

C++ 155 32 Updated Nov 15, 2024

sharu725 / online-cv

A minimal Jekyll Theme to host your resume (CV) on GitHub with a few clicks.

JavaScript 3,187 6,006 Updated Aug 19, 2024

jingyaogong / minimind

「大模型」3小时完全从0训练26M的小参数GPT，个人显卡即可推理训练！

Python 2,674 326 Updated Nov 10, 2024

fastmachinelearning / hls4ml

Machine learning on FPGAs using HLS

C++ 1,280 415 Updated Nov 14, 2024

locuslab / wanda

A simple and effective LLM pruning approach.

Python 667 90 Updated Aug 9, 2024

FlagOpen / FlagGems

FlagGems is an operator library for large language models implemented in Triton Language.

Python 340 43 Updated Nov 14, 2024

humuyan / Korch

ASPLOS'24: Optimal Kernel Orchestration for Tensor Programs with Korch

Python 29 Updated Aug 8, 2024

triton-lang / kernels

Python 45 16 Updated Nov 7, 2024

jongwooko / distillm

Official PyTorch implementation of DistiLLM: Towards Streamlined Distillation for Large Language Models (ICML 2024)

Python 135 17 Updated Sep 20, 2024

linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training

Python 3,424 201 Updated Nov 15, 2024

aredden / flux-fp8-api

Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.

Python 207 22 Updated Oct 12, 2024

SNU-ARC / any-precision-llm

[ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs

Python 81 3 Updated Aug 13, 2024

KAI_kyle kaizizzzzzz

Lists (2)

✨ Inspiration

🚀 My stack

Starred repositories

fpga