#

gemm

Here are 72 public repositories matching this topic...

OpenNMT / CTranslate2

Fast inference engine for Transformer models

Updated Nov 25, 2024
C++

flame / how-to-optimize-gemm

matrix-multiplication gemm code-optimization gotoblas blis

Updated Jul 29, 2023
C

DefTruth / CUDA-Learn-Notes

📚Tensor/CUDA Cores, 📖150+ CUDA Kernels, ⚡️⚡️toy-hgemm library with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS 🎉🎉).

cuda gemm gemv hgemm

Updated Nov 28, 2024
Cuda

CNugteren / CLBlast

Tuned OpenCL BLAS

gpu opencl matrix-multiplication blas gemm blas-libraries clblas

Updated Nov 8, 2024
C++

flame / blislab

BLISlab: A Sandbox for Optimizing GEMM

matrix-multiplication gemm code-optimization blis

Updated Jun 17, 2021
C

Bruce-Lee-LY / cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

gpu cuda cublas nvidia gemm matrix-multiply tensor-core hgemm

Updated Sep 8, 2024
Cuda

salykova / matmul.c

High-Performance FP32 Matrix Multiplication on CPU

c cpu openmp matrix-multiplication gemm fast-matrix-multiplication sgemm

Updated Nov 16, 2024
C

yzhaiustc / Optimizing-SGEMM-on-NVIDIA-Turing-GPUs

Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.

optimization cuda nvidia gemm

Updated Nov 28, 2021
Cuda

mratsim / laser

The HPC toolbox: fused matrix multiplication, convolution, data-parallel strided tensor primitives, OpenMP facilities, SIMD, JIT Assembler, CPU detection, state-of-the-art vectorized BLAS for floats and integers

deep-learning assembler parallel openmp jit simd matrix-multiplication high-performance-computing blas convolution tensor compiler-optimization gemm runtime-cpu-detection

Updated Jan 4, 2024
Nim

ROCm / Tensile

Stretching GPU performance for GEMMs and tensor contractions.

python machine-learning amd gpu assembly opencl dnn matrix-multiplication neural-networks gpu-acceleration blas hip gpu-computing tensors tensor-contraction gemm radeon auto-tuning

Updated Nov 27, 2024
Python

coderonion / awesome-cuda-and-hpc

🔥🔥🔥 A collection of some awesome public CUDA, cuBLAS, TensorRT and High Performance Computing (HPC) projects.

Updated Oct 3, 2024

cp2k / dbcsr

DBCSR: Distributed Block Compressed Sparse Row matrix library

hpc linear-algebra mpi cuda matrix-multiplication blas sparse-matrix cp2k gemm openmp-parallelization

Updated Nov 25, 2024
Fortran

yui0 / slibs

Single file libraries for C/C++

Updated Aug 5, 2024
C

yzhaiustc / Optimizing-DGEMM-on-Intel-CPUs-with-AVX512F

Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.

openmp simd blas avx512 gemm mkl

Updated Feb 3, 2022
C

ROCm / hipBLASLt

hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditional BLAS library

machine-learning amd assembly matrix-multiplication blas hip gpu-computing gemm rocm

Updated Nov 28, 2024
Assembly

Bruce-Lee-LY / cuda_hgemv

Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.

gpu cuda cublas nvidia gemm gemv matrix-multiply tensor-core hgemm cuda-core hgemv

Updated Sep 8, 2024
Cuda

enp1s0 / ozIMMU

FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme

cuda gemm mixed-precision tensorcore tensorcores

Updated Sep 7, 2024
Cuda

aredden / torch-cublas-hgemm

PyTorch half precision gemm lib w/ fused optional bias + optional relu/gelu

cuda pytorch gemm float16

Updated Aug 26, 2024
Cuda

CoffeeBeforeArch / mmul

Serial and parallel implementations of matrix multiplication

serial parallel matrix-multiplication benchmarks gemm mmul

Updated Feb 19, 2021
C++

hma02 / cublasHgemm-P100

Code for testing the native float16 matrix multiplication performance on Tesla P100 and V100 GPU based on cublasHgemm

gpu cublas precision gemm half-precision float16 p100 v100

Updated Aug 20, 2019
Cuda

Improve this page

Add a description, image, and links to the gemm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the gemm topic, visit your repo's landing page and select "manage topics."