The repository covers a wide range of topics, each aimed at improving efficiency and performance in GPU programming. Here’s a detailed look at what I learned:
Theme | Post |
---|---|
Basic Parallel Architectures | Basic Parallel Architectures에 대해 알아보자 |
Thread Programming | c++로 알아본 쓰레드 프로그래밍 |
Thread Management | 멀티쓰레드에서 쓰레드 간 작업을 어떻게 균일하게 분할할까? |
Matrix Multiplication (multi-threaded) | 멀티쓰레드에서 행렬 연산(matmul) 성능 증가시키는 방법들 |
OpenMP | 멀티쓰레딩을 편리하게 해주는 OpenMP 사용법 |
Graph Processing | 그래프 구조를 더 효율적으로 저장하는 방법들 |
Prefix sum | Prefix Sum : 효율적인 연산을 위한 가이드 |
CUDA Programming Intro | CUDA 프로그래밍 기초 |
CPU-GPU communication and thread indexing | CPU-GPU 통신 및 CUDA를 활용한 이미지 프로세싱 기법 |
CUDA thread hierarchy, memory hierarchy, GPU cache structure | CUDA와 Nvidia GPU 아키텍처: 스레드 계층, 메모리 계층 및 GPU 캐시 구조 이해하기 |
CUDA memories : registers, shared memory, global memory | CUDA Memories : 레지스터, 공유 메모리, 글로벌 메모리 |
Assignment | Description | Link |
---|---|---|
Assignment #1 | A Simple Filter on 1D Array | link |
Assignment #2 | Hash table locking | link |
Assignment #3 | Matrix Multiplication | link |
Assignment #4 | Matrix Multiplication using CUDA | link |
Assignment #5 | Sum Reduction | link |
Assignment #6 | CUDA Application of DNN | link |
- Post: Basic Parallel Architectures에 대해 알아보자
- Description: This section introduces the fundamental concepts of parallel architectures, laying the groundwork for more advanced topics.
- Post: c++로 알아본 쓰레드 프로그래밍
- Description: Dive into thread programming with C++, understanding how to create and manage threads effectively.
- Post: 멀티쓰레드에서 쓰레드 간 작업을 어떻게 균일하게 분할할까?
- Description: Learn strategies for evenly distributing tasks among threads in a multithreaded environment to maximize performance.
- Post: 멀티쓰레드에서 행렬 연산(matmul) 성능 증가시키는 방법들
- Description: Explore methods to optimize matrix multiplication operations using multithreading techniques.
- Post: 멀티쓰레딩을 편리하게 해주는 OpenMP 사용법
- Description: Get acquainted with OpenMP, a powerful tool that simplifies multithreading and parallel programming.
- Post: 그래프 구조를 더 효율적으로 저장하는 방법들
- Description: Discover efficient ways to store and process graph structures, crucial for handling complex data relationships.
- Post: Prefix Sum : 효율적인 연산을 위한 가이드
- Description: Gain a comprehensive understanding of the prefix sum algorithm and its applications in efficient computation.
- Post : CUDA 프로그래밍 기초
- Description : This section provides an introduction to CUDA programming, designed for those new to GPU programming. This post includes the basics of CUDA, including how to set up your development environment, write and compile your first CUDA program.
- Post : CPU-GPU 통신 및 CUDA를 활용한 이미지 프로세싱 기법
- Description : This section provides detailed explanation about the hierarchical structure of CUDA threads, including grids, blocks, and threads. This post includes calculating global thread index through thread indexing and some example code about image processing.
- Post : CUDA와 Nvidia GPU 아키텍처: 스레드 계층, 메모리 계층 및 GPU 캐시 구조 이해하기
- Description : This section delves into the advanced aspects of CUDA and Nvidia GPU architecture, including the hierarchical organization of threads, the different levels of memory, and the structure of GPU caches.
- Post : CUDA Memories : 레지스터, 공유 메모리, 글로벌 메모리
- Decsription : This section explores the different types of memory in CUDA, focusing on registers, shared memory, and global memory. his post delves into the characteristics of each memory type and provides strategies for effectively utilizing them to enhance the efficiency of CUDA kernels.