CUDA Learning Repository

Welcome to your comprehensive CUDA programming journey!

🚀 Quick Start

cd ~/cuda-learning
./START

📚 Complete Curriculum Structure

Week 1: Fundamentals (01-basics/)

Lesson 01: First Kernel - Your first GPU program
Lesson 02: Thread & Blocks - Understanding parallelism
Lesson 03: Array Operations - Parallel data processing
Lesson 04: Memory Model - GPU memory hierarchy
Lesson 05: Vector Addition - Complete application

Week 2: Memory & Optimization (02-memory/)

Lesson 06: Shared Memory - Fast on-chip memory
Lesson 07: Parallel Reduction - Tree-based algorithms
Lesson 08: CUDA Streams - Asynchronous execution
Lesson 09: Atomic Operations - Thread-safe operations
Lesson 10: Texture Memory - Specialized caching

Week 3: Advanced Optimization (03-optimization/)

Lesson 11: Warp Primitives - Low-level optimizations
Lesson 12: Parallel Scan - Prefix sum algorithms
Lesson 13: Tensor Cores - AI acceleration hardware

Week 4: Advanced Features (04-advanced/)

Lesson 14: Dynamic Parallelism - GPU launches kernels
Lesson 15: CUDA Graphs - Eliminate launch overhead
Lesson 16: Multi-GPU - Scale beyond single GPU

Week 5: Real-World Projects (05-projects/)

Project 01: Image Blur - Real-world image processing
Project 02: Tokenizer & Embeddings - NLP fundamentals
Project 03: Attention Mechanism - Transformer building blocks
Project 04: GPU Hash Table - High-performance data structures
Project 05: Graph Algorithms - BFS, PageRank, SSSP
Project 06: Sparse Matrices - Scientific computing
Project 07: GPU Regex - Text processing at scale
Project 08: CNN - Deep learning from scratch
Project 09: Monte Carlo Finance - Financial simulations
Project 10: Scientific FFT - Signal processing

Week 6: Production & Deployment (06-production/)

Lesson 17: Profiling & Optimization - Systematic tuning
Lesson 18: CUTLASS & Templates - Reusable kernels
Lesson 19: Error Handling - Robust GPU code
Lesson 20: Deployment - Production systems

📈 Learning Path

For AI/ML Engineers

Basics (1-5) → Memory (6,9) → Tensor Cores (13) → 
Projects (2,3,8) → Multi-GPU (16) → Production (17-20)

For HPC Developers

Basics (1-5) → Memory (6-10) → Optimization (11-13) →
Projects (5,6,10) → Advanced (14-16) → Production (17-20)

For Systems Programmers

Basics (1-5) → Atomic Ops (9) → Warp Primitives (11) →
Projects (4,7) → CUDA Graphs (15) → Production (18-20)

🎯 What You'll Master

20+ Comprehensive Lessons: From basics to production
10 Major Projects: Real-world applications
500+ Exercises: Hands-on practice
10-1000x Performance: Proven speedups
Modern GPU Features: Including Tensor Cores, CUDA Graphs
Production Skills: Deployment, profiling, error handling

💻 Compilation Commands

# Basic compilation
nvcc -O3 lesson.cu -o lesson

# With debugging
nvcc -g -G lesson.cu -o lesson

# With libraries
nvcc -O3 lesson.cu -lcublas -lcusparse -o lesson

# For Tensor Cores (Volta+)
nvcc -O3 -arch=sm_70 lesson.cu -o lesson

📊 Performance You'll Achieve

Vector Operations: 10-100x speedup
Matrix Operations: 50-500x with optimization
Deep Learning: Understanding how PyTorch/TensorFlow work
Text Processing: Millions of strings/second
Scientific Computing: Real-time simulations

📖 Key Resources

Progress Tracker - Track your learning journey
Complete Overview - Full curriculum details
How to Start - Detailed starting guide
Welcome Guide - Setup confirmation & overview
Cheatsheet - CUDA quick reference
Free Resources - Additional learning materials
Week 1 Guide - Fundamentals overview
Week 2 Guide - Memory optimization
Week 3 Guide - Advanced optimization
Week 4 Guide - Advanced features
Week 5 Guide - Projects overview
Week 6 Guide - Production guide

🏆 Your Achievement

By completing this curriculum, you'll:

✅ Master GPU architecture and programming
✅ Build production-ready GPU applications
✅ Understand how AI frameworks work internally
✅ Join the elite group of GPU programmers

💡 Pro Tips

Start Simple: Don't skip the basics
Measure Everything: Always profile before optimizing
Think Parallel: Redesign algorithms for GPU
Hardware First: Understand the hardware limits
Practice Daily: Consistency is key

Ready to accelerate your code by 10-1000x? Start with Lesson 1! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
01-basics		01-basics
02-memory		02-memory
03-optimization		03-optimization
04-advanced		04-advanced
05-projects		05-projects
06-production		06-production
resources		resources
.compile		.compile
.gitignore		.gitignore
COMPLETE-CURRICULUM.md		COMPLETE-CURRICULUM.md
HOW-TO-START.md		HOW-TO-START.md
PROGRESS.md		PROGRESS.md
README.md		README.md
START		START
WELCOME.md		WELCOME.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CUDA Learning Repository

🚀 Quick Start

📚 Complete Curriculum Structure

Week 1: Fundamentals (01-basics/)

Week 2: Memory & Optimization (02-memory/)

Week 3: Advanced Optimization (03-optimization/)

Week 4: Advanced Features (04-advanced/)

Week 5: Real-World Projects (05-projects/)

Week 6: Production & Deployment (06-production/)

📈 Learning Path

For AI/ML Engineers

For HPC Developers

For Systems Programmers

🎯 What You'll Master

💻 Compilation Commands

📊 Performance You'll Achieve

📖 Key Resources

🏆 Your Achievement

💡 Pro Tips

About

Uh oh!

Releases

Packages

Languages

h9-tec/cuda-mastery-guide

Folders and files

Latest commit

History

Repository files navigation

CUDA Learning Repository

🚀 Quick Start

📚 Complete Curriculum Structure

Week 1: Fundamentals (01-basics/)

Week 2: Memory & Optimization (02-memory/)

Week 3: Advanced Optimization (03-optimization/)

Week 4: Advanced Features (04-advanced/)

Week 5: Real-World Projects (05-projects/)

Week 6: Production & Deployment (06-production/)

📈 Learning Path

For AI/ML Engineers

For HPC Developers

For Systems Programmers

🎯 What You'll Master

💻 Compilation Commands

📊 Performance You'll Achieve

📖 Key Resources

🏆 Your Achievement

💡 Pro Tips

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages