Why GPU Matters: Accelerate Your Data Science Research in Python

Why GPU Matters: Accelerate Your Data Science Research in Python
Kuncahyo Setyo Nugroho NVIDIA University Ambassador & Certified Instructor AI Researcher at Bina Nusantara University Sunday, 14 December 2025 PyCon Indonesia @Trilogy University

Prepared and Presented by Kuncahyo Setyo Nugroho © 2025. All
Rights Reserved. Hi, nice to meet you! Kuncahyo Setyo Nugroho AI Researcher (Associate) NVIDIA AI Research & Development Center (AIRDC) Bioinformatics & Data Science Research Center (BDSRC) Bina Nusantara (BINUS) University @Anggrek Campus, Jakarta 11530 Role & Organization NVIDIA DLI Workshop Certifications Specialization: Deep Learning Applications Fundamentals of Deep Learning Fundamentals of Accelerated Data Science Data Parallelism: How to Train Deep Learning Models on Multiple GPUs Applications of AI for Anomaly Detection Applications of AI for Predictive Maintenance Building Transformer-Based NLP Applications Building Conversational AI Applications Building LLM Applications With Prompt Engineering ksnugroho.id ksnugroho26{at}gmail.com | kuncahyo.nugroho{at}binus.edu linkedin.com/in/ksnugroho

Rights Reserved. Industry Needs High Performance Data Science Exploding data size and complexity demand faster, scalable compute to keep pace with real-time analytics and industry-scale decision pipelines. Financial Services Billions of transactions, >1,000 dimensions for real-time fraud analysis Retail 100M+ items, 100M+ users, large-scale recommendation and personalization Manufacturing 10–50 GB/day of sensor data for real- time predictive maintenance Healthcare 1,000–10k’s dimensions, 10M–100M rows of single-cell data for drug discovery Telecom 10M–1B telemetry events/day for network & security anomaly detection And many more …

Rights Reserved. Challenges in Data Science Today Imaging Genomic Medical Records Tabular Sensors Data Lake Model Training* Evaluate Inference Manage Data Data Preparation Training & Evaluation Deployment *Accelerating “model training” provides value, but it doesn’t solve the whole problem. These two stages are pain points for researchers. ▪ Data Manipulation ▪ Feature Engineering ▪ Etc. ▪ Cross-Validation ▪ Hyperparameter Tuning ▪ Etc. Visualizat ion Where performance bottlenecks slow down the entire data science workflow.

Rights Reserved. Data Processing Evolution HDFS Read Query HDFS Write HDFS Read ETL HDFS Write ML Train HDFS Read GPU Read HDFS Read Query CPU Write GPU Read ETL CPU Write GPU Read ML Train HDFS Read Query ETL ML Train 25-100x Improvement Less code Language flexible Primarily in-memory 5-10x Improvement More code Language rigid Substantially on GPU Hadoop Processing, Reading from Disk Spark In-Memory Processing Traditional GPU Processing Eliminating CPU–GPU transfers unlocks real performance gains.

Rights Reserved. Data Movement and Transformation App B App A Copy & Convert GPU data App B App B App A App A GPU data Read Data Load Data Copy & Convert Copy & Convert Fragmented formats and CPU–GPU transfers create unnecessary overhead. CPU GPU format mismatch, host-device transfer, etc.

Rights Reserved. Data Movement and Transformation GPU data App B App B App A App A GPU data Read Data Load Data What if the entire pipeline stayed on the GPU? CPU GPU Minimal or zero CPU copies

Rights Reserved. Learning from Apache Arrow A single columnar memory format removes unnecessary data movement. Learn more: https://arrow.apache.org/overview Before Arrow Each system has its own internal memory format. 70-80% computation wasted on serialization and deserialization. With Arrow All systems utilize the same memory format. No overhead for cross-system communication.

Rights Reserved. Data Processing Evolution HDFS Read Query HDFS Write HDFS Read ETL HDFS Write ML Train HDFS Read GPU Read HDFS Read Query CPU Write GPU Read ETL CPU Write GPU Read ML Train Arrow Read Query ETL ML Train HDFS Read Query ETL ML Train 25-100x Improvement Less code Language flexible Primarily in-memory 5-10x Improvement More code Language rigid Substantially on GPU Hadoop Processing, Reading from Disk Spark In-Memory Processing Traditional GPU Processing RAPIDS 50-100x Improvement Same code Language flexible Primarily on GPU Eliminating CPU–GPU transfers unlocks real performance gains.

Rights Reserved. Data Science Workflow with RAPIDS Open source, End-to-end accelerated GPU workflow built-on CUDA. cuDF Data Preparation GPU Memory cuML Machine Learning cuGraph Graph Analytics PyTorch, TF DL Frameworks cuXfilter Visualization Dask Data Preparation Model Training & Analytics Visualization

Rights Reserved. RAPIDS cuDF A GPU DataFrame library in Python with a pandas-like API. Learn more: https://docs.rapids.ai/api/cudf/stable/user_guide Pandas (CPU) CuDF (GPU)

Rights Reserved. cuDF’s Pandas Accelerator Mode The zero code change GPU accelerator for pandas built on cuDF. Why should I use cudf.pandas? ▪ Zero-code-change acceleration. Just: ▪ %load_ext cudf.pandas in Jupyter, or ▪ $ python –m cudf.pandas <script.py> in the CLI ▪ Compatible with most third-party libraries that use Pandas. ▪ Run the same code on CPU or GPU, no changes needed, not even the import statements. ▪ Automatic fallback to pandas on the CPU for unsupported functions or methods → ensures correctness without breaking your code. Learn more: https://docs.rapids.ai/api/cudf/stable/cudf_pandas

Rights Reserved. Accelerated pandas (Example) The zero code change GPU accelerator for pandas built on cuDF. This part runs entirely on the GPU, cuDF supports all these operations. Seamless interoperability with third-party libraries such as Seaborn and Matplotlib. indexer_between_time isn’t supported on the GPU, so this operation falls back to Pandas on the CPU. 1 2 But, this step runs on the GPU. The output from the CPU fallback is automatically copied back to the GPU. 3 This groupby and aggregation run entirely on the GPU. 4 5 Pandas API is now GPU accelerated

Rights Reserved. RAPIDS cuML Accelerated machine learning with a scikit-learn API. scikit-learn (CPU) CuML (GPU) Learn more: https://docs.rapids.ai/api/cuml/stable

Rights Reserved. cuML Accelerator Mode GPU-accelerated machine learning with zero code changes. Why should I use cuml.accel? ▪ Zero-code-change acceleration. Just: ▪ %load_ext cuml.accel in Jupyter, or ▪ $ python –m cuml.accel <script.py> in the CLI ▪ Automatic fallback to scikit-learn on the CPU for unsupported estimators or methods → ensures correctness without breaking your code. ▪ Supported libraries include: ▪ scikit-learn ▪ UMAP (umap-learn) ▪ HDBSCAN (hdbscan) Learn more: https://docs.rapids.ai/api/cuml/stable/cuml-accel

Rights Reserved. Accelerated XGBoost Learn more: https://xgboost.readthedocs.io/en/stable/python/rmm-examples/rmm_singlegpu.html Fast, scalable gradient boosting on GPUs with minimal code changes. ▪ Up to 30× faster training on large datasets ▪ One-line GPU enablement device="cuda" ▪ Optimized for large tabular data and GBDT workloads

Rights Reserved. cuDF-cuML Tips & Tricks How to choose between direct GPU APIs and accelerator modes? Use cuDF/cuML (direct import) when: Use cudf.pandas/cuml.accel when: ▪ You want all operations to run on the GPU (CPU fallback is too slow). ▪ You need GPU-specific functionality not available in pandas or scikit-learn. ▪ You want maximum performance with no CPU processing. ▪ You have existing pandas / scikit-learn code and want to run it on GPU with zero code changes. ▪ You need your code to run on both GPU-enabled and CPU-only systems. ▪ You want automatic fallback to pandas / scikit- learn for unsupported operations. Tips & Tricks for Best Performance: Prefer GPU-supported operations to avoid fallback. Keep only necessary data in GPU memory. If GPU memory runs out → automatic CPU fallback → slowdown.

Rights Reserved. Proven Faster for Data Science Realize efficiency and scalability with multi-GPU for massive datasets. cuIO / cuDF (Load and Data Preparation) Data Conversion XGBoost cuIO / cuDF (Load & Data Preparation) XGBoost Machine Learning End-to-End Model Training Pipeline Benchmark 200GB CSV dataset; Data preparation includes joins, variable transformations. CPU Cluster Configuration CPU nodes (61 GB of memory, 8 vCPUs, 64-bot platform, Apache Spark). DGX Cluster Configuration 16 NVIDIA DGX A100 systems. Time in seconds – Shorter is better

Rights Reserved. Featured Components of the RAPIDS Ecosystem Learn more: https://rapids.ai/ecosystem CPU GPU with RAPIDS Data handling pandas cuDF, Dask-cuDF Machine learning scikit-learn cuML Graph analytics NetworkX cuGraph Vizualisation Bokeh, Datashader cuXfilter Image Processing scikit-image cuCIM Spatial analytics / GIS GeoPandas cuSpatial A unified ecosystem for GPU-accelerated data science workflows. And many more …

Rights Reserved. Illustration was generated with the help of Gemini (Nano Banana Pro).

Rights Reserved. Deploying RAPIDS Deployment documentation to get you up and running with RAPIDS anywhere. Learn more: https://docs.rapids.ai/deployment/stable

Rights Reserved. NVIDIA Deep Learning Institute Get the skills you need to fast track your success. Hands-on, self-paced and instructor-led workshop in deep learning and accelerated computing! Download the courses catalog: https://www.nvidia.com/dli Take self-paced courses online: https://www.nvidia.com/en-us/training/self-paced-courses Request onsite instructor-led workshops: https://developer.nvidia.com/dli/requests/public-lead Self-paced Courses Instructor-led Workshop Contact me if you’re interested in arranging an instructor- led workshop for your organization.

Discussion, any question? Unlock the speed of GPUs with code
you already know! Get Started with RAPIDS https://github.com/rapidsai/notebooks https://github.com/rapidsai-community/tutorial

Why GPU Matters: Accelerate Your Data Science R...

Why GPU Matters: Accelerate Your Data Science Research in Python

Kuncahyo Setyo Nugroho

More Decks by Kuncahyo Setyo Nugroho

Other Decks in Programming

Featured

Transcript

Why GPU Matters: Accelerate Your Data Science Research in Python

Prepared and Presented by Kuncahyo Setyo Nugroho © 2025. All

Prepared and Presented by Kuncahyo Setyo Nugroho © 2025. All

Prepared and Presented by Kuncahyo Setyo Nugroho © 2025. All

Prepared and Presented by Kuncahyo Setyo Nugroho © 2025. All

Prepared and Presented by Kuncahyo Setyo Nugroho © 2025. All

Prepared and Presented by Kuncahyo Setyo Nugroho © 2025. All

Prepared and Presented by Kuncahyo Setyo Nugroho © 2025. All

Prepared and Presented by Kuncahyo Setyo Nugroho © 2025. All

Prepared and Presented by Kuncahyo Setyo Nugroho © 2025. All

Prepared and Presented by Kuncahyo Setyo Nugroho © 2025. All

Prepared and Presented by Kuncahyo Setyo Nugroho © 2025. All

Prepared and Presented by Kuncahyo Setyo Nugroho © 2025. All

Prepared and Presented by Kuncahyo Setyo Nugroho © 2025. All

Prepared and Presented by Kuncahyo Setyo Nugroho © 2025. All

Prepared and Presented by Kuncahyo Setyo Nugroho © 2025. All

Prepared and Presented by Kuncahyo Setyo Nugroho © 2025. All

Prepared and Presented by Kuncahyo Setyo Nugroho © 2025. All

Prepared and Presented by Kuncahyo Setyo Nugroho © 2025. All

Prepared and Presented by Kuncahyo Setyo Nugroho © 2025. All

Prepared and Presented by Kuncahyo Setyo Nugroho © 2025. All

Prepared and Presented by Kuncahyo Setyo Nugroho © 2025. All

Discussion, any question? Unlock the speed of GPUs with code