MachineLearningMastery.com https://machinelearningmastery.com/ Making developers awesome at machine learning Gradient Descent:The Engine of Machine Learning Optimization <![CDATA[Editor's note: This article is a part of our series on visualizing the foundations of machine learning.]]> https://machinelearningmastery.com/gradient-descentthe-engine-of-machine-learning-optimization/ https://machinelearningmastery.com/?p=23004 Fri, 02 Jan 2026 11:00:17 +0000 <![CDATA[Matthew Mayo]]> Train Your Large Model on Multiple GPUs with Tensor Parallelism <![CDATA[This article is divided into five parts; they are: • An Example of Tensor Parallelism • Setting Up Tensor Parallelism • Preparing Model for Tensor Parallelism • Train a Model with Tensor Parallelism • Combining Tensor Parallelism with FSDP Tensor parallelism originated from the Megatron-LM paper.]]> https://machinelearningmastery.com/train-your-large-model-on-multiple-gpus-with-tensor-parallelism/ https://machinelearningmastery.com/?p=22980 Wed, 31 Dec 2025 21:22:39 +0000 <![CDATA[Adrian Tam]]> Train Your Large Model on Multiple GPUs with Fully Sharded Data Parallelism <![CDATA[This article is divided into five parts; they are: • Introduction to Fully Sharded Data Parallel • Preparing Model for FSDP Training • Training Loop with FSDP • Fine-Tuning FSDP Behavior • Checkpointing FSDP Models Sharding is a term originally used in database management systems, where it refers to dividing a database into smaller units, called shards, to improve performance.]]> https://machinelearningmastery.com/train-your-large-model-on-multiple-gpus-with-fully-sharded-data-parallelism/ https://machinelearningmastery.com/?p=22962 Tue, 30 Dec 2025 22:12:18 +0000 <![CDATA[Adrian Tam]]> Beyond Short-term Memory: The 3 Types of Long-term Memory AI Agents Need <![CDATA[If you've built chatbots or worked with language models, you're already familiar with how AI systems handle memory within a single conversation.]]> https://machinelearningmastery.com/beyond-short-term-memory-the-3-types-of-long-term-memory-ai-agents-need/ https://machinelearningmastery.com/?p=22927 Tue, 30 Dec 2025 11:00:59 +0000 <![CDATA[Vinod Chugani]]> Train Your Large Model on Multiple GPUs with Pipeline Parallelism <![CDATA[This article is divided into six parts; they are: • Pipeline Parallelism Overview • Model Preparation for Pipeline Parallelism • Stage and Pipeline Schedule • Training Loop • Distributed Checkpointing • Limitations of Pipeline Parallelism Pipeline parallelism means creating the model as a pipeline of stages.]]> https://machinelearningmastery.com/train-your-large-model-on-multiple-gpus-with-pipeline-parallelism/ https://machinelearningmastery.com/?p=22948 Mon, 29 Dec 2025 20:56:53 +0000 <![CDATA[Adrian Tam]]> 5 Python Libraries for Advanced Time Series Forecasting <![CDATA[Predicting the future has always been the holy grail of analytics.]]> https://machinelearningmastery.com/5-python-libraries-for-advanced-time-series-forecasting/ https://machinelearningmastery.com/?p=22905 Mon, 29 Dec 2025 11:00:23 +0000 <![CDATA[Iván Palomares Carrascosa]]> Training a Model on Multiple GPUs with Data Parallelism <![CDATA[This article is divided into two parts; they are: • Data Parallelism • Distributed Data Parallelism If you have multiple GPUs, you can combine them to operate as a single GPU with greater memory capacity.]]> https://machinelearningmastery.com/training-a-model-on-multiple-gpus-with-data-parallelism/ https://machinelearningmastery.com/?p=22901 Fri, 26 Dec 2025 06:44:15 +0000 <![CDATA[Adrian Tam]]> Train a Model Faster with torch.compile and Gradient Accumulation <![CDATA[This article is divided into two parts; they are: • Using `torch.]]> https://machinelearningmastery.com/train-a-model-faster-with-torch-compile-and-gradient-accumulation/ https://machinelearningmastery.com/?p=22898 Thu, 25 Dec 2025 16:44:48 +0000 <![CDATA[Adrian Tam]]> Training a Model with Limited Memory using Mixed Precision and Gradient Checkpointing <![CDATA[This article is divided into three parts; they are: • Floating-point Numbers • Automatic Mixed Precision Training • Gradient Checkpointing Let's get started! The default data type in PyTorch is the IEEE 754 32-bit floating-point format, also known as single precision.]]> https://machinelearningmastery.com/training-a-model-with-limited-memory-using-mixed-precision-and-gradient-checkpointing/ https://machinelearningmastery.com/?p=22892 Wed, 24 Dec 2025 17:43:03 +0000 <![CDATA[Adrian Tam]]> Practical Agentic Coding with Google Jules <![CDATA[If you have an interest in agentic coding, there's a pretty good chance you've heard of https://machinelearningmastery.com/?p=22758 Wed, 24 Dec 2025 15:13:48 +0000 <![CDATA[Matthew Mayo]]>