MachineLearningMastery.com
https://machinelearningmastery.com/
Making developers awesome at machine learningGradient Descent:The Engine of Machine Learning Optimization<![CDATA[Editor's note: This article is a part of our series on visualizing the foundations of machine learning.]]>
https://machinelearningmastery.com/gradient-descentthe-engine-of-machine-learning-optimization/
https://machinelearningmastery.com/?p=23004Fri, 02 Jan 2026 11:00:17 +0000<![CDATA[Matthew Mayo]]>Train Your Large Model on Multiple GPUs with Tensor Parallelism<![CDATA[This article is divided into five parts; they are: • An Example of Tensor Parallelism • Setting Up Tensor Parallelism • Preparing Model for Tensor Parallelism • Train a Model with Tensor Parallelism • Combining Tensor Parallelism with FSDP Tensor parallelism originated from the Megatron-LM paper.]]>
https://machinelearningmastery.com/train-your-large-model-on-multiple-gpus-with-tensor-parallelism/
https://machinelearningmastery.com/?p=22980Wed, 31 Dec 2025 21:22:39 +0000<![CDATA[Adrian Tam]]>Train Your Large Model on Multiple GPUs with Fully Sharded Data Parallelism<![CDATA[This article is divided into five parts; they are: • Introduction to Fully Sharded Data Parallel • Preparing Model for FSDP Training • Training Loop with FSDP • Fine-Tuning FSDP Behavior • Checkpointing FSDP Models Sharding is a term originally used in database management systems, where it refers to dividing a database into smaller units, called shards, to improve performance.]]>
https://machinelearningmastery.com/train-your-large-model-on-multiple-gpus-with-fully-sharded-data-parallelism/
https://machinelearningmastery.com/?p=22962Tue, 30 Dec 2025 22:12:18 +0000<![CDATA[Adrian Tam]]>Beyond Short-term Memory: The 3 Types of Long-term Memory AI Agents Need<![CDATA[If you've built chatbots or worked with language models, you're already familiar with how AI systems handle memory within a single conversation.]]>
https://machinelearningmastery.com/beyond-short-term-memory-the-3-types-of-long-term-memory-ai-agents-need/
https://machinelearningmastery.com/?p=22927Tue, 30 Dec 2025 11:00:59 +0000<![CDATA[Vinod Chugani]]>Train Your Large Model on Multiple GPUs with Pipeline Parallelism<![CDATA[This article is divided into six parts; they are: • Pipeline Parallelism Overview • Model Preparation for Pipeline Parallelism • Stage and Pipeline Schedule • Training Loop • Distributed Checkpointing • Limitations of Pipeline Parallelism Pipeline parallelism means creating the model as a pipeline of stages.]]>
https://machinelearningmastery.com/train-your-large-model-on-multiple-gpus-with-pipeline-parallelism/
https://machinelearningmastery.com/?p=22948Mon, 29 Dec 2025 20:56:53 +0000<![CDATA[Adrian Tam]]>5 Python Libraries for Advanced Time Series Forecasting<![CDATA[Predicting the future has always been the holy grail of analytics.]]>
https://machinelearningmastery.com/5-python-libraries-for-advanced-time-series-forecasting/
https://machinelearningmastery.com/?p=22905Mon, 29 Dec 2025 11:00:23 +0000<![CDATA[Iván Palomares Carrascosa]]>Training a Model on Multiple GPUs with Data Parallelism<![CDATA[This article is divided into two parts; they are: • Data Parallelism • Distributed Data Parallelism If you have multiple GPUs, you can combine them to operate as a single GPU with greater memory capacity.]]>
https://machinelearningmastery.com/training-a-model-on-multiple-gpus-with-data-parallelism/
https://machinelearningmastery.com/?p=22901Fri, 26 Dec 2025 06:44:15 +0000<![CDATA[Adrian Tam]]>Train a Model Faster with torch.compile and Gradient Accumulation<![CDATA[This article is divided into two parts; they are: • Using `torch.]]>
https://machinelearningmastery.com/train-a-model-faster-with-torch-compile-and-gradient-accumulation/
https://machinelearningmastery.com/?p=22898Thu, 25 Dec 2025 16:44:48 +0000<![CDATA[Adrian Tam]]>Training a Model with Limited Memory using Mixed Precision and Gradient Checkpointing<![CDATA[This article is divided into three parts; they are: • Floating-point Numbers • Automatic Mixed Precision Training • Gradient Checkpointing Let's get started! The default data type in PyTorch is the IEEE 754 32-bit floating-point format, also known as single precision.]]>
https://machinelearningmastery.com/training-a-model-with-limited-memory-using-mixed-precision-and-gradient-checkpointing/
https://machinelearningmastery.com/?p=22892Wed, 24 Dec 2025 17:43:03 +0000<![CDATA[Adrian Tam]]>Practical Agentic Coding with Google Jules<![CDATA[If you have an interest in agentic coding, there's a pretty good chance you've heard of https://machinelearningmastery.com/?p=22758
Wed, 24 Dec 2025 15:13:48 +0000<![CDATA[Matthew Mayo]]>