-
Notifications
You must be signed in to change notification settings - Fork 31.5k
Description
System Info
- Platform: Windows-10
- transformers version: 4.43.4
- Python version: 3.10.11
- PyTorch version (GPU?): 2.3.1+cu121
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
the Trainer does not set a warmup and the lr_scheduler is set to linear, and the training is continued from an interruption to complete all steps, the learning rates will be different from those when training all steps from the beginning. Here are the specific learning rates:
Learning rates for training from the beginning for each step:
- Step 1: "learning_rate": 1e-05,
- Step 2: "learning_rate": 1e-05,
- Step 3: "learning_rate": 9e-06,
- Step 4: "learning_rate": 8.000000000000001e-06,
- Step 5: "learning_rate": 7e-06,
- Step 6: "learning_rate": 6e-06,
- Step 7: "learning_rate": 5e-06,
- Step 8: "learning_rate": 4.000000000000001e-06,
- Step 9: "learning_rate": 3e-06,
- Step 10: "learning_rate": 2.0000.
If training is continued from a checkpoint at step 5, the learning rates for each step are:
- Step 6: "learning_rate": 7e-06,
- Step 7: "learning_rate": 7e-06,
- Step 8: "learning_rate": 6e-06,
- Step 9: "learning_rate": 5e-06,
- Step 10: "learning_rate": 4.000000000000001e-06.
Why are the learning rates for step 6 and step 7 different when training continues from a checkpoint compared to training from the start?
Reproduction steps:
- Train from the beginning for 10 steps, save a checkpoint for each step, and record the learning rate in each step.
- Delete the checkpoints for steps 6 through 7 in the folder.
- Then use
trainer.train(resume_from_checkpoint=True)to continue training from step 5, and after training is completed, record the learning rate in the new checkpoint.
Expected behavior
Please explain why the learning rate is not continuous as it is when training from the beginning, for example:
Step 6: "learning_rate": 6e-06,
Step 7: "learning_rate": 5e-06.
........