trainer resume from checkpoint&#65292;the learning rate is not the same as retraining,learning rate is discontinuous

### System Info

- Platform: Windows-10
- transformers version: 4.43.4
- Python version: 3.10.11
- PyTorch version (GPU?): 2.3.1+cu121

### Who can help?

_No response_

### Information

- [ ] The official example scripts
- [X] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [X] My own task or dataset (give details below)

### Reproduction

the Trainer does not set a warmup and the lr_scheduler is set to linear, and the training is continued from an interruption to complete all steps, the learning rates will be different from those when training all steps from the beginning. Here are the specific learning rates:

Learning rates for training from the beginning for each step:
- Step 1: "learning_rate": 1e-05,
- Step 2: "learning_rate": 1e-05,
- Step 3: "learning_rate": 9e-06,
- Step 4: "learning_rate": 8.000000000000001e-06,
- Step 5: "learning_rate": 7e-06,
- Step 6: "learning_rate": 6e-06,
- Step 7: "learning_rate": 5e-06,
- Step 8: "learning_rate": 4.000000000000001e-06,
- Step 9: "learning_rate": 3e-06,
- Step 10: "learning_rate": 2.0000.

If training is continued from a checkpoint at step 5, the learning rates for each step are:
- Step 6: "learning_rate": 7e-06,
- Step 7: "learning_rate": 7e-06,
- Step 8: "learning_rate": 6e-06,
- Step 9: "learning_rate": 5e-06,
- Step 10: "learning_rate": 4.000000000000001e-06.

Why are the learning rates for step 6 and step 7 different when training continues from a checkpoint compared to training from the start?


Reproduction steps:
1. Train from the beginning for 10 steps, save a checkpoint for each step, and record the learning rate in each step.
2. Delete the checkpoints for steps 6 through 7 in the folder.
3. Then use `trainer.train(resume_from_checkpoint=True)` to continue training from step 5, and after training is completed, record the learning rate in the new checkpoint.


### Expected behavior

Please explain why the learning rate is not continuous as it is when training from the beginning, for example:
Step 6: "learning_rate": 6e-06,
Step 7: "learning_rate": 5e-06.
........

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

trainer resume from checkpoint，the learning rate is not the same as retraining,learning rate is discontinuous #34053

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

trainer resume from checkpoint，the learning rate is not the same as retraining,learning rate is discontinuous #34053

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions