Skip to content

模型微调异常 #362

@darkprices

Description

@darkprices

2024-10-10,14:19:42 | INFO | Rank 0 | train LMDB file contains 4000 images and 4000 pairs.
2024-10-10,14:19:42 | INFO | Rank 0 | val LMDB file contains 500 images and 500 pairs.
2024-10-10,14:19:42 | INFO | Rank 0 | Params:
2024-10-10,14:19:42 | INFO | Rank 0 | accum_freq: 1
2024-10-10,14:19:42 | INFO | Rank 0 | aggregate: True
2024-10-10,14:19:42 | INFO | Rank 0 | batch_size: 128
2024-10-10,14:19:42 | INFO | Rank 0 | bert_weight_path: None
2024-10-10,14:19:42 | INFO | Rank 0 | beta1: 0.9
2024-10-10,14:19:42 | INFO | Rank 0 | beta2: 0.98
2024-10-10,14:19:42 | INFO | Rank 0 | checkpoint_path: /workspace/code/experiments/demo_data_finetune_vit-b-16_roberta-base_bs128_8gpu/checkpoints
2024-10-10,14:19:42 | INFO | Rank 0 | clip_weight_path: None
2024-10-10,14:19:42 | INFO | Rank 0 | context_length: 52
2024-10-10,14:19:42 | INFO | Rank 0 | debug: False
2024-10-10,14:19:42 | INFO | Rank 0 | device: cuda:0
2024-10-10,14:19:42 | INFO | Rank 0 | distillation: False
2024-10-10,14:19:42 | INFO | Rank 0 | eps: 1e-06
2024-10-10,14:19:42 | INFO | Rank 0 | freeze_vision: False
2024-10-10,14:19:42 | INFO | Rank 0 | gather_with_grad: False
2024-10-10,14:19:42 | INFO | Rank 0 | grad_checkpointing: False
2024-10-10,14:19:42 | INFO | Rank 0 | kd_loss_weight: 0.5
2024-10-10,14:19:42 | INFO | Rank 0 | local_device_rank: 0
2024-10-10,14:19:42 | INFO | Rank 0 | log_interval: 1
2024-10-10,14:19:42 | INFO | Rank 0 | log_level: 20
2024-10-10,14:19:42 | INFO | Rank 0 | log_path: /workspace/code/experiments/demo_data_finetune_vit-b-16_roberta-base_bs128_8gpu/out_2024-10-10-06-19-39.log
2024-10-10,14:19:42 | INFO | Rank 0 | logs: /workspace/code/experiments/
2024-10-10,14:19:42 | INFO | Rank 0 | lr: 0.0005
2024-10-10,14:19:42 | INFO | Rank 0 | mask_ratio: 0
2024-10-10,14:19:42 | INFO | Rank 0 | max_epochs: 100
2024-10-10,14:19:42 | INFO | Rank 0 | max_steps: 3200
2024-10-10,14:19:42 | INFO | Rank 0 | name: demo_data_finetune_vit-b-16_roberta-base_bs128_8gpu
2024-10-10,14:19:42 | INFO | Rank 0 | num_workers: 4
2024-10-10,14:19:42 | INFO | Rank 0 | precision: amp
2024-10-10,14:19:42 | INFO | Rank 0 | rank: 0
2024-10-10,14:19:42 | INFO | Rank 0 | report_training_batch_acc: True
2024-10-10,14:19:42 | INFO | Rank 0 | reset_data_offset: True
2024-10-10,14:19:42 | INFO | Rank 0 | reset_optimizer: True
2024-10-10,14:19:42 | INFO | Rank 0 | resume: /code/xx/Chinese-CLIP/pretrained_weights/clip_cn_vit-b-16.pt
2024-10-10,14:19:42 | INFO | Rank 0 | save_epoch_frequency: 1
2024-10-10,14:19:42 | INFO | Rank 0 | save_step_frequency: 999999
2024-10-10,14:19:42 | INFO | Rank 0 | seed: 123
2024-10-10,14:19:42 | INFO | Rank 0 | skip_aggregate: False
2024-10-10,14:19:42 | INFO | Rank 0 | skip_scheduler: False
2024-10-10,14:19:42 | INFO | Rank 0 | teacher_model_name: None
2024-10-10,14:19:42 | INFO | Rank 0 | text_model: RoBERTa-wwm-ext-base-chinese
2024-10-10,14:19:42 | INFO | Rank 0 | train_data: /workspace/code/demo_data/lmdb/train
2024-10-10,14:19:42 | INFO | Rank 0 | use_augment: True
2024-10-10,14:19:42 | INFO | Rank 0 | use_bn_sync: False
2024-10-10,14:19:42 | INFO | Rank 0 | use_flash_attention: False
2024-10-10,14:19:42 | INFO | Rank 0 | val_data: /workspace/code/demo_data/lmdb/valid
2024-10-10,14:19:42 | INFO | Rank 0 | valid_batch_size: 128
2024-10-10,14:19:42 | INFO | Rank 0 | valid_epoch_interval: 1
2024-10-10,14:19:42 | INFO | Rank 0 | valid_num_workers: 1
2024-10-10,14:19:42 | INFO | Rank 0 | valid_step_interval: 150
2024-10-10,14:19:42 | INFO | Rank 0 | vision_model: ViT-B-16
2024-10-10,14:19:42 | INFO | Rank 0 | warmup: 100
2024-10-10,14:19:42 | INFO | Rank 0 | wd: 0.001
2024-10-10,14:19:42 | INFO | Rank 0 | world_size: 1
2024-10-10,14:19:42 | INFO | Rank 0 | Use GPU: 0 for training
2024-10-10,14:19:42 | INFO | Rank 0 | => begin to load checkpoint '/code/xx/Chinese-CLIP/pretrained_weights/clip_cn_vit-b-16.pt'
2024-10-10,14:20:15 | INFO | Rank 0 | => loaded checkpoint '/code/xx/Chinese-CLIP/pretrained_weights/clip_cn_vit-b-16.pt' (epoch 15 @ 0 steps)
2024-10-10,14:20:23 | INFO | Rank 0 | Global Steps: 1/3200 | Train Epoch: 1 [128/4096 (3%)] | Loss: 5.140202 | Image2Text Acc: 5.47 | Text2Image Acc: 3.91 | Data Time: 6.993s | Batch Time: 8.429s | LR: 0.000005 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:24 | INFO | Rank 0 | Global Steps: 2/3200 | Train Epoch: 1 [256/4096 (6%)] | Loss: 5.020943 | Image2Text Acc: 6.25 | Text2Image Acc: 5.47 | Data Time: 0.503s | Batch Time: 0.813s | LR: 0.000010 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:25 | INFO | Rank 0 | Global Steps: 3/3200 | Train Epoch: 1 [384/4096 (9%)] | Loss: 4.862580 | Image2Text Acc: 6.25 | Text2Image Acc: 3.91 | Data Time: 0.025s | Batch Time: 0.385s | LR: 0.000015 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:25 | INFO | Rank 0 | Global Steps: 4/3200 | Train Epoch: 1 [512/4096 (12%)] | Loss: 5.083204 | Image2Text Acc: 4.69 | Text2Image Acc: 3.12 | Data Time: 0.036s | Batch Time: 0.330s | LR: 0.000020 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:29 | INFO | Rank 0 | Global Steps: 5/3200 | Train Epoch: 1 [640/4096 (16%)] | Loss: 4.380543 | Image2Text Acc: 4.69 | Text2Image Acc: 9.38 | Data Time: 4.093s | Batch Time: 4.395s | LR: 0.000025 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:31 | INFO | Rank 0 | Global Steps: 6/3200 | Train Epoch: 1 [768/4096 (19%)] | Loss: 4.561520 | Image2Text Acc: 3.91 | Text2Image Acc: 6.25 | Data Time: 1.478s | Batch Time: 1.772s | LR: 0.000030 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:31 | INFO | Rank 0 | Global Steps: 7/3200 | Train Epoch: 1 [896/4096 (22%)] | Loss: 4.347610 | Image2Text Acc: 3.91 | Text2Image Acc: 7.81 | Data Time: 0.036s | Batch Time: 0.330s | LR: 0.000035 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:32 | INFO | Rank 0 | Global Steps: 8/3200 | Train Epoch: 1 [1024/4096 (25%)] | Loss: 4.256195 | Image2Text Acc: 7.03 | Text2Image Acc: 3.91 | Data Time: 0.040s | Batch Time: 0.334s | LR: 0.000040 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:37 | INFO | Rank 0 | Global Steps: 9/3200 | Train Epoch: 1 [1152/4096 (28%)] | Loss: 4.305431 | Image2Text Acc: 2.34 | Text2Image Acc: 2.34 | Data Time: 4.561s | Batch Time: 4.863s | LR: 0.000045 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:39 | INFO | Rank 0 | Global Steps: 10/3200 | Train Epoch: 1 [1280/4096 (31%)] | Loss: 4.286503 | Image2Text Acc: 3.91 | Text2Image Acc: 7.81 | Data Time: 1.666s | Batch Time: 1.964s | LR: 0.000050 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:39 | INFO | Rank 0 | Global Steps: 11/3200 | Train Epoch: 1 [1408/4096 (34%)] | Loss: 4.256180 | Image2Text Acc: 5.47 | Text2Image Acc: 3.12 | Data Time: 0.041s | Batch Time: 0.338s | LR: 0.000055 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:39 | INFO | Rank 0 | Global Steps: 12/3200 | Train Epoch: 1 [1536/4096 (38%)] | Loss: 4.268936 | Image2Text Acc: 5.47 | Text2Image Acc: 6.25 | Data Time: 0.036s | Batch Time: 0.330s | LR: 0.000060 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:44 | INFO | Rank 0 | Global Steps: 13/3200 | Train Epoch: 1 [1664/4096 (41%)] | Loss: 4.263233 | Image2Text Acc: 6.25 | Text2Image Acc: 6.25 | Data Time: 3.959s | Batch Time: 4.263s | LR: 0.000065 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:46 | INFO | Rank 0 | Global Steps: 14/3200 | Train Epoch: 1 [1792/4096 (44%)] | Loss: 4.249680 | Image2Text Acc: 7.03 | Text2Image Acc: 4.69 | Data Time: 2.528s | Batch Time: 2.829s | LR: 0.000070 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:47 | INFO | Rank 0 | Global Steps: 15/3200 | Train Epoch: 1 [1920/4096 (47%)] | Loss: 4.208305 | Image2Text Acc: 6.25 | Text2Image Acc: 5.47 | Data Time: 0.040s | Batch Time: 0.339s | LR: 0.000075 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:47 | INFO | Rank 0 | Global Steps: 16/3200 | Train Epoch: 1 [2048/4096 (50%)] | Loss: 4.351048 | Image2Text Acc: 5.47 | Text2Image Acc: 6.25 | Data Time: 0.038s | Batch Time: 0.333s | LR: 0.000080 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:50 | INFO | Rank 0 | Global Steps: 17/3200 | Train Epoch: 1 [2176/4096 (53%)] | Loss: 4.289299 | Image2Text Acc: 3.91 | Text2Image Acc: 4.69 | Data Time: 2.945s | Batch Time: 3.242s | LR: 0.000085 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:53 | INFO | Rank 0 | Global Steps: 18/3200 | Train Epoch: 1 [2304/4096 (56%)] | Loss: 4.244534 | Image2Text Acc: 3.91 | Text2Image Acc: 5.47 | Data Time: 2.589s | Batch Time: 2.889s | LR: 0.000090 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:54 | INFO | Rank 0 | Global Steps: 19/3200 | Train Epoch: 1 [2432/4096 (59%)] | Loss: 4.298996 | Image2Text Acc: 5.47 | Text2Image Acc: 4.69 | Data Time: 0.036s | Batch Time: 0.334s | LR: 0.000095 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:54 | INFO | Rank 0 | Global Steps: 20/3200 | Train Epoch: 1 [2560/4096 (62%)] | Loss: 4.175068 | Image2Text Acc: 7.03 | Text2Image Acc: 2.34 | Data Time: 0.038s | Batch Time: 0.332s | LR: 0.000100 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:20:57 | INFO | Rank 0 | Global Steps: 21/3200 | Train Epoch: 1 [2688/4096 (66%)] | Loss: 4.202049 | Image2Text Acc: 6.25 | Text2Image Acc: 5.47 | Data Time: 2.378s | Batch Time: 2.680s | LR: 0.000105 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:21:00 | INFO | Rank 0 | Global Steps: 22/3200 | Train Epoch: 1 [2816/4096 (69%)] | Loss: 4.255169 | Image2Text Acc: 5.47 | Text2Image Acc: 6.25 | Data Time: 3.118s | Batch Time: 3.419s | LR: 0.000110 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:21:00 | INFO | Rank 0 | Global Steps: 23/3200 | Train Epoch: 1 [2944/4096 (72%)] | Loss: 4.340736 | Image2Text Acc: 6.25 | Text2Image Acc: 6.25 | Data Time: 0.044s | Batch Time: 0.343s | LR: 0.000115 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:21:01 | INFO | Rank 0 | Global Steps: 24/3200 | Train Epoch: 1 [3072/4096 (75%)] | Loss: 4.433716 | Image2Text Acc: 1.56 | Text2Image Acc: 6.25 | Data Time: 0.041s | Batch Time: 0.340s | LR: 0.000120 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:21:03 | INFO | Rank 0 | Global Steps: 25/3200 | Train Epoch: 1 [3200/4096 (78%)] | Loss: 4.339813 | Image2Text Acc: 3.91 | Text2Image Acc: 6.25 | Data Time: 1.788s | Batch Time: 2.085s | LR: 0.000125 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:21:06 | INFO | Rank 0 | Global Steps: 26/3200 | Train Epoch: 1 [3328/4096 (81%)] | Loss: 4.351143 | Image2Text Acc: 2.34 | Text2Image Acc: 3.12 | Data Time: 2.790s | Batch Time: 3.092s | LR: 0.000130 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:21:06 | INFO | Rank 0 | Global Steps: 27/3200 | Train Epoch: 1 [3456/4096 (84%)] | Loss: 4.369926 | Image2Text Acc: 2.34 | Text2Image Acc: 4.69 | Data Time: 0.043s | Batch Time: 0.338s | LR: 0.000135 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:21:07 | INFO | Rank 0 | Global Steps: 28/3200 | Train Epoch: 1 [3584/4096 (88%)] | Loss: 4.199516 | Image2Text Acc: 3.12 | Text2Image Acc: 3.12 | Data Time: 0.037s | Batch Time: 0.335s | LR: 0.000140 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:21:10 | INFO | Rank 0 | Global Steps: 29/3200 | Train Epoch: 1 [3712/4096 (91%)] | Loss: 4.327763 | Image2Text Acc: 3.91 | Text2Image Acc: 5.47 | Data Time: 3.056s | Batch Time: 3.354s | LR: 0.000145 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:21:12 | INFO | Rank 0 | Global Steps: 30/3200 | Train Epoch: 1 [3840/4096 (94%)] | Loss: 4.432281 | Image2Text Acc: 2.34 | Text2Image Acc: 4.69 | Data Time: 1.928s | Batch Time: 2.226s | LR: 0.000150 | logit_scale: 4.605 | Global Batch Size: 128
2024-10-10,14:21:12 | INFO | Rank 0 | Global Steps: 31/3200 | Train Epoch: 1 [3968/4096 (97%)] | Loss: 4.358601 | Image2Text Acc: 6.25 | Text2Image Acc: 5.47 | Data Time: 0.037s | Batch Time: 0.332s | LR: 0.000155 | logit_scale: 4.604 | Global Batch Size: 128
2024-10-10,14:21:13 | INFO | Rank 0 | Global Steps: 32/3200 | Train Epoch: 1 [4096/4096 (100%)] | Loss: 4.322407 | Image2Text Acc: 4.69 | Text2Image Acc: 5.47 | Data Time: 0.037s | Batch Time: 0.332s | LR: 0.000160 | logit_scale: 4.604 | Global Batch Size: 128
2024-10-10,14:21:13 | INFO | Rank 0 | Begin to eval on validation set (epoch 1 @ 32 steps)...
2024-10-10,14:21:34 | INFO | Rank 0 | Validation Result (epoch 1 @ 32 steps) | Valid Loss: 4.217743 | Image2Text Acc: 3.91 | Text2Image Acc: 5.86 | logit_scale: 4.604 | Valid Batch Size: 128
2024-10-10,14:21:34 | INFO | Rank 0 | train LMDB file contains 4000 images and 4000 pairs.
2024-10-10,14:21:34 | INFO | Rank 0 | val LMDB file contains 500 images and 500 pairs.
2024-10-10,14:21:51 | INFO | Rank 0 | Saved checkpoint /workspace/code/experiments/demo_data_finetune_vit-b-16_roberta-base_bs128_8gpu/checkpoints/epoch1.pt (epoch 1 @ 32 steps) (writing took 17.00214672088623 seconds)
2024-10-10,14:22:08 | INFO | Rank 0 | Saved checkpoint /workspace/code/experiments/demo_data_finetune_vit-b-16_roberta-base_bs128_8gpu/checkpoints/epoch_latest.pt (epoch 1 @ 32 steps) (writing took 16.851421356201172 seconds)
2024-10-10,14:22:16 | INFO | Rank 0 | Global Steps: 33/3200 | Train Epoch: 2 [128/4096 (3%)] | Loss: 4.297310 | Image2Text Acc: 3.91 | Text2Image Acc: 5.47 | Data Time: 6.647s | Batch Time: 6.951s | LR: 0.000165 | logit_scale: 4.604 | Global Batch Size: 128
2024-10-10,14:22:16 | INFO | Rank 0 | Global Steps: 34/3200 | Train Epoch: 2 [256/4096 (6%)] | Loss: 4.284641 | Image2Text Acc: 4.69 | Text2Image Acc: 1.56 | Data Time: 0.042s | Batch Time: 0.339s | LR: 0.000170 | logit_scale: 4.604 | Global Batch Size: 128
2024-10-10,14:22:17 | INFO | Rank 0 | Global Steps: 35/3200 | Train Epoch: 2 [384/4096 (9%)] | Loss: 4.414612 | Image2Text Acc: 3.12 | Text2Image Acc: 4.69 | Data Time: 0.604s | Batch Time: 0.902s | LR: 0.000175 | logit_scale: 4.604 | Global Batch Size: 128
2024-10-10,14:22:17 | INFO | Rank 0 | Global Steps: 36/3200 | Train Epoch: 2 [512/4096 (12%)] | Loss: 4.766368 | Image2Text Acc: 3.12 | Text2Image Acc: 2.34 | Data Time: 0.036s | Batch Time: 0.331s | LR: 0.000180 | logit_scale: 4.604 | Global Batch Size: 128
2024-10-10,14:22:22 | INFO | Rank 0 | Global Steps: 37/3200 | Train Epoch: 2 [640/4096 (16%)] | Loss: 4.304634 | Image2Text Acc: 3.12 | Text2Image Acc: 2.34 | Data Time: 4.513s | Batch Time: 4.818s | LR: 0.000185 | logit_scale: 4.604 | Global Batch Size: 128
2024-10-10,14:22:22 | INFO | Rank 0 | Global Steps: 38/3200 | Train Epoch: 2 [768/4096 (19%)] | Loss: 4.475136 | Image2Text Acc: 3.12 | Text2Image Acc: 3.91 | Data Time: 0.038s | Batch Time: 0.334s | LR: 0.000190 | logit_scale: 4.604 | Global Batch Size: 128
2024-10-10,14:22:24 | INFO | Rank 0 | Global Steps: 39/3200 | Train Epoch: 2 [896/4096 (22%)] | Loss: 4.387520 | Image2Text Acc: 3.91 | Text2Image Acc: 3.91 | Data Time: 1.873s | Batch Time: 2.171s | LR: 0.000195 | logit_scale: 4.604 | Global Batch Size: 128
2024-10-10,14:22:25 | INFO | Rank 0 | Global Steps: 40/3200 | Train Epoch: 2 [1024/4096 (25%)] | Loss: 4.462852 | Image2Text Acc: 3.12 | Text2Image Acc: 3.91 | Data Time: 0.040s | Batch Time: 0.335s | LR: 0.000200 | logit_scale: 4.604 | Global Batch Size: 128
2024-10-10,14:22:29 | INFO | Rank 0 | Global Steps: 41/3200 | Train Epoch: 2 [1152/4096 (28%)] | Loss: 4.601013 | Image2Text Acc: 2.34 | Text2Image Acc: 1.56 | Data Time: 3.525s | Batch Time: 3.828s | LR: 0.000205 | logit_scale: 4.604 | Global Batch Size: 128
2024-10-10,14:22:29 | INFO | Rank 0 | Global Steps: 42/3200 | Train Epoch: 2 [1280/4096 (31%)] | Loss: 4.392643 | Image2Text Acc: 3.12 | Text2Image Acc: 3.91 | Data Time: 0.040s | Batch Time: 0.338s | LR: 0.000210 | logit_scale: 4.604 | Global Batch Size: 128
2024-10-10,14:22:30 | INFO | Rank 0 | Global Steps: 43/3200 | Train Epoch: 2 [1408/4096 (34%)] | Loss: 4.625900 | Image2Text Acc: 0.78 | Text2Image Acc: 3.91 | Data Time: 1.126s | Batch Time: 1.420s | LR: 0.000215 | logit_scale: 4.604 | Global Batch Size: 128
2024-10-10,14:22:31 | INFO | Rank 0 | Global Steps: 44/3200 | Train Epoch: 2 [1536/4096 (38%)] | Loss: 4.499672 | Image2Text Acc: 0.00 | Text2Image Acc: 2.34 | Data Time: 0.036s | Batch Time: 0.331s | LR: 0.000220 | logit_scale: 4.604 | Global Batch Size: 128
2024-10-10,14:22:35 | INFO | Rank 0 | Global Steps: 45/3200 | Train Epoch: 2 [1664/4096 (41%)] | Loss: 4.556229 | Image2Text Acc: 3.12 | Text2Image Acc: 4.69 | Data Time: 3.978s | Batch Time: 4.282s | LR: 0.000225 | logit_scale: 4.604 | Global Batch Size: 128
2024-10-10,14:22:35 | INFO | Rank 0 | Global Steps: 46/3200 | Train Epoch: 2 [1792/4096 (44%)] | Loss: 4.524231 | Image2Text Acc: 3.91 | Text2Image Acc: 3.12 | Data Time: 0.035s | Batch Time: 0.335s | LR: 0.000230 | logit_scale: 4.604 | Global Batch Size: 128
2024-10-10,14:22:37 | INFO | Rank 0 | Global Steps: 47/3200 | Train Epoch: 2 [1920/4096 (47%)] | Loss: 4.618797 | Image2Text Acc: 0.78 | Text2Image Acc: 3.12 | Data Time: 1.077s | Batch Time: 1.372s | LR: 0.000235 | logit_scale: 4.604 | Global Batch Size: 128
2024-10-10,14:22:37 | INFO | Rank 0 | Global Steps: 48/3200 | Train Epoch: 2 [2048/4096 (50%)] | Loss: 4.657127 | Image2Text Acc: 1.56 | Text2Image Acc: 2.34 | Data Time: 0.036s | Batch Time: 0.335s | LR: 0.000240 | logit_scale: 4.604 | Global Batch Size: 128
2024-10-10,14:22:41 | INFO | Rank 0 | Global Steps: 49/3200 | Train Epoch: 2 [2176/4096 (53%)] | Loss: 4.595833 | Image2Text Acc: 2.34 | Text2Image Acc: 1.56 | Data Time: 3.846s | Batch Time: 4.150s | LR: 0.000245 | logit_scale: 4.604 | Global Batch Size: 128
2024-10-10,14:22:42 | INFO | Rank 0 | Global Steps: 50/3200 | Train Epoch: 2 [2304/4096 (56%)] | Loss: 4.590302 | Image2Text Acc: 2.34 | Text2Image Acc: 2.34 | Data Time: 0.038s | Batch Time: 0.337s | LR: 0.000250 | logit_scale: 4.603 | Global Batch Size: 128
2024-10-10,14:22:43 | INFO | Rank 0 | Global Steps: 51/3200 | Train Epoch: 2 [2432/4096 (59%)] | Loss: 4.545975 | Image2Text Acc: 3.91 | Text2Image Acc: 1.56 | Data Time: 1.587s | Batch Time: 1.887s | LR: 0.000255 | logit_scale: 4.603 | Global Batch Size: 128
2024-10-10,14:22:44 | INFO | Rank 0 | Global Steps: 52/3200 | Train Epoch: 2 [2560/4096 (62%)] | Loss: 4.431618 | Image2Text Acc: 3.12 | Text2Image Acc: 3.91 | Data Time: 0.040s | Batch Time: 0.340s | LR: 0.000260 | logit_scale: 4.603 | Global Batch Size: 128
2024-10-10,14:22:47 | INFO | Rank 0 | Global Steps: 53/3200 | Train Epoch: 2 [2688/4096 (66%)] | Loss: 4.410482 | Image2Text Acc: 3.91 | Text2Image Acc: 3.12 | Data Time: 3.403s | Batch Time: 3.703s | LR: 0.000265 | logit_scale: 4.603 | Global Batch Size: 128
2024-10-10,14:22:48 | INFO | Rank 0 | Global Steps: 54/3200 | Train Epoch: 2 [2816/4096 (69%)] | Loss: 4.380985 | Image2Text Acc: 2.34 | Text2Image Acc: 5.47 | Data Time: 0.042s | Batch Time: 0.340s | LR: 0.000270 | logit_scale: 4.603 | Global Batch Size: 128
2024-10-10,14:22:50 | INFO | Rank 0 | Global Steps: 55/3200 | Train Epoch: 2 [2944/4096 (72%)] | Loss: 4.415821 | Image2Text Acc: 3.91 | Text2Image Acc: 3.12 | Data Time: 1.674s | Batch Time: 1.969s | LR: 0.000275 | logit_scale: 4.603 | Global Batch Size: 128
2024-10-10,14:22:50 | INFO | Rank 0 | Global Steps: 56/3200 | Train Epoch: 2 [3072/4096 (75%)] | Loss: 4.612484 | Image2Text Acc: 2.34 | Text2Image Acc: 3.12 | Data Time: 0.038s | Batch Time: 0.334s | LR: 0.000280 | logit_scale: 4.603 | Global Batch Size: 128
2024-10-10,14:22:54 | INFO | Rank 0 | Global Steps: 57/3200 | Train Epoch: 2 [3200/4096 (78%)] | Loss: 5.101124 | Image2Text Acc: 0.78 | Text2Image Acc: 3.91 | Data Time: 3.150s | Batch Time: 3.456s | LR: 0.000285 | logit_scale: 4.603 | Global Batch Size: 128
2024-10-10,14:22:54 | INFO | Rank 0 | Global Steps: 58/3200 | Train Epoch: 2 [3328/4096 (81%)] | Loss: 5.060188 | Image2Text Acc: 1.56 | Text2Image Acc: 0.78 | Data Time: 0.039s | Batch Time: 0.335s | LR: 0.000290 | logit_scale: 4.603 | Global Batch Size: 128
2024-10-10,14:22:56 | INFO | Rank 0 | Global Steps: 59/3200 | Train Epoch: 2 [3456/4096 (84%)] | Loss: 4.785370 | Image2Text Acc: 1.56 | Text2Image Acc: 3.91 | Data Time: 1.727s | Batch Time: 2.025s | LR: 0.000295 | logit_scale: 4.603 | Global Batch Size: 128
2024-10-10,14:22:56 | INFO | Rank 0 | Global Steps: 60/3200 | Train Epoch: 2 [3584/4096 (88%)] | Loss: 4.811279 | Image2Text Acc: 1.56 | Text2Image Acc: 2.34 | Data Time: 0.036s | Batch Time: 0.337s | LR: 0.000300 | logit_scale: 4.603 | Global Batch Size: 128
2024-10-10,14:23:00 | INFO | Rank 0 | Global Steps: 61/3200 | Train Epoch: 2 [3712/4096 (91%)] | Loss: 4.825073 | Image2Text Acc: 1.56 | Text2Image Acc: 0.78 | Data Time: 3.007s | Batch Time: 3.313s | LR: 0.000305 | logit_scale: 4.602 | Global Batch Size: 128
2024-10-10,14:23:00 | INFO | Rank 0 | Global Steps: 62/3200 | Train Epoch: 2 [3840/4096 (94%)] | Loss: 4.803360 | Image2Text Acc: 0.00 | Text2Image Acc: 0.78 | Data Time: 0.039s | Batch Time: 0.334s | LR: 0.000310 | logit_scale: 4.602 | Global Batch Size: 128
2024-10-10,14:23:01 | INFO | Rank 0 | Global Steps: 63/3200 | Train Epoch: 2 [3968/4096 (97%)] | Loss: 4.794815 | Image2Text Acc: 2.34 | Text2Image Acc: 1.56 | Data Time: 1.087s | Batch Time: 1.382s | LR: 0.000315 | logit_scale: 4.602 | Global Batch Size: 128
2024-10-10,14:23:02 | INFO | Rank 0 | Global Steps: 64/3200 | Train Epoch: 2 [4096/4096 (100%)] | Loss: 4.777557 | Image2Text Acc: 0.00 | Text2Image Acc: 2.34 | Data Time: 0.036s | Batch Time: 0.332s | LR: 0.000320 | logit_scale: 4.602 | Global Batch Size: 128
2024-10-10,14:23:02 | INFO | Rank 0 | Begin to eval on validation set (epoch 2 @ 64 steps)...
2024-10-10,14:23:18 | INFO | Rank 0 | Validation Result (epoch 2 @ 64 steps) | Valid Loss: 4.740685 | Image2Text Acc: 1.37 | Text2Image Acc: 1.37 | logit_scale: 4.602 | Valid Batch Size: 128
2024-10-10,14:23:19 | INFO | Rank 0 | train LMDB file contains 4000 images and 4000 pairs.
2024-10-10,14:23:19 | INFO | Rank 0 | val LMDB file contains 500 images and 500 pairs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions