How to train VQA on my custom data? #73

xiaoqiang-lu · 2022-04-18T18:06:34Z

Hello! I am trying to finetune OFA-large on VQA using custom dataset, using the finetuning instruction in the repo. I have checked my .tsv and .pkl file several times and they are correct as your provided sample. But after command "bash train_vqa_distributed.sh", the terminal just prints:

total_num_updates 40000
warmup_updates 1000
lr 5e-5
patch_image_size 480

The GPU usage will rise to a certain value and then suddenly return to zero, and then the program will end. I train on single server with 2 GPU. Looking forward to reply, thanks for your sharing work!

yangapku · 2022-04-21T15:01:38Z

Hi, could you please provide the exact script you run on your machine and the information of your GPU-cards type? I will have a check on my environment.

yangapku · 2022-04-21T15:19:07Z

Moreover, for fine-tuning on customed VQA-formated data, please also refer to this recent issue for more information #76.

xiaoqiang-lu · 2022-04-22T04:07:19Z

Thanks for your reply! At first I was using two cards 3080ti, now I replaced them with 4 cards v100, however the same problem occurs. The script on my machine:

GPUS_PER_NODE=4
WORKER_CNT=1
export MASTER_ADDR=127.0.0.1
export MASTER_PORT=8214
export RNAK=0

The rest are unchanged. I also make my own ans2label.pkl file.
Here is a part of my .tsv file without imgbase64.

Here is a part of my .pkl file.

yangapku · 2022-04-22T04:16:44Z

Hi, have you checked the path of $log_file defined in your training script? The running log is saved in this file rather than printed on stdout. The program may be ended for other reasons, which may be recorded in the log. Please share more information if you find this log file.

xiaoqiang-lu · 2022-04-22T05:13:40Z

Thanks! It seems to be a problem with my image that is causing this, I am using the code you replied to in issue #56 for imgbase64.

xiaoqiang-lu · 2022-04-22T06:42:59Z

I have solved the above problem, but another problem occurs.

yangapku · 2022-04-22T07:39:39Z

Hi, please check whether the fields of the input data line which caused this error correspond with the specified selected_cols. By default, the selected_cols is specified as 0,5,2,3,4 in the script, which sequentially fetches the 0th (uniq_id), 5th (image), 2nd (question), 3rd (answer info), 4th (predict_objects) field from each input TSV line. If any of the field mismatches, errors may occur.

xiaoqiang-lu · 2022-04-22T09:07:44Z

I have check the input data line, and it is same as exsample. I print the column_l and the length of it, column_l is correct [img_id, imgbase64, question, answer, objects].

yangapku · 2022-04-22T09:44:50Z

Hi, I think there is a misunderstanding of how each data line is organized. As mentioned in the readme, in each line in TSV file, the fields follow the exact order of question-id, image-id, question, answer (with confidence), predicted object labels and image base64 string, thus there are 6 fields in total in the TSV file (also the image-id field is not used). By specifying the selected_cols=0,5,2,3,4, the program sequentially fetches the 0th (question-id), 5th (image), 2nd (question), 3rd (answer info), 4th (predict_objects) field from each input TSV line, resulting in a sample to be further processed in __getitem__ method of VqaGenDataset.

yangapku · 2022-04-22T09:55:00Z

By the way, for preparing the dataset TSV file, I would also recommend to prepare an original training sample with more than one golden answers into multiple samples each of which contains only one of the answers. This will take full advantage of the supervision of ground-truth answers of training samples. Otherwise, only the golden answer with the highest confidence score will be used as supervision.

hieptran1812 · 2022-11-04T02:59:20Z

Thanks! It seems to be a problem with my image that is causing this, I am using the code you replied to in issue #56 for imgbase64.

how you resolve this problem? I''m having same problem. Thanks

yangapku self-assigned this Apr 19, 2022

yangapku closed this as completed May 3, 2022

yangapku mentioned this issue May 28, 2022

How to test on VizWiz dataset #117

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to train VQA on my custom data? #73

How to train VQA on my custom data? #73

xiaoqiang-lu commented Apr 18, 2022

yangapku commented Apr 21, 2022

yangapku commented Apr 21, 2022

xiaoqiang-lu commented Apr 22, 2022

yangapku commented Apr 22, 2022 •

edited

Loading

xiaoqiang-lu commented Apr 22, 2022

xiaoqiang-lu commented Apr 22, 2022

yangapku commented Apr 22, 2022 •

edited

Loading

xiaoqiang-lu commented Apr 22, 2022

yangapku commented Apr 22, 2022 •

edited

Loading

yangapku commented Apr 22, 2022 •

edited

Loading

hieptran1812 commented Nov 4, 2022

How to train VQA on my custom data? #73

How to train VQA on my custom data? #73

Comments

xiaoqiang-lu commented Apr 18, 2022

yangapku commented Apr 21, 2022

yangapku commented Apr 21, 2022

xiaoqiang-lu commented Apr 22, 2022

yangapku commented Apr 22, 2022 • edited Loading

xiaoqiang-lu commented Apr 22, 2022

xiaoqiang-lu commented Apr 22, 2022

yangapku commented Apr 22, 2022 • edited Loading

xiaoqiang-lu commented Apr 22, 2022

yangapku commented Apr 22, 2022 • edited Loading

yangapku commented Apr 22, 2022 • edited Loading

hieptran1812 commented Nov 4, 2022

yangapku commented Apr 22, 2022 •

edited

Loading

yangapku commented Apr 22, 2022 •

edited

Loading

yangapku commented Apr 22, 2022 •

edited

Loading

yangapku commented Apr 22, 2022 •

edited

Loading