- Please see ScienceQA repo for setting up the dataset.
- Generate ScienceQA dataset for LLaVA conversation-style format.
python scripts/convert_sqa_to_llava.py \
convert_to_llava \
--base-dir /path/to/ScienceQA/data/scienceqa \
--prompt-format "QCM-LEA" \
--split {train,val,minival,test,minitest}
- Pretraining
You can download our pretrained projector weights from our Model Zoo, or train your own projector weights using pretrain.sh
.
- Finetuning
See finetune_sqa.sh
.
-
Multiple-GPU inference You may evaluate this with multiple GPUs, and concatenate the generated jsonl files. Please refer to our script for batch evaluation and results gathering.
-
Single-GPU inference
(a) Generate LLaVA responses on ScienceQA dataset
python -m llava.eval.model_vqa_science \
--model-path liuhaotian/llava-lcs558k-scienceqa-vicuna-13b-v1.3 \
--question-file /path/to/ScienceQA/data/scienceqa/llava_test_QCM-LEA.json \
--image-folder /path/to/ScienceQA/data/scienceqa/images/test \
--answers-file vqa/results/ScienceQA/test_llava-13b.jsonl \
--conv-mode llava_v1
(b) Evaluate the generated responses
python eval_science_qa.py \
--base-dir /path/to/ScienceQA/data/scienceqa \
--result-file vqa/results/ScienceQA/test_llava-13b.jsonl \
--output-file vqa/results/ScienceQA/test_llava-13b_output.json \
--output-result vqa/results/ScienceQA/test_llava-13b_result.json \
For reference, we attach our prediction file test_sqa_llava_lcs_558k_sqa_12e_vicuna_v1_3_13b.json
and test_sqa_llava_13b_v0.json
for comparison when reproducing our results, as well as for further analysis in detail.