Skip to content

BellXP/holistic_evaluation

Repository files navigation

holistic_evaluation

Visual Perception

Model ImageNet CIFAR10 OxfordIIITPet Flowers102 VCR1_OC VCR1_MCI MSCOCO_OC MSCOCO_MCI
LLaVA 15.59 67.82 6.27 8.15 27.29 65.67 21.96 74.50
Lynx 36.84 72.52 52.00 46.32 59.67 85.20 70.99 88.31
VPGTrans 24.47 72.53 20.66 18.51 20.25 68.11 27.95 65.82
BLIP2 18.33 69.83 37.69 26.31 25.04 87.89 56.07 86.44
InstructBLIP-T5 22.80 65.61 32.30 37.06 23.30 87.98 28.23 89.33
InstructBLIP 25.96 71.18 35.08 34.82 27.62 84.50 46.93 90.68
BLIVA 31.48 79.79 33.47 32.49 61.83 91.71 73.92 82.58
LLaMA-Adapter-v2 22.74 64.58 43.88 31.78 25.16 71.28 38.03 82.79
Otter-Image 15.03 65.62 12.51 10.46 32.47 77.13 48.34 71.15
Cheetah 19.45 66.10 26.06 29.81 57.74 75.71 36.27 78.63

Visual Knowledge Acquisition

Model IIIT5K IC13 IC15 Total-Text CUTE80 SVT SVTP COCO-Text WordArt CTW HOST WOST SROIE FUNSD
LLaVA 31.43 15.68 25.61 24.74 33.68 16.38 27.91 16.95 35.34 14.50 16.23 20.20 0.07 0.85
Lynx 27.33 15.45 22.82 27.09 37.15 15.46 23.26 15.00 39.51 16.60 17.22 19.99 0.36 2.55
VPGTrans 61.13 69.34 53.49 54.54 71.18 70.79 62.48 36.36 61.88 52.67 49.92 56.71 0.00 0.68
BLIP2 80.83 80.31 69.38 69.35 84.38 86.24 80.47 54.86 74.06 69.08 57.70 70.03 0.22 0.85
InstructBLIP-T5 83.17 83.25 72.12 71.87 86.81 87.33 82.33 58.25 75.84 70.67 60.02 72.89 0.22 1.02
InstructBLIP 81.53 83.02 70.73 70.11 79.51 87.17 80.78 54.49 74.26 66.86 61.55 72.27 0.22 0.85
BLIVA 87.47 87.38 75.88 76.34 85.42 90.42 85.58 61.02 78.56 71.50 68.67 78.10 0.36 2.04
LLaMA-Adapter-v2 36.43 20.87 28.79 30.29 37.85 20.87 30.54 20.80 38.58 17.88 16.76 22.64 0.43 1.19
Otter-Image 21.83 9.79 22.97 18.10 27.78 15.61 24.03 15.23 25.15 10.88 12.00 15.98 0.00 0.85
Cheetah 75.10 72.64 60.57 60.41 70.49 72.18 67.44 46.38 63.47 57.95 51.20 62.58 0.07 1.70

Visual Reasoning

Model DocVQA TextVQA STVQA OCRVQA OKVQA GQA IconQA VSR WHOOPS ScienceQA VizWiz
LLaVA 5.44 36.16 27.21 25.71 50.16 43.60 44.47 55.57 26.41 43.53 61.54
Lynx 10.43 52.80 42.60 44.80 66.43 61.86 69.13 63.19 39.14 62.82 78.34
VPGTrans 3.63 22.64 18.18 19.62 44.93 33.69 43.79 50.23 22.64 10.01 56.32
BLIP2 3.61 29.40 20.01 29.58 42.15 44.68 48.99 63.43 25.70 70.20 58.09
InstructBLIP-T5 4.49 35.58 26.07 57.84 54.36 48.89 55.27 64.46 24.03 70.00 65.16
InstructBLIP 4.88 36.78 26.01 58.90 59.18 49.49 35.37 54.38 28.17 34.95 62.16
BLIVA 7.23 45.40 32.38 62.34 69.54 60.87 39.49 76.40 36.56 29.30 73.39
LLaMA-Adapter-v2 8.19 43.68 32.69 36.67 57.17 44.80 41.91 51.96 26.77 55.88 69.94
Otter-Image 4.08 28.14 26.34 24.65 42.75 45.69 44.81 55.71 14.87 61.92 49.07
Cheetah 4.43 27.70 19.82 33.84 41.64 43.84 53.20 52.11 27.39 52.45 60.65

Visual Commonsense

Model ImageNetVC_others ImageNetVC_color ImageNetVC_shape ImageNetVC_material ImageNetVC_component VCR
LLaVA 61.79 58.17 22.48 51.49 54.98 11.84
Lynx 69.77 78.28 28.70 58.14 70.04 34.56
VPGTrans 53.93 56.54 28.11 58.53 61.26 22.45
BLIP2 47.20 47.76 11.79 35.02 61.51 55.50
InstructBLIP-T5 69.73 58.92 41.70 56.95 77.28 56.97
InstructBLIP 65.17 64.09 30.31 54.09 81.58 12.69
BLIVA 78.15 76.32 32.62 71.60 90.22 11.16
LLaMA-Adapter-v2 69.37 62.26 28.96 63.74 65.68 21.10
Otter-Image 63.48 53.81 28.54 54.98 64.51 24.52
Cheetah 61.61 54.13 24.91 52.44 65.83 23.16

Object Hallucination

Model MSCOCO_pope_random MSCOCO_pope_popular MSCOCO_pope_adversarial
LLaVA 62.16 52.27 51.07
Lynx 85.46 83.13 79.40
VPGTrans 60.76 56.60 56.53
BLIP2 81.48 80.70 78.97
InstructBLIP-T5 77.56 75.57 70.03
InstructBLIP 90.31 83.43 80.70
BLIVA 91.79 71.77 85.33
LLaMA-Adapter-v2 75.50 59.33 56.93
Otter-Image 82.41 71.43 67.43
Cheetah 83.33 73.07 69.63

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published