Skip to content

BellXP/Aliyun

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 

Repository files navigation

Overall Score

Rank Model Score
🏅️ GPT4V 1450.00
🥈 Qwen 925.00
🥉 internlm 733.33
4 Cheetah 663.33
5 Otter-Image 636.67
6 LLaMA-Adapter-v2 555.00
7 MiniGPT-4 463.33
8 PandaGPT 445.00
9 VPGTrans 440.00
10 LLaVA 325.00
11 Otter 188.33
12 BLIP2 186.67
13 InstructBLIP 78.33

visual knowledge reasoning

Rank Model Score
🏅️ GPT4V 70.00
🥈 internlm 50.00
🥉 Qwen 30.00
4 Cheetah 20.00
5 Otter-Image 20.00
6 VPGTrans 15.00
7 LLaVA 10.00
8 MiniGPT-4 10.00
9 LLaMA-Adapter-v2 5.00
10 Otter 5.00
11 InstructBLIP 0.00
12 PandaGPT 0.00
13 BLIP2 0.00

multi-image comprehension

Rank Model Score
🏅️ GPT4V 95.00
🥈 Qwen 50.00
🥉 LLaMA-Adapter-v2 40.00
4 Cheetah 40.00
5 PandaGPT 35.00
6 internlm 30.00
7 LLaVA 30.00
8 Otter-Image 25.00
9 MiniGPT-4 15.00
10 VPGTrans 5.00
11 Otter 5.00
12 InstructBLIP 0.00
13 BLIP2 0.00

multi-image comparison

Rank Model Score
🏅️ GPT4V 80.00
🥈 Cheetah 70.00
🥉 Otter-Image 55.00
4 Qwen 55.00
5 PandaGPT 45.00
6 LLaMA-Adapter-v2 40.00
7 internlm 30.00
8 VPGTrans 25.00
9 MiniGPT-4 10.00
10 LLaVA 0.00
11 Otter 0.00
12 InstructBLIP 0.00
13 BLIP2 0.00

temporal understanding

Rank Model Score
🏅️ GPT4V 90.00
🥈 Cheetah 35.00
🥉 PandaGPT 25.00
4 Qwen 25.00
5 LLaMA-Adapter-v2 20.00
6 Otter-Image 20.00
7 VPGTrans 15.00
8 MiniGPT-4 5.00
9 InstructBLIP 5.00
10 internlm 0.00
11 LLaVA 0.00
12 Otter 0.00
13 BLIP2 0.00

object hallucination

Rank Model Score
🏅️ GPT4V 100.00
🥈 internlm 50.00
🥉 Cheetah 40.00
4 Qwen 40.00
5 LLaMA-Adapter-v2 30.00
6 Otter-Image 30.00
7 LLaVA 20.00
8 PandaGPT 20.00
9 VPGTrans 10.00
10 MiniGPT-4 10.00
11 Otter 0.00
12 InstructBLIP 0.00
13 BLIP2 0.00

meme understanding

Rank Model Score
🏅️ GPT4V 80.00
🥈 Qwen 73.33
🥉 PandaGPT 46.67
4 internlm 33.33
5 VPGTrans 26.67
6 LLaMA-Adapter-v2 26.67
7 Cheetah 26.67
8 Otter-Image 26.67
9 MiniGPT-4 20.00
10 LLaVA 6.67
11 Otter 0.00
12 InstructBLIP 0.00
13 BLIP2 0.00

gui navigation

Rank Model Score
🏅️ internlm 100.00
🥈 LLaMA-Adapter-v2 86.67
🥉 Qwen 86.67
4 VPGTrans 73.33
5 Cheetah 73.33
6 BLIP2 73.33
7 GPT4V 66.67
8 Otter-Image 66.67
9 MiniGPT-4 60.00
10 PandaGPT 60.00
11 InstructBLIP 53.33
12 Otter 40.00
13 LLaVA 33.33

referring recognition

Rank Model Score
🏅️ GPT4V 66.67
🥈 Qwen 20.00
🥉 LLaMA-Adapter-v2 13.33
4 internlm 13.33
5 Otter-Image 13.33
6 Cheetah 6.67
7 MiniGPT-4 6.67
8 PandaGPT 6.67
9 VPGTrans 0.00
10 LLaVA 0.00
11 Otter 0.00
12 InstructBLIP 0.00
13 BLIP2 0.00

object counting

Rank Model Score
🏅️ GPT4V 70.00
🥈 MiniGPT-4 40.00
🥉 VPGTrans 30.00
4 Otter-Image 30.00
5 LLaMA-Adapter-v2 20.00
6 internlm 20.00
7 Qwen 20.00
8 Cheetah 10.00
9 LLaVA 10.00
10 Otter 0.00
11 InstructBLIP 0.00
12 PandaGPT 0.00
13 BLIP2 0.00

image accessment

Rank Model Score
🏅️ GPT4V 86.67
🥈 Qwen 60.00
🥉 Cheetah 53.33
4 internlm 46.67
5 MiniGPT-4 46.67
6 VPGTrans 40.00
7 LLaMA-Adapter-v2 40.00
8 LLaVA 33.33
9 Otter 33.33
10 PandaGPT 33.33
11 Otter-Image 26.67
12 InstructBLIP 0.00
13 BLIP2 0.00

medical service

Rank Model Score
🏅️ GPT4V 50.00
🥈 Otter-Image 45.00
🥉 Qwen 45.00
4 internlm 40.00
5 LLaMA-Adapter-v2 30.00
6 MiniGPT-4 25.00
7 Cheetah 20.00
8 PandaGPT 10.00
9 VPGTrans 5.00
10 LLaVA 0.00
11 Otter 0.00
12 InstructBLIP 0.00
13 BLIP2 0.00

visual commonsense reasoning

Rank Model Score
🏅️ GPT4V 100.00
🥈 Qwen 50.00
🥉 VPGTrans 20.00
4 Cheetah 20.00
5 LLaVA 20.00
6 Otter-Image 20.00
7 PandaGPT 20.00
8 LLaMA-Adapter-v2 10.00
9 MiniGPT-4 10.00
10 Otter 10.00
11 internlm 0.00
12 InstructBLIP 0.00
13 BLIP2 0.00

industry tool

Rank Model Score
🏅️ GPT4V 80.00
🥈 Otter-Image 60.00
🥉 MiniGPT-4 50.00
4 Qwen 50.00
5 Cheetah 40.00
6 internlm 30.00
7 LLaVA 30.00
8 VPGTrans 20.00
9 Otter 20.00
10 LLaMA-Adapter-v2 10.00
11 PandaGPT 10.00
12 InstructBLIP 0.00
13 BLIP2 0.00

visual recognition

Rank Model Score
🏅️ GPT4V 95.00
🥈 internlm 90.00
🥉 Qwen 80.00
4 Otter-Image 65.00
5 VPGTrans 55.00
6 Cheetah 55.00
7 LLaMA-Adapter-v2 50.00
8 PandaGPT 40.00
9 MiniGPT-4 35.00
10 LLaVA 25.00
11 Otter 15.00
12 InstructBLIP 0.00
13 BLIP2 0.00

intelligence quotient

Rank Model Score
🏅️ GPT4V 46.67
🥈 LLaVA 33.33
🥉 internlm 26.67
4 Otter-Image 20.00
5 Otter 20.00
6 VPGTrans 13.33
7 LLaMA-Adapter-v2 13.33
8 MiniGPT-4 13.33
9 PandaGPT 13.33
10 Qwen 13.33
11 BLIP2 13.33
12 Cheetah 6.67
13 InstructBLIP 0.00

code generation

Rank Model Score
🏅️ internlm 66.67
🥈 GPT4V 60.00
🥉 Cheetah 40.00
4 MiniGPT-4 40.00
5 Qwen 40.00
6 LLaMA-Adapter-v2 33.33
7 Otter-Image 33.33
8 PandaGPT 26.67
9 BLIP2 26.67
10 VPGTrans 20.00
11 LLaVA 20.00
12 Otter 20.00
13 InstructBLIP 6.67

emotional quotient

Rank Model Score
🏅️ GPT4V 80.00
🥈 Qwen 80.00
🥉 internlm 53.33
4 LLaMA-Adapter-v2 33.33
5 Cheetah 33.33
6 Otter-Image 33.33
7 MiniGPT-4 26.67
8 VPGTrans 20.00
9 PandaGPT 20.00
10 LLaVA 13.33
11 Otter 6.67
12 InstructBLIP 0.00
13 BLIP2 0.00

embodied agent

Rank Model Score
🏅️ GPT4V 80.00
🥈 Qwen 73.33
🥉 BLIP2 60.00
4 Cheetah 53.33
5 LLaMA-Adapter-v2 46.67
6 internlm 46.67
7 Otter-Image 46.67
8 VPGTrans 33.33
9 LLaVA 33.33
10 PandaGPT 33.33
11 MiniGPT-4 26.67
12 Otter 13.33
13 InstructBLIP 0.00

ocr

Rank Model Score
🏅️ GPT4V 53.33
🥈 Qwen 33.33
🥉 Cheetah 20.00
4 VPGTrans 13.33
5 MiniGPT-4 13.33
6 InstructBLIP 13.33
7 BLIP2 13.33
8 LLaMA-Adapter-v2 6.67
9 internlm 6.67
10 LLaVA 6.67
11 Otter-Image 0.00
12 Otter 0.00
13 PandaGPT 0.00

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published