GitHub

Overall Score

Rank	Model	Score
🏅️	GPT4V	1450.00
🥈	Qwen	925.00
🥉	internlm	733.33
4	Cheetah	663.33
5	Otter-Image	636.67
6	LLaMA-Adapter-v2	555.00
7	MiniGPT-4	463.33
8	PandaGPT	445.00
9	VPGTrans	440.00
10	LLaVA	325.00
11	Otter	188.33
12	BLIP2	186.67
13	InstructBLIP	78.33

visual knowledge reasoning

Rank	Model	Score
🏅️	GPT4V	70.00
🥈	internlm	50.00
🥉	Qwen	30.00
4	Cheetah	20.00
5	Otter-Image	20.00
6	VPGTrans	15.00
7	LLaVA	10.00
8	MiniGPT-4	10.00
9	LLaMA-Adapter-v2	5.00
10	Otter	5.00
11	InstructBLIP	0.00
12	PandaGPT	0.00
13	BLIP2	0.00

multi-image comprehension

Rank	Model	Score
🏅️	GPT4V	95.00
🥈	Qwen	50.00
🥉	LLaMA-Adapter-v2	40.00
4	Cheetah	40.00
5	PandaGPT	35.00
6	internlm	30.00
7	LLaVA	30.00
8	Otter-Image	25.00
9	MiniGPT-4	15.00
10	VPGTrans	5.00
11	Otter	5.00
12	InstructBLIP	0.00
13	BLIP2	0.00

multi-image comparison

Rank	Model	Score
🏅️	GPT4V	80.00
🥈	Cheetah	70.00
🥉	Otter-Image	55.00
4	Qwen	55.00
5	PandaGPT	45.00
6	LLaMA-Adapter-v2	40.00
7	internlm	30.00
8	VPGTrans	25.00
9	MiniGPT-4	10.00
10	LLaVA	0.00
11	Otter	0.00
12	InstructBLIP	0.00
13	BLIP2	0.00

temporal understanding

Rank	Model	Score
🏅️	GPT4V	90.00
🥈	Cheetah	35.00
🥉	PandaGPT	25.00
4	Qwen	25.00
5	LLaMA-Adapter-v2	20.00
6	Otter-Image	20.00
7	VPGTrans	15.00
8	MiniGPT-4	5.00
9	InstructBLIP	5.00
10	internlm	0.00
11	LLaVA	0.00
12	Otter	0.00
13	BLIP2	0.00

object hallucination

Rank	Model	Score
🏅️	GPT4V	100.00
🥈	internlm	50.00
🥉	Cheetah	40.00
4	Qwen	40.00
5	LLaMA-Adapter-v2	30.00
6	Otter-Image	30.00
7	LLaVA	20.00
8	PandaGPT	20.00
9	VPGTrans	10.00
10	MiniGPT-4	10.00
11	Otter	0.00
12	InstructBLIP	0.00
13	BLIP2	0.00

meme understanding

Rank	Model	Score
🏅️	GPT4V	80.00
🥈	Qwen	73.33
🥉	PandaGPT	46.67
4	internlm	33.33
5	VPGTrans	26.67
6	LLaMA-Adapter-v2	26.67
7	Cheetah	26.67
8	Otter-Image	26.67
9	MiniGPT-4	20.00
10	LLaVA	6.67
11	Otter	0.00
12	InstructBLIP	0.00
13	BLIP2	0.00

gui navigation

Rank	Model	Score
🏅️	internlm	100.00
🥈	LLaMA-Adapter-v2	86.67
🥉	Qwen	86.67
4	VPGTrans	73.33
5	Cheetah	73.33
6	BLIP2	73.33
7	GPT4V	66.67
8	Otter-Image	66.67
9	MiniGPT-4	60.00
10	PandaGPT	60.00
11	InstructBLIP	53.33
12	Otter	40.00
13	LLaVA	33.33

referring recognition

Rank	Model	Score
🏅️	GPT4V	66.67
🥈	Qwen	20.00
🥉	LLaMA-Adapter-v2	13.33
4	internlm	13.33
5	Otter-Image	13.33
6	Cheetah	6.67
7	MiniGPT-4	6.67
8	PandaGPT	6.67
9	VPGTrans	0.00
10	LLaVA	0.00
11	Otter	0.00
12	InstructBLIP	0.00
13	BLIP2	0.00

object counting

Rank	Model	Score
🏅️	GPT4V	70.00
🥈	MiniGPT-4	40.00
🥉	VPGTrans	30.00
4	Otter-Image	30.00
5	LLaMA-Adapter-v2	20.00
6	internlm	20.00
7	Qwen	20.00
8	Cheetah	10.00
9	LLaVA	10.00
10	Otter	0.00
11	InstructBLIP	0.00
12	PandaGPT	0.00
13	BLIP2	0.00

image accessment

Rank	Model	Score
🏅️	GPT4V	86.67
🥈	Qwen	60.00
🥉	Cheetah	53.33
4	internlm	46.67
5	MiniGPT-4	46.67
6	VPGTrans	40.00
7	LLaMA-Adapter-v2	40.00
8	LLaVA	33.33
9	Otter	33.33
10	PandaGPT	33.33
11	Otter-Image	26.67
12	InstructBLIP	0.00
13	BLIP2	0.00

medical service

Rank	Model	Score
🏅️	GPT4V	50.00
🥈	Otter-Image	45.00
🥉	Qwen	45.00
4	internlm	40.00
5	LLaMA-Adapter-v2	30.00
6	MiniGPT-4	25.00
7	Cheetah	20.00
8	PandaGPT	10.00
9	VPGTrans	5.00
10	LLaVA	0.00
11	Otter	0.00
12	InstructBLIP	0.00
13	BLIP2	0.00

visual commonsense reasoning

Rank	Model	Score
🏅️	GPT4V	100.00
🥈	Qwen	50.00
🥉	VPGTrans	20.00
4	Cheetah	20.00
5	LLaVA	20.00
6	Otter-Image	20.00
7	PandaGPT	20.00
8	LLaMA-Adapter-v2	10.00
9	MiniGPT-4	10.00
10	Otter	10.00
11	internlm	0.00
12	InstructBLIP	0.00
13	BLIP2	0.00

industry tool

Rank	Model	Score
🏅️	GPT4V	80.00
🥈	Otter-Image	60.00
🥉	MiniGPT-4	50.00
4	Qwen	50.00
5	Cheetah	40.00
6	internlm	30.00
7	LLaVA	30.00
8	VPGTrans	20.00
9	Otter	20.00
10	LLaMA-Adapter-v2	10.00
11	PandaGPT	10.00
12	InstructBLIP	0.00
13	BLIP2	0.00

visual recognition

Rank	Model	Score
🏅️	GPT4V	95.00
🥈	internlm	90.00
🥉	Qwen	80.00
4	Otter-Image	65.00
5	VPGTrans	55.00
6	Cheetah	55.00
7	LLaMA-Adapter-v2	50.00
8	PandaGPT	40.00
9	MiniGPT-4	35.00
10	LLaVA	25.00
11	Otter	15.00
12	InstructBLIP	0.00
13	BLIP2	0.00

intelligence quotient

Rank	Model	Score
🏅️	GPT4V	46.67
🥈	LLaVA	33.33
🥉	internlm	26.67
4	Otter-Image	20.00
5	Otter	20.00
6	VPGTrans	13.33
7	LLaMA-Adapter-v2	13.33
8	MiniGPT-4	13.33
9	PandaGPT	13.33
10	Qwen	13.33
11	BLIP2	13.33
12	Cheetah	6.67
13	InstructBLIP	0.00

code generation

Rank	Model	Score
🏅️	internlm	66.67
🥈	GPT4V	60.00
🥉	Cheetah	40.00
4	MiniGPT-4	40.00
5	Qwen	40.00
6	LLaMA-Adapter-v2	33.33
7	Otter-Image	33.33
8	PandaGPT	26.67
9	BLIP2	26.67
10	VPGTrans	20.00
11	LLaVA	20.00
12	Otter	20.00
13	InstructBLIP	6.67

emotional quotient

Rank	Model	Score
🏅️	GPT4V	80.00
🥈	Qwen	80.00
🥉	internlm	53.33
4	LLaMA-Adapter-v2	33.33
5	Cheetah	33.33
6	Otter-Image	33.33
7	MiniGPT-4	26.67
8	VPGTrans	20.00
9	PandaGPT	20.00
10	LLaVA	13.33
11	Otter	6.67
12	InstructBLIP	0.00
13	BLIP2	0.00

embodied agent

Rank	Model	Score
🏅️	GPT4V	80.00
🥈	Qwen	73.33
🥉	BLIP2	60.00
4	Cheetah	53.33
5	LLaMA-Adapter-v2	46.67
6	internlm	46.67
7	Otter-Image	46.67
8	VPGTrans	33.33
9	LLaVA	33.33
10	PandaGPT	33.33
11	MiniGPT-4	26.67
12	Otter	13.33
13	InstructBLIP	0.00

ocr

Rank	Model	Score
🏅️	GPT4V	53.33
🥈	Qwen	33.33
🥉	Cheetah	20.00
4	VPGTrans	13.33
5	MiniGPT-4	13.33
6	InstructBLIP	13.33
7	BLIP2	13.33
8	LLaMA-Adapter-v2	6.67
9	internlm	6.67
10	LLaVA	6.67
11	Otter-Image	0.00
12	Otter	0.00
13	PandaGPT	0.00

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overall Score

visual knowledge reasoning

multi-image comprehension

multi-image comparison

temporal understanding

object hallucination

meme understanding

gui navigation

referring recognition

object counting

image accessment

medical service

visual commonsense reasoning

industry tool

visual recognition

intelligence quotient

code generation

emotional quotient

embodied agent

ocr

About

Releases

Packages

BellXP/Aliyun

Folders and files

Latest commit

History

Repository files navigation

Overall Score

visual knowledge reasoning

multi-image comprehension

multi-image comparison

temporal understanding

object hallucination

meme understanding

gui navigation

referring recognition

object counting

image accessment

medical service

visual commonsense reasoning

industry tool

visual recognition

intelligence quotient

code generation

emotional quotient

embodied agent

ocr

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages