Text2Chart31

Official PyTorch implementation of "Text2Chart31: Instruction Tuning for Chart Generation with Automatic Feedback", (EMNLP 2024 Main Oral Presentation)

Paper: Link
Project Page: Link

Abstract Large language models (LLMs) have demonstrated strong capabilities across various language tasks, notably through instruction-tuning methods. However, LLMs face challenges in visualizing complex, real-world data through charts and plots. Firstly, existing datasets rarely cover a full range of chart types, such as 3D, volumetric, and gridded charts. Secondly, supervised fine-tuning methods do not fully leverage the intricate relationships within rich datasets, including text, code, and figures. To address these challenges, we propose a hierarchical pipeline and a new dataset for chart generation. Our dataset, Text2Chart31, includes 31 unique plot types referring to the Matplotlib library, with 11.1K tuples of descriptions, code, data tables, and plots. Moreover, we introduce a reinforcement learning-based instruction tuning technique for chart generation tasks without requiring human feedback. Our experiments show that this approach significantly enhances the model performance, enabling smaller models to outperform larger open-source models and be comparable to state-of-the-art proprietary models in data visualization tasks.

Dataset File

Code, Description, Data Table and Reasoning Step:
- Training set (Text2Chart31): ./prepare-data/Text2Chart-31-train.json
- Test set: ./prepare-data/Text2Chart-31-test.json.
Dataset file including the figures:
- Training set (Text2Chart31) : download
- Test set: download
Text2Chart31-v2:
- As pointed out in tabel 1 Text2Chart31-v2 is constructed and published at the camera ready version of the paper, and the experiment results in this paper is conducted with Text2Chart31. We will upload the experimental results of Text2Chart31-v2 on this github page soon. However you can download Text2Chart31-v2 here. (To be announced soon.)

LoRA checkpoints

Unzip it under checkpoint folder and run inference code.

Supervised fine-tuned model

Task	Model	Checkpoints
Task 1	Llama 3 Instruct	download
Task 1	Code Llama 13B	download
Task 2	Llama 3 Instruct	download
Task 3	Llama 3 Instruct	download

RL-tuned model

Task	Model	Checkpoints
Task 1	Llama 3 Instruct	download
Task 3	Llama 3 Instruct	download
Task 1	Code Llama 13B	download

Llama 3 Instruct models for Task 1 and Task 3 are jointly optimized with our algorithm here.

Reward model checkpoint

OPT model (Llama 3 Instruct): download
OPT model (Code Llama 13B): download

Training code

Supervised fine-tuning

Task 1 (Llama 3 Instruct): Run python sft-task1.py
Task 1 (Code llama 13B): Run python sft-task1-cli.py
Task 2: Run python sft-task2.py
Task 3: Run python sft-task3.py

RL fine-tuning

You would need to download reward model/SFT model checkpoints beforehand.

Task 1 & Task 3 (Llama 3 Instruct): Run python rl-task1-task3.py
Task 1 (Code Llama 13B) : Run python rl-task1-cli.py

Generating samples

Task 1

Llama 3 Instruct:
- Base model : Run python generate-llama3-base.py
- SFT model : Run python generate-llama3-bf16-sft.py
- RL model : Run python generate-llama3-bf16-rl.py
Code Llama 13B:
- Base model: Run python generate-cli-base.py
- SFT model: Run python generate-cli-sft.py
- RL model: Run python generate-cli-bf16-rl.py

Task 2

SFT model : Run python generate2-llama3-sft.py (You would need to train the model beforehand).

Task 3

Base model : Run python generate3-llama3-base.py
SFT model : Run python generate3-llama3-bf16-sft.py
RL model : Run python generate3-llama3-bf16-rl.py

Evaluation

We will release the evaluation code soon!

Citation

If you find this code useful, please don't forget to cite the paper! 🙂

@inproceedings{pesaranzadeh2024text2chart31,
    title = “Text2Chart31: Instruction Tuning for Chart Generation with Automatic Feedback”,
    author = "Pesaran zadeh, Fatemeh  and Kim, Juyeon  and Kim, Jin-Hwa and Kim, Gunhee",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
    year = "2024”,
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text2Chart31

Dataset File

LoRA checkpoints

Supervised fine-tuned model

RL-tuned model

Reward model checkpoint

Training code

Supervised fine-tuning

RL fine-tuning

Generating samples

Task 1

Task 2

Task 3

Evaluation

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
asset		asset
prepare-data		prepare-data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
generate-cli-base.py		generate-cli-base.py
generate-cli-bf16-rl.py		generate-cli-bf16-rl.py
generate-cli-sft.py		generate-cli-sft.py
generate-llama3-base.py		generate-llama3-base.py
generate-llama3-bf16-rl.py		generate-llama3-bf16-rl.py
generate-llama3-bf16-sft.py		generate-llama3-bf16-sft.py
generate2-llama3-sft.py		generate2-llama3-sft.py
generate3-llama3-base.py		generate3-llama3-base.py
generate3-llama3-bf16-rl.py		generate3-llama3-bf16-rl.py
generate3-llama3-bf16-sft.py		generate3-llama3-bf16-sft.py
rl-task1-cli.py		rl-task1-cli.py
rl-task1-task3.py		rl-task1-task3.py
sft-task1-cli.py		sft-task1-cli.py
sft-task1.py		sft-task1.py
sft-task2.py		sft-task2.py
sft-task3.py		sft-task3.py
split-dataset.py		split-dataset.py

License

fatemehpesaran310/Text2Chart31

Folders and files

Latest commit

History

Repository files navigation

Text2Chart31

Dataset File

LoRA checkpoints

Supervised fine-tuned model

RL-tuned model

Reward model checkpoint

Training code

Supervised fine-tuning

RL fine-tuning

Generating samples

Task 1

Task 2

Task 3

Evaluation

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages