layoutxlm

LayoutXLM (Document Foundation Model)

Multimodal (text + layout/format + image) pre-training for multilingual Document AI

Introduction

LayoutXLM is a multimodal pre-trained model for multilingual document understanding, which aims to bridge the language barriers for visually-rich document understanding. Experiment results show that it has significantly outperformed the existing SOTA cross-lingual pre-trained models on the XFUND dataset.

LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei, arXiv Preprint 2021

Models

layoutxlm-base | huggingface

Fine-tuning Example on XFUND

Installation

Please refer to layoutlmft

Fine-tuning for Semantic Entity Recognition

cd layoutlmft
python -m torch.distributed.launch --nproc_per_node=4 examples/run_xfun_ser.py \
        --model_name_or_path microsoft/layoutxlm-base \
        --output_dir /tmp/test-ner \
        --do_train \
        --do_eval \
        --lang zh \
        --max_steps 1000 \
        --warmup_ratio 0.1 \
        --fp16

Fine-tuning for Relation Extraction

cd layoutlmft
python -m torch.distributed.launch --nproc_per_node=4 examples/run_xfun_re.py \
        --model_name_or_path microsoft/layoutxlm-base \
        --output_dir /tmp/test-ner \
        --do_train \
        --do_eval \
        --lang zh \
        --max_steps 2500 \
        --per_device_train_batch_size 2 \
        --warmup_ratio 0.1 \
        --fp16

Results on XFUND

Language-specific Finetuning

	Model	FUNSD	ZH	JA	ES	FR	IT	DE	PT	Avg.
Semantic Entity Recognition	`xlm-roberta-base`	0.667	0.8774	0.7761	0.6105	0.6743	0.6687	0.6814	0.6818	0.7047
	`infoxlm-base`	0.6852	0.8868	0.7865	0.6230	0.7015	0.6751	0.7063	0.7008	0.7207
	`layoutxlm-base`	0.794	0.8924	0.7921	0.7550	0.7902	0.8082	0.8222	0.7903	0.8056
Relation Extraction	`xlm-roberta-base`	0.2659	0.5105	0.5800	0.5295	0.4965	0.5305	0.5041	0.3982	0.4769
	`infoxlm-base`	0.2920	0.5214	0.6000	0.5516	0.4913	0.5281	0.5262	0.4170	0.4910
	`layoutxlm-base`	0.5483	0.7073	0.6963	0.6896	0.6353	0.6415	0.6551	0.5718	0.6432

Zero-shot Transfer Learning

	Model	FUNSD	ZH	JA	ES	FR	IT	DE	PT	Avg.
SER	`xlm-roberta-base`	0.667	0.4144	0.3023	0.3055	0.371	0.2767	0.3286	0.3936	0.3824
	`infoxlm-base`	0.6852	0.4408	0.3603	0.3102	0.4021	0.2880	0.3587	0.4502	0.4119
	`layoutxlm-base`	0.794	0.6019	0.4715	0.4565	0.5757	0.4846	0.5252	0.539	0.5561
RE	`xlm-roberta-base`	0.2659	0.1601	0.2611	0.2440	0.2240	0.2374	0.2288	0.1996	0.2276
	`infoxlm-base`	0.2920	0.2405	0.2851	0.2481	0.2454	0.2193	0.2027	0.2049	0.2423
	`layoutxlm-base`	0.5483	0.4494	0.4408	0.4708	0.4416	0.4090	0.3820	0.3685	0.4388

Multitask Fine-tuning

	Model	FUNSD	ZH	JA	ES	FR	IT	DE	PT	Avg.
SER	`xlm-roberta-base`	0.6633	0.883	0.7786	0.6223	0.7035	0.6814	0.7146	0.6726	0.7149
	`infoxlm-base`	0.6538	0.8741	0.7855	0.5979	0.7057	0.6826	0.7055	0.6796	0.7106
	`layoutxlm-base`	0.7924	0.8973	0.7964	0.7798	0.8173	0.821	0.8322	0.8241	0.8201
RE	`xlm-roberta-base`	0.3638	0.6797	0.6829	0.6828	0.6727	0.6937	0.6887	0.6082	0.6341
	`infoxlm-base`	0.3699	0.6493	0.6473	0.6828	0.6831	0.6690	0.6384	0.5763	0.6145
	`layoutxlm-base`	0.6671	0.8241	0.8142	0.8104	0.8221	0.8310	0.7854	0.7044	0.7823

Citation

If you find LayoutXLM useful in your research, please cite the following paper:

@article{Xu2020LayoutXLMMP,
  title         = {LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding},
  author        = {Yiheng Xu and Tengchao Lv and Lei Cui and Guoxin Wang and Yijuan Lu and Dinei Florencio and Cha Zhang and Furu Wei},
  year          = {2021},
  eprint        = {2104.08836},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CL}
}

License

The content of this project itself is licensed under the Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)

Contact Information

For help or issues using LayoutXLM, please submit a GitHub issue.

For other communications related to LayoutXLM, please contact Lei Cui ([email protected]), Furu Wei ([email protected]).

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

layoutxlm

layoutxlm

README.md

LayoutXLM (Document Foundation Model)

Introduction

Models

Fine-tuning Example on XFUND

Installation

Fine-tuning for Semantic Entity Recognition

Fine-tuning for Relation Extraction

Results on XFUND

Language-specific Finetuning

Zero-shot Transfer Learning

Multitask Fine-tuning

Citation

License

Contact Information

Files

layoutxlm

Directory actions

More options

Directory actions

More options

Latest commit

History

layoutxlm

Folders and files

parent directory

README.md

LayoutXLM (Document Foundation Model)

Introduction

Models

Fine-tuning Example on XFUND

Installation

Fine-tuning for Semantic Entity Recognition

Fine-tuning for Relation Extraction

Results on XFUND

Language-specific Finetuning

Zero-shot Transfer Learning

Multitask Fine-tuning

Citation

License

Contact Information