GitHub - jujbob/multilingual-models: This is a page to upload multilingual language resources.

Multilingual models

This repository contains the additional resources used in the paper Multilingual Dependency Parsing for Low-Resource Languages: Case Studies of North Saami and Komi-Zyrian, written by KyungTae Lim, Niko Partanen and Thierry Poibeau in LATTICE laboratory, Paris.

Also, we participated in the CoNLL 2018 shared task with those multilingual embeddings and ELMO embeddings trained by ourselves. We placed 2st in UAS and 4th LAS out of 27 teams, and shown the best performing tagger and parser for Saami with the multilingual models (see the paper SEx BiST: A Multi-Source Trainable Parser with Deep Contextualized Lexical Representations).

The additional materials include:

Bilingual dictionaries extracted from Giellatekno infrastructure's SVN repository:
Pretrained monolingual and multilingual word embeddings, latter aligned with VecMap

Komi-Zyrian UD-corpora have been later split into two sections, one for written and another for spoken languages, and they can be found in Lattice and IKDP repositories within UD infrastructure. In this study we have used the version 0.1, which is located in earlier repository which were not yet ready to be integrated into UD. We are fully aware that this version contains errors and inconsistencies, as there were in that point several open questions in applying UD annotation model to a new language.

Users interested about Komi treebanks are strongly encouraged to look into dev-branches of these treebanks, since they reflect the state that will be included in next UD release 2.3.

ELMO models (Applied in the CoNLL 2018 shared task)

During the CoNLL 2018 shared task, we trained ELMO embeddings using the data set provided by the shared task organizers.

English, French, Japanese, Chinese and Korean download 1.7G:

Papers

@inproceedings{lim:hal-01856178,
  TITLE = {{Multilingual Dependency Parsing for Low-Resource Languages: Case Studies on North Saami and Komi-Zyrian}},
  AUTHOR = {Lim, KyungTae and Partanen, Niko and Poibeau, Thierry},
  URL = {https://hal.archives-ouvertes.fr/hal-01856178},
  BOOKTITLE = {{Language Resource and Evaluation Conference}},
  ADDRESS = {Miyazaki, Japan},
  ORGANIZATION = {{ELRA}},
  YEAR = {2018},
  MONTH = May,
  KEYWORDS = {dependency parsing ; word embeddings ; Uralic languages},
  PDF = {https://hal.archives-ouvertes.fr/hal-01856178/file/600.pdf},
  HAL_ID = {hal-01856178},
  HAL_VERSION = {v1},
}


@InProceedings{lim-EtAl:2018:K18-2,
  author    = {Lim, KyungTae  and  Park, Cheoneum  and  Lee, Changki  and  Poibeau, Thierry},
  title     = {{SEx} {BiST}: A Multi-Source Trainable Parser with Deep Contextualized Lexical Representations},
  booktitle = {Proceedings of the {CoNLL} 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies},
  month     = {October},
  year      = {2018},
  address   = {Brussels, Belgium},
  publisher = {Association for Computational Linguistics},
  pages     = {143--152},
  abstract  = {We describe the SEx BiST parser (Semantically EXtended Bi-LSTM parser) developed at Lattice for the CoNLL 2018 Shared Task (Multilingual Parsing from Raw Text to Universal Dependencies). The main characteristic of our work is the encoding of three different modes of contextual information for parsing: (i) Treebank feature representations, (ii) Multilingual word representations, (iii) ELMo representations obtained via unsupervised learning from external resources. Our parser performed well in the official end-to-end evaluation (73.02 LAS -- 4th/26 teams, and 78.72 UAS -- 2nd/26); remarkably, we achieved the best UAS scores on all the English corpora by applying the three suggested feature representations. Finally, we were also ranked 1st at the optional event extraction task, part of the 2018 Extrinsic Parser Evaluation campaign.},
  url       = {http://www.aclweb.org/anthology/K18-2014}
}

TODO: Add list of papers and posters with links

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
dictionaries		dictionaries
epoch_analysis		epoch_analysis
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multilingual models

ELMO models (Applied in the CoNLL 2018 shared task)

Papers

About

Releases

Packages

Contributors 2

Languages

jujbob/multilingual-models

Folders and files

Latest commit

History

Repository files navigation

Multilingual models

ELMO models (Applied in the CoNLL 2018 shared task)

Papers

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages