Skip to content

This is a page to upload multilingual language resources.

Notifications You must be signed in to change notification settings

jujbob/multilingual-models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 

Repository files navigation

Multilingual models

This repository contains the additional resources used in the paper Multilingual Dependency Parsing for Low-Resource Languages: Case Studies of North Saami and Komi-Zyrian, written by KyungTae Lim, Niko Partanen and Thierry Poibeau in LATTICE laboratory, Paris.

Also, we participated in the CoNLL 2018 shared task with those multilingual embeddings and ELMO embeddings trained by ourselves. We placed 2st in UAS and 4th LAS out of 27 teams, and shown the best performing tagger and parser for Saami with the multilingual models (see the paper SEx BiST: A Multi-Source Trainable Parser with Deep Contextualized Lexical Representations).

The additional materials include:

Komi-Zyrian UD-corpora have been later split into two sections, one for written and another for spoken languages, and they can be found in Lattice and IKDP repositories within UD infrastructure. In this study we have used the version 0.1, which is located in earlier repository which were not yet ready to be integrated into UD. We are fully aware that this version contains errors and inconsistencies, as there were in that point several open questions in applying UD annotation model to a new language.

Users interested about Komi treebanks are strongly encouraged to look into dev-branches of these treebanks, since they reflect the state that will be included in next UD release 2.3.

ELMO models (Applied in the CoNLL 2018 shared task)

During the CoNLL 2018 shared task, we trained ELMO embeddings using the data set provided by the shared task organizers.

Papers

@inproceedings{lim:hal-01856178,
  TITLE = {{Multilingual Dependency Parsing for Low-Resource Languages: Case Studies on North Saami and Komi-Zyrian}},
  AUTHOR = {Lim, KyungTae and Partanen, Niko and Poibeau, Thierry},
  URL = {https://hal.archives-ouvertes.fr/hal-01856178},
  BOOKTITLE = {{Language Resource and Evaluation Conference}},
  ADDRESS = {Miyazaki, Japan},
  ORGANIZATION = {{ELRA}},
  YEAR = {2018},
  MONTH = May,
  KEYWORDS = {dependency parsing ; word embeddings ; Uralic languages},
  PDF = {https://hal.archives-ouvertes.fr/hal-01856178/file/600.pdf},
  HAL_ID = {hal-01856178},
  HAL_VERSION = {v1},
}


@InProceedings{lim-EtAl:2018:K18-2,
  author    = {Lim, KyungTae  and  Park, Cheoneum  and  Lee, Changki  and  Poibeau, Thierry},
  title     = {{SEx} {BiST}: A Multi-Source Trainable Parser with Deep Contextualized Lexical Representations},
  booktitle = {Proceedings of the {CoNLL} 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies},
  month     = {October},
  year      = {2018},
  address   = {Brussels, Belgium},
  publisher = {Association for Computational Linguistics},
  pages     = {143--152},
  abstract  = {We describe the SEx BiST parser (Semantically EXtended Bi-LSTM parser) developed at Lattice for the CoNLL 2018 Shared Task (Multilingual Parsing from Raw Text to Universal Dependencies). The main characteristic of our work is the encoding of three different modes of contextual information for parsing: (i) Treebank feature representations, (ii) Multilingual word representations, (iii) ELMo representations obtained via unsupervised learning from external resources. Our parser performed well in the official end-to-end evaluation (73.02 LAS -- 4th/26 teams, and 78.72 UAS -- 2nd/26); remarkably, we achieved the best UAS scores on all the English corpora by applying the three suggested feature representations. Finally, we were also ranked 1st at the optional event extraction task, part of the 2018 Extrinsic Parser Evaluation campaign.},
  url       = {http://www.aclweb.org/anthology/K18-2014}
}

TODO: Add list of papers and posters with links

About

This is a page to upload multilingual language resources.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published