Skip to content

A list of awesome Machine Translation frameworks, libraries, software and papers

License

Notifications You must be signed in to change notification settings

maidis/awesome-machine-translation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

74 Commits
 
 
 
 

Repository files navigation

Awesome Machine Translation Awesome

A list of awesome Machine Translation frameworks, libraries, software and papers. Inspired by awesome-machine-learning.

If you want to contribute to this list (please do), send me a pull request or contact me @anilozbek. Also, a listed repository should be deprecated if:

  • Repository's owner explicitly say that "this library is not maintained".
  • Not committed for long time (2~3 years).

You can also find an updated list of machine translation frameworks, libraries, software and papers at machinetranslate.org.

Contents

Aligners 🌌

  • Bleualign - A machine translation based sentence alignment tool for parallel text.
  • Bleualign-cpp - A C++ sentence alignment tool based on Bleualign. Bleualign-cpp is expected to be used together with document-aligner.
  • hunalign - A tool that aligns bilingual text on the sentence level.
  • Getting started with Sentence Alignment - A list of sentence alignment tools.
  • LF Aligner - A tool to create translation memories from texts and their translations. It relies on Hunalign for automatic sentence pairing.
  • Vecalign - An accurate sentence alignment algorithm that works in about 100 languages, without the need for a machine translation system or lexicon.
  • Web Align Toolkit - Online parallel texts aligner and format converter.
  • yalign - A sentence aligner for comparable corpora.
  • yasa - Yet Another Sentence Aligner.

Applications 💻

  • Argos Translate - An open-source offline translation library written in Python. Uses OpenNMT for translations, SentencePiece for tokenization, Stanza for sentence boundary detection, and PyQt for GUI.
  • Canopy Speak - A freemium smart medical phrase mobile app.
  • CTranslate-NMT-Web-Interface - A Machine Translation web interface for OpenNMT and FairSeq models using CTranslate and Streamlit.
  • DesktopTranslator - A local cross-platform machine translation GUI, based on CTranslate2.
  • Intento - A simple API to third-party machine translation services from many vendors.
  • iTranslate - A translation and dictionary app.
  • LibreOffice Translate - An extension providing neural machine translation for LibreOffice with a single click.
  • LibreTranslate - A free and open source machine translation API.
  • Local-NMT - A pre-trained Huggingface Machine Translation engine with UI on local computer.
  • Mantra - A highly accurate automatic translation of manga.
  • Skype Translator - A real-time voice and text translator.
  • Slatona Translator - A translation app for macOS that annotates word senses.
  • translateLocally - A fast and secure translation on your local machine, powered by marian and Bergamot.

Books 📚

  • Learning Machine Translation - Cyril Goutte, Nicola Cancedda, Marc Dymetman, George Foster - 2008 - The book looks first at enabling technologies that solve problems that are not Machine Translation proper but are linked closely to the development of a Machine Translation system, and then presents some Machine Translation techniques.
  • Machine Translation - Thierry Poibeau - 2018 - A concise, nontechnical overview of the development of machine translation, including the different approaches, evaluation issues, and market potential.
  • Machine Translation - Pushpak Bhattacharyya - 2015 - A book that compares and contrasts the salient principles and practices of rule-based machine translation, statistical machine translation, and and example-based machine translation.
  • Statistical Machine Translation - Philipp Koehn - An introductory text to statistical machine translation (SMT) provides all of the theories and methods needed to build a statistical machine translator.
  • Syntax-based Statistical Machine Translation - Philip Williams, Rico Sennrich, Matt Post, Philipp Koehn - 2016 - A comprehensive introduction to the syntax-based statistical machine translation models.
  • Makine Çevirisi - Erdinç Aslan - 2019 - Turkish - A book that will provide a good introduction to students taking courses such as Translation Technologies and those starting to work in the field of machine translation.
  • Neural Machine Translation - Philipp Koehn - 2020 - A book that introduces the challenge of machine translation and evaluation, including historical, linguistic, and applied context, then develops the core deep learning methods used for natural language applications.
  • Machine Translation: Foundations and Models - Tong Xiao, Jingbo Zhu - 2020... - Chinese - A book that gives a systematic introduction to the basic knowledge and modeling methods of machine translation, and on this basis, discuss some cutting-edge technologies of machine translation. It can be used for the study of senior undergraduates and graduate students in computer and artificial intelligence related majors, and can also be used as a reference material for researchers related to natural language processing, especially machine translation.

Companies and Paid Services 💰

  • KantanAI - A SaaS-based Machine Translation platform.
  • Lingua Custodia - A machine translation company specializes in finance.
  • SYSTRAN - One of the oldest machine translation companies.
  • SDL Machine Translation - Neural and statistical based machine translation services.
  • Unbabel - A company that provides AI-powered, human-refined translation for customer support.
  • Waverly Labs - A tech startup in NYC at the convergence of wearable technology and machine translation.

Frameworks 🖼

  • Apertium - An open source rule-based machine translation platform.
  • Bergamot - Cross platform C++ library focusing on optimized machine translation on the consumer-grade device.
  • EnglishTurkishTranslation-CPP - An English-Turkish phrase-based translation library.
  • fairseq - A sequence modeling toolkit to train custom models for translation, summarization, language modeling and other text generation tasks.
  • Joey NMT - A minimalist NMT for educational purposes.
  • Marian - A neural machine translation framework written in pure C++ with minimal dependencies.
  • ModernMT - A neural adaptive machine translation that adapts to context and learns from corrections.
  • Moses - A statistical machine translation system that allows to automatically train translation models for any language pair.
  • Nematus - Attention-based encoder-decoder model for neural machine translation built in Tensorflow.
  • NiuTrans.NMT - A fast Neural Machine Translation system that developed in C++ and resorts to NiuTensor for fast tensor APIs.
  • NiuTrans.SMT - An open source statistical machine translation system that fully developed in C++ language.
  • OpenNMT - An open source initiative for neural machine translation and neural sequence modeling.
  • Sockeye - A sequence-to-sequence framework with a focus on Neural Machine Translation based on PyTorch.
  • THUMT - An open source toolkit for neural machine translation.

Hardware 🎧

  • ili - An instant offline translation device for travelers.

Online MT Services 🌐

  • Bing Microsoft Translator - A service to translate texts or entire web pages into different languages.
  • DeepL Translator - A translation service that currently supports translations between seven major European languages, powered by neural network technology.
  • Google Translate - A free service instantly translates words, phrases, and web pages between English and over 100 other languages.
  • Masakhane - A machine translation service for African languages.
  • ModernMT - ModernMT online demo.
  • MyDutchPal's Neural MT Gateway - A free online neural machine translation system to translate short pieces of text.
  • NiuTrans - A neural machine translation engine for 115 languages.
  • SYSTRAN Translate - A demonstrator of SYSTRAN's MT engines.
  • THUMT - THUMT online demo.
  • Ubiqus Online Text Translation - Free online translation for information purposes only in English, French, German, Spanish, Italian, Dutch. Up to 2,500 characters i.e. about 350 words.
  • Yandex.Translate - A web service provided by Yandex, intended for the translation of text or web pages into another language.

Organizations and Events 🎉

  • AAMT - Asia-Pacific Association for Machine Translation.
  • AMTA - Association for Machine Translation in the Americas.
  • EAMT (European Association for Machine Translation) - An organization that serves the growing community of people interested in MT and translation tools, including users, developers, and researchers.
  • WMT18 - A conference builds on a series of annual workshops and conferences on statistical machine translation, going back to 2006.

Other MT Lists 📝

Papers 📄

Parallel Texts ⏸️

  • Avrupa Birliği İlerleme Raporları - Regular Turkish and English progress reports prepared for Turkey by the European Commission.
  • Corpora Cleaning Tools - Tools for filtering and cleaning parallel and monolingual corpora in order to train better (neural) machine translation systems.
  • OmegaWiki - A collaborative project to produce a free, multilingual dictionary for every language with lexicological, terminological and thesaurus information.
  • OPUS - A growing collection of translated texts from the web.
  • Publicly accessible translation memories - Several online services allowing access to aggregated translation memories.
  • turkish-parallel-corpora - Some English-Turkish parallel texts for use in machine translation systems.

Tools 🛠

  • Corpora Cleaning Tools - Tools for filtering and cleaning parallel and monolingual corpora in order to train better (neural) machine translation systems.
  • MT-Tools - A collection of common machine translation tools.
  • MTData - A tool that locates, downloads, and extracts machine translation corpora.
  • Multiword Expression Tools - Tools for use with multiword expression extraction from parallel corpora for Moses statistical machine translation system.
  • OpusFilter - A tool for filtering and combining parallel corpora.
  • SMT Corpus Tools - A tool set to process corpus files for machine translation.

Tutorials and Blogs 🎒