Skip to content

Make language-data available in MediaWiki core #372

@winstonsung

Description

@winstonsung

We would like to bring language-data to MediaWiki core.

Questions

Should we move this repository to Gerrit/GitLab?

Reason for Gerrit:

  • We could easily make the list of users with CR +2 rights the same with mediawiki/core on Gerrit.
  • It would be hard for integration of Depends-On on different platforms.
    (There's no CI injection/dependency feature for libraries in Wikimedia Gerrit.)

Reason for GitLab: Contributors aren't required to accept third party privacy policies.


The reason it should be under Gerrit instead of GitLab is due to the decision of the project layout.

This repository shold fall under mediawiki/libs (i.e., named as mediawiki/libs/LanguageData and included in /vendor in mediawiki/core) as it should contain PHP codes, and all mediawiki/libs/ projects were on Gerrit while none of them were on GitLab.

https://www.mediawiki.org/wiki/GitLab/Migration_status


Should composer.json be exported?

Looks like we need composer.json to be exported, should it be removed from .gitattributes export-ignore?

https://gerrit.wikimedia.org/r/c/mediawiki/vendor/+/1056254

Nikki wrote:

The language-data format doesn't support all the data they have (multiple scripts, Wikidata IDs, English names, parent language/families, etc), and requires data that is hard to get (autonyms), I think it would need big changes if it's ever going to be useful for things other than selecting a MediaWiki interface language.

Considerations

  • BCP 47 Language/script/region/variant subtags
  • ISO codes
    • NOTE: This is actually different from BCP 47 subtags.
  • MediaWiki internal language codes
  • Wikidata IDs
  • WikiLambda ZID
  • Autonyms (language name written in its local writing system)
  • The script in which a language is written
    • Multiple scripts
  • The regions in which the language is spoken/written
  • Translations of language names
    • English names
  • Language fallback chains
  • Parent language/families
  • The writing mode of the text
    • The directionality of the text
    • The writing-mode property of the text
  • Time formats

Bug: T190129

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions