Tokenizer, POS-tagger, and dependency-parser with Transformers and SuPar.
>>> import esupar
>>> nlp=esupar.load("ja")
>>> doc=nlp("太郎は花子が読んでいる本を次郎に渡した")
>>> print(doc)
1 太郎 _ PROPN _ _ 12 nsubj _ SpaceAfter=No
2 は _ ADP _ _ 1 case _ SpaceAfter=No
3 花子 _ PROPN _ _ 5 nsubj _ SpaceAfter=No
4 が _ ADP _ _ 3 case _ SpaceAfter=No
5 読ん _ VERB _ _ 8 acl _ SpaceAfter=No
6 で _ SCONJ _ _ 5 mark _ SpaceAfter=No
7 いる _ AUX _ _ 5 aux _ SpaceAfter=No
8 本 _ NOUN _ _ 12 obj _ SpaceAfter=No
9 を _ ADP _ _ 8 case _ SpaceAfter=No
10 次郎 _ PROPN _ _ 12 obl _ SpaceAfter=No
11 に _ ADP _ _ 10 case _ SpaceAfter=No
12 渡し _ VERB _ _ 0 root _ SpaceAfter=No
13 た _ AUX _ _ 12 aux _ _
>>> import deplacy
>>> deplacy.render(doc,Japanese=True)
太郎 PROPN ═╗<════════╗ nsubj(主語)
は ADP <╝ ║ case(格表示)
花子 PROPN ═╗<══╗ ║ nsubj(主語)
が ADP <╝ ║ ║ case(格表示)
読ん VERB ═╗═╗═╝<╗ ║ acl(連体修飾節)
で SCONJ <╝ ║ ║ ║ mark(標識)
いる AUX <══╝ ║ ║ aux(動詞補助成分)
本 NOUN ═╗═════╝<╗ ║ obj(目的語)
を ADP <╝ ║ ║ case(格表示)
次郎 PROPN ═╗<╗ ║ ║ obl(斜格補語)
に ADP <╝ ║ ║ ║ case(格表示)
渡し VERB ═╗═╝═════╝═╝ root(親)
た AUX <╝ aux(動詞補助成分)
esupar.load(model)
loads a natural language processor pipeline, working on Universal Dependencies. Available model
options are:
model="ja"
Japanese model bert-base-japanese-upos (default)model="ja_large"
Japanese model bert-large-japanese-uposmodel="ja_luw_small"
Japanese long-unit-word model roberta-small-japanese-char-luw-uposmodel="ja_luw_base"
Japanese long-unit-word model bert-base-japanese-luw-uposmodel="ja_luw_large"
Japanese long-unit-word model bert-large-japanese-luw-uposmodel="ko"
Korean model roberta-base-korean-uposmodel="ko_large"
Korean model roberta-large-korean-uposmodel="ko_morph_base"
Korean morpheme model roberta-base-korean-morph-uposmodel="ko_morph_large"
Korean morpheme model roberta-large-korean-morph-uposmodel="zh"
Chinese model chinese-bert-wwm-ext-uposmodel="zh_base"
Chinese model chinese-roberta-base-uposmodel="zh_large"
Chinese model chinese-roberta-large-uposmodel="lzh"
Classical Chinese model roberta-classical-chinese-base-uposmodel="lzh_large"
Classical Chinese model roberta-classical-chinese-large-uposmodel="th"
Thai model roberta-base-thai-spm-uposmodel="vi"
Vietnamese model bert-base-vietnamese-uposmodel="en"
English model roberta-base-english-uposmodel="en_large"
English model roberta-large-english-uposmodel="de"
German model bert-base-german-uposmodel="de_large"
German model bert-large-german-uposmodel="sr"
Serbian (Cyrillic and Latin) model roberta-base-serbian-uposmodel="cop"
Coptic model roberta-base-coptic-uposmodel="ain"
Ainu model roberta-base-ainu-upos
pip3 install esupar --user
Make sure to get python37-devel
python37-pip
python37-cython
python37-numpy
python37-wheel
gcc-g++
mingw64-x86_64-gcc-g++
git
curl
make
cmake
, and then:
curl -L https://raw.githubusercontent.com/KoichiYasuoka/CygTorch/master/installer/supar.sh | sh
pip3.7 install esupar
!pip install esupar
Try notebook.
Koichi Yasuoka (安岡孝一)