Skip to content

Latest commit

 

History

History
12 lines (8 loc) · 457 Bytes

README.md

File metadata and controls

12 lines (8 loc) · 457 Bytes

zemberekMorphTR

This repo provides a way for morhological segmentation of Turkish corpora by using Zemberek.

The segmentations are done as a preprocessing step for Neural Machine Translation, Eng-Tr and Tr-Eng.

There is also a jar provided for direct usage ZemberekJar.jar, only works with Java>=7. Usage of the jar is as follows: java -jar zemberekJar -i <input_file> -o <output_file> -d -s -d : disambiguate -s : short suffix list