Skip to content

Latest commit

 

History

History

tokenizers

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

GoDoc

Tokenizers

Tokenizers can be passed to the ngrams.NewIndex function to change the data tokenization mechanism. More details can be found in the ngrams README.

Default Word Tokenizer (default)
// New word tokenizer which includes line breaks as distinct tokens.
tk := NewDefaultWordTokenizer(false)

// New word tokenizer without tokenized line breaks.
tk := NewDefaultWordTokenizer(true)

New tokenizers can be created by satisfying the tokenizers.Tokenizer interface.