description: An abstract base class for splitting text.
# text.Splitter
View
source
An abstract base class for splitting text.
text.Splitter(
name=None
)
A Splitter is a module that splits strings into pieces. Generally, the pieces
returned by a splitter correspond to substrings of the original string, and can
be encoded using either strings or integer ids (where integer ids could be
created by hashing strings or by looking them up in a fixed vocabulary table
that maps strings to ids).
Each Splitter subclass must implement a `split` method, which subdivides each
string in an input Tensor into pieces. E.g.:
```
>>> class SimpleSplitter(tf_text.Splitter):
... def split(self, input):
... return tf.strings.split(input)
>>> print(SimpleSplitter().split(["hello world", "this is a test"]))
```
## Methods
split
View
source
@abc.abstractmethod
split(
input
)
Splits the input tensor into pieces.
Generally, the pieces returned by a splitter correspond to substrings of the
original string, and can be encoded using either strings or integer ids.
#### Example:
```
>>> print(tf_text.WhitespaceTokenizer().split("small medium large"))
tf.Tensor([b'small' b'medium' b'large'], shape=(3,), dtype=string)
```
| Args |
|
`input`
|
An N-dimensional UTF-8 string (or optionally integer) `Tensor` or
`RaggedTensor`.
|
| Returns |
|
An N+1-dimensional UTF-8 string or integer `Tensor` or `RaggedTensor`.
For each string from the input tensor, the final, extra dimension contains
the pieces that string was split into.
|