Skip to content

Very slow for inputs like "a" * 100000 #195

Open
@vvolhejn

Description

I've noticed that Tiktoken is really slow for strings of repeated characters like "a" * 100_000. Interestingly, when you add spaces, like "a " * 50_000, the performance is orders of magnitude better:

Screenshot 2023-09-18 at 09 56 21

Is this a bug or a fundamental property of BPE?

My Tiktoken version is 0.5.0.

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions