Skip to content

Char class with exception in token regexp causes WASM compilation to explode #3496

Closed as not planned
@rabbiveesh

Description

Problem

When attempting to build my https://github.com/tree-sitter-perl/tree-sitter-perl parser using tree-sitter build --wasm, it eats up about 28GB of RAM and then gets OOM-killed. This doesn't seem like that's what's supposed to happen when building a WASM parser.

It turns out the issue is patterns like /[[_\p{XID_Start}]--[\u{b7}\u{387}\u{1369}-\u{1370}\u{19da}\u{2118}\u{212e}]][[\p{XID_Continue}]--[\u{b7}\u{387}\u{1369}-\u{1370}\u{19da}\u{2118}\u{212e}]]*/v (that's char class minus certain char ranges) cause clang to choke up while compiling the ts_lex function.

Steps to reproduce

git clone https://github.com/tree-sitter-perl/tree-sitter-perl
git checkout 65c237b 
tree-sitter generate
tree-sitter build --wasm

Expected behavior

builds that WASM parser

Tree-sitter version (tree-sitter --version)

tree-sitter 0.22.6 (installed from linuxbrew)

Operating system/version

Linux/Ubuntu 22.04

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions