Regexp Separator not working OOTB with (Recursive)CharacterSplitter

### Checked other resources

- [X] I added a very descriptive title to this issue.
- [X] I searched the LangChain documentation with the integrated search.
- [X] I used the GitHub search to find a similar question and didn't find it.
- [X] I am sure that this is a bug in LangChain rather than my code.
- [X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

### Example Code

```python
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
import re

text_splitter = CharacterTextSplitter(
    # Updated separator to match both uppercase and title case chapter headings
    separator="\bCHAPTER\b", # doesn't work 
    # works separator=r"\bCHAPTER\b",
    chunk_size=500, chunk_overlap = 0,
    is_separator_regex=True,
)


char_chunks = text_splitter.split_text(full_book)
print([c[0:10] for c in char_chunks])
len(char_chunks), len(char_chunks[0])
```

['Acknowledg', '1\nIntroduc', '2\nOrganizi']

### Error Message and Stack Trace (if applicable)

Non-Working case with: `separator="\bCHAPTER\b",`

```
['Acknowledg']
(1, 2996)
````

Working case with r-string: `separator=r"\bCHAPTER\b",`

```
['Acknowledg', '1\nIntroduc', '2\nOrganizi']
(3, 696)
```

### Description

we just spent two hours trying to figure out how to use recursive/character text splitter with regexp-separators

it turned out none of the docs or the code had the right information, there is no mention of r-strings anywhere in the docs and the example also doesn't have any. And it also says "interpreted as regexp" which is not true.

https://python.langchain.com/docs/how_to/recursive_text_splitter/

> `is_separator_regex`: Whether the separator list (defaulting to `["\n\n", "\n", " ", ""]`) should be *interpreted* as regex.

We thought strings are turned automatically into regexps, but it doesn't seem so, it only escapes non-regexp-strings if `is_separator_regex` is False

see https://github.com/langchain-ai/langchain/blob/master/libs/text-splitters/langchain_text_splitters/character.py#L24-L93

so the solution was :exploding_head:  to use r-strings `r"^CHAPTER \d+$"` otherwise you get only a single chunk because your regexp is not found as a separator.

Not sure how any of the language stuff that has regexpes actuallly works?

e.g. Markdown
https://github.com/langchain-ai/langchain/blob/master/libs/text-splitters/langchain_text_splitters/character.py#L440-L443

### System Info

```
System Information
------------------
> OS:  Darwin
> OS Version:  Darwin Kernel Version 23.6.0: Wed Jul 31 20:49:39 PDT 2024; root:xnu-10063.141.1.700.5~1/RELEASE_ARM64_T6000
> Python Version:  3.11.10 (main, Sep  7 2024, 01:03:31) [Clang 15.0.0 (clang-1500.3.9.4)]

Package Information
-------------------
> langchain_core: 0.2.35
> langchain: 0.2.14
> langchain_community: 0.2.12
> langsmith: 0.1.104
> langchain-genai-website: Installed. No version info available.
> langchain_anthropic: 0.1.13
> langchain_aws: 0.1.6
> langchain_cli: 0.0.22
> langchain_experimental: 0.0.64
> langchain_fireworks: 0.1.3
> langchain_google_genai: 1.0.4
> langchain_google_vertexai: 1.0.4
> langchain_groq: 0.1.5
> langchain_openai: 0.1.22
> langchain_text_splitters: 0.2.2
> langserve: 0.1.1
...
> tomlkit: 0.12.0
> typer[all]: Installed. No version info available.
> typing-extensions: 4.12.2
> uvicorn: 0.30.6
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Regexp Separator not working OOTB with (Recursive)CharacterSplitter #28407

Checked other resources

Example Code

Error Message and Stack Trace (if applicable)

Description

System Info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Regexp Separator not working OOTB with (Recursive)CharacterSplitter #28407

Description

Checked other resources

Example Code

Error Message and Stack Trace (if applicable)

Description

System Info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions