Reggy
Friendly regular expressions for text analytics. Typical regex features are removed/adjusted to make natural language queries easier. Able to incrementally match streaming text.
API Usage
// use the high-level Pattern API for simple use cases
let mut p = new.unwrap;
assert_eq!
// transpile to normal (https://docs.rs/regex/) syntax
let ast = parse.unwrap;
assert_eq!;
// perform an incremental search with several patterns at once
let money = parse.unwrap;
let people = parse.unwrap;
let mut search = new;
// call step() to begin searching a stream
let jane_match = Match ;
assert_eq!;
// call step() again to continue with the same search state
// note "John Doe" matches across the step boundary
let john_match = Match ;
let money_match_1 = Match ;
assert_eq!;
// call finish() to retrieve any pending matches once the stream is done
let money_match_2 = Match ;
assert_eq!;
Pattern Language
Reggy
is case-insensitive by default. Spaces match any amount of whitespace (i.e. \s+
). All the reserved characters mentioned below (\
, (
, )
, ?
, |
, *
, +
, and !
) may be escaped with a backslash for a literal match. Patterns are surrounded by implicit unicode word boundaries (i.e. \b
).
Examples
Make a letter optional with ?
dogs?
matches dog
and dogs
Create two or more options with |
dog|cat
matches dog
and cat
Perform operations on groups of characters with (...)
the qualit(y|ies) required
matches the quality required
and the qualities required
the only( one)? around
matches the only around
and the only one around
Create a case-sensitive group with (!...)
United States of America|(!USA)
matches USA
, not usa
Match digits with \d
\d.\d\d
matches 3.14
Match zero-or-more characters with *
, or one-or-more characters with +
$(\d?\d?\d,)*\d?\d?\d.\d\d
matches $20.66
and $4,670,055.32