Skip to content

Latest commit

 

History

History
862 lines (588 loc) · 14.5 KB

parsers.md

File metadata and controls

862 lines (588 loc) · 14.5 KB

List of parser combinators

Note: when samples use a local input variable representing the input text to parse, and a parser variable, the result is usually the outcome of calling var result = parser.Parse(input) or var success = parser.TryParse(input, out var result).

Terms and Literals

These are lowest level elements of a grammar, like a '.' (dot), predefined strings like "hello", numbers, and more. In Parlot they usually are accessed from the Literals or Terms classes, with the difference that Terms will return a parser that accepts blank spaces before the element.

Terms and Literals are accessed using the Parsers.Terms and Parsers.Literals static classes.

WhiteSpace

Matches blank spaces, optionally including new lines. Returns a TextSpan with the matched spaces. This parser is not available in the Terms static class.

Parser<TextSpan> WhiteSpace(bool includeNewLines = false)

Usage

var input = "   \thello world  ";
var parser = Literals.WhiteSpace();

Result:

"   \t"

NonWhiteSpace

Matches any non-blank spaces, optionally including new lines. Returns a TextSpan with the matched characters.

Parser<TextSpan> NonWhiteSpace(bool includeNewLines = false)

Usage:

var input = "hello world";
var parser = Terms.NonWhiteSpace();

Result:

"hello"

Text

Matches a given string, optionally in a case insensitive way.

Parser<string> Text(string text, bool caseInsensitive = false)

Usage:

var input = "hello world";
var parser = Terms.Text("hello");

Result:

"hello"

Char

Matches a given character.

Parser<char> Char(char c)

Usage:

var input = "hello world";
var parser = Terms.Char("h");

Result:

'h'

Integer

Matches an integral numeric value and an optional leading sign.

Parser<long> Integer()

Usage:

var input = "-1234";

Result:

-1234

Decimal

Matches a numeric value with optional digits and leading sign. The exponent is supported.

Parser<decimal> Decimal()

Usage:

var input = "-1234.56";
var parser = Terms.Decimal(NumberOptions.AllowSign);

Result:

-1234.56

Number

Matches a numeric value of any .NET type. The NumberOptions enumeration enables to customize how the number is parsed. The return type can be any numeric .NET type that is compatible with the selected options.

Parser<T> Number() where T : INumber<T>

Usage:

var input = "-1,234.56e1";
var parser = Terms.Number<double>(NumberOptions.Float | NumberOptions.AllowGroupSeparators);

Result:

-12345.6

String

Matches a quoted string literal, optionally use single or double enclosing quotes.

Parser<TextSpan> String(StringLiteralQuotes quotes = StringLiteralQuotes.SingleOrDouble)

Usage:

var input = "'hello\\nworld'";
var parser = Terms.String();

Result:

'hello\\nworld'

Identifier

Matches an identifier, optionally with extra allowed characters. Default start chars are [$_a-zA-Z]. Other chars also include digits.

Parser<TextSpan> Identifier(Func<char, bool> extraStart = null, Func<char, bool> extraPart = null)

Usage:

var input = "slice_text();";
var parser = Terms.Identifier();

Result:

slice_text

Pattern

Matches a consecutive characters with a specific predicate, optionally defining a minimum and maximum size.

Parser<TextSpan> Pattern(Func<char, bool> predicate, int minSize = 1, int maxSize = 0)

Usage:

var input = "ababcad";
var parser = Terms.Pattern(c => c == 'a' || c == 'b');

Result:

abab

AnyOf

Matches any chars from a list of chars.

Parser<TextSpan> AnyOf(ReadOnlySpan<char> values, int minSize = 1, int maxSize = 0)
Parser<TextSpan> AnyOf(SearchValue<char> searchValues, int minSize = 1, int maxSize = 0)

Usage:

var input = "ababcad";
var parser = Terms.AnyOf("ab");

Result:

abab

Combining parsers

Or

Matches any of two parsers.

Parser<T> Or<T>(this Parser<T> parser, Parser<T> or)

Another overload accepts to return a common base class from the two parsers

Parser<T> Or<A, B, T>(this Parser<A> parser, Parser<B> or) 

Usage:

var parser = Terms.Text("one").Or(Terms.Text("1"));
parser.Parse("1");
parser.Parse("one");
parser.Parse("hello")

Result:

"1"
"one"
null

Multiple Or() calls can be used in a row, e.g., a.Or(b).Or(c).Or(d)

And

Matches two consecutive parsers. The result is a strongly typed tuple containing the two individual results.

Parser<ValueTuple<T1, T2>> And<T1, T2>(this Parser<T1> parser, Parser<T2> and)

Usage:

var parser = Terms.Text("hello").And(Terms.Text("world"));
parser.Parse("hello world").Item1;
parser.Parse("hello world").Item2;
parser.Parse("hello");

Result:

"hello"
"world"
null

Multiple And() calls can be used in a row, e.g. to match a variable assignment like age = 12

var input = "age = 12";
var parser = Terms.Identifier().And(Terms.Char('=')).And(Terms.Integer());
var result = parser.Parse(input);

Assert.Equal("age", result.Item1);
Assert.Equal('=', result.Item2);
Assert.Equal(12, result.Item3);

AndSkip

Behaves like And but skips the later one's result.

Parser<T1> AndSkip<T1, T2>(this Parser<T1> parser, Parser<T2> and)```

Usage:

var parser = Terms.Text("hello").AndSkip(Terms.Text("world"));
parser.Parse("hello world");
parser.Parse("hello");

Result:

"hello"
null

This is useful to expect successive terms but to ignore some of them and make the result as lean as possible.

var input = "age = 12";
var parser = Terms.Identifier().AndSkip(Terms.Char('=')).And(Terms.Integer());
var result = parser.Parse(input);

Assert.Equal("age", result.Item1);
Assert.Equal(12, result.Item2);

SkipAnd

Behaves like And but skips the former one's result.

Parser<T2> SkipAnd<T1, T2>(this Parser<T1> parser, Parser<T2> and)

Usage:

var parser = Terms.Text("hello").SkipAnd(Terms.Text("world"));
parser.Parse("hello world");
parser.Parse("hello");

Result:

"world"
null

This is useful to expect successive terms but to ignore some of them and make the result as lean as possible.

var input = "age = 12";
var parser = Terms.Identifier().And(Terms.Char('=')).SkipAnd(Terms.Integer());
var result = parser.Parse(input);

Assert.Equal("age", result.Item1);
Assert.Equal(12, result.Item2);

Cardinality Parsers

ZeroOrOne

Makes an existing parser optional.

Parser<T> ZeroOrOne<T>(Parser<T> parser)

Usage:

var parser = ZeroOrOne(Terms.Text("hello"));
parser.Parse("hello");
parser.Parse(""); // returns null but with a successful state

Result:

"hello"
null

ZeroOrMany

Executes a parser as long as it's successful. The result is a list of all individual results.

Parser<List<T>> ZeroOrMany<T>(Parser<T> parser)

Usage:

var parser = ZeroOrMany(Terms.Text("hello"));
parser.Parse("hello hello");
parser.Parse("");

Result:

[ "hello", "hello" ]
[]

OneOrMany

Executes a parser as long as it's successful, and is successful if at least one occurrence is found. The result is a list of all individual results.

Parser<List<T>> OneOrMany<T>(Parser<T> parser)

Usage:

var parser = OneOrMany(Terms.Text("hello"));
parser.Parse("hello hello");
parser.Parse("");

Result:

[ "hello", "hello" ]
null

Not

Succeeds if the parser is not matching.

Parser<T> Not<T>(Parser<T> parser)

Usage:

var parser = Not(Terms.Text("hello"));
parser.Parse("hello");
parser.Parse("world");

Result:

hello // failure
null // success

Coordination parsers

Separated

Matches all occurrences of a parser that are separated by another one. If a separator is not followed by a value, it is not consumed.

Parser<List<T>> Separated<U, T>(Parser<U> separator, Parser<T> parser)

Usage:

var parser = Separated(Terms.Text(","), Terms.Integer());
parser.Parse("1, 2, 3");
parser.Parse("1,2;3");

Result:

[1, 2, 3]
[1, 2]

Between

Matches a parser when between two other ones.

Parser<T> Between<A, T, B>(Parser<A> before, Parser<T> parser, Parser<B> after)

Usage:

var parser = Between(Terms.Char('['), Terms.Integer(), Terms.Char(']'));
parser.Parse("[ 1 ]");
parser.Parse("[ 1");

Result:

1
0 // failure

SkipWhiteSpace

Matches a parser after any blank spaces. This parser respects the Scanner options related to multi-line grammars.

Parser<T> SkipWhiteSpace<T>(Parser<T> parser)

Usage:

var parser = SkipWhiteSpace(Literals.Text("abc"));
parser.Parse("abc");
parser.Parse("  abc");

Result:

"abc"
"abc"

Note: This parser is used by all Terms (e.g., Terms.Text) to skip blank spaces before a Literal.

Deferred

Creates a parser that can be referenced before it is actually defined. This is used when there is a cyclic dependency between parsers.

Deferred<T> Deferred<T>()

Usage:

var parser = Deferred<string>();
var group = Between(Terms.Char('('), parser, Terms.Char(')'));
parser.Parser = Terms.Integer().Or(group);

parser.Parse("((1))");
parser.Parse("1");

Result:

1
1

Recursive

Creates a parser that can reference itself.

Deferred<T> Recursive<T>(Func<Deferred<T>, Parser<T>> parser)

Usage:

var number = Terms.Decimal();
var minus = Terms.Char('-');

var parser = Recursive<decimal>((u) =>
    minus.And(u)
        .Then(static x => 0 - x.Item2)
    .Or(number)
    );

parser.Parse("--1");

Result:

1

Capture

Ignores the individual result of a parser and returns the whole matched string instead.

This can be used for pattern matching when each part of a pattern isn't more useful that the whole result.

Parser<TextSpan> Capture<T>(Parser<T> parser)

Usage:

var parser = Terms.Identifier().And(Terms.Char('=')).And(Terms.Integer());
var capture = Capture(parser);

capture.Parse("age = 12");

Result:

"age = 12"

Flow parsers

Then

Convert the result of a parser. This is usually used to create custom data structures when a parser succeeds, or to convert it to another type.

Parser<U> Then<U>(Func<T, U> conversion)
Parser<U> Then<U>(Func<ParseContext, T, U> conversion)
Parser<U> Then<U>(U value)

Usage:

var parser = 
    Terms.Integer()
    .AndSkip(Terms.Char(','))
    .And(Terms.Integer())
    .Then(x => new Point(x.Item1, y.Item2));

parser.Parse("1,2");

Result:

Point { x: 1, y: 2}

When the previous results or the ParseContext are not used then the version without delegates can be used:

var parser = OneOf(
    Terms.Text("not").Then(UnaryOperator.Not),
    Terms.Text("-").Then(UnaryOperator.Negate)
);

Else

Returns a value if the previous parser failed.

Usage:

var parser = Terms.Integer().Else<string>(0).And(Terms.Text("years"));

capture.Parse("years");
capture.Parse("123 years");

Result:

(0, "years")
(123, "years")

ThenElse

Converts the result of a parser, or returns a value if it didn't succeed. This parser always succeeds.

NB: It is implemented using Then() and Else() parsers.

Parser<U> ThenElse<U>(Func<T, U> conversion, U elseValue)
Parser<U> ThenElse<U>(Func<ParseContext, T, U> conversion, U elseValue)
Parser<U> ThenElse<U>(U value, U elseValue)

Usage:

var parser = 
    Terms.Integer().ThenElse<long?>(x => x, null)

parser.Parse("abc");

Result:

(long?)null

When the previous results or the ParseContext are not used then the version without delegates can be used:

var parser = OneOf(
    Terms.Text("not").Then(UnaryOperator.Not),
    Terms.Text("-").Then(UnaryOperator.Negate)
);

ElseError

Fails parsing with a custom error message when the inner parser didn't match.

Parser<T> ElseError(string message)

Usage:

var parser = 
    Terms.Integer().ElseError("Expected an integer")
    .AndSkip(Terms.Char(',').ElseError("Expected a comma"))
    .And(Terms.Integer().ElseError("Expected an integer"))
    .Then(x => new Point(x.Item1, y.Item2));

parser.Parse("1,");

Result:

failure: "Expected an integer at (1:3)

Error

Fails parsing with a custom error message when the inner parser matched.

Parser<T> Error(string message)
Parser<U> Error<U>(string message)

Usage:

var parser = 
    Terms.Char('a')
    .Or(Terms.Char('b')
    .Or(Terms.Char('c').Error("Unexpected char c")));

parser.Parse("c");

Result:

failure: "Unexpected char c"

When

Adds some additional logic for a parser to succeed.

Parser<T> When(Func<T, bool> predicate)

Switch

Returns the next parser based on some custom logic that can't be defined statically. It is typically used in conjunction with a ParseContext instance which has custom options.

Parser<U> Switch<U>(Func<ParseContext, T, Parser<U>> action)

Usage:

var parser = Terms.Integer().And(Switch((context, x) =>
{
    var customContext = context as CustomParseContext;
    return Literals.Char(customContext.IntegerSeparator);
});

For performance reasons it is recommended to return a static Parser instance. Otherwise each Parse execution will allocate and it will usually be the same objects.

Eof

Expects the end of the string.

Parser<T> Eof()

Discard

Discards the previous result and replaces it with the default value or a custom one.

Parser<U> Discard<U>()
Parser<U> Discard<U>(U value)

Lookup

Builds a parser that lists all possible matches to improve performance. Most parsers implement ISeekable parsers in order to provide OneOf a way to build a lookup table and identify the potential next parsers in the chain. Some parsers don't implement ISeekable because they are built too late, like Deferred. The Lookup parser circumvents that lack.

Parser<T> Lookup<U>(params ReadOnlySpan<char> expectedChars)
Parser<T> Lookup(params ISeekable[] parsers)

Other parsers

AnyCharBefore

Returns any characters until the specified parser is matched.

Parser<TextSpan> AnyCharBefore<T>(Parser<T> parser, bool canBeEmpty = false, bool failOnEof = false, bool consumeDelimiter = false)

Always

Always returns successfully, with an optional return type or value.

Parser<T> Always<T>()
Parser<object> Always()
Parser<T> Always<T>(T value)

OneOf

Like Or, with an unlimited list of parsers.

Parser<T> OneOf<T>(params Parser<T>[] parsers)