Writing a first grammar

After reading this tutorial, the user will be able to write a grammar with basic functionnalities and understand it's root concepts.

What is a grammar?
A grammar is a bundle of rules describing a language. It can be writted in many different notations. The most popular one is the Backus-Naur Form. In the current project, a grammar is an abstract class providing the basics to build a language.

What is an expression?
In the parsing world, there are two different entities used to describe a language. A terminal is a leaf describing how to read a part of a text and a non-terminal is a branch describing how to regroup multiple leafs or branches. In Fluent Parser, an expression is either a terminal or a non-terminal. Some expressions can parse a text, others can decorate another expression or combine multiple of them.

How to build a grammar?
To use a grammar, you have to create a new class derivating from Grammar. You then have to describe your language using the grammar's constructor by combining expressions. Once the complete tree of expressions have been built, you have to assign the root expression to grammar's Root property.

public class ParseNumberGrammar : Grammar<int>
{
    public ParseNumberGrammar()
    {
        Root = ParseNumber.Int32 + EndOfText;
    }
}


This grammar combines two expressions using a sequence operator. The first one can read a number and the second one makes sure the texte has been parsed entirely. Once combined, they make sure the texte is composed only of a number and immediately ends.

Note : The type T in Grammar<T> corresponds to the type returned from parsing a text. In the current situation, it is the number coming from from ParseNumber.Int32. The grammar is then returning a number itself.

How to use a grammar?

Using a grammar is pretty simple. You only need to instantiate the new class and call the function ParseValue(string text):

static void Main()
{
    ParseNumberGrammar grammar = new ParseNumberGrammar();
    int number = grammar.ParseValue("1234");

    ...
}


Building a complex grammar

Expressions can be combined to build complex logic. There are multiple ways to combine them. For example, a grammar reading the content of a dictionnary could be build like this:

public class DictionaryGrammar : Grammar<Dictionary<string, string>>
{
    public DictionaryGrammar()
    {
        // Reads a string composed of any character
        // exept a comma or end of line
        Expr<string> parseValue = ParseString.Regex("[^,\r\n]+");

        // Builds a nonterminal combining three expressions
        // using the sequence operator.
        Expr<Tuple<string, string>> parsePair =
            parseValue + "," + parseValue;

        // Repeat the expression 0 to n times.
        Expr<Tuple<string, string>[]> parsePairCollection =
            parsePair.Repeat(0, "\n");

        // Converts the list of tuples into a dictionary.
        Expr<Dictionary<string, string>> parseDictionary =
            parsePairCollection.Convert(x => BuildDictionary(x));

        Root = parseDictionary;
    }

    private static Dictionary<string, string>        
        BuildDictionary(Tuple<string, string>[] pairs)
    {
        var dict = new Dictionary<string, string>();
        foreach (Tuple<string, string> pair in pairs)
            dict.Add(pair.Item1, pair.Item2);

        return dict;
    }
}


Creating an expression

There are two types of expressions. Somes will return a typed value (Expr<TResult>) and others returns no value (Symbol). There are multiple ways to build an expression. The methods to build a terminal are accessible through the following static classes:
  1. ParseSymbol : Offers multiple methods to build an expression returning no value.
  2. ParseString : Offers multiple methods to build an expression of type Expr<string>.
  3. ParseNumber : Offers multiple methods to build an expression returning a number (ex: Expr<int>).
  4. ParseOperators(leaf, separator) : Builds an operator expression allowing to register operatiors and grouping behavior (used to parse mathematical expression).
  5. Symbol(text) : Shortcut to ParseSymbol.Exact
  6. FutureExpr() : Builds an expressions to be defined later. (used for recursion)

NonTerminal used for composition or decoration are accessible using extension methods:
  1. expr.Optional() : Returns an expression that can be omitted (0 to 1).
  2. expr.Repeat(minimum, [separator]) : Repeats an expression (minimum to n)
  3. expr.Convert(delegate) : Transforms each resulting value into a new value.
  4. expr.Surround(left, right) : Combines three expressions into the form of a sequence (left + expr + right)
  5. expr.Join(right) : CombineS two expressions into the form of a sequence (expr + right)
  6. expr.Where(delegate) : Adds a condition for the success of an expression
  7. expr.ToSymbol() : Transforms an expression (Expr<T>) into a symbol (Symbol)
  8. expr.ToExpr() : Transforme a symbol (Symbol) into an expression (Expr<object>)

Last edited Dec 20, 2010 at 8:27 PM by Zumten, version 6

Comments

No comments yet.