MetaSharp Grammar (M#g) Specs - 1.0

Overview

This document will attempt to define the specifications for the MetaSharp grammar syntax.

Transformation

The primary principle of MetaSharp is that of transformation. It can be summarized as a transformation engine. At the heart of all compilers is this very same principle, thus MetaSharp itself can be expressed grammatically using a transformation language.

OMeta’s key insight is the realization that all of the passes in a traditional compiler are essentially pattern matching operations...
- Alessandro Warth
In Ometa the concept of Pattern Matching is identical to that of Transformation in MetaSharp. The terminology is interchangable. Two important patterns in understanding compilers and transformation are the Finite State Machine and the Visitor Pattern. The grammar language is intended to be able to express both of these design patterns. If you extend the pattern matching principal to include the ability to match arbitrary objects in addition to text you are now able to extened your grammar to be able to express both finite state machines and visitors all in the same, unified syntax.

An additional principal of OMeta that is to be carried into MetaSharp is the ability to re-use existing grammars and have "inheritable" grammars, or Object Oriented grammars. Using the .NET platform for implementation makes this feature fairly easy to accomplish and Extensible, Reusable grammars are a primary focus.

Syntax

MetaSharp's grammar syntax is an extension of the Lang grammar, thus it is intermixed with standard language constructs. Specifically grammars are declared as if they were types inside of namespaces and may contain standard class members and expression trees in projections.

Grammar

The grammar syntax is used to define a prioritized list of patterns. A grammar can "inherit" from other grammar declarations as well as re-use patterns declared in other grammars. Defining an already named pattern takes priority for this grammar and inheriting grammars, requiring an override keyword. Specifying a grammar to inherit from is conceptually akin to specifying which steps need to be taken to process the input prior to the grammar. Grammars generate a class that ultimately inherits from MetaSharp.Transformation.Transformer.

For example, the inheritance chain for the following example is Example < Parser < Interleaver < Tokenizer < Transformer. Which is to say that input processed by this grammar should be text and it will be tokenized and interleaved before being processed by the Example grammar.
Example
grammar Example < Parser:
end

Rules

Rules make up the bulk of a grammar definition. The input stream is passed to a Rule named Main in the grammar and the Main rule must match each element in the input stream. The projection of the main rule is the output of the grammar.

Rules may be preceded by an optional override keyword. If a rule matches a rule defined on a super type then the rule will override the one defined on the supertype, changing its behavior.
Example
Example = "foo" -> new Foo();
override Bar = "BAR" | super;

Patterns

Patterns are everything found on the left hand side of a projection (->). Matching is made up of basic logical operators, basic quantifiers and several fundamental other patterns. The MetaSharp parser is a packrat parser that operates on a hierarchical object stream.
Types
Name Syntax
Character 'x'
String "xyz"
Number 7 or 7.11 or 0xFF
Range "a".."z" or 0..100
Null null
Bool true or false
Array [x y z]
Object x { y = z }
Any any
Default default
Fail fail
Super super
Reference X
Variable x:X
Call or Switch X(y)
  • The any keyword matches exactly one of anything in the stream, provided the stream has anything to read. Use this carefully with quantifiers.
  • The default pattern is also unique, it always matches, even on EOF, consumes nothing from the stream and projects an empty set by default. It is especially useful for the last pattern in an Or clause.
Examples
CharacterExample = 'x' 'y' 'z';

StringExample = "xyz";

ArrayExample = [x y z]
              | [x | y | z]
              | [x+]
              | [[x], [y], [z]];

ObjectExample = Foo { }
               | Foo { Name = n:Letter+ }
               | Foo { Name = ![ ], Children = c:[Child*] } -> Bar { Children = c };

Foo = X -> true | default-> false;

End = !any; // !any will only match on <EOF>. 

Logical Operators

Name Syntax
And x y z
Or x | y | z
Not !x
Group ( x ( y z ) )
Projection x -> y

Any group expression many contain any combination of nested logical expressions. Groups may also contain inner projections. They are equivalent to anonymous patterns. Variables are only reference-able in the scope they are declared in or a higher scope.

A not may only apply to a single expression but that expression can be a group expression. The not operator has precedence over quatifiers. For example "!x+" is equivalent to "(!x)+". The not operator will perform a look ahead and if the pattern is matched the not operator will instead return Fail, however if the pattern is not matched null will be projected.

Or operators will perform a look ahead and will return on the first match made in the order specified. Therefore it's possible to have multiple matches but not possible to have an ambiguous grammar since it is order dependent. The Or pattern has a higher precedence then the projection pattern thus "x | y -> z" is equivalent to "x | (y -> z)".

Quantifiers

Name Syntax
Zero or One X?
Zero or More X*
One or More X+
Exactly N X#N

Quantifiers specify how many times the expression must match in order to be true. Quantifiers are matched as look ahead and if their quantity is not matched then it returns false. The Zero or More quantifier always returns true, though it may return an empty set while Zero Or One will return null if the pattern is not matched.

Projections

Projections are also known as everything on the right hand side of the pattern. A projection may contain either a single Expression or a collection of statements, as defined by the Lang grammar. Statement Blocks are surrounded by brackets "{ ... }" and must contain return statements. Projections may access variables defined in scope of the projections pattern.
Example
Example = "xyz" -> new char[] { 'x' 'y' 'z' }; // actually projects ['x', 'y', 'z']

Block = "block" name:Identifier "{" s:Statements* "}" -> {
  BlockNode b = new BlockNode();
  b.Name = name as string;
  b.Statements = s.Cast<CodeStatement>().ToNodes();
  return b;
}

References

[1] Ometa, Alessandro Warth - http://www.tinlizzie.org/ometa/

[2] Experimenting with Programming Languages, Chapter 1.1, Allesandro Warth - http://www.vpri.org/pdf/tr2008003_experimenting.pdf

[3] Finite State Machine - http://en.wikipedia.org/wiki/Finite_state_machine

[4] Visitor Pattern - http://en.wikipedia.org/wiki/Visitor_pattern

[5] OMeta: an Object-Oriented Language for Pattern Matching, Chapter 3: O is for Object-Oriented, Alessandro Warth - http://www.tinlizzie.org/~awarth/papers/dls07.pdf

Last edited Jul 27, 2011 at 3:25 AM by justinc, version 19

Comments

No comments yet.