BNF and EBNF

When you are describing a context-free grammar such as the syntax of a formal language e.g. a programming language when you are constructing a compiler, you most commonly use the so called Backus-Naur form (usually just called BNF).
BNF was invented by John Backus, who developed FORTRAN, when he needed to express the grammar of ALGOL.
It was then only recognized as Backus Normal Form but after the Danish programmer Peter Naur had made some changes to it the meaning of N was changed to Naur. BNF consists of terminals, nonterminals and some metasymbols which you write expressions, called production, that maps your grammar. Terminals are literals (like ‘a’, ‘b’, ‘c’, ‘1′, ‘3′, ‘4′, ‘!’) and nonterminals are composed by terminals and other nonterminals.
Literal = '1' | '2' | 'a' | '-'.

 

Nonterminal = {Literal}.

 

In the productions above the elements 1, 2, a, - are terminals and Literal, Nonterminal nonterminals.
The conventions used when naming terminals are that they are capitalized and nonterminals not.
With BNF you can do recursions and other things by using different metasymbols. Following are the most common ones from both standard BNF and the Extended BNF.
{}  Repeat 0 or more times
 ()  Grouping
[]  Optional element
 |   Or
 <>  Encloses nonterminals (used in EBNF)   

Here is an example using all elements and some others too.

(* a simple program in EBNF − Wikipedia *)
program = 'PROGRAM' , white space , identifier , white space ,
           'BEGIN' , white space ,
           { assignment , ";" , white space } ,
           'END.' ;
identifier = alphabetic character , { alphabetic character | digit } ;
number = [ "-" ] , digit , { digit } ;
string = '"' , { all characters − '"' } , '"' ;
assignment = identifier , ":=" , ( number | identifier | string ) ;
alphabetic character = "A" | "B" | "C" | "D" | "E" | "F" | "G"
                     | "H" | "I" | "J" | "K" | "L" | "M" | "N"
                     | "O" | "P" | "Q" | "R" | "S" | "T" | "U"
                     | "V" | "W" | "X" | "Y" | "Z" ;
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
white space = ? white space characters ? ;
all characters = ? all visible characters ? ;

 

With this grammar you can parse text that is syntactically correct with the grammar. For example this code:

PROGRAM DEMO1
BEGIN
  A0:=3;
  B:=45;
  H:=-100023;
  C:=A;
  D123:=B34A;
  BABOON:=GIRAFFE;
  TEXT:="Hello world!";
END.

There are some good engines available on the market, both commercial and free. I have tried GOLD Parser Generator, a free grammar parser engine where you can build grammars and later port them so that you can use them in your own programming projects.

Why am I interested in this? Lexical parsers are a very interesting and also important piece in compilators. Almost every parser uses BNF. I am planning to construct my own compiler in the future so this will come in handy.

0 Responses to “BNF and EBNF”


  1. No Comments

Leave a Reply




Pages

Categories

RSS MSDN Highlights

  • An error has occurred; the feed is probably down. Try again later.