101-languagesandgrammarspart1

Upload: anandan0

Post on 06-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 101-LanguagesAndGrammarsPart1

    1/5

    CM0266 DISCRETE MATHEMATICS II AUTUMN 2009

    LECTURE 1 LANGUAGES AND GRAMMARS

    1.1 Languages and Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    1.2 Formal Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    1.3 Formal Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    1.4 Languages Generated by Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    1.1 Languages and Grammars

    The grammar of English tells us whether a combination of words is a valid sentence.

    The dog runs fast- is a valid sentence

    dog fast jump - is not a valid sentence

    Unlike natural languages, a formal language is specified by a well-defined set of rules of syntax. The valid

    sentences of a formal language can be described by a grammar.

    Grammars help us answer two classes of problems:

    (1) How can we determine whether a combination of words is a valid sentence in a formal language?

    (2) How can we generate the valid sentences of a formal language?

    Example 1.1. The following grammar generates a subset of the English language:

    |

    |

    |

    |

    |

    We can form a valid sentence using a sequence of these replacement rules, stopping when no more rules can be

    applied. The replacement rules of a grammar are called productions.

    For instance, we can perform the following sequence of replacements:

    Exercise 1.1. Form all other valid sentences generated by this grammar.

    1

  • 8/2/2019 101-LanguagesAndGrammarsPart1

    2/5

    CM0266 DISCRETE MATHEMATICS II 1. Languages and Grammars

    1.2 Formal Languages

    A formal language is a set of finite-length words (or strings) over some finite alphabet

    A typical alphabet would be {a, b}, a typical string over that alphabet would be ababba, and a typical

    language over that alphabet containing that string would be the set of all strings which contain the samenumber ofas as bs.

    The empty word is allowed and is usually denoted by .

    Note that while the alphabet is a finite set and every string has finite length, a language may contain an

    infinite number of strings.

    Some examples of formal languages include

    The set of all words over {a, b}.

    The set {an : n is a prime number}.

    The set of syntactically correct programs in Java.

    The set of inputs upon which a certain Turing machine halts.

    Definition 1.1.

    An alphabet (or vocabulary) V is a finite, non-empty set of elements called symbols

    A string (or word) over an alphabet V is a concatenation of a finite number of elements ofV

    The empty string or null string, denoted by , is the string containing no symbols (note that {} = )

    The set of all words over an alphabet V is denoted by V

    A language over an alphabet V is a subset ofV

    Languages can be specified in various ways

    (1) List all the words in the language: L = {0, 00, 000, 0000, 00000}.

    (2) Give some criteria that a word must satisfy: L = {0n|n N}.

    (3) Using a formal grammar.

    1.3 Formal Grammars

    Formal grammars describe formal languages.

    Definition 1.2. A phrase-structure or Type-0 grammar G = (V , T , S , P ) consists of

    (1) A finite alphabet V

    (2) A subset T V ofterminal symbols

    (3) A start symbol S V \ T

    (4) A finite set P ofproduction rules, which transform strings over V into strings over V.

    2

  • 8/2/2019 101-LanguagesAndGrammarsPart1

    3/5

    CM0266 DISCRETE MATHEMATICS II 1. Languages and Grammars

    Phrase structure grammars are also called unrestrictedgrammars.

    The elements ofV \ T are called non-terminal symbols.

    Every production in P must contain at leastone non-terminal symbol on its left side.

    A derivation is a sequence of rule applications, starting with S.

    The terminal symbols T correspond to the alphabet of the associated formal language.

    Example 1.2. Consider the phrase structure grammar G = (V , T , S , P ) defined by

    V = {S,a,b}

    T = {a, b}

    P = {S aSb, S ab}

    By applying the first production n 1 times, then applying the second production once

    S aSb aaSbb . . . an

    1Sbn

    1 anbn

    The only strings in the language generated by this grammar are anbn, where n 1.

    Exercise 1.2. Describe the typical strings of the language generated by adding the production rule S SS

    to the set of productions P in the above example.

    Definition 1.3. Let G = (V , T , S , P ) be a phrase structure grammar. Let w0 = xy0z (i.e. the concatenation of

    x, y0 and z) and w1 = xy1z be strings (words) over V.

    (1) Ify0 y1 is a production ofG, we say that w1 is directly derivable from w0, and we write w0 w1.

    (2) Ifw0, w1, . . . , wn are strings over V such that

    w0 w1, w1 w2, . . . , wn1 wn (n 0)

    we say that wn is derivable from w0, and we write w

    0 wn.

    (3) The sequence of steps used to obtain wn from w0 is called a derivation.

    Example 1.3. Consider the phrase structure grammar G = (V , T , S , P ) defined by

    V = {S,a,b}

    T = {a, b}

    P = {S aSb, S ab}

    The string aaSbb is directly derivable from aSb, since S aSb is a production in the grammar.

    The string aaaabbbb is derivable from aaSbb since aaSbb aaaSbbb aaaabbbb, using the produc-

    tions S aSb and S ab in succession.

    3

  • 8/2/2019 101-LanguagesAndGrammarsPart1

    4/5

    CM0266 DISCRETE MATHEMATICS II 1. Languages and Grammars

    1.4 Languages Generated by Grammars

    The language generated by a formal grammar G, denoted by L(G), is the set of all strings over V that can be

    generated, starting with the start symbol, by applying production rules until no more non-terminal symbols are

    present in the string.

    Definition 1.4. Let G = {V , T , S , P } be a phrase structure grammar. The language generated by G, denoted

    by L(G), is the set of all strings of terminal symbols that are derivable from the start symbol S,

    L(G) = {w T : S w}

    In the following two examples we will find the language generated by a phrase structure grammar.

    Example 1.4. Describe the language generated by G = {V , T , S , P } where

    V = {S,A,a,b}

    T = {a, b}

    P = {S aA, S b, A aa}

    Solution:

    From the start symbol S we derive aA using the production S aA.

    We can also use the production S b to derive b.

    From aA, the production A aa can be used to derive aaa.

    No additional words can be derived so we conclude that L(G) = {aaa, b}

    Example 1.5. Describe the language generated by G = {V , T , S , P } where

    V = {S, 0, 1}

    T = {0, 1}

    P = {S 11S, S 0}

    Solution:

    From S we can derive either 0 using S 0 or 11S using S 11S.

    From 11S we can derive either 110 or 1111S, etc.

    At any stage of derivation we can either add two 1s at the end of the string, or terminate by

    adding a 0 at the end of the string.

    Hence L(G) = {12n0 : n 0} = {0, 110, 11110, 1111110, . . .}

    Example 1.6. Consider the grammar G = (V , T , S , P ) with

    V = {S,B,a,b,c}

    T = {a,b,c}

    P = {S aBSc, S abc, Ba aB, Bb bb}

    4

  • 8/2/2019 101-LanguagesAndGrammarsPart1

    5/5

    CM0266 DISCRETE MATHEMATICS II 1. Languages and Grammars

    Some examples of the derivation of strings in L(G) are:

    S abc

    S aBSc aBabcc aaBbcc aabbcc

    S aBSc aBaBScc aBaBabccc aaBBabccc

    aaBaBbccc aaaBBbccc aaaBbbccc aaabbbccc

    This grammar defines the language L(G) = {anbncn|n > 0} where an denotes a string ofn consecutive as.

    Example 1.7. Find two phrase structure grammars that generate the language L = {0m1n : m, n 0}

    Solution:

    One answer is G1 = {V , T , S , P } where

    V = {S, 0, 1}

    T = {0, 1}P = {S 0S, S S1, S }

    The string 0m1n is obtained by applying the first production m times and the second production n times.

    Another answer is G2 = {V , T , S , P } where

    V = {S,A, 0, 1}

    T = {0, 1}

    P = {S 0S, S S1A, S , A }

    Exercise 1.3. Check that the grammars G2

    does indeed generate L.

    5