derivation language


Background

Let G=(Σ,N,P,σ) be a formal grammar. A pair (W1,W2) of words over Σ is said to correspond to the production pq if

W1=u1pv1  and  W2=u2qv2

for some words ui,vj over Σ. We also say that (W1,W2) is a derivation step, and write W1W2.

Recall that a derivation in G is a finite sequencePlanetmathPlanetmath of words

W1,W2,,Wn

over Σ such that WiWi+1, for i=1,,n-1. The derivation is also written

W1W2Wn.

The derivation above has n-1 steps. Zero-step derivations are also permitted. These are just words over Σ.

The reflexive transitive closure of is *. Thus, V*W means that there is a derivation starting with V and ending with W. There may be more than one derivation from V to W.

If we consider each production as a “letter” in the alphabet P, then the above derivation can be represented by a “word” over P in the following manner: the “word” is formed by taking concatenationsMathworldPlanetmath of the “letters”, where concatenations correspond to successive applications of productions in P:

[p1q1][p2q2][pn-1qn-1].

Derivation language is thus a certain collectionMathworldPlanetmath of derivation words, formally defined below.

Definitions

Treating productions as “letters” is really nothing more than labeling each production with some symbol. Formally, call a labeling of an alphabet P a surjection f:FP, where F is some alphabet. For any pP, a label for p is an element xF such that f(x)=p. We will only be interested in injectivePlanetmathPlanetmath labeling (hence a bijection) from now on.

Definition. Suppose we are given a labeling f of the set P of productions in the grammar G. Given a derivation D:W1W2Wn, a derivation word U for D is defined inductively as follows:

  1. 1.

    if n=1, then the empty wordPlanetmathPlanetmathPlanetmath U:=λ,

  2. 2.

    if n=2, then U:=xF is a label for a production that W1W2 corresponds to,

  3. 3.

    if n>2, then U:=U1U2, where

    • U1 is a derivation word for the derivation W1Wi,

    • U2 is a derivation word for the derivation WiWn.

If U is a derivation word for derivation D, let us abbreviate this by writing f[U]=D. Note that we are not applying the labeling f to U, it is merely a notational convenience.

A derivation word is sometimes called a control word, for it defines or controls whether and how a word may be derived from another word. Note that any W1*W2 may correspond to several distinct derivations, and hence several distinct derivation words. Also, the same derivation word may correspond to distinct derivations.

For example, let G be a grammar over two symbols ( and ) with productions σλ, σ(σ), and σσσ (G generates the parenthesis language 𝐏𝐚𝐫𝐞𝐧𝟏) Label the productions as a,b,c respectively. Then the derivation σ*(()()) correspond to derivation words bcbbaa and bcbaba. Notice that σσ(σ)σ and σσσ(σ) both correspond to the derivation word b.

Definition. The derivation language of a grammar G=(Σ,N,P,σ) is the set of all derivation words for derivations on words generated by G. In other words, consider the labeling f:FP. The derivation language of G is the set

{UF*f[U] is a derivation of the form σ*u for some uN*}.

The derivation language of G is also called the Szilard language of G, named after its inventor, and is denoted by Sz(G).

For example, let G be the grammar over a the letter a, with productions given by σσ, σa. If the productions are labeled b,c, then Sz(G)={bncn0}. Note that L(G)={a}. Likewise, if G is the grammar over a, with productions σAσ, Aλ, and σa, labeled p,q,r respectively, then L(G)={a}. However, Sz(G) is quite different from Sz(G):

Sz(G) ={uF* u=vrw, where v{p,q}*,w{q}*,
#u(p)=#u(q) and #x(p)#x(q) for all xu}

where

  • F={p,q,r},

  • #u(r) means the number of occurrences of r in word u,

  • vu means that v is a prefix of u.

Remarks. Let G be a formal grammar.

  • Some properties of Sz(G):

    1. (a)

      Sz(G) is always context-sensitive.

    2. (b)

      If G is regularPlanetmathPlanetmath, so is Sz(G).

    3. (c)

      if G is context-free, Sz(G) may not be; in fact, for any context-free language L, there is a context-free grammar G such that L=L(G) but Sz(G) is not context-free.

    4. (d)

      There exists a context-free language L such that Sz(G) is not context-free for any grammar G generating L.

  • However, if one considers the subset SzL(G) of Sz(G) consisting of all derivation words corresponding to leftmost derivations, then SzL(G) is context-free.

  • It is an open question that, given any (context-sensitive) languagePlanetmathPlanetmath L, whether there is a grammar G such that L=Sz(G).

References

  • 1 A. Salomaa, Formal Languages, Academic Press, New York (1973).
  • 2 G. E. Révész, Introduction to Formal Languages, Dover Publications (1991).
Title derivation language
Canonical name DerivationLanguage
Date of creation 2013-03-22 18:58:15
Last modified on 2013-03-22 18:58:15
Owner CWoo (3771)
Last modified by CWoo (3771)
Numerical id 13
Author CWoo (3771)
Entry type Definition
Classification msc 68Q42
Classification msc 68Q45
Synonym Szilard language