Dyck language


The importance of using parentheses can be illustrated by looking at the following expression:

((2-1)(-(1+2))4)÷((1+2)2)

There is no ambiguity in computing the result, which is -2. If we remove all the parentheses in the expression, we get

2-1-1+24÷1+22

which does not make much sense, unless we know the order of arithmetic operations in advance. In additionPlanetmathPlanetmath, without using parentheses, the result will differ depending on how the order of operations is assigned.

Now, if we remove all the symbols in the first expression above except the parentheses, we get

(()(()))(())

an expression known as a word of “well-balanced” parentheses.

Formally, let Σ1={(,)} be an alphabet consisting of the left and right parentheses. Given word u over Σ1, let D1(u) be the number of occurrences of the left parentheses in u minus the number of occurrences of the right parentheses in u.

Definition. A word u over Σ1 is said to be a word of well-balanced parentheses, if

  1. 1.

    D1(u)=0, and

  2. 2.

    D1(v)0 for any prefix v of u.

For simplicity, we also say that u is a well-balanced word over Σ1.

Given this definition, the word above is well-balanced, but ()(())) and )())(( are not.

Definition. The set of well-balanced words over Σ1 is called the parenthesis language or Dyck language over Σ1, and is denoted by 𝐏𝐚𝐫𝐞𝐧𝟏.

The 1 in Σ1 denotes that only one type of parentheses is used in the languagePlanetmathPlanetmath.

By inductionMathworldPlanetmath, it is not hard to see that 𝐏𝐚𝐫𝐞𝐧𝟏 can be generated by the following grammarMathworldPlanetmath:

  1. 1.

    terminal set is Σ1,

  2. 2.

    non-terminal set is the singleton consisting of the start symbol σ,

  3. 3.

    productions are σλ (the empty wordPlanetmathPlanetmath), σσσ, and σ(σ).

As a result, 𝐏𝐚𝐫𝐞𝐧𝟏 is context-free. Furthermore, 𝐏𝐚𝐫𝐞𝐧𝟏 is a deterministic language, and hence unambiguous.

More generally, one can consider expressions involving more than one type of parentheses, such as [], {}, and .

Definition. Let Σn={(1,)1,,(n,)n} be an alphabet consisting of n types of parentheses, a left and a right one for each type. The Dyck language over Σn, written 𝐏𝐚𝐫𝐞𝐧𝒏, is the language generated by the following grammar:

  1. 1.

    terminal set is Σn,

  2. 2.

    non-terminal set is the singleton consisting of the start symbol σ,

  3. 3.

    productions are σλ (the empty word), σσσ, and σ(iσ)i for each i in {1,,n}.

As before, 𝐏𝐚𝐫𝐞𝐧𝒏 is context-free, and deterministic in particular, and hence unambiguous.

Words in 𝐏𝐚𝐫𝐞𝐧𝒏 are also called well-balanced. However, it is a little more complicated to characterize what a well-balanced word is. The two criteria above for the case n=1, while necessary, are not sufficient enough to describe the “well-nestedness” of parentheses when n>1. For example, if n=2, and the parentheses considered are {} and [], then the word [{]} satisfy both criteria above, but fail to be well-nested.

In order to fully characterize a well-balanced word over Σn, we first define, for each i=1,,n, the function Di much the same way as D1: so that Di(u) is the number of left parentheses (i, minus the number of right parentheses )i. Call a word u partially balanced if, for every i=1,,n:

  1. 1.

    Di(u)=0, and

  2. 2.

    Di(v)0 for every prefix v of u.

Next, write u=u1um, where each uk is a symbol in Σn. Let u(j) be the prefix u1uj. Given a position j in u, if uj is a left parenthesis, say (i, then there is a corresponding right parenthesis )i in u to the right of uj, positioned at, say k, satisfying the equation Di(u(j))=Di(u(k))+1. This is a straightforward result of the two criteria above. Let j+ be the least such position such that the equation holds. Now, if uj is right parenthesis, then for some position k<j, we have k+=j. This means that, given any position j in u, there is a unique pair of positions (j0,j1), such that

  • either j=j0 or j=j1, and

  • j0+=j1.

Define, for each j, the word u[j] to be the subword of u with starting position j0 and ending position j1. Now, we are ready to state the last criterion in order that u be well-balanced:

  1. 3.

    for each position j in u, the word u[j] is partially balanced.

It can be shown, the set of words satisfying all three criteria above is 𝐏𝐚𝐫𝐞𝐧𝒏. Furthermore, if n=1, the third criterion can be derived from the first two criteria.

Other than being deterministic, some basic properties of 𝐏𝐚𝐫𝐞𝐧𝒏:

  • 𝐏𝐚𝐫𝐞𝐧𝒎𝐏𝐚𝐫𝐞𝐧𝒏, for any mn.

  • 𝐏𝐚𝐫𝐞𝐧𝒏 is monoidal (it is a monoid): λ𝐏𝐚𝐫𝐞𝐧𝒏, and if u,v𝐏𝐚𝐫𝐞𝐧𝒏, then uv𝐏𝐚𝐫𝐞𝐧𝒏.

  • More generally, 𝐏𝐚𝐫𝐞𝐧𝒏 is insertion closed: if u,vw𝐏𝐚𝐫𝐞𝐧𝒏, then vuw𝐏𝐚𝐫𝐞𝐧𝒏.

  • 𝐏𝐚𝐫𝐞𝐧𝒏 is also deletion closed: if u1vu2,v𝐏𝐚𝐫𝐞𝐧𝒏, then u1u2𝐏𝐚𝐫𝐞𝐧𝒏.

  • 𝐏𝐚𝐫𝐞𝐧𝒏 is not prefix-free.

  • Suppose f:ΣnΣm* is a function, such that for each i=1,,n, f maps (i and )i to some ui and vi respectively such that uivi𝐏𝐚𝐫𝐞𝐧𝒎. Then the extensionPlanetmathPlanetmathPlanetmath f*:Σn*Σm*, when restricted to 𝐏𝐚𝐫𝐞𝐧𝒏, is a language homomorphismPlanetmathPlanetmathPlanetmathPlanetmathPlanetmathPlanetmathPlanetmathPlanetmath from 𝐏𝐚𝐫𝐞𝐧𝒏 to 𝐏𝐚𝐫𝐞𝐧𝒎.

Remark. It can be shown that the number of words of length 2n in 𝐏𝐚𝐫𝐞𝐧𝟏 is the n-th Catalan numberMathworldPlanetmath. For a proof of this, see this entry (http://planetmath.org/ExampleOfCatalanNumbers). The idea is to visualize a word in 𝐏𝐚𝐫𝐞𝐧𝟏 as a path in a two-dimensional latticeMathworldPlanetmath, which can be generated as follows: given a word u of length 2n, the path p(u) starts from (0,0) (which corresponds to the first symbol of u). If point (i,j) is on the path, then the next point on the path is (i+1,k), where k=j+1 if the i-th symbol of u is (, otherwise k=j-1. So the increase from one point to the next is either (1,1), or (1,-1). As a result, the path P(u) has the property that it never dips below the x-axis, and it ends at (2n,0). Paths defined this way are also known as Dyck pathsMathworldPlanetmath.

References

  • 1 D. C. Kozen, Automata and Computability, Springer, New York (1997).
  • 2 J.E. Hopcroft, J.D. Ullman, Formal Languages and Their RelationMathworldPlanetmathPlanetmath to Automata, Addison-Wesley, (1969).
Title Dyck language
Canonical name DyckLanguage
Date of creation 2013-03-22 18:55:25
Last modified on 2013-03-22 18:55:25
Owner CWoo (3771)
Last modified by CWoo (3771)
Numerical id 17
Author CWoo (3771)
Entry type Definition
Classification msc 68Q45
Classification msc 68Q42
Synonym well-nested
Synonym fully balanced
Synonym parenthesis language
Related topic DyckPaths
Related topic ExampleOfCatalanNumbers
Defines well-balanced