\chapter[ Compilers]{Compilers}
\index{algorithms of compilers}\index{compilers}%
When a programmer writes down a solution of her problems, she
writes a program on a special programming language.
These programming languages are very different from the proper
languages of computers,
from the \ki{machine languages.} Therefore we have to produce the
executable forms
of programs created by the programmer.
We need a software or hardware tool, that translates
the \ki{source language program} -- written on a high level
programming
language -- to the \ki{target language program,} a lower
level programming language, mostly to a machine code program.
\index{source!language program}%
\index{program!source!language}%
\index{target!language program}%
\index{program!target!language}%
There are two fundamental methods to execute a program written on
higher
level language. The first is using an \ki{interpreter.}
\index{interpreter}%
In this case, the generated machine code is not saved but executed
immediately. The interpreter is considered as a special
computer, whose machine code is the high level language.
Essentially, when we use an interpreter, then we create a two-level
machine; its lower level is
the real computer, on which the higher level, the interpreter, is built.
The higher level is usually realized by a computer program,
but, for some programming languages, there are special hardware
interpreter
machines.
The second method is using a \ki{compiler} program.
\index{compiler}%
The difference of this method from the first is that here the result of
translation is not executed, but it is saved in an intermediate file called
\ki{target program.}
The target program may be executed later, and the result of the
program
is received only then. In this case,
in contrast with interpreters, the times of translation and
execution are distinguishable.
In the respect of translation, the two translational methods are
identical, since the interpreter and the compiler both generate
target programs. For this reason we speak about compilers only.
We will deal the these translator programs, called compilers
(Figure \ref{fig-fordprog}).
\begin{figure}[h!]
\begin{picture}(154,60)(0,-8)
\put(160,5){\fbox{\makebox(55,20)%
{\shortstack{\textit{translator}}}}}
\put(60,5){\fbox{\makebox(65,20)%
{\shortstack{\textit{source language}\\ \textit{program}}}}}
\put(250,5){\fbox{\makebox(65,20)%
{\shortstack{\textit{target language}\\ \textit{program}}}}}
\put(147,12.5){\makebox(0,0){$\longrightarrow$}}
\put(237,12.5){\makebox(0,0){$\longrightarrow$}}
%
\end{picture}
\caption{The translator.}
\label{fig-fordprog}
\end{figure}
\newpage
Our task is to study the algorithms of compilers.
This chapter will care for the translators of high level imperative
programming languages; the translational methods of logical or
functional languages will not be investigated.
First the structure of compilers will be given. Then we will deal with
scanners, that is, lexical analysers.
\index{scanner}%
\index{lexical!analyser}%
\index{analyser!lexical}%
In the topic of parsers -- syntactic analysers --, the two most
successful
methods will be studied:
\index{syntactic!analyser}%
\index{analyser!syntactic}%
the \textit{LL$(1)$} and the \textit{LALR}$(1)$ parsing methods.
\index{LL(1) parser}%
\index{parser!LL(1)}%
\index{LALR(1)!parser}%
\index{parser!LALR(1)}%
The advanced methods of semantic analysis
\index{semantic!analyser}%
\index{analyser!semantic}%
use \textit{O-ATG} grammars,
\index{O-ATG grammar}%
\index{grammar!O-ATG}%
and the task of code generation
\index{code!generator}%
is also written by this type of grammars.
In this book these topics are not considered, nor we will study such
important and
interesting problems as
symbol table handling, error repairing or code optimising.
\index{symbol!table}%
\index{error!repair}%
\index{code!optimiser}%
The reader can find very new, modern and efficient methods
for these methods in the bibliography.
%%%
\section{The structure of compilers}
A compiler translates the source language program (in short, source
program)
\index{source!language program}%
\index{program!source!language}%
\index{source!program}%
\index{program!source}%
into a target language program (in short, target program).
\index{target!language program}%
\index{program!target!language}%
\index{target!program}%
\index{program!target}%
Moreover, it creates a list by which the programmer can check her
private
program. This list contains the detected errors, too.
Using the notation \textit{program~(input)(output)}
the compiler can be written by
\index{compiler}%
\index{list}%
\begin{center}
\textit{compiler~(source program)(target program, list)} \koz .
\end{center}
In the next, the structure of compilers are studied, and the tasks of
program elements are described,
using the previous
notation.
The first program of a compiler transforms the source language
program into character stream that is easy to handle. This program is
the \ki{source handler.}
\index{source!handler}%
\index{handler!source}%
\index{character stream}%
\begin{center}
\textit{source handler~(source program)(character stream)}.
\end{center}
The form of the source program depends
from the operating system.
The source handler reads the file of source program using a system,
called
operating system, and omits the characters signed
the end of lines, since these characters have no importance in the next
steps
of compilation.
This modified, ``poured'' character stream will be the input data of the
next
steps.
The list created by the compiler has to contain the original source
language
program written by the programmer, instead of this modified character
stream. Hence
we define a \ki{list handler} program,
\index{list-handler}%
\index{handler!list}%
\begin{center}
\textit{list handler~(source program, errors)(list)} \koz ,
\end{center}
which creates the list according to the file form of the operating system,
and puts this list on a secondary memory.
It is practical to join the source handler and the list handler programs,
since they have same input files. This program is the \ki{source
handler.}
\index{source!handler}%
\index{handler!source}%
\begin{center}
\textit{source handler~(source program, errors)(character stream,
list)} \koz .
\end{center}
\indent
The target program is created by the compiler from the generated
target code.
It is located on a secondary memory, too,
usually in a transferable binary form. Of course this form depends on
the operating
system.
This task is done by the \ki{code handler} program.
\index{code!handler}%
\index{handler!code}%
\begin{center}
\textit{code handler~(target code)(target program)} \koz .
\end{center}
Using the above programs, the structure of a compiler is the following
(Figure \ref{fig-comszerk}):
\begin{verse}
\textit{source handler~(source program, errors)
(character string, list)}, \\
$\star$\textit{compiler}$\star$~\textit{(character stream)(target code,
errors)},
\\
\textit{code handler~(target code)(target program)} \koz .
\end{verse}
\begin{figure}[t!]
\begin{picture}(110,98)(-5,-5)
\put(163,35){\fbox{\makebox(55,20)%
{\shortstack{$\star$\textit{compiler}$\star$}}}}
\put(75,35){\fbox{\makebox(55,20)%
{\shortstack{\textit{source}\\ \textit{handler}}}}}
\put(75,75){\fbox{\makebox(55,20)%
{\shortstack{\textit{source}\\ \textit{program}}}}}
\put(75,-5){\fbox{\makebox(55,20)%
{\textit{list}}}}
\put(250,35){\fbox{\makebox(55,20)%
{\shortstack{\textit{code}\\ \textit{handler}}}}}
\put(250,-5){\fbox{\makebox(55,20)%
{\shortstack{\textit{target}\\ \textit{program}}}}}
%
\put(151,45){\makebox(0,0){$\longrightarrow$}}
\put(151,39){\makebox(0,0){$\longleftarrow$}}
\put(105,65){\makebox(0,0){$\downarrow$}}
\put(105,25){\makebox(0,0){$\downarrow$}}
\put(238,42){\makebox(0,0){$\longrightarrow$}}
\put(280,25){\makebox(0,0){$\downarrow$}}
%
\end{picture}
\caption{The structure of compilers.}
\label{fig-comszerk}
\end{figure}
This decomposition is not a sequence: the three
program elements are executed not sequentially. The decomposition
consists of three independent working units. Their connections are
indicated by their inputs and outputs.
In the next we do not deal with the handlers because of their
dependentness on computers, operating system and peripherals --
although the outer form, the connection with the user and the
availability of the compiler are determined mainly by these programs.
The task of the program
$\star$\textit{compiler}$\star$ is the translation.
It consists of two main subtasks: analysing the input character stream,
and to synthetizing the target code.
The first problem of the \textit{analysis}
\index{analyser}%
is to determine the connected
characters in the character stream. These are the symbolic items, e.g.,
the constants, names of variables, keywords, operators.
This is done by the \ki{lexical analyser}, in short,
\ki{scanner.}
\index{lexical!analyser}%
\index{analyser!lexical}%
\index{scanner}%
>From the character stream the scanner makes a \ki{series of symbols}
and during this task it detects \ki{lexical errors.}
\index{series of symbols}%
\index{lexical!error}%
\index{error!lexical}%
\begin{center}
\textit{scanner~(character stream)%
(series of symbols, lexical errors)} \koz.
\end{center}
This series of symbols is the input of the
\ki{syntactic analyser,} in short, \ki{parser.}
\index{syntactic!analyser}%
\index{analyser!syntactic}%
\index{parser}%
Its task is to check the syntactic structure of the
program. This process is near to the checking the verb, the subject,
predicates and attributes of a sentence by a language teacher in a
language lesson.
The errors detected during this analysis are the \ki{syntactic errors.}
\index{syntactic!error}%
\index{error!syntactic}%
The result of the syntactic analysis is the syntax tree of the program,
or some similar equivalent structure.
\begin{center}
\textit{parser~(series of symbols)%
(syntactically analysed program, syntactic errors)} \koz.
\end{center}
The third program of the analysis is the \ki{semantic analyser.}
\index{semantic!analyser}%
\index{analyser!semantic}%
Its task is to check the static semantics. For example, when the
semantic analyser checks
declarations and the types of variables
in the expression \texttt{a + b}, it verifies whether the variables \texttt
{a} and \texttt{b} are declared, do they are of the same type, do they
have values? The
errors detected by this program are the \ki{semantic errors.}
\index{semantic!error}%
\index{error!semantic}%
\begin{center}
\textit{semantic analyser~(syntactically analysed program)%
(analysed program, semantic errors)} \koz .
\end{center}
The output of the semantic analyser is the input of the
programs of \ki{synthesis.}
\index{synthesis}%
The first step of the synthesis is the code generation,
that is made by the \ki{code generator:}
\index{code!generator}%
\begin{center}
\textit{code generator~(analysed program)(target code)}.
\end{center}
The target code usually depends on the computer and the operating
system.
It is usually an assembly language program or machine code.
\index{assembly language program}%
\index{program!assembly language}%
The next step of synthesis is the \ki{code optimisation:}
\index{code!optimiser}%
\begin{center}
\textit{code optimiser~(target code)(target code)}.
\end{center}
The code optimiser transforms the target code on such a way that
the new code is better in many respect, for example running time or
size.
\begin{figure}[t!]
\begin{picture}(115,170)(5,15)
\put(90,20){\fbox{\makebox(70,140){%
\put(4,60){\fbox{\makebox(55,20)%
{\shortstack{\textit{scanner}}}}}
\put(4,15){\fbox{\makebox(55,20)%
{\shortstack{\textit{parser}}}}}
\put(4,-30){\fbox{\makebox(55,20)%
{\shortstack{\textit{semantic}\\ \textit{analyzer}}}}}
\put(33,100){\makebox(0,0){\textit{ANALYSIS}}}
\put(33,47){\makebox(0,0){$\downarrow$}}
\put(33,3){\makebox(0,0){$\downarrow$}}
}}}
%
\put(200,20){\fbox{\makebox(70,140){%
\put(4,15){\fbox{\makebox(55,20)%
{\shortstack{\textit{code}\\ \textit{generator}}}}}
\put(4,-30){\fbox{\makebox(55,20)%
{\shortstack{\textit{code}\\ \textit{optimizer}}}}}
\put(33,100){\makebox(0,0){\textit{SYNTHESIS}}}
\put(33,3){\makebox(0,0){$\downarrow$}}
}}}
%
\put(185,155){\makebox(0,0){$\longrightarrow$}}
%
\end{picture}
\caption{The programs of the analysis and the synthesis.}
\label{fig-analszin}
\end{figure}
As it follows from the considerations above, a compiler consists of
the next components (the structure of the
$\star$compiler$\star$ program is in the Figure
\ref{fig-analszin}):
\begin{verse}
\textit{source handler~(source program, errors)%
(character stream, list)}, \\
\textit{scanner~(character stream)%
(series of symbols, lexical errors)}, \\
\textit{parser~(series of symbols)%
(syntactically analysed program, syntactic errors)}, \\
\textit{semantic analyser~(syntactically analysed program)%
(analysed program, semantic errors)}, \\
\textit{code generator~(analysed program)(target code)}, \\
\textit{code optimiser~(target code)(target code)}, \\
\textit{code handler(target code)(target program)}.
\end{verse}
The algorithm of the part of the compiler, that performs analysis and
synthesis,
is the next:
\begin{alg}{$\star$Compiler$\star$}%
\inddef{\textsc{Compiler}@$\star$\textsc{Compiler}$\star$}
1 \' determine the symbolic items in the text of source program \\
2 \' check the syntactic correctness of the series of symbols \\
3 \' check the semantic correctness of the series of symbols \\
4 \' generate the target code \\
5 \' optimise the target code
\end{alg}
\noindent The objects written in the first two points will be analysed
in
the
next sections.
\begin{gyak}
\ujgyak Using the above notations, give the structure of interpreters.\gyakindex{interpreter}
\ujgyak Take a programming language, and write program details in
which
there are lexical, syntactic and semantic
errors.gyakindex{error!lexical}\gyakindex{error!syntactic}\gyakindex{error!semantic}
\ujgyak Give respects in which the code optimiser can
create better target code than the original.
\end{gyak}
%%%
\section{Lexical analysis}
\label{sec-lex}
The \textit{source-handler} transforms the source
program into a character stream.
The main task of lexical analyser (scanner) is recognising the \ki
{symbolic units} in
this character stream. These symbolic units are named \ki{symbols.}
Unfortunately, in different programming languages the same symbolic
units
consist of different character streams, and different symbolic units
consist of the same character streams. For example, there is a
programming language in which the
\texttt{1.} and \texttt{.10} characters mean real numbers. If we
concatenate these symbols, then the result is the \texttt{1..10}
character stream.
The fact, that a sign of an algebraic function is missing between the
two numbers,
will be detected by the next analyser, doing
syntactic analysis. However,
there are programming languages in which this character stream
is decomposited into three components: 1 and 10 are the lower and
upper limits of
an interval type variable.
The lexical analyser determines not only the characters of a symbol, but
the
attributes derived from the surrounded text. Such attributes are, e.g.,
the type and value of a symbol.
The scanner assigns codes to the symbols, same codes to the same
sort of symbols.
For example the code of all integer numbers is the same; another
unique code is assigned to variables.
The lexical analyser transforms the character stream into the series of
symbol codes
\index{series of symbols}%
and the attributes of a symbols are written in this series,
immediately after the code of the symbol concerned.
The output information of the lexical analyser is not \idez{readable}:
it is usually a series of binary codes. We note that, in the viewpoint of
the compiler, from this step of the compilation it is no matter from
which characters were made the symbol, i.e. the code of the \textit{if}
symbol
was made form English \textit{if} or Hungarian \textit{ha} or German
\textit{wenn} characters.
Therefore, for a program language using English keywords, it is easy
to construct another program language using keywords of another
language.
In the compiler of this new program language the lexical analysis would
be modified
only, the other parts of the compiler are unchanged.
%%%
\subsection{The automaton of the scanner}
\label{subsec-lex-aut}%
The exact definition of symbolic units would be given by
\textit{regular grammar}, \textit{regular expressions} or
\textit{deterministic finite automaton}.
\index{regular!grammar}%
\index{grammar!regular}%
\index{regular!expression}%
\index{expression!regular}%
\index{deterministic!finite!automaton}%
\index{finite!automaton!deterministic}%
\index{automaton!deterministic!finite}%
The theories of regular grammars, regular expressions and
deterministic finite automata were studied in previous chapters.
Practically the lexical analyser may be a part of the syntactic
analysis. The main reason to distinguish these analysers
is that a lexical analyser made from regular grammar
is much more simpler than a lexical analyser made from a context-free
grammar. Context-free grammars are used to create syntactic
analysers.
One of the most popular methods to create the lexical analyser
is the following:
\begin{enumerate}
\item describe symbolic units in the language of regular expressions,
and from this information construct the deterministic finite automaton
which is equivalent to these regular expressions,
\item implement this deterministic finite automaton.
\end{enumerate}
We note that, in writing of symbols regular expressions are used,
because they are more comfortable and readable then
regular grammars.
There are standard programs as the \texttt{lex} of UNIX systems,
\index{lex}%
that generate a complete syntactical analyser from regular
expressions. Moreover, there are generator programs that
give the automaton of scanner, too.
A very trivial implementation of the deterministic finite automaton uses
multidirectional \key{case} instructions. The conditions of the branches
are the characters of state transitions, and the instructions of a branch
represent the new state the automaton reaches when it carries out
the
given state transition.
The main principle of the lexical analyser is building a symbol from
the \textit{longest series of symbols}. For example the string
\texttt{ABC} is a three-letters symbol, rather than three one-letter
symbols. This means that the alternative instructions of the \key{case}
branch read characters as long as they are parts of a constructed
symbol.
Functions can belong to the final states of the automaton. For
example, the
function converts constant symbols into an inner binary forms of
constants,
or the function writes identifiers to the symbol table.
The input stream of the lexical analyser contains tabulators and
space characters, since the source-handler expunges the carriage
return and
line feed characters only.
In most programming languages it is possible to write a lot of spaces or
tabulators
between symbols. In the point of view of compilers these symbols have
no
importance after their recognition, hence they have the name \ki{white
spaces.}
\inddef{white space}%
Expunging white spaces is the task of the lexical analyser. The
description of
the white space is the following regular expression:
$$( \textit{space\/} \mid \textit{tab\/} )^{*} \koz , $$
\noindent where \textit{space} and the \textit{tab} tabulator are the
characters which build the white space symbols and $\mid$ is the
symbol
for the \textit{or} function. No actions have to make
with this white space symbols, the scanner does not pass these
symbols to
the syntactic analyser.
Some examples for regular expression:
\begin{pelda}{}
\label{pld-regkif}%
Introduce the following notations:
Let $D$ be an arbitrary digit, and let $L$ be an arbitrary letter,
$$ D\in \{0,1,\ldots,9\},\ \textnormal{and}\
L \in \{a,b,\ldots,z,A,B,\ldots,Z\} \koz ,$$
\noindent
the not-visible characters are denoted by their short names, and
let $\varepsilon$ be the name of the empty character stream.
$\textit{Not\/}( a )$ denotes a character distinct from $a$. The regular
expressions are:
\begin{enumerate}
\item real number:
$(+ \mid - \mid \varepsilon )D^{+}.D^{+}(\mathsf{e}(+ \mid - \mid
\varepsilon )D^{+} \mid \varepsilon ) $,
\item positive integer and real number:
$(D^{+}(\varepsilon \mid .)) \mid (D^{*}.D^{+})$,
\item identifier:
$(L \mid \_\, )(L \mid D \mid \_\, )^{*}$,
\item comment:
$\textnormal{-\,-}(\textit{Not\/}(\mathit{eol}))^{*}\mathit{eol} $,
\item comment terminated by \#\# :
$\#\#((\# \mid \varepsilon )\textit{Not\/}(\#))^{*}\#\# $,
\item string of characters:
$"(\textit{Not\/}(") \mid "\,")^{*}\ " $.
\end{enumerate}
\begin{figure}[t!]
\hspace*{\fill}
\xymatrix{
& & *++[o][F]{} \ar[dr]^D \\
\ar[r] & *++[o][F]{} \ar[ur]^{\centerdot} \ar[dr]^D & &
*++[o][F=]{} \ar@(ur,dr)[]^D \\
& & *++[o][F=]{} \ar[ur]^{\centerdot} \ar@(dr,dl)[]^D
}
\hspace*{\fill}
\caption{The positive integer and real number.}
\label{fig-lexpl2}
\end{figure}
Deterministic finite automata constructed from regular expressions 2
and 3
are in Figures \ref{fig-lexpl2} and \ref{fig-lexpl3}.
\end{pelda}
\begin{figure}[t!]
\hspace*{\fill}
\xymatrix{
\\
\ar[r] & *++[o][F]{} \ar[r]^{L \mid \_}
& *++[o][F=]{} \ar@(ur,dr)[]^{L \mid D \mid \_}
}
\hspace*{\fill}
\caption{The identifier.}
\label{fig-lexpl3}
\end{figure}
The task of lexical analyser is to determine the text of symbols,
but not all the characters of a regular expression belong to the symbol.
As is in the 6th example, the first and the last
\texttt{"} characters do not belong to the symbol. To unravel this
problem,
a buffer is created for the scanner. After recognising of a symbol,
the characters of these symbols will be in the buffer. Now the
deterministic finite automaton is supplemented by a $T$ transfer
function, where $T(a)$ means that the character $a$ is inserted into
the buffer.
\begin{figure}[t!]
\hspace*{\fill}
$\xymatrix{
\ar[r] & *++[o][F]{} \ar[r]^{\_} & *++[o][F]{} \ar[r]^{\_}
& *++[o][F]{} \ar[r]^{\textit{eol}} \ar@(dr,dl)[]^{\textit{Not\/}
(\mathit{eol\/})}
& *++[o][F=]{}
}$
\hspace*{\fill}
\caption{A comment.}
\label{fig-lext4}
\end{figure}
\begin{pelda}
The 4th and 6th regular expressions of the example
\ref{pld-regkif}
are supplemented by the $T$ function, automata for these
expressions are in Figures
\ref{fig-lext4} and \ref{fig-lext6}.
The automaton of the 4th regular expression has none $T$ function,
since it recognises comments.
The automaton of the 6th regular expression recognises
\texttt{This is a "string"} from the character string
\texttt{"This is a ""string"""}.
\end{pelda}
\begin{figure}[t!]
\hspace*{\fill}
\xymatrix{
\ar[r] & *++[o][F]{} \ar[r]^{"}
& *++[o][F]{} \ar@(u,r)[]^{T(\textit{Not\/}("))} \ar@/^{5mm}/[d]^
{"} \\
& & *++[o][F=]{} \ar@/^{5mm}/[u]^{T(")}
}
\hspace*{\fill}
\caption{The character string.}
\label{fig-lext6}
\end{figure}
Now we write the algorithm of the lexical analyser given by
deterministic finite automaton. (The state of the set of one element will
be denoted by
the only element of the set).
Let $A = (Q,\Sigma, \delta, q_0, F)$ be the deterministic finite
automaton,
which is the scanner. We augment the alphabet $\Sigma$
with a new notion:
let \textit{others} be all the characters not in
$\Sigma$. Accordingly, we modify the transition function $\delta$:
$$\delta'(q,a) = \left\{ \begin{array}{ll}
\delta(q,a), & \textnormal{if}\ a \neq \textit{others}
\koz , \\
\emptyset, & \textnormal{otherwise
\koz .}
\end{array}
\right. $$
The algorithm of parsing, using the augmented automaton $A'$,
follows:
\begin{algN}{Lex-analyse$(x\#,A')$}\inddef{\textsc{Lex-analyse}}
1 \' $q \leftarrow q_0$, $a \leftarrow$ {first character of} $x$ \\
2 \' $s' \leftarrow \textit{analyzing}$ \\
3 \' \key{while} \= $a \neq \#$ and $s' = \textit{analyzing}$\\
4 \' \> \key{do} \key{if} \= $\delta'(q,a) \neq \emptyset$ \\
5 \' \> \> \key{then} \= $q \leftarrow \delta'(q,a)$ \\
6 \' \> \> \> $a \leftarrow $ {next character of} $x$ \\
7 \' \> \> \key{else} \> $s' \leftarrow \textit{error}$ \\
8 \' \key{if} \= $s' = \textit{analyzing}$ and $q \in F$ \\
9 \' \> \key{then} \= $s' \leftarrow \textit{O.K.}$ \\
10 \' \> \key{else} \> $s' \leftarrow \textit{ERROR}$ \\
11 \' \key{return} $s',a$
\end{algN}
The algorithm has two parameters: the first one is the input character
string
terminated by \#, the second one is the automaton of the scanner.
In the line 1 the state of the scanner is set to $q_0$, to the start state
of the
automaton, and the first
character of the input string is determined.
The variable $s'$ indicates that the algorithm is analysing the input
string,
the text \textit{analysing} is set in this variable in the line 2.
In the line 5 a state-transition is executed. It can be seen that the
above
augmentation is needed to terminate in case of unexpected, invalid
character.
In line 8--10 the \textit{O.K.} means that the analysed character string
is
correct, and the \textit{ERROR} signs that a lexical error was
detected.
In the case of successful termination the variable $a$ contains the
$\#$ character,
at erroneous termination it contains the invalid character.
We note that the algorithm \textsc{Lex-Analyse} recognise one symbol
only,
and then it is terminated. The program written in a programming
language
consists of a lot of symbols, hence after recognising a symbol, the
algorithm have to
be continued by detecting the next symbol.
The work of the analyser is restarted at the state of the automaton.
We propose the full algorithm of the lexical analyser as an exercise
(see Exercise \ref{fld-lex-teljes}).
\begin{pelda}
The automaton of the identifier in the point 3 of example
\ref{pld-regkif} is in Figure
\ref{fig-lexpl3}.
The start state is 0, and the final state is 1. The transition function of
the
automaton follows:
\vspace{-6mm}
\hspace{\fill}\vspace{1.5truemm}\nobreak\hspace{\fill}
\break\noindent
\begin{center}
\begin{tabular}{c|ccc}
$\delta$ & $L$ & $\_$ & $D$ \\
\hline
0 & 1 & 1 & $\emptyset$ \\
1 & 1 & 1 & 1 \\
\end{tabular}
\end{center}
\hspace{\fill}\vspace{1.5truemm}\nobreak\hspace{\fill}
\break\noindent
The augmented transition function of the automaton:
\hspace{\fill}\vspace{1.5truemm}\nobreak\hspace{\fill}
\break\noindent
\begin{center}
\begin{tabular}{c|cccc}
$\delta'$ & $L$ & $\_$ & $D$ & \textit{others} \\
\hline
0 & 1 & 1 & $\emptyset$ & $\emptyset$ \\
1 & 1 & 1 & 1 & $\emptyset$ \\
\end{tabular}
\end{center}
\hspace{\fill}\vspace{1.5truemm}\nobreak\hspace{\fill}
\break\noindent
\newpage
The algorithm \textsc{Lex-Analyse} gives the series of states
$0111111$ and sign \textit{O.K.} to the input string
\texttt{abc123\#}, it gives sign \textit{ERROR} to the input sting
\texttt{9abc\#}, and the series $0111$ and sign \textit{ERROR} to
the input string
\texttt{abc}$\chi$\texttt{123}.
\end{pelda}
%%%
\subsection{Special problems}
\label{sec-spec-probl}%
In this subsection we investigate the problems emerged during running
of lexical analyser, and supply solutions for these problems.
%%%
\subsubsection{Keywords, standard words}
All of programming languages allows identifiers
having special names and predefined meanings. They are the \ki
{keywords}.
\inddef{keyword}%
Keywords are used only in their original notions. However there are
identifiers which also have predefined meaning but they
are alterable in the programs. These words are called
\ki{standard words.}
\index{standard word}%
The number of keywords and standard words in programming
languages are vary. For example, there is a program language, in
which
three keywords are used for the zero value:
\texttt{zero,} \texttt{zeros} és \texttt{zeroes.}
Now we investigate how does the lexical analyser recognise keywords
and standard words,
and how does it distinguish them from identifiers created by the
programmers.
The usage of a standard word distinctly from its original meaning
renders extra
difficulty, not only to the compilation process but also to the readability
of the program,
such as in the next example:
\hspace{\fill}\vspace{1.5truemm}\nobreak\hspace{\fill}
\break\noindent
\texttt{if if then else = then;}
\hspace{\fill}\vspace{1.5truemm}\nobreak\hspace{\fill}
\break\noindent
\noindent or if we declare procedures which have names \texttt{begin}
and
\texttt{end}:
\begin{verbatim}
begin
begin; begin end; end; begin end;
end;
\end{verbatim}
Recognition of keywords and standard words is a simple task if they
are
written using special type characters (for example bold characters), or
they are between special prefix and postfix characters (for example
between apostrophes).
We give two methods to analyse keywords.
\begin{enumerate}
\item All keywords is written as a regular expression, and the
implementation of
the automaton created to this expression is prepared. The
disadvantage of this
method is the size of the analyser program. It will be large even if the
description of keywords, whose first letter are the same, are
contracted.
\item Keywords are stored in a special keyword-table. The words can
be determined in the character stream by a general identifier-
recogniser. Then, by a simple
search algorithm, we check whether this word is in the keyword-
table.
If this word is in the table then it is a keyword. Otherwise it is an
identifier
defined by the user.
This method is very simple, but the efficiency of search
depends on the structure of keyword-table and on the algorithm of
search.
A well-selected mapping function and an adequate keyword-table
should be very
effective.
\end{enumerate}
If it is possible to write standard words in the programming language,
then the
lexical analyser recognises the standard words using one of the above
methods.
But the meaning of this standard word depends of its context. To
decide, whether
it has its original meaning or it was overdefined by the programmer, is
the task of syntactic analyser.
%%%
\subsubsection{Look ahead.}
\label{subsec-eloreolv}%
Since the lexical analyser creates a symbol from the longest character
stream,
the lexical analyser has to look ahead one or more characters
for the allocation of the right-end of a symbol
\index{lookahead}%
There is a classical example for this problem, the next two FORTRAN
statements:
\begin{verbatim}
DO 10 I = 1.1000
DO 10 I = 1,1000
\end{verbatim}
\noindent In the FORTRAN programming language space-characters
are not
important characters, they do not play an important part,
hence the character between
\texttt{1}
and \texttt{1000} decides that the statement is a \texttt{DO}
cycle statement or it is an assignment statement for the
\texttt{DO10I} identifier.
To sign the right end of the symbol, we introduce the symbol \texttt{/}
into the
description of regular expressions. Its name is
\ki{lookahead operator.}
\inddef{lookahead!operator}%
Using this symbol the description of the
above
\texttt{DO} keyword is the next:
$$\texttt{DO}\ \texttt{/}\ ( \textit{letter\/} \mid \textit{digit\/} )^* =
(
\textit{letter\/} \mid \textit{digit\/} )^* \texttt{,}$$
\noindent This definition means that the lexical analyser says that the
first two
\texttt{D} and \texttt{O} letters are the \texttt{DO} keyword, if
looking
ahead, after the \texttt{O} letter, there are letters or digits, then
there is an equal sign, and after this sign there are letters or digits
again,
and finally, there is a \idez{\ \texttt{,}\ } character.
The lookahead operator implies that the lexical analyser has to look
ahead after
the \texttt{DO} characters. We remark that using this lookahead
method the
lexical analyser recognises the \texttt{DO} keyword even if there is an
error in the character stream, such as in the
\texttt{DO2A=3B,}
character stream, but in a correct assignment statement it does not
detect
the \texttt{DO} keyword.
In the next example we concern for positive integers.
The definition of integer numbers is a prefix of the definition of the
real numbers, and the definition of real numbers is a prefix of
the definition of real numbers containing explicit power-part.
\hspace{\fill}\vspace{1.5truemm}\nobreak\hspace{\fill}
\break\noindent
$\begin{array}{ll}
\textit{positive integer}: \ &\ D^{+} \\
\textit{positive real}: \ &\ D^{+}.D^{+} \\
&\ \textnormal{és}\ D^{+}.D^{+}\mathtt{e}(+ \mid - \mid
\varepsilon )D^{+}
\end{array} $
\hspace{\fill}\vspace{1.5truemm}\nobreak\hspace{\fill}
\break\noindent
\noindent The automaton for all of these three expressions is the
automaton
of the longest character stream, the real number
containing explicit power-part.
The problem of the lookahead symbols is resolved using the following
algorithm. Put the character into a buffer, and put an auxiliary
information aside this character. This information is \idez{it is invalid}.
if the character string, using this red character, is not correct;
otherwise we put the type of the symbol into here.
If the automaton is in a final-state, then the automaton recognises a
real number with explicit power-part. If the automaton is in an
internal state, and there is no possibility to read a next character,
then the longest character stream which has valid information is the
recognised symbol.
\begin{pelda}{}
Consider the \texttt{12.3e+f\#} character stream,
where the character \texttt{\#} is the endsign of the analysed text. If
in this
character stream there was a positive integer number in the place of
character
\texttt{f}, then this character stream should be a real number.
The content of the puffer of lexical analyser:
\hspace{\fill}\vspace{1.5truemm}\nobreak\hspace{\fill}
\break\noindent
$\begin{array}{ll}
\texttt{1} \ &\ \textit{integer number} \\
\texttt{12} \ &\ \textit{integer number} \\
\texttt{12.} \ &\ \textit{invalid} \\
\texttt{12.3} \ &\ \textit{real number} \\
\texttt{12.3e} \ &\ \textit{invalid} \\
\texttt{12.3e+} \ &\ \textit{invalid} \\
\texttt{12.3e+f} \ &\ \textit{invalid} \\
\texttt{12.3e+f\#} \ &\
\end{array} $
\hspace{\fill}\vspace{1.5truemm}\nobreak\hspace{\fill}
\break\noindent
\noindent The recognised symbol is the \texttt{12.3}
\textit{real number}. The lexical analysing is continued at the text
\texttt{e+f}.
\end{pelda}
The number of lookahead-characters may be determined from the
definition of the program language. In the modern languages this
number is at most two.
%%%
\subsubsection{The symbol table.}
\label{subsec-szimbtabla}%
There are programming languages, for example C, in which
small letters and capital letters are different. In this
case the lexical analyser uses characters of all symbols without
modification. Otherwise the lexical analyser converts all characters
to their small letter form or all characters to capital letter form.
It is proposed to execute
this transformation in the source handler program.
At the case of simpler programming languages the lexical analyser
writes the characters of the detected symbol into the symbol table,
\inddef{symbol!table}%
if this symbol is not there. After writing up, or if this symbol
has been in the symbol table already, the lexical analyser returns the
table address of this symbol, and writes this information into
its output.
These data will be important at semantic analysis and code generation.
%%%
\subsubsection{Directives.}
In programming languages the \ki{directives}
\inddef{directive}%
serve to control the compiler. The lexical analyser
identifies directives and recognises their operands,
and usually there are further tasks with these
directives.
If the directive is the \texttt{if} of the conditional compilation,
then the lexical analyser has to detect all of parameters of
this condition, and it has to evaluate the value of the branch. If
this value is \texttt{false}, then it has to omit the
next lines until the \texttt{else} or \texttt{endif}
directive. It means that the lexical analyser performs
syntactic and semantic checking, and creates code-style
information.
This task is more complicate if the programming language gives
possibility to write nested conditions.
Other types of directives are the substitution of macros and
including files into the source text. These tasks are far away
from the original task of the lexical analyser.
The usual way to solve these problems is the following. The compiler
executes a pre-processing program, and this program performs
all of the tasks written by directives.
\begin{gyak}
\ujgyak
Give a regular expression to the comments of a programming language.
In this
language the delimiters of comments are $/*$ and $*/$, and inside of
a comment may occurs $/$ and $*$ characters, but
$*/$ is forbidden.
\ujgyak Modify the result of the previous question if it is supposed that
the programming language has possibility to write nested comments.
\ujgyak Give a regular expression for positive integer numbers, if the
pre- and post-zero characters are prohibited.
Give a deterministic finite automaton for this regular expression.
\ujgyak Write a program, which re-creates the original source program
from the
output of lexical analyser. Pay attention for nice an correct positions
of the re-created character streams.
\end{gyak}
%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Syntactic analysis}
The perfect definition of a programming language includes
the definition of its \textit{syntax}
\index{syntax}%
and \textit{semantics}.
\index{semantics}%
The syntax of the programming languages cannot be written by context
free grammars. \index{context free!grammar}\index{grammar!context free}%
It is possible by using context dependent grammars,
\index{context dependent!grammar}%
\index{grammar!context dependent}%
two-level grammars or
\index{two-level grammar}%
\index{grammar!two-level}%
attribute grammars.
\index{attribute!grammar}%
\index{grammar!attribute}%
For these grammars there are not efficient parsing methods,
hence the description of a language consists of two parts.
The main part of the syntax is given using context free grammars,
and for the remaining part a context dependent or an attribute
grammar
is applied. For example, the description of the program structure or the
description of the statement structure belongs to the first part,
and the type checking, the scope of variables or the correspondence
of formal and actual parameters belong to the second part.
The checking of properties written by context free grammars is called
\textit{syntactic analysis} or \textit{parsing}.
\index{syntactic!analysis}%
\index{analyser!syntactic}%
\index{parsing}%
\index{parser}%
Properties that cannot be written by context free grammars are
called form the \textit{static semantics}.
\index{static!semantics}%
\index{semantics!static}%
These properties are checked by the \textit{semantic analyser}.
\index{semantic!analyser}%
\index{analyser!semantic}%
The conventional semantics has the name \textit{run-time semantics}
or
\textit{dynamic semantics}.
\index{dynamic semantics}%
\index{semantics!dynamic}%
\index{run-time semantics}%
\index{semantics!run-time}%
The dynamic semantics can be given by verbal methods or some
interpreter methods, where the operation of the program is
given by the series of state-alterations of the interpreter
and its environment.
We deal with \textit{context free grammars},
\index{context free!grammar}%
\index{grammar!context free}%
and in this section we will use \textit{extended grammars}
for the syntactic analysis.
We investigate on methods of checking of properties
which are written by context free grammars. First we give
basic notions of the syntactic analysis, then
the parsing algorithms will be studied.
\begin{defi}
\label{def-mondatforma}
Let $G = (N,T,P,S)$\ be a grammar. If \
$S \stackrel{*}\Longrightarrow \alpha$ and $\alpha \in (N \cup T)
^*$
then $\alpha$ is a \textbf{sentential form}.
\index{sentential form}%
If $S \stackrel{*}\Longrightarrow x$ and $x \in T^*$ then \ $x$\
is a \textbf{sentence} of the language defined by the grammar.
\index{sentence}%
\end{defi}
The sentence has an important role in parsing. The program written by
a
programmer
is a series of terminal symbols, and this series is a sentence if
it is correct, that is, it has not syntactic errors.
\index{sentence}%
\begin{defi}
\label{def-reszmondat}%
Let \ $G = (N,T,P,S)$\ be a grammar and
\ $\alpha =\alpha_{1}\beta\alpha _{2}$\ is a sentential form
($\alpha, \alpha_1,
\alpha_2, \beta \in (N \cup T)^*$).
We say that $\beta$ is a \textbf{phrase} of \ $\alpha$, if
\index{phrase}%
there is a symbol $A \in N$, which
$S \stackrel{*}\Longrightarrow \alpha_{1}A\alpha_{2}$ and \
$A \stackrel{*}\Longrightarrow \beta$.
We say that $\alpha$ is a \textbf{simple phrase} of $\beta$,
\index{simple!phrase}%
\index{phrase!simple}%
if $A \rightarrow \beta \in P$.
\end{defi}
We note that every sentence is phrase.
The leftmost simple phrase has an important role in parsing;
it has its own name.
\begin{defi}
\label{def-nyel}
The leftmost simple phase of a sentence is the \textbf
{handle}.
\index{handle}%
\end{defi}
The leaves of the \textit{syntax tree} of a sentence are terminal
symbols, other
points of the tree are nonterminal symbols, and the root symbol of the
tree is the start symbol of the grammar.
In an ambiguous grammar there is at least one sentence, which has
several syntax trees. It means that this sentence has more than one
analysis, and therefore there are several target programs for this
sentence. This ambiguity raises a lot of problems, therefore the
compilers translate
languages generated by unambiguous grammars only.
\index{unambiguous grammar}%
\index{grammar!unambiguous}%
We suppose that the grammar $G$ has properties as follows:
\begin{enumerate}
\item the grammar is \textit{cycle free,}
\index{cycle free grammar}%
\index{grammar!cycle free}%
that is, it has not series of derivations rules
$A \stackrel{+}\Longrightarrow A$\ ($A \in N$),
\item the grammar is \textit{reduced},
\index{reduced grammar}%
\index{grammar!reduced}%
that is, there are not \idez{unused symbols} in the grammar,
all of nonterminals happen in a derivation, and from all
nonterminals we can derive a part of a sentence.
This last property means that for all
$A \in N$ it is true that
$S \stackrel{*}\Longrightarrow \alpha A\beta \stackrel{*}
\Longrightarrow
\alpha y\beta \stackrel{*}\Longrightarrow xyz $,
where $A \stackrel{*}\Longrightarrow y$ and ${\mid}y{\mid} > 0$ \
($\alpha, \beta \in (N \cup T)^*,\ x, y, z \in T^*$).
\end{enumerate}
As it has shown, the lexical analyser translates the program
written by a programmer into series of terminal symbols,
and this series is the input of syntactic analyser.
The task of syntactic analyser is to decide if this
series is a sentence of the grammar or it is not.
To achieve this goal, the parser creates the syntax tree of
the series of symbols. From the known start symbol and the
leaves of the syntax tree the parser creates all
vertices and edges of the tree, that is, it creates a
derivation of the program.
If this is possible, then we say that the program is an element of
the language. It means that the program is syntactically correct.
Hence forward we will deal with \textit{left to right} parsing methods.
\index{left to right!parsing}%
\index{parser!left to right}%
These methods read the symbols of the programs left to right.
All of the real compilers use this method.
To create the inner part of the syntax tree there are
several methods. One of these methods builds the syntax tree
from its start symbol $S$. This method is called
\textit{top-down} method.
\index{top-down parsing}%
\index{parser!top-down}%
If the parser goes from the leaves to the symbol $S$, then
it uses the \textit{bottom-up} parsing method.
\index{bottom-up parsing}%
\index{parser!bottom-up}%
We deal with top-down parsing methods in Subsection
\ref{subsec-felulrol-lefele}. We investigate
bottom-up parsers in Subsection
\ref{subsec-alulrol-fel}; now these methods are used in real
compilers.
%%%
\subsection[\textit{LL}$(1)$ parser]%
{\textit{LL}{\boldmath{$(1)$}} parser}
\label{subsec-felulrol-lefele}
If we analyse from top to down then we start with the start symbol.
This symbol is the root of syntax tree; we attempt to construct the
syntax tree. Our goal is that the leaves of tree are the terminal
symbols.
First we review the notions that are necessary in the top-down
parsing.
Then the ${\textit{LL}}(1)$ table methods and the recursive descent
method will be analysed.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection[\textit{LL}$(k)$ grammars]%
{\textit{LL}{\boldmath{$(k)$}} grammars}
Our methods build the syntax tree top-down and read symbols
of the program left to right. For this end we try to create
terminals on the left side of sentential forms.
\begin{defi}
\label{def-legbal-hely}%
\index{leftmost!direct!derivation}%
\index{derivation!leftmost!direct}%
If\ $A \rightarrow \alpha \in P$ then the \textbf{leftmost direct
derivation}
of the sentential form $xA\beta$\
($x \in T^*,\ \alpha, \beta \in (N \cup T)^*$) is
\ $x\alpha\beta$,
and
$$xA\beta \underset{leftmost}{\Longrightarrow} x\alpha\beta
\koz .$$
\end{defi}
\begin{defi}
\label{def-legbal-lev}%
\index{leftmost!derivation}%
\index{derivation!leftmost}%
If all of direct derivations in \ $S \stackrel{*}\Longrightarrow x$\
($x \in T^*$)
are leftmost, then this derivation is said to be
\textbf{leftmost derivation}, and \
$$S \underset{leftmost}{\overset{*}{\Longrightarrow}} x \koz .$$
\end{defi}
In a leftmost derivation terminal symbols appear at the left
side of the sentential forms. Therefore we use leftmost derivations
in all of top-down parsing methods. Hence
if we deal with top-down methods, we do not write the text
``leftmost'' at the arrows.
One might as well say that we create all possible syntax trees.
Reading leaves from left to right, we take sentences of the
language. Then we compare these sentences with the parseable text
and if a sentence is same as the parseable text, then we can read
the steps of parsing from the syntax tree which is belongs to this
sentence. But this method is not practical; generally
it is even impossible to apply.
A good idea is the following. We start at the start symbol of the
grammar, and using
leftmost derivations we try to create the text of the program.
If we use a not suitable derivation at one of steps of parsing,
then we find that,
at the next step, we can not apply a proper derivation.
At this case such terminal symbols are at the left side of the
sentential form, that are not same as in our parseable text.
For the leftmost terminal symbols we state the theorem as follows.
\begin{tetel}
\label{tet-felulrol-le}%
If $S \stackrel{*}\Longrightarrow x\alpha \stackrel{*}\Longrightarrow
yz$
\ ($\alpha \in (N \cup T)^*,\ x, y, z \in T^*$) és ${\mid}x{\mid} =
{\mid}y{\mid}$, then $x = y$ \koz .
\end{tetel}
The proof of this theorem is trivial. It is not possible to
change the leftmost terminal symbols $x$ of sentential forms
using derivation rules of a context free grammar.
This theorem is used during the building of syntax tree,
to check that the leftmost terminals of the tree are same as the
leftmost
symbols of the parseable text. If they are different then
we created wrong directions with this syntax tree.
At this case we have to make a backtrack, and we have to apply an
other derivation rule. If it is impossible (since for example there are no
more derivation rules) then we have to apply a backtrack once again.
General top-down methods are realized by using backtrack
algorithms, but these backtrack steps make the parser very slow.
Therefore we will deal only with grammars such that have parsing
methods without backtracks.
The main properties of \textit{LL}$(k)$ grammars are
the following.
If, by creating the leftmost derivation
$S \stackrel{*}\Longrightarrow wx$\ ($w, x \in T^*$),
we obtain the sentential form
$S \stackrel{*}\Longrightarrow wA\beta$
($A \in N,\ \beta \in (N \cup T)^*$) at some step of this derivation,
and our goal is to achieve $A\beta \stackrel{*}\Longrightarrow x$,
then the next step of the derivation for nonterminal $A$
is determinable unambiguously from the first
$k$ symbols of $x$.
To look ahead $k$ symbols we define the function
\textit{First}$_k$.
\begin{defi}
\inddef{First$_k$}%
Let \textbf{First}{\boldmath{$_{k}(\alpha)$}}\ $(k \ge 0,
\ \alpha \in (N \cup T)^*)$ be the set as follows.
$${\textit{First}}_k(\alpha ) = $$
$$\{
x \ \mid \ \alpha \stackrel{*}\Longrightarrow x\beta\
\textit{and}\ |x| = k \}\ \cup\
\textit{and}\ |x| = k \}\ \cup\
\{ x \ \mid\ \alpha \stackrel{*}\Longrightarrow x\
\textit{and}\ |x| < k \}\ \ $$
$$(x \in T^*,\ \beta \in (N \cup T)^*) \koz .
$$
\end{defi}
The set ${\textit{First}}_k(x)$ consists of the first $k$ symbols of
$x$;
for $|x| < k$,
it consists the full $x$. If
$\alpha \stackrel{*}\Longrightarrow
\varepsilon$, then
$\varepsilon \in {\textit{First}}_k(\alpha )$.
\begin{defi}
\label{def-llk}
\index{LL(k)!grammar}%
\index{grammar!LL(k)}%
The grammar $G$ is a \textbf{\textit{LL}{\boldmath{$(k)$}}
grammar}\
$( k \ge 0 )$, if for derivations
$$\begin{array}{c} S \stackrel{*}\Longrightarrow w A \beta
\Longrightarrow w \alpha_{1} \beta \stackrel{*}\Longrightarrow w x
\koz ,\\
S \stackrel{*}\Longrightarrow w A \beta \Longrightarrow
w \alpha_{2} \beta \stackrel{*}\Longrightarrow w y \koz
\end{array}$$
\noindent ($A \in N,\ x, y, w \in T^*,\ \alpha_1, \alpha_2, \beta \in
(N \cup T)^*$) the equality
$${\textit{First}}_k(x) = {\textit{First}}_k(y)$$
\noindent implies
$$\alpha_{1} = \alpha_{2} \koz .$$
\end{defi}
Using this definition, if a grammar is a \textit{LL}$(k)$
grammar then the $k$ symbol after the parsed $x$ determine
the next derivation rule unambiguously
(Figure \ref{fig-llk}).
\begin{figure}[t!]
\begin{picture}(140,115)(-5,-15)
\put(160,93){\fbox{\makebox(10,6){$S$}}}
\put(122,76){\line(6,2){37}}
\put(215,76){\line(-6,2){37}}
\put(120,65){\fbox{\makebox(31,6){$w$}}}
\put(117,46){\line(1,4){3.5}}
\put(160,65){\fbox{\makebox(10,6){$A$}}}
\put(157,46){\line(1,4){3.5}}
\put(180,46){\line(-1,4){3.5}}
\put(179,65){\fbox{\makebox(33,6){$\beta$}}}
\put(223,46){\line(-1,4){3.5}}
\put(115,35){\fbox{\makebox(31,6){$w$}}}
\put(112,16){\line(1,4){3.5}}
\put(155,35){\fbox{\makebox(20,6){$\alpha$}}}
\put(235,16){\line(-3,4){10}}
\put(152,16){\line(1,4){3.5}}
\put(184,35){\fbox{\makebox(33,6){$\beta$}}}
\put(110,5){\fbox{\makebox(31,6){$w$}}}
\put(150,5){\fbox{\makebox(80,6){\ \ \ $x$\hfill}}}
\put(150,0){\line(0,-1){8}}
\put(161,-10){\makebox(0,0){$\longmapsto^{\mbox{\textit{k}}}$}}
%
\end{picture}
\caption{\textit{LL}$(k)$ grammar.}
\label{fig-llk}
\end{figure}
One can see from this definition that if a grammar is an
$\textit{LL}(k_{0})$ grammar then for all
$k > k_{0}$ it is also an \textit{LL}$(k)$ grammar.
If we speak about \textit{LL}$(k)$ grammar then we also mean
that $k$ is the least number such that the properties of
the definition are true.
\begin{pelda}
The next grammar is a \textit{LL}$(1)$ grammar. Let
$G = ( \{ A,S \}, \{ a,b \}, P, S ) $ be a grammar whose
derivation rules are:
$S \rightarrow AS \mid \varepsilon$
$A \rightarrow aA \mid b $
\noindent We have to use the derivation $S \rightarrow AS$ for the
start symbol $S$ if the next symbol of the parseable text is
$a$ or $b$.
We use the derivation
$S \rightarrow \varepsilon$ if the next symbol is the mark \#.
\end{pelda}
\begin{pelda}
The next grammar is a \textit{LL}(2) grammar.
Let
$G = ( \{ A,S \}, \{ a,b \}, P, S )$ be a grammar whose
the derivation rules are:
$S \rightarrow abA \mid \varepsilon $
$A \rightarrow Saa \mid b $
\noindent One can see that at the last step of derivations
$$S \Longrightarrow abA \Longrightarrow abSaa
\overset{S \rightarrow abA}{\Longrightarrow} ababAaa$$
and
$$S \Longrightarrow abA \Longrightarrow abSaa
\overset{S \rightarrow \varepsilon}{\Longrightarrow} abaa$$
if we look ahead one symbol, then in both derivations
we obtain the symbol $a$. The proper rule for symbol $S$ is
determined
to look ahead two symbols ($ab$ or $aa$).
\end{pelda}
There are context free grammars such that are not \textit{LL}$(k)$
grammars.
For example the next grammar is not \textit{LL}$(k)$ grammar for any
$k$.
\begin{pelda}
Let $G = ( \{A,B,S\},\{a,b,c\},P,S)$ be a grammar whose
the derivation rules are:
$S \rightarrow A \mid B$
$A \rightarrow aAb \mid ab$
$B \rightarrow aBc \mid ac $
\noindent $L(G)$ consists of sentences $a^{i}b^{i}$ és $a^{i}c^{i}
$\ $(i \ge 1)$.
If we analyse the sentence $a^{k+1}b^{k+1}$, then at the first step
we can
not decide by looking ahead $k$ symbols whether we have to use the
derivation
$S \rightarrow A$ or $S \rightarrow B$, since
for all $k$
${\textit{First}}_k(a^{k}b^{k}) = {\textit{First}}_k(a^{k}c^{k})=
a^{k}$.
\end{pelda}
By the definition of the \textit{LL}$(k)$ grammar,
if we get the sentential form
$wA\beta$ using leftmost derivations, then
the next $k$ symbol determines the next rule for symbol $A$.
This is stated in the next theorem.
\begin{tetel}
\label{tetel-llk}%
Grammar $G$ is a \textit{LL}$(k)$ grammar iff
$$S \stackrel{*}\Longrightarrow wA\beta,\ \textit{és}\ A \rightarrow
\gamma \mid \delta\ \
(\gamma \neq \delta,\ w \in T^*,\ A \in N,\ \beta, \gamma, \delta \in (N
\cup T)^*)$$
implies
$${\textit{First}}_k(\gamma \beta ) \cap
{\textit{First}}_k(\delta \beta ) = \emptyset \koz . $$
\end{tetel}
If there is a
$A \rightarrow \varepsilon$ rule in the grammar, then
the set ${\textit{First}}_k$ consists
the $k$ length prefixes of terminal series generated from
$\beta$. It implies that, for deciding the property
\textit{LL}$(k)$, we have to check not only the derivation rules, but
also the infinite derivations.
We can give good methods, that are used in the practice, for
\textit{LL}$(1)$
grammars only. We define the follower-series, which
follow a symbol or series of symbols.
\begin{defi}
\label{def-follow-k}
\index{Follow$_k$}%
\textbf{Follow}{\boldmath{$_{k}(\beta)$}} $ = \{ x \mid S
\stackrel{*}\Longrightarrow \alpha\beta\gamma\
\textit{and}\ x \in {\textit{First}}_k(\gamma) \}$,
and if $\varepsilon \in {\textit{Follow}}_k(\beta)$,
then
${\textit{Follow}}_k(\beta) =
{\textit{Follow}}_k(\beta) \setminus \{ \varepsilon \} \cup \{ \# \}$
\ \ ($\alpha, \beta, \gamma \in (N \cup T)^*,\ x \in T^*$) \koz .
\end{defi}
The second part of the definition is necessary because
if there are no symbols after the $\beta$ in the derivation
$\alpha\beta\gamma$, that is
$\gamma =
\varepsilon$, then the next symbol after $\beta$ is the mark
\# only.
\textit{Follow}$_1(A)$ \ ($A \in N$) consists of terminal symbols
that can be immediately after the symbol $A$ in the derivation
$$S \stackrel{*}\Longrightarrow \alpha A\gamma
\stackrel{*}\Longrightarrow \alpha Aw \ (\alpha, \gamma \in (N \cup T)
^*,
\ w \in T^*).$$
\begin{tetel}
\label{tetel-ll1}%
The grammar $G$ is a \textit{LL$(1)$} grammar iff,
for all nonterminal $A$ and for all derivation rules $A \rightarrow
\gamma \mid \delta $,
$${\textit{First}}_{1}(\gamma {\textit{Follow}}_{1}(A)) \cap
{\textit{First}}_{1}(\delta {\textit{Follow}}_{1}(A)) = \emptyset
\koz . $$
\end{tetel}
In this theorem the expression
${\textit{First}}_{1}(\gamma
{\textit{Follow}}_{1}(A))$
means that we have to concatenate to
$\gamma$ the elements of set $\textit{Follow}_{1}(A)$
separately, and for all elements of this new set we have to
apply the function
${\textit{First}}_{1}$.
It is evident that Theorem \ref{tetel-ll1}
is suitable to decide whether a grammar is
\textit{LL}$(1)$ or it is not.
Hence forward we deal with
\textit{LL}$(1)$ languages determined by \textit{LL}$(1)$ grammars,
and we investigate
the parsing methods of
\textit{LL}$(1)$ languages.
For the sake of simplicity, we omit indexes from the names of functions
${\textit{First}}_1$ és ${\textit{Follow}}_{1}$.
The elements of the set {\textit{First}}$(\alpha)$ are determined
using the next
algorithm.
\begin{algN}{First$(\alpha)$}\inddef{\textsc{First}}
1 \' \key{if} \= $\alpha = \varepsilon$ \\
2 \' \> \key{then} $F\leftarrow \{\varepsilon\}$ \\
3 \' \key{if} \= $\alpha = a$, where $ a \in T$ \\
4 \' \> \key{then} $F\leftarrow \{a\}$
\algnewpage
5 \' \key{if} \= $\alpha = A$, where $A \in N$ \\
6 \' \> \key{then} \= \key{if} \= $A \rightarrow \varepsilon \in P$
\\
7 \' \> \> \> \key{then} \= $F\leftarrow \{\varepsilon\}$ \\
8 \' \> \> \> \key{else} \> $F\leftarrow \emptyset$ \\
9 \' \>\>\key{for} \= all $A \rightarrow Y_1 Y_2 \dots Y_m \in P$\ \ $(m
\geq 1)$ \\
10 \' \>\> \> \key{do} \= $F \leftarrow F \cup (\textsc{First}(Y_1)
\setminus \{\varepsilon\})$\\
11 \' \>\> \> \> \key{for} \= $k \leftarrow 1$ \key{to} $m-1 $ \\
12 \' \>\> \> \> \>\key{do} \key{if} \=$Y_1 Y_2 \dots Y_k
\overset{*}{\Longrightarrow} \varepsilon$ \\
13 \' \>\> \> \> \>\> \key{then} $F \leftarrow F \cup (\textsc{First}(Y_
{k+1})\setminus \{\varepsilon\})$ \\
14 \' \>\> \> \>\key{if} \=$Y_1 Y_2 \dots Y_m
\overset{*}{\Longrightarrow} \varepsilon$ \\
15 \' \>\> \> \>\> \key{then} $F \leftarrow F \cup \{\varepsilon\}$ \\
16 \' \key{if} \= $\alpha = Y_1 Y_2 \dots Y_m \ (m \geq 2)$ \\
17 \' \> \key{then} \= $F \leftarrow (\textsc{First}(Y_1) \setminus
\{\varepsilon\})$\\
18 \' \> \> \key{for} \= $k \leftarrow 1\ \key{to}\ m-1 $ \\
19 \' \> \> \>\key{do} \key{if} \=$Y_1 Y_2 \dots Y_k
\overset{*}{\Longrightarrow} \varepsilon$ \\
20 \' \> \> \>\> \key{then} $F \leftarrow F \cup (\textsc{First}(Y_
{k+1})\setminus \{\varepsilon\})$ \\
21 \' \> \>\key{if} \=$Y_1 Y_2 \dots Y_m
\overset{*}{\Longrightarrow} \varepsilon$ \\
22 \' \> \>\> \key{then} $F \leftarrow F \cup \{\varepsilon\}$ \\
23 \' \key{return} $F$
\end{algN}
In lines 1--4 the set is given for $\varepsilon$ and a
terminal symbol $a$. In lines 5--15 we construct the
elements of this set for a nonterminal $A$.
If $\varepsilon$ is derivated from $A$ then we put
symbol $\varepsilon$ into the set in lines 6--7 and 14--15.
If the argument is a symbol stream then the elements of the
set are constructed in lines 16--22.
We notice that we can terminate the \key{for} cycle
in lines 11 and 18 if $Y_k \in T$, since in this case
it is not possible to derive symbol $\varepsilon$ from
$Y_1 Y_2 \dots Y_k$.
In Theorem \ref{tetel-ll1} and hereafter, it is necessary
to know the elements of the set
${\textit{Follow}}(A)$. The next algorithm constructs this set.
\begin{algN}{Follow$(A)$}\inddef{\textsc{Follow}}
1 \' \key{if} \= $A = S$ \\
2 \' \> \key{then} \= $F\leftarrow \{\#\}$ \\
3 \' \> \key{else} \> $F\leftarrow \emptyset$ \\
4 \' \key{for} \= all rules $B\rightarrow \alpha A \beta \in P$ \\
5 \' \> \key{do} \= \key{if} \= $|\beta| > 0$ \\
6 \' \>\>\> \key{then} \= $F\leftarrow F \cup (\textsc{First}(\beta)
\setminus \{\varepsilon\})$ \\
7 \' \>\>\>\> \key{if} \= $\beta
\overset{*}{\Longrightarrow} \varepsilon$ \\
8 \' \>\>\>\>\> \key{then} $F\leftarrow F \cup \textsc{Follow}(B)$ \\
9 \' \>\>\> \key{else} \> $F\leftarrow F \cup \textsc{Follow}(B)$ \\
10 \' \key{return} $F$
\end{algN}
The elements of the
${\textit{Follow}}(A)$ set get into the set $F$.
In lines 4--9 we check that, if the argumentum is at the right side of a
derivation
rule, what symbols may stand immediately after him.
It is obvious that no $\varepsilon$ is in this set, and the symbol
$\#$ is in the set only if the argumentum is the rightmost symbol of a
sentential form.
%%%
\subsubsection{Parsing with table}
Suppose that we analyse a series of terminal symbols
$xay$ and the part $x$ has already been analysed without errors.
We analyse the text with a top-down method, so we use leftmost
derivations.
Suppose that our
sentential form is
$xY\alpha$, that is, it has form
$xB\alpha$ or $xb\alpha$ ($Y \in (N \cup T),\ B \in N,\ a,b \in T,\ x, y \in
T^*,
\ \alpha \in (N \cup T)^*$)\ (Figure \ref{fig-mesz}).
\begin{figure}[t!]%
\begin{picture}(143,95)(50,25)
\put(160,93){\fbox{\makebox(10,6){$S$}}}
\thicklines
\put(122,76){\line(6,2){37}}
\put(215,76){\line(-6,2){37}}
\put(120,65){\fbox{\makebox(31,6){$x$}}}
\put(117,46){\line(1,4){3.5}}
\put(160,65){\fbox{\makebox(10,6){$B$}}}
\put(157,46){\line(1,4){3.5}}
\put(179,65){\fbox{\makebox(33,6){$\alpha$}}}
\put(223,46){\line(-1,4){3.5}}
\put(115,35){\fbox{\makebox(31,6){$x$}}}
\put(155,35){\fbox{\makebox(62,6)[l]{$\ a\ y$}}}
\end{picture}
%
\begin{picture}(143,95)(65,25)
\put(160,93){\fbox{\makebox(10,6){$S$}}}
\thicklines
\put(122,76){\line(6,2){37}}
\put(215,76){\line(-6,2){37}}
\put(120,65){\fbox{\makebox(31,6){$x$}}}
\put(117,46){\line(1,4){3.5}}
\put(160,65){\fbox{\makebox(10,6){$b$}}}
\put(157,46){\line(1,4){3.5}}
\put(179,65){\fbox{\makebox(33,6){$\alpha$}}}
\put(223,46){\line(-1,4){3.5}}
\put(115,35){\fbox{\makebox(31,6){$x$}}}
\put(155,35){\fbox{\makebox(62,6)[l]{$\ a\ y$}}}
\end{picture}
\caption{The sentential form and the analysed text.}
\label{fig-mesz}
\end{figure}
In the first case the next step is the substitution of symbol $B$.
We know the next element of the input series, this is the terminal
$a$, therefore we can determine the correct substitution of symbol
$B$.
This substitution is the rule
$B\ \rightarrow\ \beta$ for which
$a \in \textit{First}(\beta \textit{Follow}(B))$.
If there is such a rule then, according to the definition of
\textit{LL}$(1)$ grammar, there is exactly one.
If there is not such a rule, then a
\textit{syntactic error} was found.
\index{syntactic!error}%
\index{error!syntactic}%
In the second case the next symbol of the sentential form is
the terminal symbol $b$, thus we look out for the symbol $b$
as the next symbol of the analysed text. If this comes true,
that is, $a = b$, then the symbol $a$ is a correct symbol and we can go
further.
We put the symbol $a$ into the already analysed text.
If
$a \neq b$, then here is a \textit{syntactic error}.
\index{syntactic!error}%
\index{error!syntactic}%
We can see that the position of the error is known, and the erroneous
symbol is the
terminal symbol $a$.
The action of the parser is the following.
\index{parser!LL(1)}%
Let \# be the sign of the right end of the analysed text, that is,
the mark \# is the last symbol of the text. We use a stack through the
analysing,
the bottom of the stack is signed by mark \#, too.
We give serial numbers to derivation rules and through the analysing
we write the
number of the applied rule
into a list. At the end of parsing we can write the syntax tree from this
list (Figure \ref{fig-llelemz}).
We sign the \textit{state of the parser} using triples
\index{parser!state}%
\index{state!parser}%
$(ay\#,X\alpha\#,v)$.
The symbol $ay\#$ is the text not analysed yet.
$X\alpha\#$
is the part of the sentential form corresponding to the not analysed
text;
this information is in the stack, the symbol $X$ is at the top of the
stack.
$v$ is the list of the serial numbers of production rules.
If we analyse the text then we observe the symbol $X$ at the top of
the stack,
and
the symbol $a$ that is the first symbol of the not analysed text.
The name of the symbol $a$ is
\textit{actual symbol}.
\index{actual!symbol}%
\index{symbol!actual}%
There are pointers to the top of the stack and to the actual symbol.
\begin{figure}[t!]
\begin{picture}(238,100)(-25,-20)
\put(132,75){\fbox{\makebox(20,5){$x$}}}
\put(160,75){\fbox{\makebox(6,5){$a$}}}
\put(174,75){\fbox{\makebox(40,5){\ \ $y$\hfill}}}
\put(222,75){\fbox{\makebox(6,5){$\#$}}}
\put(128,35){\fbox{\makebox(60,15){\textit{parser}}}}
\put(180,0){\framebox{\makebox(46,8){\ \ $v$\hfill}}}
\put(80,-11){\line(0,1){80}}
\put(95,-11){\line(0,1){80}}
\put(80,-11){\line(1,0){15}}
\put(80,4){\line(1,0){15}}
\put(80,52){\line(1,0){15}}
\put(80,37){\line(1,0){15}}
%\put(88,55){\makebox(0,0){$X$}}
\put(88,44){\makebox(0,0){$X$}}
\put(88,19){\makebox(0,0){$\alpha$}}
\put(87,-5){\makebox(0,0){$\#$}}
%
\put(127,45){\vector(-1,0){30}}
\put(165,55){\vector(0,1){15}}
\put(185,29){\vector(0,-1){15}}
%
\end{picture}
\caption{The structure of the \textit{LL$(1)$} parser.}
\label{fig-llelemz}
\end{figure}
We use a top down parser, therefore the initial content of the stack
is
$S\#$. If the initial analysed text is
$xay$, then the initial state of the parsing process is the triple
\index{parser!initial state}%
\index{initial state!parser}%
$(xay\#, S\#, \varepsilon)$, where
$\varepsilon$ is the sign of the empty list.
We analyse the text, the series of symbols using a
\textit{parsing table.}
\index{parsing!table}%
\index{table!parsing}%
\index{pop}%
\index{accept}%
\index{error}%
The rows of this table sign the symbols at the top of the stack,
the columns of the table sign the next input symbols, and we write
mark \# to the last row and the last column of the table.
Hence the number of rows of the table is greater by one than the
number of symbols of the grammar, and the number of columns
is greater by one than the number of terminal symbols of the grammar.
The element $T[X,a]$ of the table is as follows.
$$T[X,a] = \left\{
\begin{array}{ll}
(\beta,i), & \ \textnormal{if}\ X \rightarrow \beta\
\textnormal{az}\ i\textnormal{-th derivation rule} \koz , \\
& \ a \in {\textit{First}}(\beta)\ \textnormal{or} \\
& \ (\varepsilon \in {\textit{First}}(\beta)\
\textnormal{and}\ a \in {\textit{Follow}}(X)) \koz , \\
\textit{pop}, & \ \textnormal{if}\ X = a \koz , \\
\textit{accept},\ & \ \textnormal{if}\ X = {\textsf{\#}}
\ \textnormal{and}\ a = {\textsf{\#}} \koz , \\
\textit{error} & \ \textnormal{otherwise} \koz .
\end{array}
\right.
\index{pop}%
\index{accept}%
\index{error}%
$$
We fill in the parsing table using the following algorithm.
\begin{algN}{LL(1)-Table-Fill-in$(G)$}\inddef{\textsc{LL(1)-Table-Fill-in}}
1 \' \key{for} \= all $A\in N$\\
2 \' \> \key{do} \= \key{if} \= $A\rightarrow \alpha \in P$
the $i$-th rule \\
3 \' \> \> \> \key{then} \key{for} \= all $a \in \textsc{First}(\alpha)$-
ra \\
4 \' \> \> \> \> \key{do} $T[A,a] \leftarrow (\alpha, i)$ \\
5 \' \> \> \> \> \key{if} \= $\varepsilon \in \textsc{First}(\alpha)$ \\
6 \' \> \> \> \> \> \key{then} \key{for} \= all $a \in \textsc{Follow}
(A)$ \\
7 \' \> \> \> \> \> \> \key{do} $T[A,a] \leftarrow (\alpha, i)$\\
8 \'\key{for} \= all $a\in T$\\
9 \' \> \key{do} $T[a,a] \leftarrow \textit{pop}$ \\
10 \' $T[\#,\#] \leftarrow \textit{accept}$\\
11 \' \key{for} \= all $X \in (N \cup T \cup \{\#\})$ and all $a \in (T \cup
\{\#\})$\\
12 \' \> \key{do} \= \key{if} \= $T[X,a] =$ \idez{empty} \\
13 \' \> \> \> \key{then} $T[X,a] \leftarrow \textit{error}$ \\
14 \' \key{return} $T$
\end{algN}
At the line 10 we write the text \textit{accept} into the right lower
corner of the table. At the lines 8--9 we write the text \textit{pop} into the main diagonal of the
square labelled by terminal symbols. The program in lines 1--7
writes a tuple in which the first element is the right part of a derivation
rule and the second element is the serial number of this rule.
In lines 12--13 we write \textit{error} texts into the empty positions.
The actions of the parser are written by state-transitions.
The initial state is
$(x\#,S\#,\varepsilon)$, where the initial text is $x$, and the parsing
process
will be finished if the parser goes into the state
$(\#,\#,w)$, this state is the \textit{final state.}
\index{parser!final state}%
\index{final state!parsing}%
If the text is
$ay\#$ in an intermediate step, and the symbol $X$ is at the top of
stack,
then the possible state-transitions are as follows.
$$(ay\#,X\alpha\#,v)\ \rightarrow\ \left\{
\begin{array}{ll}
(ay\#,\beta \alpha\#,vi), \ &\ \ \textnormal{ha}\ T[X,a] =
(\beta ,i) \koz , \\
(y\#,\alpha\#,v), \ &\ \ \textnormal{ha}\ T
[X,a] = \textit{pop} \koz , \\
\textit{O.K.}, \ &\ \ \textnormal{ha}\ T[X,a] = \textit
{accept} \koz , \\
\textit{ERROR}, \ &\ \ \textnormal{ha}\ T[X,a] = \textit{error}
\koz .
\end{array}
\right.
$$
The letters \textit{O.K.} mean that the analysed text is syntactically
correct;
the text
\textit{ERROR} means that a syntactic error is detected.
The actions of this parser are written by the next algorithm.
\begin{algN}{LL(1)-Parser$(xay\#,T)$}\inddef{\textsc{LL(1)-Parser}}
1 \' $s \leftarrow (xay\#, S\#, \varepsilon)$, $s' \leftarrow \textit
{analyze}$ \\
2 \' \key{repeat} \= \\
3 \' \> \key{if} \= $s = (ay\#, A\alpha\#, v)$ és $T[A,a] = (\beta,i)$ \\
4 \' \> \> \key{then} $s \leftarrow (ay\#, \beta\alpha\#, vi)$ \\
5 \' \> \> \key{else}
\key{if} \= $s = (ay\#, a\alpha\#, v)$ \\
6 \' \> \> \> \key{then} \= $s \leftarrow (y\#, \alpha\#, v)$
\` $\rhd$ Then $T[a,a] = \textit{pop}$.\\
7 \' \>\> \> \key{else} \> \key{if} \= $s = (\#,\#,v)$ \\
8 \' \> \> \> \> \key{then} \= $ s' \leftarrow \textit{O.K.}$
\` $\rhd$ Then $T[\#,\#] = \textit{accept}$.
\\
9 \' \> \> \> \> \key{else} \> $ s' \leftarrow \textit{ERROR}$
\` $\rhd$ Then $T[A,a] = \textit{error}$. \\
10 \' \key{until} $ s' = \textit{O.K.}$ \key{or} $s' = \textit{ERROR}$ \\
11 \' \key{return} $s',s$
\end{algN}
The input parameters of this algorithm are the text
$xay$ and the parsing table $T$.
The variable $s'$ describes the state of the parser: its value is
\textit{analyse,} during the analysis, and it is either \textit{O.K.} or
\textit{ERROR.} at the end.
The parser determines his action by the actual symbol $a$ and by the
symbol at
the top of the stack, using the parsing table $T$.
In the line 3--4 the parser builds the syntax tree using the derivation
rule
$A \rightarrow \beta$. In lines 5--6 the parser executes a shift action,
since there is a symbol $a$ at the top of the stack.
At lines 8--9 the algorithm finishes his work if the stack is empty and
it is at the end of the text, otherwise a syntactic error was detected.
At the end of this work the result is
\textit{O.K.} or \textit{ERROR} in the variable $s'$,
and, as a result, there is the triple $s$ at the output of this algorithm.
If the text was correct, then we can
create the syntax tree of the analysed text from the third element of
the triple. If there was an error, then
the first element of the triple points to the position of the erroneous
symbol.
\begin{pelda}{}
\label{pld-ll1}%
Let $G$ be a grammar
$G = ( \{ E,E',T,T',F \}, \{ +,*,(,),i \}, P, E ) , $
where the set $P$ of derivation rules:
$E \rightarrow TE'$
$E' \rightarrow +TE' \mid \varepsilon$
$T \rightarrow FT'$
$T' \rightarrow *FT' \mid \varepsilon$
$F \rightarrow (E) \mid i $
>From these rules we can determine the ${\textit{Follow}}(A)$ sets.
To fill in the parsing table, the following sets are required:
${\textit{First}}(TE') = \{ ( , i \}$,
${\textit{First}}(+TE') = \{ + \}$,
${\textit{First}}(FT') = \{ ( , i \}$,
${\textit{First}}(*FT') = \{ * \}$,
${\textit{First}}((E)) = \{ ( \}$,
${\textit{First}}(i) = \{ i \}$,
${\textit{Follow}}(E') = \{ ) , \# \}$,
${\textit{Follow}}(T') = \{ +, ) , \# \} . $
The parsing table is as follows. The empty positions in the table mean
\textit{errors}
\hspace{\fill}\vspace{1.5truemm}\nobreak\hspace{\fill}
\break\noindent
\begin{center}$
\begin{tabular}{||l||l|l|l|l|l|l||}
\hhline{|t:=:t:======:t|}
& + & * & ( & ) & $i$ & \# \\
\hhline{|:=::======:|}
$E$ & & & $(TE',1)$ & & $(TE',1)$ & \\
$E'$ & $(+TE',2)$ & & & $(\varepsilon,3)$ & &
$(\varepsilon,3)$ \\
$T$ & & & $(FT',4)$ & & $(FT',4)$ & \\
$T'$ & $(\varepsilon,6)$ & $(*FT',5)$ & & $(\varepsilon,6)$ & &
$(\varepsilon,6)$ \\
$F$ & & & $((E),7)$ & & $(i,8)$ & \\
\hhline{||-||-|-|-|-|-|-||}
+ & \textit{pop} & & & & & \\
* & & \textit{pop} & & & & \\
( & & & \textit{pop} & & & \\
) & & & & \textit{pop} & & \\
$i$ & & & & & \textit{pop} & \\
\hhline{||-||-|-|-|-|-|-||}
\# & & & & & & \textit{accept} \\
\hhline{|b:=:b:======:b|}
\end{tabular}$
\end{center}
\end{pelda}
\newpage
\begin{pelda}
Using the parsing table of the previous example, analyse the text
$i+i*i$.
\hspace{\fill}\vspace{1.5truemm}\nobreak\hspace{\fill}
\break\noindent
\begin{tabular}{llcrrlc}
($i+i*i\#$,\ $S\#$,\ $\varepsilon$)
& $\xrightarrow{(TE',1)}$ & ( & $i+i*i\#$, & $TE'\#$, & 1 & ) \\
& $\xrightarrow{(FT',4)}$ & ( & $i+i*i\#$, & $FT'E'\#$, & 14 & ) \\
& $\xrightarrow{(i,8)}$ & ( & $i+i*i\#$, & $iT'E'\#$, & 148 & ) \\
& $\xrightarrow{\textit{pop}}$ & ( & $+i*i\#$, & $T'E'\#$, & 148 & )
\\
& $\xrightarrow{(\varepsilon,6)}$ & ( & $+i*i\#$, & $E'\#$, & 1486
& ) \\
& $\xrightarrow{(+TE',2)}$ & ( & $+i*i\#$, & $+TE'\#$, & 14862
& ) \\
& $\xrightarrow{\textit{pop}}$ & ( & $i*i\#$, & $TE'\#$, & 14862 & )
\\
& $\xrightarrow{(FT',4)}$ & ( & $i*i\#$, & $FT'E'\#$, & 148624 & )
\\
& $\xrightarrow{(i,8)}$ & ( & $i*i\#$, & $iT'E'\#$, & 1486248 & )
\\
& $\xrightarrow{\textit{pop}}$ & ( & $*i\#$, & $T'E'\#$, & 1486248
& ) \\
& $\xrightarrow{(*FT',5)}$ & ( & $*i\#$, & $*FT'E'\#$, & 14862485
& ) \\
& $\xrightarrow{\textit{pop}}$ & ( & $i\#$, & $FT'E'\#$, & 14862485
& ) \\
& $\xrightarrow{(i,8)}$ & ( & $i\#$, & $iT'E'\#$, & 148624858 & )
\\
& $\xrightarrow{\textit{pop}}$ & ( & $\#$, & $T'E'\#$, & 148624858
& ) \\
& $\xrightarrow{(\varepsilon,6)}$ & ( & $\#$, & $E'\#$, &
1486248586 & ) \\
& $\xrightarrow{(\varepsilon,3)}$ & ( & $\#$, & $\#$, &
14862485863 & ) \\
& $\xrightarrow{\textit{accept}}$ & & \textit{O.K.}
\end{tabular}
\hspace{\fill}\vspace{1.5truemm}\nobreak\hspace{\fill}
\break\noindent
The syntax tree of the analysed text is the Figure
{\ref{fig-ll1fa}}.
\end{pelda}
\begin{figure}[t!]
\hspace*{\fill}
\xymatrix{
& & E \ar@{-}[dl] \ar@{-}[dr] \\
& T \ar@{-}[dl] \ar@{-}[d] & & E' \ar@{-}[dl] \ar@{-}[d] \ar@{-}[dr] \\
F \ar@{-}[d] & T' \ar@{-}[d] & + & T \ar@{-}[dl] \ar@{-}[dr] &
E' \ar@{-}[dr] \\
i & {\varepsilon} & F \ar@{-}[d] & & T' \ar@{-}[dl] \ar@{-}[d]
\ar@{-}[dr] & {\varepsilon} \\
& & i & {*} & F \ar@{-}[d] & T' \ar@{-}[d] \\
& & & & i & {\varepsilon} }
\hspace*{\fill}
\caption{The syntax tree of the sentence $i+i*i$.}
\label{fig-ll1fa}
\end{figure}
\subsubsection{Recursive-descent parsing method}
There is another frequently used
method for the backtrackless top-down parsing. Its essence is that we
write a real program for the
applied grammar. We create procedures to the symbols of grammar,
and
using these procedures the recursive procedure calls realize the
stack of the parser and the stack management.
This is a top-down parsing method, and the procedures call
each other recursively; it is the origin of the name of this
method, that is, \ki{recursive-descent method.}\index{recursive-descent method}%
To check the terminal symbols we create the procedure
\textit{Check}.
\index{check}%
Let the parameter of this procedure be the \idez{expected symbol},
that is
the leftmost unchecked terminal symbol of the sentential form, and let
the
\textit{actual symbol} be the symbol which is analysed in that moment.
\index{actual!symbol}\index{symbol!actual}%
\begin{verbatim}
procedure Check(a);
begin
if actual_symbol = a
then Next_symbol
else Error_report
end;
\end{verbatim}
The procedure \textit{Next\_symbol} reads the next symbol, it is a call
for
the lexical analyser. This procedure determines the next symbol and
put this
symbol into the \textit{actual\_symbol} variable.
The procedure \textit{Error\_report} creates an error report and then
finishes the parsing.
\index{error!report}%
We create procedures to symbols of the grammar as follows.
The procedure of the nonterminal symbol $A$ is the next.
\begin{verbatim}
procedure A;
begin
T(A)
end;
\end{verbatim}
\noindent where \verb+T(A)+ is determined by symbols of the right
part of
derivation rule having symbol $A$ in its left part.
The grammars which are used for syntactic analysis are
\textit{reduced grammars}.
\index{grammar!reduced}%
\index{reduced grammar}%
It means that no unnecessary symbols in the grammar, and
all of symbols occur at the left side at least one reduction rule.
Therefore, if we consider the symbol $A$, there is at least one
$A \rightarrow \alpha$ production rule.
\begin{enumerate}
\item If there is only one production rule for the symbol $A$,
\begin{enumerate}
\item let the program of the rule $A \rightarrow a$ is as follows:
\verb+ Check(a)+,
\item for the rule $A \rightarrow B$ we give the procedure call
\verb+ B +,
\item for the rule $A \rightarrow X_{1}X_{2} \ldots X_{n}$\ $( n \geq
2 )$
we give the next block:
\begin{verbatim}
begin
T(X_1);
T(X_2);
...
T(X_n)
end;
\end{verbatim}
\vspace{-4mm}
\end{enumerate}
\vspace{-4mm}
\item If there are more rules for the symbol $A$:
\begin{enumerate}
\item If the rules $A \rightarrow \alpha_{1} \mid \alpha_{2} \mid
\ldots
\mid \alpha_{n}$ are $\varepsilon$-free, that is from
$\alpha_i$ ($1 \le i \le n$) it is not possible to deduce $\varepsilon$,
then
\verb+T(A)+
\begin{verbatim}
case actual_symbol of
First(alpha_1) : T(alpha_1);
First(alpha_2) : T(alpha_2);
...
First(alpha_n) : T(alpha_n)
end;
\end{verbatim}
\noindent where \texttt{First(alpha\_i)} is the sign of the set
${\textit{First}}(\alpha_i)$.
We note that this is the first point of the method of recursive-descent
parser
where we use the fact that the grammar is an
\textit{LL}(1) grammar.
\item We use the ${\textit{LL}}(1)$ grammar to write a programming
language,
therefore it is not comfortable to require that the grammar is a
$\varepsilon$-free grammar.
For the rules $A \rightarrow \alpha_{1} \mid \alpha_{2} \mid \ldots
\mid \alpha_{n-1}
\mid
\varepsilon$ we create the next \verb+T(A)+
program:
\begin{verbatim}
case actual_symbol of
First(alpha_1) : T(alpha_1);
First(alpha_2) : T(alpha_2);
...
First(alpha_(n-1)) : T(alpha_(n-1));
Follow(A) : skip
end;
\end{verbatim}
\noindent where \texttt{Follow(A)} is the set ${\textit{Follow}}(A)$.
In particular, if the rules $A \rightarrow \alpha_1 \mid
\alpha_2 \mid \ldots \mid \alpha_n$ for some $i$ ($ 1 \le
i \le n )$ \
$\alpha_i \stackrel{*}\Longrightarrow \varepsilon$, that is
$\varepsilon \in {\textit{First}}(\alpha_i)$,
then the $i$-th row of the \verb+case+ statement is \\
\verb+Follow(A) : skip +
\end{enumerate}
\end{enumerate}
In the program \verb+T(A)+, if it is possible, we use
\texttt{if-then-else} or \texttt{while} statement instead of the
statement \texttt{case}.
The start procedure, that is the main program of this
parsing program, is the procedure which is created for the start symbol
of the grammar.
We can create the recursive-descent parsing program with
the next algorithm. The input of this algorithm is the grammar $G$,
and the result is parsing program $P$.
In this algorithm we use a \textsc{Write}-\textsc{Program}
\inddef{\textsc{Write-Program}}%
procedure, which concatenates the new program lines to
the program $P$. We will not go into the details of this algorithm.
\begin{algN}{Create-Rec-Desc$(G)$}\inddef{\textsc{Create-Rec-Desc}}
1 \' $P \leftarrow \emptyset$ \\
2 \' \textsc{Write-Program}( \\
3 \' \hspace*{8mm}\= \texttt{procedure Check(a);} \\
4 \' \> \texttt{begin} \\
5 \' \> \texttt{if} \= \verb+actual_symbol = a + \\
6 \' \> \> \verb+then Next_symbol +\\
7 \' \> \> \texttt{else Error\_report} \\
8 \' \> \texttt{end;} \\
9 \' \> ) \\
10 \' \key{for} \= all symbol $A \in N$ of the grammar $G$ \\
11 \' \> \key{do} \= \key{if} \= $A = S$ \\
12 \' \> \> \> \key{then} \=\textsc{Write-Program}( \\
13 \' \> \> \> \> \hspace*{8mm}\=\texttt{program S;}\\
14 \' \> \> \> \> \> \texttt{begin} \\
15 \' \> \> \> \> \> \textsc{\ \ Rec-Desc-Stat}$(S,P)$ \\
16 \' \> \> \> \> \> \texttt{end;} \\
17 \' \> \> \> \> \> ) \\
18 \' \> \> \> \key{else} \=\textsc{Write-Program}( \\
19 \' \> \> \> \> \hspace*{8mm}\=\texttt{procedure A;}\\
20 \' \> \> \> \> \> \texttt{begin} \\
21 \' \> \> \> \> \> \textsc{\ \ Rec-Desc-Stat}$(A,P)$ \\
22 \' \> \> \> \> \> \texttt{end;} \\
23 \' \> \> \> \> \> ) \\
24 \' \key{return} $P$
\end{algN}
The algorithm creates the \textit{Check} procedure in lines 2--9/
Then, for all nonterminals of grammar $G$, it
determines their procedures using the algorithm
\textsc{Rec-Desc-Stat}.
In the lines 11--17, we can see that for the start symbol $S$
we create the main program.
The output of the algorithm is the parsing program.
\begin{alg}{Rec-Desc-Stat$(A,P)$}\inddef{\textsc{Rec-Desc-Stat}}
1 \' \key{if} \= there is only one rule $A \rightarrow \alpha$ \\
2 \' \> \key{then} \= \textsc{Rec-Desc-Stat1}$(\alpha,P)$
\` $\rhd$ $A \rightarrow \alpha .$ \\
3 \' \> \key{else} \> \textsc{Rec-Desc-Stat2}$(A,
(\alpha_1,\dots,\alpha_n),P)$
\` $\rhd$ $A \rightarrow \alpha_1\mid\dots\mid\alpha_n .$ \\
4 \' \key{return} $P$
\end{alg}
The form of the statements of the parsing program depends on the
derivation rules of the symbol $A$. Therefore the algorithm \textsc{Rec-Desc-Stat}
divides the next tasks into two parts.
The algorithm \textsc{Rec-Desc-Stat1} deals with the case when there
is only one derivation rule, and the algorithm
\textsc{Rec-Desc-Stat2} creates the program for the alternatives.
\begin{algN}{Rec-Desc-Stat1$(\alpha,P)$}\inddef{\textsc{Rec-Desc-Stat1}}
1 \' \key{if} \= $\alpha = a$ \\
2 \' \> \key{then} \=\textsc{Write-Program}( \\
3 \' \> \> \hspace*{8mm}\=\texttt{Check(a)} \\
4 \' \> \> \> ) \\
5 \' \key{if} \= $\alpha = B$ \\
6 \' \> \key{then} \=\textsc{Write-Program}( \\
7 \' \> \> \hspace*{8mm}\=\texttt{B} \\
8 \' \> \> \> ) \\
9 \' \key{if} \= $\alpha = X_1 X_2 \dots X_n$\ $(n \geq 2)$ \\
10 \' \> \key{then} \= \textsc{Write-Program}( \\
11 \' \> \> \hspace*{8mm}\=\texttt{begin} \\
12 \' \> \> \> \ \ \textsc{Rec-Desc-Stat1}$(X_1,P)$ \verb+;+ \\
13 \' \> \> \> \ \ \textsc{Rec-Desc-Stat1}$(X_2,P)$ \verb+;+ \\
14 \' \> \> \> \ \ $\dots$ \\
15 \' \> \> \> \ \ \textsc{Rec-Desc-Stat1}$(X_n,P)$ \\
16 \' \> \> \> \verb+end;+ \\
17 \' \key{return} $P$
\end{algN}
\begin{algN}{Rec-Desc-Stat2$(A,(\alpha_1,\dots,\alpha_n),P)$}
\inddef{\textsc{Rec-Desc-Stat2}}
1 \' \key{if} \= the rules $\alpha_1,\dots,\alpha_n$ are $\varepsilon$-
free \\
2 \' \> \key{then} \=\textsc{Write-Program}( \\
3 \' \> \> \> \hspace*{8mm}\=\verb+case actual_symbol of+ \\
4 \' \> \> \> \> \ \ \verb+First(alpha_1) : +\textsc{Rec-Desc-Stat1}
$(\alpha_1,P)$ \verb+;+ \\
5 \' \> \> \> \> \ \ \verb+...+ \\
6 \' \> \> \> \> \ \ \verb+First(alpha_n) : +\textsc{Rec-Desc-Stat1}
$(\alpha_n,P)$ \\
7 \' \> \> \> \> \verb+end;+ \\
8 \' \> \> \> \> ) \\
9 \' \key{if} there is a $\varepsilon$-rule, $\alpha_i = \varepsilon$
$(1 \leq i \leq n)$\\
10 \' \> \key{then} \=\textsc{Write-Program}( \\
11 \' \> \> \> \hspace*{8mm}\= \verb+case actual_symbol of+ \\
12 \' \> \> \> \> \ \ \verb+First(alpha_1) : +\textsc{Rec-Desc-Stat1}
$(\alpha_1,P)$ \verb+;+ \\
13 \' \> \> \> \> \ \ \verb+...+ \\
14 \' \> \> \> \> \ \ \verb+First(alpha_(i-1)) : +\textsc{Rec-Desc-Stat1}
$(\alpha_{i-1},P)$ \verb+;+ \\
15 \' \> \> \> \> \ \ \verb+Follow(A) : skip;+ \\
16 \' \> \> \> \> \ \ \verb2First(alpha_(i+1)) : 2\textsc{Rec-Desc-Stat1}
$(\alpha_{i+1},P)$ \verb+;+ \\
17 \' \> \> \> \> \ \ \verb+...+ \\
18 \' \> \> \> \> \ \ \verb+First(alpha_n) : +\textsc{Rec-Desc-Stat1}
$(\alpha_1,P)$ \\
19 \' \> \> \> \> \verb+end;+ \\
20 \' \> \> \> \> ) \\
21 \' \key{return} $P$
\end{algN}
These two algorithms create the program described above.
Checking the end of the parsed text is achieved by the recursive-
descent
parsing method with the next modification. We generate a
new derivation rule for the end mark \#. If the start symbol of the
grammar is $S$,
then we create the new rule
$S' \rightarrow S\#$, where the new symbol $S'$ is the start symbol of
our new grammar. The mark \# is considered as terminal symbol.
Then we generate the parsing program for this new grammar.
\begin{pelda}
We augment the grammar of the Example \ref{pld-ll1} in the above manner.
The production rules are as follows.
$S' \rightarrow E\#$
$E \rightarrow TE'$
$E' \rightarrow +TE' \mid \varepsilon$
$T \rightarrow FT'$
$T' \rightarrow *FT' \mid \varepsilon$
$F \rightarrow (E) \mid i $
In the example \ref{pld-ll1} we give the necessary
\textit{First} and \textit{Follow} sets. We use the next sets:
${\textit{First}}(+TE') = \{ + \}$,
${\textit{First}}(*FT') = \{ * \}$,
${\textit{First}}((E)) = \{ ( \}$,
${\textit{First}}(i) = \{ i \}$,
${\textit{Follow}}(E') = \{ ) , \# \}$,
${\textit{Follow}}(T') = \{ +, ) , \# \} . $
In the comments of the program lines we give the using of these
sets. The first characters of the comment are the character
pair \verb+--+.
The program of the recursive-descent parser is the following.
\begin{verbatim}
program S';
begin
E;
Check(#)
end.
procedure E;
begin
T;
E'
end;
procedure E';
begin
case actual_symbol of
+ : begin -- First(+TE')
Check(+);
T;
E'
end;
),# : skip -- Follow(E')
end
end;
procedure T;
begin
F;
T'
end;
procedure T';
begin
case actual_symbol of
* : begin -- First(*FT')
Check(*);
F;
T'
end;
+,),# : skip -- Follow(T')
end
end;
procedure F;
begin
case actual_symbol of
( : begin -- First((E))
Check(();
E;
Check())
end;
i : Check(i) -- First(i)
end
end;
\end{verbatim}
We can see that the main program of this parser belongs to the symbol
$S'$.
\end{pelda}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection[\textit{LR}$(1)$ parsing]%
{\textit{LR}{\boldmath{$(1)$}} parsing}
\label{subsec-alulrol-fel}
If we analyse from bottom to up, then we start with the program text.
We search the handle of the sentential form, and we substitute the
nonterminal symbol that belongs to the handle, for this
handle. After this first step, we repeat this procedure several times.
Our goal is to achieve the start symbol of the grammar. This
symbol will be the root of the syntax tree, and by this time
the terminal symbols of the program text are the leaves of the tree.
First we review the notions which are necessary in the parsing.
To analyse bottom-up, we have to determine the handle of the
sentential form. The problem is to create a good method which
finds the handle, and to find the best substitution if there are more
than one possibilities.
\begin{defi}
\label{def-legjobb-hely}%
\index{rightmost!substitution}%
\index{substitution!rightmost}%
If \ $A \rightarrow \alpha \in P$, then the \ki{rightmost substitution}
of the sentential form \ $\beta Ax$\
($x \in T^*,\ \alpha, \beta \in (N \cup T)^*$)
is\ $\beta\alpha x$,
that is \
$$\beta Ax \underset{rightmost}{\Longrightarrow} \beta\alpha x
\koz .$$
\end{defi}
\begin{defi}
\label{def-legjobb-lev}%
\index{rightmost!derivation}%
\index{derivation!rightmost}%
If the derivation \ $S \stackrel{*}\Longrightarrow x$\
($x \in T^*$)
all of the substitutions were rightmost substitution, then
this derivation is a \ki{rightmost derivation,}
$$S \underset{rightmost}{\overset{*}{\Longrightarrow}} x \koz .$$
\end{defi}
In a rightmost derivation, terminal symbols are at the right side
of the sentential form. By the connection of the notion
of the handle and the
rightmost derivation, if we apply the steps of a rightmost derivation
backwards, then we obtain the steps of a bottom-up parsing.
Hence the bottom-up parsing is equivalent with the \idez{inverse}
of a rightmost derivation. Therefore, if we deal with
bottom-up methods, we will not write the text "rightmost" at the
arrows.
General bottom-up parsing methods are realized by using backtrack
algorithms. They are similar to the top-down parsing methods. But
the backtrack steps make the parser very slow.
Therefore we only deal with grammars such that have parsing methods
without backtracks.
Hence forward we produce a very efficient algorithm for a large class
of context-free grammars. This class contains the grammars for the
programming languages.
The parsing is called \textit{LR}$(k)$\ parsing;
the grammar is called \textit{LR}$(k)$\
\index{LR(k)!parsing}%
\index{parser!LR(k)}%
\index{LR(k)!grammar}%
\index{grammar!LR(k)}%
grammar. \textit{LR} means the "Left to Right" method,
and $k$ means that if we look ahead $k$ symbols then we can
determine
the
handles of the sentential forms.
The \textit{LR}$(k)$\ parsing method is a shift-reduce method.
We deal with {\textit{LR$(1)$}} parsing only, since
for all \textit{LR}$(k)$\ $( k > 1 )$
grammar there is an equivalent \textit{LR$(1)$} grammar.
This fact is very important for us since, using this type of grammars,
it is enough to look ahead one symbol in all cases.
Creating \textit{LR}$(k)$ parsers is not an easy task.
However, there are such standard programs (for example the
\texttt{yacc} in UNIX systems),
\index{yacc}%
that create the complete parsing program from the derivation rules
of
a grammar. Using these programs the task of writing parsers is not too
hard.
After studying the \textit{LR}$(k)$\ grammars we will deal
with the
\textit{LALR}$(1)$\ parsing method. This method is used in the
compilers of modern programming languages.
%%%
\subsubsection[\textit{LR}$(k)$ grammars.]
{\textit{LR}{\boldmath{$(k)$}} grammars}
As we did previously, we write a mark \# to the right end
of the text to be analysed. We introduce a new
nonterminal symbol
$S'$ and a new rule $S' \rightarrow S$ into the grammar.
\begin{defi}{}
\label{def-kieg-gr}%
Let $G'$ be the \textbf{augmented grammar} belongs to grammar $G
= (N,T,P,S)$, where
$G'$ \textbf{augmented grammar}
\index{augmented grammar}%
\index{grammar!augmented}%
$$G' = ( N \cup \{ S' \}, T, P \cup \{ S' \rightarrow S \}, S' ) \koz .$$
\end{defi}
Assign serial numbers to the derivation rules of grammar, and let $S'
\rightarrow S$ be the 0th rule. Using this numbering, if we apply the
0th rule,
it means that the parsing process is concluded and the
text is correct.
We notice that if the original start symbol $S$ does not happen on
the right side of any rules, then there is no need for this augmentation.
However, for the sake of generality, we deal with augmented
grammars only.
\begin{defi}{}
The augmented grammar $G'$ is an \textbf{\textit{LR}{\boldmath{$(k)$}}\ grammar}
$( k \ge 0 )$,
\index{LR(k)!grammar}%
\index{grammar!LR(k)}%
if for derivations
$$\begin{array}{c}S' \stackrel{*}\Longrightarrow \alpha A w
\Longrightarrow \alpha \beta w \koz ,\\
S' \stackrel{*}\Longrightarrow \gamma B x \Longrightarrow \gamma
\delta x
= \alpha \beta y \koz
\end{array}$$
($A,B \in N,\ x,y,w \in T^*,\ \alpha,\beta,\gamma,\delta \in (N \cup T)
^*$)
the equality
$${\textit{First}}_{k}(w) = {\textit{First}}_{k}(y)$$
implies
$$\alpha = \gamma, A = B\ \textit{és}\ x = y \koz.$$
\end{defi}
The feature of \textit{LR}$(k)$\ grammars is that, in the sentential
form
$\alpha \beta w$, looking ahead $k$ symbol from $w$
unambiguously decides if
$\beta$ is or is not the handle. If the handle is $beta$, then we have
to
reduce the form using the rule
$A \rightarrow \beta$, that results the new sentential form
is
$\alpha Aw$. Its reason is the following:
suppose that, for sentential forms
$\alpha \beta w$ and
$\alpha \beta y$, (their prefixes
$\alpha \beta$ are same),
${\textit{First}}_{k}(w) = {\textit{First}}_{k}(y)$,
and we can reduce $\alpha \beta w$ to
$\alpha Aw$ and $\alpha \beta y$
to $\gamma Bx$. In this case, since the grammar is a \textit{LR}$(k)$\
grammar, $\alpha =\gamma$ and $A = B$ hold.
Therefore in this case either the handle is $\beta$ or $\beta$ never is
the handle.
\begin{figure}[t!]
\begin{picture}(143,115)(55,0)
\put(160,93){\fbox{\makebox(10,6){$S'$}}}
\thicklines
\put(122,76){\line(6,2){37}}
\put(215,76){\line(-6,2){37}}
\put(120,65){\fbox{\makebox(31,6){$\alpha$}}}
\put(117,46){\line(1,4){3.5}}
\put(160,65){\fbox{\makebox(10,6){$A$}}}
\put(157,46){\line(1,4){3.5}}
\put(181,46){\line(-1,4){3.5}}
\put(179,65){\fbox{\makebox(33,6){$w$}}}
\put(223,46){\line(-1,4){3.5}}
\put(115,35){\fbox{\makebox(31,6){$\alpha$}}}
\put(155,35){\fbox{\makebox(20,6){$\beta$}}}
\put(184,35){\fbox{\makebox(33,6){$w$}}}
\put(184,30){\line(0,-1){8}}
\put(195,20){\makebox(0,0){$\longmapsto^{\mbox{\textit{k}}}$}}
\end{picture}
%
\begin{picture}(143,115)(55,0)
\put(160,93){\fbox{\makebox(10,6){$S'$}}}
\thicklines
\put(122,76){\line(6,2){37}}
\put(215,76){\line(-6,2){37}}
\put(120,65){\fbox{\makebox(22,6){$\gamma$}}}
\put(117,46){\line(1,4){3.5}}
\put(151,65){\fbox{\makebox(10,6){$B$}}}
\put(147,46){\line(1,4){3.5}}
\put(172,46){\line(-1,4){3.5}}
\put(170,65){\fbox{\makebox(42,6){$x$}}}
\put(223,46){\line(-1,4){3.5}}
\put(115,35){\fbox{\makebox(22,6){$\gamma$}}}
\put(146,35){\fbox{\makebox(20,6){$\delta$}}}
\put(175,35){\fbox{\makebox(42,6){$x$}}}
\put(115,15){\fbox{\makebox(31,6){$\alpha$}}}
\put(155,15){\fbox{\makebox(20,6){$\beta$}}}
\put(184,15){\fbox{\makebox(33,6){$y$}}}
\put(184,10){\line(0,-1){8}}
\put(195,0){\makebox(0,0){$\longmapsto^{\mbox{\textit{k}}}$}}
\end{picture}
%
\caption{The ${\textit{LR}}(k)$ grammar.}
\label{fig-lr}
\end{figure}
\begin{pelda}{}
Let $G' = ( \{ S',S \}, \{ a \}, P', S' )$ be a grammar and let
the derivation rules be as follows.
$S' \rightarrow\ S$
$S \rightarrow\ Sa \mid a$
\noindent This grammar is not an \textit{LR$(0)$}\ grammar, since
using notations of the definition, in the derivations
\begin{center}
\begin{tabbing}
xx \= xxxx \= xx \= xx \= xx \= xxxx \= xx \= xxx \= xx \= xxxx \= xx \=
xx \= xx \kill\\
$S'$ \> $\stackrel{*}\Longrightarrow$ \> $\varepsilon$ \> $S'$ \>
$\varepsilon$ \> $\Longrightarrow$
\> $\varepsilon$ \> $S$ \> $\varepsilon$, \\
\> \> $\alpha$ \> $A$ \> $w$ \> \> $\alpha$ \> $\beta$ \> $w$
\\[1mm]
$S'$ \> $\stackrel{*}\Longrightarrow$ \> $\varepsilon$ \> $S'$ \>
$\varepsilon$ \> $\Longrightarrow$
\> $\varepsilon$ \> $Sa$ \> $\varepsilon$ \> = \> $\varepsilon$ \>
$S$ \> $a$, \\
\> \> $\gamma$ \> $B$ \> $x$ \> \> $\gamma$ \> $\delta$ \> $x$
\> \> $\alpha$ \> $\beta$ \> $y$
\end{tabbing}
\end{center}
\noindent it holds that
${\textit{First}}_{0}(\varepsilon ) = {\textit{First}}_{0}(a) =
\varepsilon,$
and $\gamma Bx \neq \alpha Ay$.
\end{pelda}
\begin{pelda}{}
The next grammar is a \textit{LR}$(1)$ grammar.
$G = ( \{ S',S \}, \{ a,b \}, P', S' )$,
the derivation rules are:
$S' \rightarrow\ S$
$S \rightarrow\ SaSb \mid \varepsilon$ .
\end{pelda}
In the next example we show that there is a context-free grammar,
such that is not \textit{LR}$(k)$\ grammar for any $k$
$( k \ge 0 )$.
\begin{pelda}{}
Let
$G' = ( \{ S',S \}, \{ a \}, P', S' ) $ be a grammar and let the
derivation rules be
$S' \rightarrow\ S$
$S \rightarrow\ aSa \mid a$
\noindent Now for all $k$\ $( k \ge 0 )$
$$
S' \stackrel{*}\Longrightarrow a^{k}Sa^{k}
\Longrightarrow a^{k}aa^{k} = a^{2k+1} \koz ,$$
$$S' \stackrel{*}\Longrightarrow a^{k+1}Sa^{k+1}
\Longrightarrow a^{k+1}aa^{k+1} = a^{2k+3} \koz ,$$
and
$${\textit{First}}_{k}(a^{k})~={\textit{First}}_{k}(aa^{k+1}) = a^
{k} \koz ,$$
but
$$ a^{k+1}Sa^{k+1}~\neq ~a^{k}Sa^{k+2} \koz .$$
\end{pelda}
It is not sure that, for a \textit{LL}$(k)$ $(k > 1)$ grammar, we can
find an equivalent \textit{LL}(1) grammar. However,
\textit{LR}$(k)$ grammars have this nice property.
\begin{tetel}
For all \textit{LR}$(k)$ $(k > 1)$ grammar there is an equivalent
\textit{LR}(1) grammar.
\index{LR(1)!grammar}%
\index{grammar!LR(1)}%
\end{tetel}
The great significance of this theorem is that it makes sufficient to
study
the \textit{LR}$(1)$ grammars instead of
\textit{LR}$(k)$ $(k > 1)$ grammars.
%%%
\subsubsection[\textit{LR}$(1)$ canonical sets.]%
{\textit{LR}{\boldmath{$(1)$}} canonical sets}
Now we define a very important notion of the
\textit{LR} parsings.
\begin{defi}
\label{def-jarhato}%
If $\beta$ is the handle of the $\alpha \beta x$ \
($\alpha,\beta \in (N \cup T)^*, x \in
T^*$) sentential form, then the prefixes of
$\alpha \beta$ are the
\ki{viable prefixes} of
$\alpha \beta x$.
\index{viable prefix}%
\end{defi}
\begin{pelda}{}
\label{pelda-lr0gr1}
Let
$G' = ( \{ E,T,S' \}, \{ i,+,(,) \}, P', S' )$ be a grammar and the
derivation rule as follows.
$(0) \ S' \rightarrow\ E$
$(1) \ E \rightarrow\ T$
$(2) \ E \rightarrow\ E + T $
$(3) \ T \rightarrow\ i$
$(4) \ T \rightarrow\ ( E )$
$E+(i+i)$ is a sentential form, and the first $i$
is the handle. The viable prefixes of this sentential form are
$E,\ E+,\ E+(,\ E+(i.$
\end{pelda}
By the above definition, symbols after the handle are not parts of
any viable prefix. Hence the task of finding the handle is the
task of finding the longest viable prefix.
For a given grammar, the set of viable prefixes is determined, but it is
obvious that the size of this set is not always finite.
The significance of viable prefixes are the following.
We can assign states of a deterministic finite automaton to viable
prefixes,
and we can assign state transitions to the symbols of the grammar.
From the initial state we go to a state along the symbols of a viable
prefix.
Using this property, we will give a method to create an automaton that
executes the task of parsing.
\begin{defi}
If $A \rightarrow \alpha \beta$ is a rule of a $G'$ grammar,
then let
\index{LR(1)!item}%
$$\left[ A \rightarrow \alpha .\beta , a \right],
\ \ ( a \in T \cup \{ \# \} ) \koz ,$$
\noindent be a
\ki{LR{\boldmath{$(1)$}}-item,} where
$A \rightarrow \alpha .\beta$ is the \ki{core} of the \textit{LR}$(1)$-item,
\index{LR(1)!item!core}%
and $a$ is the \ki{lookahead symbol} of the \textit{LR}$(1)$-item.
\index{LR(1)!item!lookahead symbol}%
\end{defi}
\begin{figure}[t!]%
\begin{picture}(143,55)(0,60)
\put(160,93){\fbox{\makebox(10,6){$A$}}}
\thicklines
\put(122,76){\line(6,2){37}}
\put(215,76){\line(-6,2){37}}
\put(120,65){\fbox{\makebox(31,6){$\alpha$}}}
\put(162,65){\mbox{.}}
\put(167,65){\fbox{\makebox(43,6){$\beta$}}}
\put(220,65){\fbox{\makebox(10,6){$a$}}}
\end{picture}
\caption{The $\left[ A \rightarrow \alpha . \beta, a \right]$ {\textit
{LR$(1)$}}
-item.}
\label{fig-lr1elem}
\end{figure}
The lookahead symbol is instrumental in reduction, i.e. it has form
$\left[ A \rightarrow \alpha ., a \right]$. It means that we can execute
reduction only if the symbol $a$ follows the handle $alpha$.
\begin{defi}
The \textit{LR}$(1)$-item
$\left[ A \rightarrow \alpha .\beta , a \right]$
is \ki{valid}
\index{valid!LR(1)-item}%
\index{LR(1)!item!valid}%
for the viable prefix $\gamma \alpha$ if
$$\noindent S' \stackrel{*}\Longrightarrow \gamma Ax
\Longrightarrow \gamma \alpha \beta x \
(\gamma \in(N \cup T)^*,\ x \in T^*) \koz ,$$
\noindent and {a} is the first symbol of {x} or if
$x = \varepsilon$ then
$a = \#$.
\end{defi}
\begin{pelda}{}
\label{pld-lr1gr2}%
Let
$G' = ( \{ S',S,A \}, \{ a,b \}, P', S' )$ a grammar and
the derivation rules as follows.
$(0) \ S' \rightarrow S$
$(1) \ S \rightarrow AA$
$(2) \ A \rightarrow aA$
$(3) \ A \rightarrow b$
Using these rules, we can derive
$S' \stackrel{*}\Longrightarrow aaAab \Longrightarrow aaaAab$.
Here $aaa$ is a viable prefix,
and $\left[ A \rightarrow a.A, a \right] $
is valid for this viable prefix.
Similarly,
$S' \stackrel{*}\Longrightarrow AaA \Longrightarrow AaaA$,
and {\textit{LR$(1)$}}-item
$\left[ A \rightarrow a.A, \# \right]$ is valid for viable prefix
$Aaa$.
\end{pelda}
Creating a
\textit{LR}$(1)$ parser, we construct
the canonical sets of
\textit{LR}$(1)$-items. To achieve this we have to define the
\textit{closure}
\index{closure}%
and \textit{read}
\index{read}%
functions.
\begin{defi}{}
Let the set ${\cal{H}}$ be a set of
\textit{LR$(1)$}-items for a given grammar. The set
\ki{closure({\boldmath{$\cal{H}$}})} consists of the next
\textit{LR$(1)$}-items:
\begin{enumerate}
\item every element of the set ${\cal{H}}$ is an element of the set
$\textit{closure}({\cal{H}})$,
\item if $\left[ A \rightarrow \alpha .B\beta , a \right] \in
\textit{closure}({\cal{H}})$, and $B \rightarrow \gamma$
is a derivation rule of the grammar, then
$\left[ B \rightarrow .\gamma , b \right] \in
\textit{closure}({\cal{H}})$ for all
$b \in {\textit{First}}(\beta a)$,
\item the set $\textit{closure}({\cal{H}})$ is needed to expand using
the step 2
until no more items can be added to it.
\end{enumerate}
\end{defi}
By definitions, if the \textit{LR$(1)$}-item
$\left[ A \rightarrow \alpha .B\beta , a \right]$
is valid for the viable prefix
$\delta \alpha$, then the \textit{LR$(1)$}-item
$\left[ B \rightarrow .\gamma , b \right]$
is valid for the same viable prefix in the case of
$b \in {\textit{First}}(\beta a)$.
(Figure \ref{fig-lr1clos}).
It is obvious that the function
\textit{closure} creates all of \textit{LR$(1)$}-items which are valid for
viable prefix $\delta\alpha$.
\begin{figure}[t!]
\begin{picture}(260,115)(5,-5)
\put(160,93){\fbox{\makebox(10,6){$S'$}}}
\put(122,76){\line(6,2){37}}
\put(215,76){\line(-6,2){37}}
\put(120,65){\fbox{\makebox(31,6){$\delta$}}}
\put(160,65){\fbox{\makebox(10,6){$A$}}}
\put(179,65){\fbox{\makebox(10,6){$a$}}}
\put(198,65){\fbox{\makebox(14,6){$x$}}}
\put(106,46){\line(2,2){14}}
\put(251,46){\line(-5,2){33}}
\put(210,46){\line(-5,2){33}}
\put(146,46){\line(2,2){14}}
\put(105,35){\fbox{\makebox(31,6){$\delta$}}}
\put(145,35){\fbox{\makebox(15,6){$\alpha$}}}
\put(169,35){\mbox{.}}
\put(173,35){\fbox{\makebox(10,6){$B$}}}
\put(192,35){\fbox{\makebox(12,6){$\beta$}}}
\put(213,35){\fbox{\makebox(10,6){$a$}}}
\put(232,35){\fbox{\makebox(14,6){$x$}}}
\put(174,16){\line(0,1){12}}
\put(198,16){\line(-2,4){6}}
\put(105,5){\fbox{\makebox(31,6){$\delta$}}}
\put(145,5){\fbox{\makebox(15,6){$\alpha$}}}
\put(169,5){\mbox{.}}
\put(173,5){\fbox{\makebox(20,6){$\gamma$}}}
\put(202,5){\fbox{\makebox(29,6){$\beta a$}}}
\put(240,5){\fbox{\makebox(14,6){$x$}}}
%
\end{picture}
\caption{The function
$\textit{closure}(\left[ A \rightarrow \alpha . B\beta, a \right]$).}
\label{fig-lr1clos}
\end{figure}
We can define the function
$\textit{closure}({\cal{H}})$, i.e. the closure of set
${\cal{H}}$ by the following algorithm. The result of this algorithm is
the set ${\cal{K}}$.
\begin{alg}{Closure-Set-of-Items$({\cal{H}})$} \inddef{Closure-Set-of-Items@\textsc{Closure-Set-of-Items}}
1 \' ${\cal{K}} \leftarrow \emptyset$ \\
2 \' \key{for} \= all $E \in {\cal{H}}$ LR(1)-item\\
3 \' \> \key{do} ${\cal{K}} \leftarrow {\cal{K}}\ \cup\ $ \textsc{Closure-Item}$(E)$ \\
4 \' \key{return} ${\cal{K}}$
\end{alg}
\begin{algN}{Closure-Item$(E)$}\inddef{Closure-Item@\textsc{Closure-Item}}
1 \' ${\cal{K}}_E \leftarrow \{E\}$ \\
2 \' \key{if} \= the LR(1)-item $E$ has form $[A\rightarrow\alpha.B\beta, a]$ \\
3 \' \> \key{then} \= $I\leftarrow \emptyset$ \\
4 \' \> \> $J\leftarrow {\cal{K}}_E$
\algnewpage
5 \' \> \> \key{repeat} \= \\
6 \' \> \> \> \key{for} \= for all LR(1)-items $\in J$ which have form
$[C\rightarrow\gamma. D\delta, b]$ \\
7 \' \> \> \> \> \key{do} \key{for} \= for all rules $D\rightarrow \eta
\in P$ \\
8 \' \> \> \> \> \> \key{do} \key{for} \= for all symbols
$c \in$ \textsc{First}$(\delta b)$ \\
9 \' \> \> \> \> \> \> \key{do} $I \leftarrow I \cup\ [D\rightarrow .\eta, c]$ \\
10 \' \> \> \> $J \leftarrow I$\\
11 \' \> \> \> \key{if} \= $I \neq \emptyset$ \\
12 \' \> \> \> \> \key{then} \= ${\cal{K}}_E
\leftarrow {\cal{K}}_E \cup I$ \\
13 \' \> \> \> \> \> $I \leftarrow \emptyset$ \\
14 \' \> \> \key{until} $J \neq \emptyset$\\
15 \' \key{return} ${\cal{K}}_E$
\end{algN}
The algorithm \textsc{Closure-Item} creates ${\cal{K}}_E$,
the closure of item $E$.
If, in the argument $E$, the "point" is followed by a terminal symbol,
then the result
is this item only (line 1). If in $E$ the "point" is followed by a
nonterminal symbol $B$, then
we can create new items from every rule having the symbol $B$ at
their left side (line 9). We have to check this condition for all new items, too, the
\key{repeat} cycle is in line 5--14. These steps are executed until no more items can be added
(line 14). The set $J$ contains the items to be checked, the set $I$ contains the
new items. We can find the operation $J \leftarrow I$ in line 10.
\begin{defi}
Let $\cal{H}$ be a set of \textit{LR$(1)$}-items for the grammar $G$.
Then the set \ki{read({\boldmath{${\cal{H}},X$}})}\ ($X \in (N \cup T)$)
consists of the following \textit{LR$(1)$}-items.
\begin{enumerate}
\item if $\left[ A \rightarrow \alpha .X\beta , a \right] \in
{\cal{H}}$, then all items of the set
$\textit{closure}( \left[ A \rightarrow \alpha X .\beta, a \right] )$
are in $\textit{read}({\cal{H}},X)$,
\item the set $\textit{read}({\cal{H}},X)$ is extended using step 1
until no more items can be added to it.
\end{enumerate}
\end{defi}
The function $\textit{read}({\cal{H}},X)$ "reads symbol $X$" in items
of
${\cal{H}}$, and after this operation the sign "point" in the items
gets to the right side of $X$.
If the set ${\cal{H}}$ contains the valid \textit{LR$(1)$}-items for
the viable prefix $\gamma$ then the set
$\textit{read}({\cal{H}},X)$ contains the
valid \textit{LR$(1)$}-items for the
viable prefix $\gamma X$.
The algorithm \textsc{Read-set-of-items} executes the function
\textit{read}. The result is the set
${\cal{K}}$.
\begin{alg}{Read-Set$({\cal{H}},Y)$}\inddef{\textsc{Read-Set}}
1 \' ${\cal{K}} \leftarrow \emptyset$ \\
2 \' \key{for} \= all $E \in H$ \\
3 \' \> \key{do} ${\cal{K}} \leftarrow {\cal{K}}\ \cup\ $
\textsc{Read-item}$(E,Y)$ \\
4 \' \key{return} ${\cal{K}}$
\end{alg}
\begin{alg}{Read-Item$(E,Y)$}\inddef{\textsc{Read-Item}}
1 \' \key{if} \= $E = [A\rightarrow\alpha. X\beta, a]$ and $X = Y$ \\
2 \' \> \key{then} \= ${\cal{K}}_{E,Y} \leftarrow$
\textsc{Closure-Item}$([A\rightarrow\alpha X. \beta, a])$\\
3 \' \> \key{else} \> ${\cal{K}}_{E,Y} \leftarrow \emptyset$ \\
4 \' \key{return} ${\cal{K}}_{E,Y}$
\end{alg}
Using these algorithms we can create all of items which
writes the state after reading of symbol $Y$.
Now we introduce the following notation for \textit{LR$(1)$}-items,
to give shorter descriptions. Let
$$\left[ A \rightarrow \alpha .X\beta , a/b \right]$$
\noindent be a notation for items
$$\left[ A \rightarrow \alpha .X\beta , a \right]\ \textnormal{and}\
\left[ A \rightarrow \alpha .X\beta , b \right].$$
\begin{pelda}{}
The \textit{LR$(1)$}-item
$\left[ S' \rightarrow .S, \# \right]$ is an item of the grammar
in the example \ref{pld-lr1gr2}. For this item
$$\textit{closure}( \left[ S' \rightarrow .S, \# \right] ) =
\{ \left[ S' \rightarrow .S, \# \right],
\left[ S \rightarrow .AA, \# \right],
\left[ A \rightarrow .aA, a/b \right],
\left[ A \rightarrow .b, a/b \right] \} \koz .$$
\end{pelda}
We can create the \textit{canonical sets of {LR}$(1)$-items} or shortly the
\textit{LR}$(1)$-\textit{canonical sets}
\index{LR(1)!canonical set}%
\index{canonical!set!LR(1)}%
with the following method.
\begin{defi}
\label{def-lr1-kan}%
\textit{Canonical sets of
{LR}$(1)$-items} ${\cal{H}}_0,{\cal{H}}_1,\ldots,{\cal{H}}_m$
are the following.
\begin{itemize}
\item ${\cal{H}}_{0} = \textit{closure}( \left[ S'
\rightarrow .S, \# \right] )$,
\item
\noindent Create the set
$\textit{read}( {\cal{H}}_0 ,X)$
for a symbol $X$. If this set is not empty and it is not equal to
canonical set ${\cal{H}}_0$ then it is the next canonical set
${\cal{H}}_1$.
Repeat this operation for all possible terminal and nonterminal symbol
$X$.
If we get a nonempty set which is not equal to any of previous sets
then this set is a new canonical set, and its index is greater by one as
the maximal index of previously generated canonical sets.
\item repeat the above operation for all
previously generated canonical sets and for all symbols of the grammar
until no more items can be added to it.
\end{itemize}
The sets
$$ {\cal{H}}_0, {\cal{H}}_1, \ldots, {\cal{H}}_m $$
\noindent are the
\textit{canonical sets of
{LR}$(1)$-items} of the grammar $G$.
\end{defi}
The number of elements of \textit{LR}$(1)$-items for a grammar is
finite,
hence the above method is terminated in finite time.
The next algorithm creates canonical sets of the grammar $G$.
\begin{algN}{Create-Canonical-Sets$(G)$}\inddef{\textsc{Create-Canonical-Sets}}
1 \' $i \leftarrow 0$ \\
2 \' ${\cal{H}}_i \leftarrow$ \textsc{Closure-Item}$([S' \rightarrow . S, \#])$ \\
3 \' $I \leftarrow \{ H_i \} , K \leftarrow \{ H_i \}$ \\
4 \' \key{repeat}\=\\
5 \' \> $L \leftarrow K$\\
6 \' \> \key{for} \= all $M \in I$-re \\
7 \' \> \> \key{do} \= $I \leftarrow I \setminus M$ \\
8 \' \> \> \> \key{for} \= all $X \in T \cup N$-re \\
9 \' \> \> \> \> \key{do} \=$J \leftarrow$
\textsc{Closure-Set-of-Items}(\textsc{Read-Set}$(M,X))$ \\
10 \' \> \> \> \> \>\key{if} \= $J \neq \emptyset$ and $J \not\in K$ \\
11 \' \> \> \>\> \> \>\key{then} \= $i \leftarrow i+1$ \\
12 \' \> \> \> \> \> \> \> ${\cal{H}}_i \leftarrow J$ \\
13 \' \> \> \> \> \> \> \> $K \leftarrow K\ \cup\ \{ {\cal{H}}_i \}$ \\
14 \' \> \> \> \> \> \> \> $I \leftarrow I\ \cup\ \{ {\cal{H}}_i \}$ \\
15 \' \key{until} $K = L$ \\
16 \' \key{return} $K$
\end{algN}
The result of the algorithm is $K$.
The first canonical set is the set ${\cal{H}}_0$ in the line 2.
Further canonical sets are created by functions
\textsc{Closure-Set-of-Items}(\textsc{Read-Set}) in the line 9.
The program in the line 10 checks that the new set differs from
previous sets, and if the answer is true then
this set will be a new set in lines 11--12.
The \key{for} cycle in lines 6--14 guarantees that
these operations are executed for all sets previously generated.
In lines 3--14 the \key{repeat} cycle generate new canonical sets as
long as it is possible.
\begin{pelda}{}
\label{pld-lr1gr2-kan}%
The canonical sets of \textit{LR}$(1)$-items for the Example
\ref{pld-lr1gr2} are as follows.
\hspace{\fill}\vspace{1.5truemm}\nobreak\hspace{\fill}
\break\noindent
\noindent$\begin{array}{lcllll}
{\cal{H}}_0 && &= \textit{closure}( \left[ S' \rightarrow .S \right] )
&=&\{%
\left[ S' \rightarrow .S, \# \right],
\left[ S \rightarrow .AA, \# \right], \\
& & & & &
\left[ A \rightarrow .aA, a/b \right],
\left[ A \rightarrow .b, a/b \right] \} \\[1mm]
{\cal{H}}_1 & =& \textit{read}({\cal{H}}_{0},S)
&=\textit{closure}( \left[ S' \rightarrow S., \# \right] )
&=&\{%
\left[ S' \rightarrow S., \# \right] \} \\[1mm]
{\cal{H}}_2 & =& \textit{read}({\cal{H}}_{0},A)
&=\textit{closure}( \left[ S' \rightarrow A.A, \# \right] )
&=&\{%
\left[ S \rightarrow A.A, \# \right],
\left[ A \rightarrow .aA, \# \right], \\
& & & &&
\left[ A \rightarrow .b, \# \right] \} \\[1mm]
{\cal{H}}_3 & =& \textit{read}({\cal{H}}_{0},a)
&=\textit{closure}( \left[ A \rightarrow a.A, a/b \right] )
&=&\{%
\left[ A \rightarrow a.A, a/b \right],
\left[ A \rightarrow .aA, a/b \right], \\
& & & &&
\left[ A \rightarrow .b, a/b \right] \} \\[1mm]
{\cal{H}}_4 & =& \textit{read}({\cal{H}}_{0},b)
&=\textit{closure}( \left[ A \rightarrow b., a/b \right] )
&=&\{%
\left[ A \rightarrow b., a/b \right] \} \\[1mm]
{\cal{H}}_5 & =& \textit{read}({\cal{H}}_{2},A)
&=\textit{closure}( \left[ S \rightarrow AA., \# \right] )
&=&\{%
\left[ S \rightarrow AA., \# \right] \} \\[1mm]
{\cal{H}}_6 & =& \textit{read}({\cal{H}}_{2},a)
&=\textit{closure}( \left[ A \rightarrow a.A, \# \right] )
&=&\{%
\left[ A \rightarrow a.A, \# \right],
\left[ A \rightarrow .aA, \# \right], \\
& & && &
\left[ A \rightarrow .b, \# \right] \} \\[1mm]
\end{array}$
\hspace{\fill}\vspace{1.5truemm}\nobreak\hspace{\fill}
\break\noindent
\hspace{\fill}\vspace{1.5truemm}\nobreak\hspace{\fill}
\break\noindent
\noindent$\begin{array}{lcllll}
{\cal{H}}_7 & =& \textit{read}({\cal{H}}_{2},b)
&=\textit{closure}( \left[ A \rightarrow b., \# \right] )
&=&\{%
\left[ A \rightarrow b., \# \right] \} \\[1mm]
{\cal{H}}_8 & =& \textit{read}({\cal{H}}_{3},A)
&=\textit{closure}( \left[ A \rightarrow aA., a/b \right] )
&=&\{%
\left[ A \rightarrow aA., a/b \right] \} \\[1mm]
& & \textit{read}({\cal{H}}_{3}, a ) &={\cal{H}}_3 \\[1mm]
& & \textit{read}({\cal{H}}_{3}, b ) &={\cal{H}}_4 \\[1mm]
{\cal{H}}_9 & =& \textit{read}({\cal{H}}_{6},A)
&=\textit{closure}( \left[ A \rightarrow aA., \# \right] )
&=&\{%
\left[ A \rightarrow aA., \# \right] \} \\[1mm]
& & \textit{read}({\cal{H}}_{6}, a ) &={\cal{H}}_6 \\[1mm]
& & \textit{read}({\cal{H}}_{6}, b ) &={\cal{H}}_7
\end{array}$
\hspace{\fill}\vspace{1.5truemm}\nobreak\hspace{\fill}
\break\noindent
\noindent The automaton of the parser is in Figure \ref{fig-lr1gr2-aut}.
\end{pelda}
\begin{figure}[t!]
\hspace*{\fill}
\xymatrix{
& & *++[o][F=]{\;1\;} & *++[o][F=]{\;5\;} \\
\ar[r] & *++[o][F]{\;0\;} \ar[r]^A \ar[rdd]^a \ar@/_{5mm}/[rddd]_b \ar
[ru]^S
& *++[o][F]{\;2\;} \ar[ru]^A \ar[r]^a \ar[rd]^b
& *++[o][F]{\;6\;} \ar@(ur,u)[]_a \ar[r]^A \ar[d]^b
& *++[o][F=]{\;9\;} \\
& & & *++[o][F=]{\;7\;} \\
& & *++[o][F]{\;3\;} \ar@(ur,u)[]_a \ar[r]^A \ar[d]^b & *++[o][F=]{\;8
\;} \\
& & *++[o][F=]{\;4\;} }
\hspace*{\fill}
\caption{The automaton of the Example \ref{pld-lr1gr2}.}
\label{fig-lr1gr2-aut}
\end{figure}
\vspace{-2mm}
%%%
\subsubsection[ \textit{LR}$(1)$ parser.]
{ \textit{LR}{\boldmath{$(1)$}} parser}
\index{LR(1)!parser}%
If the canonical sets of \textit{LR}$(1)$-items
$${\cal{H}}_0,{\cal{H}}_1,\ldots,{\cal{H}}_m$$
\noindent were created, then assign the state $k$ of an automaton to
the
set ${\cal{H}}_{k}$.
Relation between the states of the automaton and
the canonical sets of \textit{LR}$(1)$-items is stated by the next
theorem.
This theorem is the \ki{``great'' theorem of the {LR{\boldmath{$(1)
$}}}-parsing.}
\begin{tetel}
The set of the \textit{LR}$(1)$-items being valid for a viable prefix
$\gamma$ can be assigned to the automaton-state $k$ such that
there is
path from the initial state to state $k$ labeled by $gamma$.
\end{tetel}
This theorem states that we can create the automaton of the parser
using canonical sets. Now we give a method to create this \textit{LR}$(1)$ parser
from canonical sets of \textit{LR}$(1)$-items.
The deterministic finite automaton can be described with a table,
that is called
\textit{LR}$(1)$ \textit{parsing table}.
\index{LR(1)!parsing!table}%
\index{parsing!table}%
\index{table!parsing}%
The rows of the table are assigned to the states of the automaton.
The parsing table has two parts. The first is the
\textit{action} table.
\index{action table}%
Since the operations of parser are determined by the symbols of
analysed text, the \textit{action} table is divided into columns
labeled by the terminal symbols.
The \textit{action} table contains information about the action
performing at the given state and at the given symbol. These actions
can be
shifts or reductions.
The sign of a shift operation is $sj$, where $j$ is the next state.
The sign of the reduction is $ri$, where $i$ is the serial number of
the applied rule.
The reduction by the rule having the serial number zero means the
termination of the parsing and that the parsed text is syntactically
correct; for this reason we call this operation \textit{accept}.
\index{accept}%
The second part of the parsing table is the \textit{goto} table.
\index{goto table}%
In this table are informations about shifts caused by nonterminals.
(Shifts belong to terminals are in the action table.)
Let $\{ 0,1,\ldots,m \}$ be the set of states of the automata.
The $i$-th row of the table is filled in from the
\textit{LR}$(1)$-items of
canonical set ${\cal{H}}_{i}$.
The $i$-th row of the \textit{action} table:
\begin{itemize}
\item if $\left[ A \rightarrow \alpha .a\beta, b \right] \in {\cal{H}}_{i}
$
and $\textit{read}({\cal{H}}_{i},a) = {\cal{H}}_{j}$ then
$\textit{action}[i,a] = sj$,
\item if $\left[ A \rightarrow \alpha ., a \right] \in {\cal{H}}_{i}$ and
$A \neq S'$,
then $\textit{action}[i,a] = rl$,
where $A \rightarrow \alpha$ is the $l$-th rule of the grammar,
\item if $\left[ S' \rightarrow S., \# \right] \in {\cal{H}}_{i}$,
then
$\textit{action}[i,\#] = \textit{accept}$.
\end{itemize}
\index{accept}%
The method of filling in the \textit{goto} table:
\begin{itemize}
\item if $\textit{read}({\cal{H}}_{i},A) = {\cal{H}}_{j}$, then
$\textit{goto}[i,A] = j$.
\end{itemize}
\begin{itemize}
\item In both table we have to write the text \textit{error} into the
empty positions.
\index{error}%
\end{itemize}
These \textit{action} and \textit{goto} tables are called
\textit{canonical parsing tables}.
\index{LR(1)!parsing!table}%
\index{canonical!parsing table}%
\index{parsing!table}%
\index{table!parsing}%
\begin{tetel}
\label{tetel-lre-tabl}%
The augmented grammar $G'$ is
\textit{LR}$(1)$ grammar iff we can fill in the parsing tables
created for this grammar without conflicts.
\end{tetel}
We can fill in the parsing tables with the next algorithm.
\begin{algN}{Fill-in-LR(1)-Table$(G)$}\inddef{\textsc{Fill-in-LR(1)-Table}}
1 \' \key{for} \= all LR(1) canonical sets ${\cal{H}}_i$ \\
2 \' \> \key{do} \= \key{for} \= all LR(1)-items \\
3 \' \> \> \> \key{if} \= $[ A \rightarrow \alpha .a\beta, b]
\in {\cal{H}}_{i}$
and $\textit{read}({\cal{H}}_{i},a) = {\cal{H}}_{j}$ \\
4 \' \> \> \> \> \key{then} \textit{action}$[i,a] = sj$\\
5 \' \> \> \> \key{if} \= $[ A \rightarrow \alpha ., a ]
\in {\cal{H}}_{i}$
and $A \neq S'$ and
$A \rightarrow \alpha$ the $l$-th rule \\
6 \' \> \> \> \> \key{then} $\textit{action}[i,a] = rl$ \\
7 \' \> \> \> \key{if} \= $\left[ S' \rightarrow S., \# \right] \in {\cal
{H}}_{i}$ \\
8 \' \> \> \> \> \key{then} $\textit{action}[i,\#] = \textit{accept}$
\\
9 \' \> \> \> \key{if} \= $\textit{read}({\cal{H}}_{i},A) = {\cal{H}}_
{j}$ \\
10 \' \> \> \> \> \key{then} $\textit{goto}[i,A] = j$ \\
11 \' \> \key{for} \= all $a \in ( T \cup \{\#\})$ \\
12 \' \> \> \key{do} \= \key{if} \= $\textit{action}[i,a] =$ \idez
{empty} \\
13 \' \> \> \> \> \key{then} $\textit{action}[i,a] \leftarrow \textit
{error}$ \\
14 \' \> \key{for} \= all $X \in N$\\
15 \' \> \> \key{do} \= \key{if} \= $\textit{goto}[i,X] =$ \idez
{empty} \\
16 \' \> \> \> \key{then} $\textit{goto}[i,X] \leftarrow \textit{error}$
\\
17 \' \key{return} \textit{action}, \textit{goto}
\end{algN}
We fill in the tables its line-by-line. In lines
2--6 of the algorithm we fill in the \textit{action} table, in lines 9--10
we fill in the \textit{goto} table.
In lines 11--13 we write the \textit{error} into the positions which remained empty.
\begin{figure}[t!]
\begin{picture}(238,110)(-25,-20)
\put(132,75){\fbox{\makebox(20,5){$x$}}}
\put(160,75){\fbox{\makebox(6,5){$a$}}}
\put(174,75){\fbox{\makebox(40,5){\ \ $y$\hfill}}}
\put(222,75){\fbox{\makebox(6,5){$\#$}}}
\put(128,35){\fbox{\makebox(60,15){\textit{parser}}}}
\put(80,-11){\line(0,1){80}}
\put(95,-11){\line(0,1){80}}
\put(80,-11){\line(1,0){15}}
\put(80,4){\line(1,0){15}}
\put(80,52){\line(1,0){15}}
\put(80,37){\line(1,0){15}}
%\put(88,55){\makebox(0,0){$X$}}
\put(88,44){\makebox(0,0){$k$}}
\put(88,19){\makebox(0,0){$ $}}
\put(87,-5){\makebox(0,0){$0$}}
%
\put(65,-11){\line(0,1){80}}
\put(65,-11){\line(1,0){15}}
\put(65,4){\line(1,0){15}}
\put(65,52){\line(1,0){15}}
\put(65,37){\line(1,0){15}}
%\put(88,55){\makebox(0,0){$X$}}
\put(73,44){\makebox(0,0){$X$}}
\put(73,19){\makebox(0,0){$\alpha$}}
\put(72,-5){\makebox(0,0){$\#$}}
%
\put(127,45){\vector(-1,0){30}}
\put(165,55){\vector(0,1){15}}
%
\end{picture}
\caption{The structure of the \textit{LR}$(1)$ parser.}
\label{fig-lrelemz}
\end{figure}
Now we deal with the steps of the \textit{LR}$(1)$ parsing.
(Figure \ref{fig-lrelemz}).
The \textit{state of the parsing}
\index{parser!state}%
is written by configurations. A configuration
of the \textit{LR}$(1)$ parser consists of
two parts, the first is the stack and the second is the unexpended
input text.
The stack of the parsing is a double stack, we write or read two data
with the operations
\textit{push} or \textit{pop}.
The stack consists of pairs of symbols, the first element of pairs
there is a terminal or nonterminal symbol, and the second element
is the serial number of the state of automaton.
The content of the start state is
$\#0$.
The \textit{start configuration} is
\index{parser!start configuration}%
$(\#0, z\#)$,
where $z$ means the unexpected text.
The parsing is successful if the parser moves
to \textit{final state}.
\index{parser!final state}%
In the final state the content of the stack is $\#0$,
and the parser is at the end of the text.
Suppose that the parser is in the configuration
$(\#0 \ldots Y_k i_k, ay\# )$. The next move of the parser
is determined by
$\textit{action}[i_k,a]$.
State transitions are the following.
\begin{itemize}
\item If $\textit{action}[i_{k},a] = sl$, i.e. the parser executes a
shift, then the actual symbol $a$ and the new state $l$ are written into
the
stack. That is, the new configuration is
$$(\#0 \ldots Y_k i_k, ay\#) \rightarrow (\#0 \ldots Y_k i_k a i_l, y\#)
\koz . $$
\item If $\textit{action}[i_k,a] = rl,$
then we execute a reduction by the $i$-th rule
$A \rightarrow \alpha$. In this step we delete $| \alpha |$ rows,
i.e. we delete $2 | \alpha |$
elements from the stack, and then we determine the new state
using the \textit{goto} table. If after the deletion there is the state
$i_{k-r}$ at the top of the stack, then the new state is
$\textit{goto}[i_{k-r},A] =
i_l$.
$$(\#0\ldots Y_{k-r} i_{k-r} Y_{k-r+1} i_{k-r+1} \ldots Y_k i_k, y\#)
\rightarrow
(\#0\ldots Y_{k-r} i_{k-r} A i_l, y\#) \koz ,$$
\noindent where $| \alpha | = r$.
\item If $\textit{action}[i_{k},a] = \textit{accept}$,
then the parsing is completed, and the analysed text was correct.
\item If $\textit{action}[i_{k},a] = \textit{error}$,
then the parsing terminates, and a \textit{syntactic error} was
discovered
at the symbol $a$.
\index{syntactic!error}%
\index{error!syntactic}%
\end{itemize}
The \textit{LR}$(1)$ parser is often named \textit{canonical}
\textit{LR}$(1)$ \textit{parser}.
\index{canonical!parser}%
\index{parser!canonical}%
Denote the \textit{action} and \textit{goto} tables together by $T$.
We can give the following algorithm for the steps of parser.
\begin{algN}{LR(1)-Parser$(xay\#,T)$}\inddef{\textsc{LR(1)-Parser}}
1 \' $s \leftarrow (\#0, xay\#)$, $s' \leftarrow \textit{parsing}$ \\
2 \' \key{repeat} \= \\
3 \' \> $s =
(\#0\ldots Y_{k-r} i_{k-r} Y_{k-r+1} i_{k-r+1} \ldots Y_k i_k, ay\#)$
\\
4 \' \> \key{if} \=
$\textit{action}[i_{k},a] = sl$ \\
5 \' \> \> \key{then} $s \leftarrow (\#0 \ldots Y_k i_k a i_l, y\#)$ \\
6 \' \> \> \key{else}
\key{if} \= $\textit{action}[i_k,a] = rl$ and
$A \rightarrow \alpha$ is the $l$-th rule and \\
7 \' \> \> \> $| \alpha | = r$ and $\textit{goto}[i_{k-r},A] =
i_l$\\
8 \' \> \> \> \key{then} $s \leftarrow (\#0\ldots Y_{k-r} i_{k-r} A i_l,
ay\#)$ \\
9 \' \>\> \> \key{else} \key{if} \= $\textit{action}[i_{k},a] = \textit
{accept}$ \\
10 \' \> \> \> \> \key{then} \= $ s' \leftarrow \textit{O.K.}$
\\
11 \' \> \> \> \> \key{else} \> $ s' \leftarrow \textit{ERROR}$ \\
12 \' \key{until} $ s' = \textit{O.K.}$ \key{or} $s' = \textit{ERROR}$ \\
13 \' \key{return} $s',s$
\end{algN}
The input parameters of the algorithm are the text $xay$ and table
$T$.
The variable $s'$ indicates the action of the parser. It has value
\textit{parsing} in the intermediate states, and its value is
\textit{O.K.} or
\textit{ERROR} at the final states.
In line 3 we detail the configuration of the parser, that is necessary at
lines 6--8. Using the \textit{action} table, the parser determines its move
from the symbol $x_k$
at the top of the stack and from the actual symbol $a$.
In lines 4--5 we execute a shift step, in lines 6--8 a reduction.
The algorithm is completed in lines 9--11.
At this moment, if the parser is at the end of text and the state 0 is at
the top of stack, then the text is correct, otherwise a
syntax error was detected. According to this,
the output of the algorithm is
\textit{O.K.} or \textit{ERROR}, and the final configuration is at the
output, too. In the case of error, the first symbol of the second element of the
configuration is the erroneous symbol.
\begin{pelda}The \textit{action} and \textit{goto} tables of the \textit
{LR}$(1)$ parser
for the grammar of Example \ref{pld-lr1gr2} are as follows.
The empty positions denote \textit{errors}.
\hspace{\fill}\vspace{1.5truemm}\nobreak\hspace{\fill}
\break\noindent
\begin{center}
$\begin{array}{||c||ccc||cc||}
\hhline{|t:=:t:===:t:==:t|}
\textnormal{state} & \multicolumn{3}{c||}{action} & \multicolumn
{2}
{c||}{goto} \\
%\cline{2-4}\cline{5-6} \\
\hhline{||~||---||--||}
& a & b & \# & S & A \\
\hhline{|:=::===::==:|}
0 & s3 & s4 & & 1 & 2 \\
1 & & & \textit{accept} & & \\
2 & s6 & s7 & & & 5 \\
3 & s3 & s4 & & & 8 \\
4 & r3 & r3 & & & \\
5 & & & r1 & & \\
6 & s6 & s7 & & & 9 \\
7 & & & r3 & & \\
8 & r2 & r2 & & & \\
9 & & & r2 & &\\
\hhline{|b:=:b:===:b:==:b|}
\end{array}$
\end{center}
\end{pelda}
\begin{pelda}
Using the tables of the previous example, analyse the text $abb\#$.
\hspace{\fill}\vspace{1.5truemm}\nobreak\hspace{\fill}
\break\noindent
$\begin{array}[b]{lrllrl|l}
& & & & & \hspace*{7mm} & \textit{rule} \\
(\#0, & aab\#) & \xrightarrow{s3}\
& (\#0a3, & bb\#) & & \\
& & \xrightarrow{s4} & (\#0a3b4, & b\#) & & \\
& & \xrightarrow{r3} & (\#0a3A8, & b\#) & & A \rightarrow b \\
& & \xrightarrow{r2} & (\#0A2, & b\#) & & A \rightarrow aA \\
& & \xrightarrow{s7} & (\#0A2b7, & \#) & & \\
& & \xrightarrow{r3} & (\#0A2A5, & \#) & & A \rightarrow b \\
& & \xrightarrow{r1} & (\#0S1, & \#) & & S \rightarrow AA\\
& & \xrightarrow{\textit{elfogad}} & \textit{O.K.}
\end{array}$
The syntax tree of the sentence is in Figure {\ref{fig-lr1fa}}.
\end{pelda}
\begin{figure}[t!]
\hspace*{\fill}
\xymatrix{
& & S' \ar@{-}[d] \\
& & S \ar@{-}[dl] \ar@{-}[dr] \\
& A \ar@{-}[dl] \ar@{-}[dr] & & A \ar@{-}[d] \\
a & & A \ar@{-}[d] & b \\
& & b
}
\hspace*{\fill}
\caption{The syntax tree of the sentence $aab$.}
\label{fig-lr1fa}
\end{figure}
%%%%%%%%%%%%%%%%%%%%%%
\subsubsection[\textit{LALR}$(1)$\ parser.]
{\textit{LALR}{\boldmath{$(1)$}}\ parser}
\index{LALR(1)!parser}%
Our goal is to decrease the number of states of the parser, since
not only the size but the speed of the compiler is dependent on the
number of states.
At the same time, we wish not to cut radically the set of \textit{LR}$(1)
$ grammars
and languages, by using our new method.
There are a lot of \textit{LR}$(1)$-items in the canonical sets, such
that are very similar: their core are the same, only their
lookahead symbols are different. If there are two or more canonical
sets in which there are similar items only, then we merge these sets.
If the canonical sets ${\cal{H}}_i$ és a ${\cal{H}}_j$ are mergeable,
then let
${\cal{K}}_{\left[ i,j\right]} = {\cal{H}}_i \cup {\cal{H}}_j \koz .$
Execute all of possible merging of \textit{LR}$(1)$ canonical sets.
After renumbering the indexes we obtain sets
${\cal{K}}_0,{\cal{K}}_1,\ldots,{\cal{K}}_n$;
these are the \textit{merged LR$(1)$ canonical sets} or
\textit{{\textit{LALR}$(1)$} canonical sets}.
\index{canonical!set!merged}%
\index{merged canonical set}%
\index{canonical!set!LALR(1)}%
\index{LALR(1)!canonical set}%
We create the \textit{LALR}$(1)$ parser from these united canonical
sets.
\begin{pelda}{}
\label{pld-lr1gr2-egy}%
Using the \textit{LR}$(1)$ canonical sets
of the example \ref{pld-lr1gr2-kan},
we can merge the next canonical sets:
${\cal{H}}_{3}$ and ${\cal{H}}_{6}$,
${\cal{H}}_{4}$ and ${\cal{H}}_{7}$,
${\cal{H}}_{8}$ and ${\cal{H}}_{9}$.
In the Figure \ref{fig-lr1gr2-aut} it can be seen that
mergeable sets are in equivalent or similar positions in the automaton.
\end{pelda}
There is no difficulty with the function \textit{read} if we use merged
canonical sets.
If
$${\cal{K}} =
{\cal{H}}_1 \cup {\cal{H}}_2 \cup \ldots \cup {\cal{H}}_k \koz ,$$
$$\textit{read}({\cal{H}}_1,X) = {\cal{H}}_1^{'},
\textit{read}({\cal{H}}_2,X) = {\cal{H}}_2^{'},
\ldots,
\textit{read}({\cal{H}}_k,X) = {\cal{H}}_k^{'} \koz , $$
and
$${\cal{K'}} =
{\cal{H}}_1^{'} \cup {\cal{H}}_2^{'} \cup \ldots \cup {\cal{H}}_k^
{'} \koz ,$$
then
$$\textit{read}({\cal{K}},X) = {\cal{K'}} \koz . $$
We can prove this on the following way.
By the definition of function \textit{read},
the set $\textit{read}({\cal{H}},X)$ depends on
the core of \textit{LR}$(1)$-items in ${\cal{H}}$ only,
and it is independent of the lookahead symbols.
Since the cores of \textit{LR}$(1)$-items in the sets
${\cal{H}}_{1},{\cal{H}}_{2},\ldots
,{\cal{H}}_{k}$ are the same, the cores of \textit{LR}$(1)$-items of
$$\textit{read}({\cal{H}}_{1},X), \textit{read}({\cal{H}}_
{2},X),\ldots,%
\textit{read}({\cal{H}}_{k},X)$$
\noindent are also the same. It follows that these sets are mergeable
into a set ${\cal{K'}}$, thus
$\textit{read}({\cal{K}},X) = {\cal{K'}}$.
However, after merging canonical sets of \textit{LR}$(1)$-items,
elements of this set can raise difficulties.
Suppose that
\hspace{\fill}\vspace{1.5truemm}\nobreak\hspace{\fill}
\break\noindent
${\cal{K}}_{\left[ i,j\right]} = {\cal{H}}_i \cup {\cal{H}}_j$.
\begin{itemize}
\item After merging there are not \textit{shift-shift conflicts}.
\index{shift-shift conflict}%
\index{conflict!shift-shift}%
If
$$\left[A \rightarrow \alpha .a\beta , b \right] \in {\cal{H}}_i \koz $$
and
$$\left[B \rightarrow \gamma .a\delta, c \right] \in {\cal{H}}_j \koz $$
then there is a shift for the symbol $a$ and we saw that the
function \textit{read} does not cause problem, i.e. the set
$\textit{read}({\cal{K}}_{\left[ i,j\right]},a)$ is equal to the set
$\textit{read}({\cal{H}}_i,a) \cup \textit{read}({\cal{H}}_j,a)$.
\item If there is an item
$$\left[A \rightarrow \alpha .a\beta , b \right] \koz $$
in the canonical set ${\cal{H}}_i$ and there is an item
$$\left[B \rightarrow \gamma ., a \right]$$
in the set a ${\cal{H}}_j$, then the merged set is an inadequate
set with the symbol $a$, i.e. there is a
\textit{shift-reduce conflict}
\index{shift-reduce conflict}%
\index{conflict!shift-reduce}%
in the merged set.
But this case never happens.
Both items are elements of the set ${\cal{H}}_i$ and of the set
${\cal{H}}_j$. These sets are mergeable sets, thus they are different
in lookahead symbols only.
It follows that there is an item
$\left[A \rightarrow \alpha .a\beta , c \right]$
in the set ${\cal{H}}_j$.
Using the Theorem
\ref{tetel-lre-tabl} we get that the grammar is not a
\textit{LR}$(1)$ grammar; we get shift-reduce conflict
from the set ${\cal{H}}_j$ for the \textit{LR}$(1)$ parser, too.
\item However, after merging
\textit{reduce-reduce conflict}
\index{reduce-reduce conflict}%
\index{conflict!reduce-reduce}%
may arise. The properties of \textit{LR}$(1)$ grammar do not exclude
this case.
In the next example we show such a case.
\end{itemize}
\begin{pelda}{}
\label{pld-red-red}%
Let
$G' = ( \{ S',S,A,B \}, \{ a,b,c,d,e \}, {P'}, S' )$ be a grammar, and
the derivation rules are as follows.
$S' \rightarrow S$
$S \rightarrow aAd\ \mid\ bBd\ \mid\ aBe\ \mid\ bAe$
$A \rightarrow c$
$B \rightarrow c$
\noindent This grammar is a \textit{LR}$(1)$ grammar. For the viable
prefix
$ac$ the \textit{LR}$(1)$-items
\hspace{\fill}\vspace{1.5truemm}\nobreak\hspace{\fill}
\break\noindent
$$\{ \left[ A \rightarrow c., d \right], \left[ B \rightarrow c., e \right] \}
\koz , $$
\noindent for the viable prefix $bc$ the \textit{LR}$(1)$-items
$$\{ \left[ A \rightarrow c., e \right], \left[ B \rightarrow c., d \right] \}
$$
\noindent create two canonical sets.
After merging these two sets we get a reduce-reduce conflict.
If the input symbol is $d$ or $e$ then the handle is $c$, but we
cannot decide
that if we have to use the rule
$A \rightarrow c$ or the rule
$B \rightarrow c$ for reducing.
\end{pelda}
Now we give the method for creating a
\textit{LALR}$(1)$\ parsing table.
First we give the canonical sets of \textit{LR}$(1)$-items
$${\cal{H}}_{1}, {\cal{H}}_{2}, \ldots , {\cal{H}}_{m}$$,
\noindent then
we merge canonical sets in which the sets constructed from
the core of the items are identical ones.
Let
$${\cal{K}}_{1}, {\cal{K}}_{2}, \ldots , {\cal{K}}_{n}\ \ ( n \le m )
\koz $$
\noindent be the \textit{LALR}$(1)$ canonical sets.
For the calculation of the size of the
\textit{action} and
\index{action table}%
\textit{goto} tables
\index{goto table}%
and for filling in these tables we use the sets
${\cal{K}}_{i}$\ \ $(1 \le i \le n)$.
The method is the same as it was in the
\textit{LR}$(1)$ parsers.
The constructed tables are named by
\textit{LALR}$(1)$\ \textit{parsing tables}.
\index{LALR(1)!parsing!table}%
\index{parsing!table}%
\index{table!parsing}%
\begin{defi}
If the filling in the
\textit{LALR}$(1)$\ parsing tables do not produce conflicts then
the grammar is said to be an
\textbf{\textit{LALR}}{\boldmath{$(1)$}}\ grammar.
\end{defi}
\index{LALR(1)!grammar}%
\index{grammar!LALR(1)}%
The run of \textit{LALR}$(1)$\ parser is the same as it was in
\textit{LR}$(1)$ parser.
\begin{pelda}{}
\label{pld-lalr12}
Denote the result of merging canonical sets ${\cal{H}}_{i}$ and
${\cal{H}}
_{j}$
by
${\cal{K}}_{\left[ i,j\right]}$. Let
$\left[i,j\right]$
be the state which belonging to this set.
The \textit{LR}$(1)$ canonical sets of the grammar of Example
\ref{pld-lr1gr2} were given in the Example
\ref{pld-lr1gr2-kan} and the mergeable sets were seen in the
example
\ref{pld-lr1gr2-egy}.
For this grammar we can create the next \textit{LALR}$(1)$\
parsing tables.
\hspace{\fill}\vspace{1.5truemm}\nobreak\hspace{\fill}
\break\noindent
\begin{center}
$\begin{array}{||c||ccc||cc||}
\hhline{|t:=:t:===:t:==:t|}
\textnormal{állapot} & \multicolumn{3}{c||}{action} & \multicolumn
{2}
{c||}{goto} \\
%\cline{2-4}\cline{5-6}
\hhline{||~||---||--||}
& a & b & \# & S & A \\
\hhline{|:=::===::==:|}
0 & s\left[3,6\right] & s\left[4,7\right] & & 1 & 2 \\
1 & & & accept & & \\
2 & s\left[3,6\right] & s\left[4,7\right] & & & 5 \\
\left[3,6\right] & s\left[3,6\right] & s\left[4,7\right] & & & \left[8,9
\right] \\
\left[4,7\right] & r3 & r3 & r3 & & \\
5 & & & r1 & & \\
\left[8,9\right] & r2 & r2 & r2 & & \\
\hhline{|b:=:b:===:b:==:b|}
\end{array}$
\end{center}
The filling in the
\textit{LALR}$(1)$ tables are conflict free, therefore the grammar
is an
\textit{LALR}$(1)$\ grammar. The automaton of this parser is in Figure
\ref{fig-lalr12-aut}.
\end{pelda}
\begin{figure}[t!]
\hspace*{\fill}
\xymatrix{
& & *++[o][F=]{\;1\;} & *++[o][F=]{\;5\;} \\
\ar[r] & *++[o][F]{\;0\;} \ar[r]^A \ar@/_{8mm}/[rr]_a \ar@/_{10mm}/
[rrd]_b \ar[ru]^S
& *++[o][F]{\;2\;} \ar[ru]^A \ar[r]^a \ar[rd]_(.6){b}
& *++[o][F]{3\hspace*{-0.2mm},\hspace*{-0.5mm}6} \ar@(ur,u)[]_a
\ar[r]^A \ar[d]^b
& *++[o][F=]{8\hspace*{-0.2mm},\hspace*{-0.5mm}9} \\
& & & *++[o][F=]{4\hspace*{-0.2mm},\hspace*{-0.5mm}7} }
\hspace*{\fill}
\caption{The automaton of the Example \ref{pld-lalr12}.}
\label{fig-lalr12-aut}
\end{figure}
\begin{pelda}
Analyse the text $abb\#$ using the parsing table of the
previous example.
\hspace{\fill}\vspace{1.5truemm}\nobreak\hspace{\fill}
\break\noindent
$\begin{array}[b]{lrllrl|l}
& & & & & \hspace*{7mm} & \textit{rule} \\
(\#0, & aab\#) & \xrightarrow{s\left[3,6\right]}\
& (\#0a\left[3,6\right], & bb\#) & & \\
& & \xrightarrow{s\left[4,7\right]} &
(\#0a\left[3,6\right]b\left[4,7\right], & b\#) & & \\
& & \xrightarrow{r3} & (\#0a\left[3,6\right]A[8,9], & b\#) & & A
\rightarrow b \\
& & \xrightarrow{r2} & (\#0A2, & b\#) & & A \rightarrow aA \\
& & \xrightarrow{s\left[4,7\right]} & (\#0A2b\left[4,7\right], & \#) &
& \\
& & \xrightarrow{r3} & (\#0A2A5, & \#) & & A \rightarrow b \\
& & \xrightarrow{r1} & (\#0S1, & \#) & & S \rightarrow AA\\
& & \xrightarrow{\textit{elfogad}} & \textit{O.K.}
\end{array}$
The syntax tree of the parsed text is in the Figure
{\ref{fig-lr1fa}}.
\end{pelda}
As it can be seen from the previous example, the
\textit{LALR}$(1)$\ grammars are \textit{LR}$(1)$ grammars. The
converse assertion is not true. In Example
\ref{pld-red-red} there is a grammar which is
\textit{LR}$(1),$ but it is not \textit{LALR}$(1)$\ grammar.
Programming languages can be written by
\textit{LALR}$(1)$\ grammars. The most frequently used methods in
compilers of
programming languages is the
\textit{LALR}$(1)$\ method. The advantage of the \textit{LALR}$(1)$\
parser is that the sizes of parsing tables are smaller than the size
of \textit{LR}$(1)$ parsing tables.
For example, the \textit{LALR}$(1)$ parsing tables for the Pascal
language
have a few hundreds of lines, whilst the \textit{LR}$(1)$ parsers for
this
language
have a few thousands of lines.
%%%%%%%%%%%%%%%%%%%%%
\begin{gyak}
\ujgyak Find the \textit{LL$(1)$} grammars among
the following grammars
(we give their derivation rules only).
\begin{enumerate}
\item $\begin{array}[t]{lll}
S & \rightarrow & ABc \\
A & \rightarrow & a \mid \varepsilon \\
B & \rightarrow & b \mid \varepsilon
\end{array}$
\item $\begin{array}[t]{lll}
S & \rightarrow & Ab \\
A & \rightarrow & a \mid B \mid \varepsilon \\
B & \rightarrow & b \mid \varepsilon
\end{array}$
\item $\begin{array}[t]{lll}
S & \rightarrow & ABBA \\
A & \rightarrow & a \mid \varepsilon \\
B & \rightarrow & b \mid \varepsilon
\end{array}$
\item $\begin{array}[t]{lll}
S & \rightarrow & aSe \mid A \\
A & \rightarrow & bAe \mid B \\
B & \rightarrow & cBe \mid d
\end{array}$
\end{enumerate}
\ujgyak Prove that the next grammars are \textit{LL$(1)$}\ grammars
(we give their derivation rules only).
\begin{enumerate}
\item $\begin{array}[t]{lll}
S & \rightarrow & Bb \mid Cd \\
B & \rightarrow & aB \mid \varepsilon \\
C & \rightarrow & cC \mid \varepsilon
\end{array}$
\item $\begin{array}[t]{lll}
S & \rightarrow & aSA \mid \varepsilon \\
A & \rightarrow & c \mid bS
\end{array}$
\item $\begin{array}[t]{lll}
S & \rightarrow & AB \\
A & \rightarrow & a \mid \varepsilon \\
B & \rightarrow & b \mid \varepsilon
\end{array}$
\end{enumerate}
\ujgyak Prove that the next grammars are not \textit{LL$(1)$}\
grammars
(we give their derivation rules only).
\begin{enumerate}
\item $\begin{array}[t]{lll}
S & \rightarrow & aAa \mid Cd \\
A & \rightarrow & abS \mid c
\end{array}$
\item $\begin{array}[t]{lll}
S & \rightarrow & aAaa \mid bAba \\
A & \rightarrow & b \mid \varepsilon
\end{array}$
\item $\begin{array}[t]{lll}
S & \rightarrow & abA \mid \varepsilon \\
A & \rightarrow & Saa \mid b
\end{array}$
\end{enumerate}
\ujgyak Show that a \textit{LL$(0)$} language has only one sentence.
\ujgyak Prove that the next grammars are \textit{LR$(0)$}\ grammars
(we give their derivation rules only).
\begin{enumerate}
\item $\begin{array}[t]{lll}
S' & \rightarrow & S \\
S & \rightarrow & aSa \mid aSb \mid c
\end{array}$
\item $\begin{array}[t]{lll}
S' & \rightarrow & S \\
S & \rightarrow & aAc \\
A & \rightarrow & Abb \mid b
\end{array}$
\end{enumerate}
\ujgyak Prove that the next grammars are \textit{LR$(1)$}\ grammars.
(we give their derivation rules only).
\begin{enumerate}
\item $\begin{array}[t]{lll}
S' & \rightarrow & S \\
S & \rightarrow & aSS \mid b
\end{array}$
\item $\begin{array}[t]{lll}
S' & \rightarrow & S \\
S & \rightarrow & SSa \mid b
\end{array}$
\end{enumerate}
\ujgyak Prove that the next grammars are not \textit{LR$(k)$}\
grammars for
any $k$ (we give their derivation rules only).
\begin{enumerate}
\item $\begin{array}[t]{lll}
S' & \rightarrow & S \\
S & \rightarrow & aSa \mid bSb \mid a \mid b
\end{array}$
\item $\begin{array}[t]{lll}
S' & \rightarrow & S \\
S & \rightarrow & aSa \mid bSa \mid ab \mid ba
\end{array}$
\end{enumerate}
\ujgyak Prove that the next grammars are \textit{LR$(1)$}\ but are not
\textit{LALR$(1)$} grammars (we give their derivation rules only).
\begin{enumerate}
\item $\begin{array}[t]{lll}
S' & \rightarrow & S \\
S & \rightarrow & Aa \mid bAc \mid Bc \mid bBa \\
A & \rightarrow & d \\
B & \rightarrow & d
\end{array}$
\item $\begin{array}[t]{lll}
S' & \rightarrow & S \\
S & \rightarrow & aAcA \mid A \mid B \\
A & \rightarrow & b \mid Ce \\
B & \rightarrow & dD \\
C & \rightarrow & b \\
D & \rightarrow & CcS \mid
CcD
\end{array}$
\end{enumerate}
\ujgyak Create parsing table for the above \textit{LL$(1)$}\ grammars.
\ujgyak Using the recursive descent method, write the parsing program
for the above \textit{LL$(1)$}\ grammars.
\ujgyak Create canonical sets and the parsing tables for the above
\textit{LR$(1)$}\ grammars.
\vspace{-4mm}
\ujgyak Create merged canonical sets and the parsing tables for the
above \textit{LALR$(1)$}\ grammars.
\end{gyak}
\begin{fld}
\ujfld{Lexical analysis of a program text}
{The algorithm \textsc{Lex-Analyse} in Section
\ref{sec-lex}\label{fld-lex-teljes}
gives a scanner for the text that is described by \textit{only one}
regular expression or deterministic finite automaton, i.e. this scanner
is able to analyse only one symbol.
Create an automaton which executes total lexical analysis of a
program language, and give the algorithm
\textsc{Lex-analyse-language}
\index{\textsc{Lex-Analyse-Language}}%
for this automaton.
Let the input of the algorithm be the text of a program, and the output
be the series of symbols. It is obvious that if the automaton goes into a
finite state then its new work begins at the initial state, for
analysing the next symbol. The algorithm finishes his work if it is at the end of the
text or a lexical error is detected.
}
%
\ujfld{Series of symbols augmented with data of symbols}
{Modify the algorithm of the previous task on such way that the output
is the series of symbols augmented with the appropriate attributes. For
example, the attribute of a variable is the character string of its name, or
the attribute of a number is its value and type.
It is practical to write pointers to the symbols in places of data.}
%
\ujfld{\textit{LALR}{\boldmath{$(1)$}}\ parser from \textit{LR}{\boldmath{$(0)$}} canonical sets}
{If we omit lookahead symbols from the \textit{LR}$(1)$-items then
we get
\ki{LR{\boldmath{$(0)$}}-items.}
\index{LR(0)!item}%
We can define functions \textit{closure} and \textit{read}
\index{closure}%
for \textit{LR}$(0)$-items too,
\index{read}%
doing not care for lookahead symbols.
Using a method similar to the method of \textit{LR}$(1)$,
we can construct
\ki{LR{\boldmath{$(0)$}} canonical sets}
$$ {\cal{I}}_0, {\cal{I}}_1, \dots, {\cal{I}}_n \koz .$$
\index{LR(0)!canonical set}%
\index{canonical!set!LR(0)}%
One can observe that the number of merged canonical sets is equal to
the number of
\textit{LR}$(0)$ canonical sets, since the cores of
\textit{LR}$(1)$-items of the merged canonical sets are the same as
the items of the
\textit{LR}$(0)$ canonical sets.
Therefore the number of states of
\textit{LALR}$(1)$\ parser is equal to the number of states of its \textit
{LR}$(0)$
parser.
Using this property, we can construct
\textit{LALR}$(1)$ canonical sets from \textit{LR}$(0)$ canonical sets,
by completing the items of the
\textit{LR}$(0)$ canonical sets with lookahead symbols.
The result of this procedure is the set of
\textit{LALR}$(1)$ canonical sets.
It is obvious that the right part of an \textit{LR}$(1)$-item begins
with symbol point only if this item was constructed by the function
closure.
(We notice that there is one exception, the
$\left[ S' \rightarrow .S \right]$
item of the canonical set ${\cal{H}}_{0}$.)
Therefore it is no need for all items of \textit{LR}$(1)$ canonical sets.
Let the
\ki{kernel} of the canonical set ${\cal{H}}_{0}$ be
\index{LR(1)!canonical set!kernel}%
\index{canonical!set!kernel}%
the \textit{LR}$(1)$-item $\left[ S' \rightarrow .S, \# \right]$,
and let the kernel of any other canonical set be
the set of the \textit{LR}$(1)$-items such that there is no point at the
first position on the right side of the item.
We give an
\textit{LR}$(1)$ canonical set by its kernel, since all of items can be
construct from the kernel using the function \textit{closure}.
If we complete the items of the kernel of \textit{LR}$(0)$ canonical
sets
then we get the kernel of the merged
\textit{LR}$(1)$ canonical sets. That is,
if the kernel of an \textit{LR}$(0)$ canonical set is
${\cal{I}}_{j}$, then from it with completions we get
the kernel of the \textit{LR}$(1)$ canonical set,
${\cal{K}}_{j}$.
If we know ${\cal{I}}_{j}$ then we can construct
$\textit{read}({\cal{I}}_{j},X)$ easily.
If
$\left[ B \rightarrow \gamma .C\delta \right] \in {\cal{I}}_{j}$,\
$C \stackrel{*}\rightarrow A\eta$\ and $A \rightarrow X\alpha$, then
$\left[ A \rightarrow X.\alpha \right] \in \textit{read}({\cal{I}}_
{j},X).$
For \textit{LR}$(1)$-items, if
$\left[ B \rightarrow \gamma .C\delta , b \right] \in {\cal{K}}_{j}$,
\ $C \stackrel{*}\rightarrow A\eta$\ and $A \rightarrow X\alpha$
then we have to determine also the lookahead symbols, i.e.
the symbols $a$ such that
$\left[ A \rightarrow X.\alpha , a \right] \in
\textit{read}({\cal{K}}_{j},X)$.
If $\eta \delta \neq \varepsilon$ and
$a \in {\textit{First}}(\eta \delta b)$ then it is sure that
$\left[ A \rightarrow X.\alpha , a \right] \in \textit{read}({\cal{K}}_
{j},X)$.
In this case, we say that the lookahead symbol was
\ki{spontaneously generated} for this item of canonical set
$\textit{read}({\cal{K}}_{j},X)$.
\index{spontaneously generated}%
The symbol $b$ do not play important role in the construction of the
lookahead symbol.
If $\eta \delta = \varepsilon$ then $\left[ A \rightarrow X.\alpha , b
\right]$ is an element of the set
$\textit{read}({\cal{K}}_{j},X)$, and the lookahead symbol is
$b$. In this case we say that the lookahead symbol is
\ki{propagated} from
${\cal{K}}_{j}$
\index{propagation}%
into the item of the set $\textit{read}({\cal{K}}_{j},X)$.
If the kernel ${\cal{I}}_{j}$ of an \textit{LR}$(0)$ canonical set is
given then
we construct the propagated and spontaneously generated lookahead
symbols
for items of $\textit{read}({\cal{K}}_{j},X)$ by the following
algorithm.
For all items
$\left[ B \rightarrow \gamma .\delta \right] \in {\cal{I}}_{j}$
we construct the set
${\cal{K}}_{j} = \textit{closure}(\left[ B \rightarrow \gamma .\delta ,
@ \right])$,
where
$@$ is a dummy symbol,
\begin{itemize}
\item if
$\left[ A \rightarrow \alpha .X\beta , a \right] \in {\cal{K}}_{j}$ and
$a \neq @$
then
$\left[ A \rightarrow \alpha X.\beta , a \right] \in
\textit{read}({\cal{K}}_{j},X)$
and the symbol $a$ is spontaneously generated into the item of
the set $\textit{read}({\cal{K}}_{j},X)$,
\item if
$\left[ A \rightarrow \alpha .X\beta , @ \right] \in {\cal{K}}_{j}$
then
$\left[ A \rightarrow \alpha X.\beta , @ \right] \in
\textit{read}({\cal{K}}_{j},X)$, and the symbol $@$ is propagated
from
${\cal{K}}_j$ into the item of the set
$\textit{read}({\cal{K}}_{j},X)$.
\end{itemize}
The kernel of the canonical set ${\cal{K}}_{0}$ has only one element.
The core of this element is
$\left[ S' \rightarrow .S \right]$. For this item we
can give the lookahead symbol
$\#$ directly. Since the core of the
kernel of all
${\cal{K}}_{j}$ canonical sets are given, using the above method
we can calculate all of propagated and spontaneously generated
symbols.
Give the algorithm which constructs \textit{LALR}$(1)$ canonical sets
from
\textit{LR$(0)$} canonical sets using the methods of propagation and
spontaneously generation.}
\end{fld}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{fejmegj}
The theory and practice of compilers, computers and program
languages are of the same age. The construction of first compilers date
back
to the 1950's. The task of writing compilers was a very hard task at
that time,
the first Fortran compiler took 18 man-years to implement
\cite{Aho73}. From that time more and more precise definitions and
solutions have been given to
the problems of compilation, and better and better
methods and utilities have been used in the construction of
translators.
The development of formal
languages and automata was a great leap forward, and we can say
that this development was
urged
by the demand of writing of compilers.
In our days this task is a simple routine project. New results, new
discoveries are expected in the field of code optimisation only.
One of the earliest nondeterministic and
backtrack algorithms appeared in the 1960's. The first two dynamic
programming
algorithms were the
CYK
(Cocke-Younger-Kasami)\nevindex{Cocke, J.}%
\nevindex{Younger, D. H.}\nevindex{Kasami, T.} algorithm from
1965--67 and the Earley-algorithm from 1965.
\nevindex{Earley, J.} The idea of precedence parsers
is from the end of 1970's and from the beginning of 1980's.
The \textit{LR}(k) grammars was defined by Knuth
\nevindex{Knuth, Donald Ervin} in 1965; the definition of \textit{LL}(k)
grammars is dated from the beginning of 1970's.
\textit{LALR}(1) grammars were studied
by De Remer\nevindex{De Remer, F. L.} in 1971, the
elaborating of \textit{LALR}(1) parsing methods were finished in
the beginning of 1980's
\cite{Aho86,Aho72,Aho73}.
To the middle of 1980's it became obvious that
the \textit{LR} parsing methods are the real efficient methods
and since than the \textit{LALR}(1) methods are used in compilers
\cite{Aho86}.
A lot of very excellent books deal with the theory and practice of
compiles. Perhaps the most successful of them was the book of
Gries \cite{Gries};\nevindex{Gries, David}
in this book there are interesting results for precedence grammars.
The first successful book which wrote about the new \textit{LR}
algorithms was of Aho and Ullman \cite{Aho72}, we can find here also the CYK and
the Early algorithms.
It was followed by the "dragon book" of
Aho and Ullman\cite{Aho73}; the extended and corrected issue of
it was published in 1986 by authors Aho, Ullman and Sethi \cite{Aho86}.
\nevindex{Aho, Alfred V.}\nevindex{Sethi, Ravi}\nevindex{Ullman, Jeffrey David}
Without completeness we notice the books of Fischer and LeBlanc \cite{fisch},
Tremblay and Sorenson \cite{trem},\nevindex{Fischer, C. N.}\nevindex
{LeBlanc, R. J.} \nevindex{Tremblay, Jean-Paul}\nevindex{Sorenson, Paul G.}
Waite and Goos \cite{waite}, \nevindex{Waite, William M.}
\nevindex{Goos, Gerhard} Hunter \cite{hunter}, \nevindex{Hunter, Robin} Pittman \cite
{pittman} \nevindex{Pittman, Thomas} and Mak \cite{mak}.
\nevindex{Mak, Ronald} Advanced achievements are in
recently published books, among others in the book of
Muchnick \cite{Muchnick}, \nevindex{Muchnick, Steven S.}
Grune, Bal, Jacobs and Langendoen \cite{Grune},\nevindex{Grune, Dick}\nevindex{Bal, Henri E.}
\nevindex{Jacobs, Ceriel J. H.}\nevindex{Langendoen, Koen G.} in the book of Cooper and Torczon
\cite{Cooper}\nevindex{Cooper, Keith D.}\nevindex{Torczon, Linda}
and in a chapter of the book by Louden \cite{Louden}.\nevindex{Louden, Kenneth C.}
\end{fejmegj}