p$ then this
probability converges fast to 0 as $n\to \infty $ while if $f f\ln \frac {f}{p}-f = f\ln \frac {f}{ep}\;,
\end {align}
where the inequality (useful for small $f$ and $e p < f$)
comes via $1>1-p>1-f$ and
$\ln (1-f)\geq -\frac {f}{1-f}$ from~\eqref {e.log-bounds}.
Using the concavity of logarithm, it can be shown that $D(f,p)$ is always
nonnegative, and is 0 only if $f=p$ (see Exercise~\ref {xrc.D(f,p)}).
\begin {tetel}[Large deviations for coin-toss]\label {t.large-dev}
If $f>p$ then
\begin {equation*}
\mathbf {P}\mskip 1mu\mathopen [\, S_{n} \geq f n\,\mathclose ] \leq e^{-n D(f,p)} \koz.
\end {equation*}
\end {tetel}
This theorem shows that
if $f>p$ then $\mathbf {P}\mskip 1mu\mathopen [\,S_n > f n\,\mathclose ]$ converges to 0 exponentially fast.
Inequality~\eqref {e.Kullback-small} will allow the following simplification:
\begin {equation} \label {e.small-prob}
\mathbf {P}\mskip 1mu\mathopen [\,S_n \geq f n\,\mathclose ] \leq e^{-n f\ln \frac {f}{e p}}
= {\left ( \frac {e p}{f}\right )}^{n f},
\end{equation}
useful for small $f$ and $ep1$ (to be chosen later),
let $Y_{n}$ be the random variable that is $\alpha $ if $X_{n}=1$ and $1$ if
$X_{n}=0$, and let $P_{n} = Y_{1}\dotsm Y_{n}=\alpha ^{S_{n}}$: then
\begin {equation*}
\mathbf {P}\mskip 1mu\mathopen [\, S_{n} \geq f n\,\mathclose ] =
\mathbf {P}\mskip 1mu\mathopen [\, P_{n} \geq \alpha ^{fn}\,\mathclose ] \koz .
\end {equation*}
Applying the Markov inequality~\eqref {e.Markov}
and~\eqref {e.expected-multipl}, we get
\begin {equation*}
\mathbf {P}\mskip 1mu\mathopen [\, P_{n} \geq \alpha ^{fn}\,\mathclose ] \leq \mathbf {E}\mskip 1muP_{n}/\alpha ^{fn}
= (\mathbf {E}\mskip 1muY_{1}/\alpha ^{f})^{n},
\end {equation*}
where $\mathbf {E}\mskip 1muY_{1} = p\alpha + (1-p)$.
Let us choose $\alpha = \frac {f(1-p)}{p(1-f)}$, this is $>1$ if $pp=1/2$:
\begin {align*}
2^{-n} \sum _{i \geq f n}^{n}\binom {n}{i} &=
\mathbf {P}\mskip 1mu\mathopen [\, S_{n} \geq f n\,\mathclose ] \leq e^{-n D(f,p)} = 2^{-n}e^{n h(f)},
\\ \sum _{i\geq fn}^{n}\binom {n}{i} &\leq e^{n h(f)} \koz .
\end {align*}
Substituting $g=1-f$, and noting the symmetries
$\binom {n}{f}=\binom {n}{g}$, $h(f)=h(g)$
and~\eqref {e.entr.small.p} gives~\eqref {e.binom-sum}.
\end{biz}
\begin{megj}
Inequality~\eqref {e.small-prob} also follows from the trivial estimate
$\mathbf {P}\mskip 1mu\mathopen [\, S_{n} \geq f n\,\mathclose ] \leq \binom {n}{f n} p^{f n}$ combined
with~\eqref {e.binom-bound}.
\end{megj}
\begin{gyak}
\ujgyak \label{xrc.D(f,p)}
Prove that the statement made in the main text that $D(f,p)$ is always
nonnegative, and is 0 only if $f=p.$
\ujgyak \label{xrc.Chernoff}
For $f = p + \delta $, derive from Theorem~\ref {t.large-dev} the useful bound
\begin{equation*}
\mathbf {P}\mskip 1mu\mathopen [\, S_{n} \geq f n\,\mathclose ] \leq e^{-2\delta ^{2} n} \koz .
\end{equation*}
\textit{Hint.} Let $F(x) = D(x,p)$, and use the Taylor formula:
$F(p+\delta ) = F(p) + F'(p)\delta + F''(p+\delta ')\delta ^{2}/2$, where $0\leq \delta '\leq \delta.$
\ujgyak \label {xrc.dependent}
Prove that in Theorem~\ref {t.large-dev},
the assumption that $X_{i}$ are independent and identically
distributed can be weakened: replaced by the single inequality
\[
\mathbf {P}\mskip 1mu\mathopen [\,X_i=1\mid X_{1},\dots ,X_{i-1}\,\mathclose ] \leq p \koz .
\]
\end {gyak}
\section {Logic circuits}
In a model of computation taking errors into account,
the natural assumption is that errors occur \emph {everywhere}.
The most familiar kind of computer, which is
separated into a single processor and memory,
seems extremely vulnerable under such conditions:
while the processor is not ``looking'',
noise may cause irreparable damage in the memory.
Let us therefore rather consider computation
models that are \emph {parallel}: information is being processed
everywhere in the system, not only in some distinguished places.
Then error correction can be built into the work of every part of the
system.
We will concentrate on
the best known parallel computation model: Boolean circuits.
\subsection{Boolean functions and expressions}
Let us look inside a computer, (actually inside an integrated
circuit, with a microscope).
Discouraged by a lot of physical detail irrelevant to abstract
notions of computation, we will decide to look at the blueprints of
the circuit designer, at the stage when it shows the smallest
elements of the circuit still according to their computational functions.
We will see a network of lines that can be in two states (of electric
potential), ``high'' or ``low'', or in other words ``true'' or ``false'', or, as we will write, 1
or 0. The points connected by these lines are the familiar \ki {logic components}:
at the lowest level of computation, a typical computer
processes \ki {bits}. Integers, floating-point numbers, characters are all represented as
strings of bits, and the usual arithmetical operations can be composed of bit operations.
\begin {defi}
A \ki {Boolean vector function} is a mapping $f:\{0,1\}^{n}\to \{0,1\}^{m}$.
Most of the time, we will take $m=1$ and speak
of a \ki {Boolean function}.
\end {defi}
The variables in $f(x_{1},\ldots ,x_{n})$ are sometimes called \ki {Boolean
variables}, \ki {Boolean variables} or \ki {bits}.
\begin {pelda}
Given an undirected graph $G$ with $N$ nodes, suppose we want to
study the question whether it has a Hamiltonian cycle
(a sequence $(u_{1},\dots ,u_{n})$ listing all vertices of $G$ such
that $(u_{i},u_{i+1})$ is an edge for each $i0$, upperbound the number of Boolean circuits with
at most $c 2^{n}/n$ nodes and compare it with the number of Boolean
functions over $n$ variables.]
\ujgyak \label {xrc.fooling-maj}
Consider a circuit $\mathcal {M}_{3}^{r}$ with $3^{r}$ inputs, whose single
output bit is computed from the inputs by $r$ levels of 3-input majority gates.
Show that there is an input vector $\boldsymbol {x}$ which is 1 in only
$n^{1/\log 3}$ positions but with which $\mathcal {M}_{3}^{r}$ outputs 1.
Thus a small minority of the inputs, when cleverly arranged, can command
the result of this circuit.
\end {gyak}
\section{Expensive fault-tolerance in Boolean circuits}
Let $\mathcal {N}$ be a Boolean circuit as given in Definition~\ref {d.logic-circuit}.
When noise is allowed then the values
\begin {equation*}
y_{v} = \mathrm {val}_{\boldsymbol {x}}(v)
\end {equation*}
will not be determined by the formula~\eqref {e.gate-assn} anymore.
Instead, they will be random variables $Y_{v}$.
The random assignment $(Y_v : v\in V)$ will be called a \ki {random configuration}.
\begin {defi}
At vertex $v$, let
\begin {equation}\label {e.failure}
Z_v = b_v( Y_{\arg _1(v)}, \ldots , Y_{\arg _k(v)} ) \oplus Y_v \koz .
\end {equation}
\begin{figure}
\begin{equation*}
\includegraphics {figs/gate-fail.eps}
\end{equation*}
\caption {\label {f.gate-fail} Failure at a gate.}
\end{figure}
In other words, $Z_v=1$ if gate $Y_v$ is not equal to the value
computed by the noise-free gate $b_v$ from its inputs $Y_{\arg _j(v)}$.
(See Figure~\ref {f.gate-fail}.)
The set of vertices where $Z_{v}$ is non-zero is the set of \ki {faults}.
Let us call the difference $\mathrm {val}_{\boldsymbol {x}}(v)\oplus Y_v$ the \ki {deviation} at
node $v$.
\end{defi}
Let us impose conditions on the kind of noise that will be allowed.
Each fault should occur only with probability at most $\epsilon $,
two specific faults should only occur with probability at most $\epsilon ^{2}$, and so
on.
\begin{defi}\label {d.admissible}
For an $\epsilon >0$,
let us say that the random configuration $(Y_v : v\in V)$
is \ki {$\epsilon $-admissible} if
\begin{description}
\item[\rm{(a)}] $Y_{\mathrm {inp}(i)}=x_i$ for $i=1,\dots ,n$.
\item[\rm{(b)}] For every set $C$ of non-input nodes, we have
\begin{equation}\label {e.admissible}
\mathbf{P}\mskip 1mu\mathopen [\,Z_v=1 \text {\upshape { for all }} v\in C\,\mathclose ]\leq \epsilon ^{|C|}\koz.
\end{equation}
\end{description}
\end{defi}
In other words, in an $\epsilon $-admissible random configuration,
the probability of having faults at $k$ different specific gates is
at most $\epsilon ^{k}$.
This is how we require that not only is the fault probability low
but also, faults do not ``conspire''.
The admissibility condition is satisfied if faults occur independently with
probability $\leq \epsilon.$
Our goal is to build a circuit that will work correctly,
with high probability, despite the ever-present noise:
in other words, in which errors \emph {do not accumulate}.
This concept is formalized below.
\begin{defi}
We say that the circuit
$\mathcal {N}$ with output node $w$ is $(\epsilon ,\delta )$-\ki {resilient} if for all inputs
$\boldsymbol {x}$, all $\epsilon $-admissible configurations $\mathbf {Y}$, we have
$\mathbf {P}\mskip 1mu\mathopen [\,Y_{w}\neq \mathrm {val}_{\boldsymbol {x}}(w)\,\mathclose ] \leq \delta $.
\end{defi}
Let us explore this concept.
There is no $(\epsilon ,\delta )$-resilient circuit with $\delta <\epsilon $, since
even the last gate can fail with probability $\epsilon $.
So, let us, a little more generously, allow $\delta >2\epsilon $.
Clearly, for each circuit $\mathcal {N}$ and for each $\delta >0$ we can
choose $\epsilon $ small enough so that $\mathcal {N}$ is $(\epsilon ,\delta )$-resilient.
But this is not what we are after: hopefully, one does not need
more reliable gates every time one builds a larger circuit.
So, we hope to find a function
\begin {equation*}
F(N,\delta )
\end {equation*}
and an $\epsilon _{0}>0$ with the property that for all $\epsilon <\epsilon _{0}$,
$\delta \geq 2\epsilon $,
every Boolean circuit $\mathcal {N}$ of size
$N$ there is some $(\epsilon ,\delta )$-resilient circuit $\mathcal {N}'$
of size $F(N,\delta )$ computing the same function as $\mathcal {N}$.
If we achieve this then we can say that we prevented the accumulation
of errors.
Of course, we want to make $F(N,\delta )$ relatively small,
and $\epsilon _{0}$ large (allowing more noise).
The function $F(N,\delta )/N$ can be called the \ki {redundancy}: the factor by
which we need to increase the size of the circuit to make it resilient.
Note that the problem is nontrivial even with, say, $\delta =1/3$.
Unless the accumulation of errors is prevented we will
lose gradually all information about the desired output, and no $\delta <1/2$
could be guaranteed.
How can we correct errors?
A simple idea is this: do ``everything'' 3 times and then
continue with the result obtained by majority vote.
\begin{defi} For odd natural number $d$, a
$d$-input majority gate is a Boolean function that
outputs the value equal to the majority of its inputs.
\end{defi}
Note that a $d$-input majority can be computed using
$O(d)$ gates of type AND and NOT.
Why should majority voting help? The following \emph {informal discussion} helps understanding the benefits
and pitfalls. Suppose for a moment that the output is a single bit.
If the probability of each of the three
independently computed results failing is
$\delta $ then the probability that at least $2$ of them fails is bounded by
$3\delta ^{2}$.
Since the majority vote itself can fail with some
probability $\epsilon $ the total
probability of failure is bounded by $3\delta ^{2}+\epsilon $.
We decrease the probability $\delta $ of failure, provided the
condition $3\delta ^{2}+\epsilon <\delta $ holds.
We found that if $\delta $ is small, then repetition and majority vote can
``make it'' smaller.
Of course, in order to keep the error probability from accumulating,
we would have to perform this majority operation repeatedly.
Suppose, for example, that our computation has $t$ stages.
Our bound on the probability of faulty output after stage $i$ is
$\delta _{i}$.
We plan to perform the majority operation after each stage $i$.
Let us perform stage $i$ three times.
The probability of failure is now bounded by
\begin {equation}\label {e.accum}
\delta _{i+1} = \delta _{i}+3\delta ^{2}+\epsilon \koz .
\end {equation}
Here, the error probabilities of the different stages accumulate, and
even if $3\delta ^{2}+\epsilon <\delta $ we only get a bound $\delta _{t}< (t-1)\delta $.
So, this strategy will not work for arbitrarily large computations.
Here is a somewhat mad idea to avoid accumulation: repeat
\emph{everything} before the end
of stage $i$ three times, not only stage $i$ itself.
In this case, the growing bound~\eqref {e.accum} would be replaced with
\begin {equation*}
\delta _{i+1} = 3(\delta _{i}+\delta )^{2}+\epsilon \koz .
\end {equation*}
Now if $\delta _{i}<\delta $ and $12\delta ^{2}+\epsilon <\delta $ then also $\delta _{i+1}<\delta $, so
errors do not accumulate.
But we paid an enormous price: the fault-tolerant version of
the computation reaching stage $(i+1)$ is
3 times larger than the one reaching stage $i$.
To make $t$ stages fault-tolerant this way will cost a factor of
$3^{t}$ in size.
This way, the function $F(N,\delta )$ introduced above may become
exponential in $N$.
The theorem below formalizes the above discussion.
\begin {tetel}\label {t.expensive}
Let $R$ be a finite and complete basis for Boolean functions.
If $2\epsilon \leq \delta \leq 0.01$ then
every function can be computed by an $(\epsilon ,\delta )$-resilient circuit over
$R$.
\end {tetel}
\begin {proof}
For simplicity, we will prove the result for a complete basis that contains
the three-argument majority function and contains not functions with more
than three arguments.
We also assume that faults occur independently.
Let $\mathcal {N}$ be a noise-free circuit of depth $t$ computing function $f$.
We will prove that there is an $(\epsilon ,\delta )$-resilient
circuit $\mathcal {N}'$ of depth $2 t$ computing $f$.
The proof is by induction on $t$.
The sufficient conditions on $\epsilon $ and $\delta $ will emerge from the
proof.
The statement is certainly true for $t=1$, so suppose $t>1$.
Let $g$ be the output gate of the circuit $\mathcal {N}$,
then $f(\boldsymbol {x})=g(f_1(\boldsymbol {x}),f_2(\boldsymbol {x}),f_3(\boldsymbol {x}))$.
The subcircuits $\mathcal {N}_i$ computing the functions $f_i$ have depth $\leq t-1$.
By the inductive assumption, there exist $(\epsilon ,\delta )$-resilient
circuits $\mathcal {N}'_i$ of depth $\leq 2t-2$ that compute $f_i$.
Let $\mathcal {M}$ be a new circuit containing copies of the circuits
$\mathcal {N}'_i$ (with the corresponding input nodes merged), with a new
node in which $f(\boldsymbol {x})$ is computed as $g$ is applied to the outputs of
$\mathcal {N}'_i$.
Then the probability of error of $\mathcal {M}$ is at most $3\delta +\epsilon <4\delta $
if $\epsilon <\delta $ since each circuit $\mathcal {N}'_i$ can err with probability
$\delta $ and the node with gate $g$ can fail with probability $\epsilon $.
Let us now form $\mathcal {N}'$ by taking three copies of $\mathcal {M}$ (with the
inputs merged) and adding a new node computing the majority of the
outputs of these three copies.
The error probability of $\mathcal {N}'$ is at most
$3(4\delta )^2+\epsilon = 48\delta ^{2} + \epsilon $.
Indeed, error will be due to either a fault at the majority gate
or an error in at least two of the three independent copies of $\mathcal {M}$.
So under condition
\begin {equation}\label {e.delta-cond}
48\delta ^{2} + \epsilon \leq \delta \koz ,
\end {equation}
the circuit $\mathcal {N}'$ is $(\epsilon ,\delta )$-resilient.
This condition will be satisfied by $2\epsilon \leq \delta \leq 0.01$.
\end{proof}
The circuit $\mathcal {N}'$ constructed in the proof above is at
least $3^{t}$ times larger than $\mathcal {N}$.
So, the redundancy is enormous.
Fortunately, we will see a much more economical solution.
But there are interesting circuits with small depth, for which the $3^{t}$
factor is not extravagant.
\begin {tetel}\label {t.resilient-majority}
Over the complete basis consisting of all 3-argument Boolean functions,
for all sufficiently small $\epsilon >0$, if $2\epsilon \leq \delta \leq 0.01$ then
for each $n$ there is an $(\epsilon ,\delta )$-resilient
Boolean circuit of input size $n$,
depth $\leq 4\log (n+1)$ and size $(n+1)^{7}$ outputting a near-majority
(as given in Definition~\ref {d.near-maj}).
\end {tetel}
\begin{proof}
Apply Theorem~\ref {t.expensive} to the circuit from part~\eqref {i.add} of Theorem~\ref {t.addition}:
it gives a new, $4\log (n+1)$-deep $(\epsilon ,\delta )$-resilient circuit computing a near-majority.
The size of any such circuit with 3-input gates is at most
$3^{4\log (n+1)}=(n+1)^{4\log 3} < (n+1)^{7}.$
\end {proof}
\begin {gyak}
\ujgyak \label {xrc.iter-maj-converg}
Exercise~\ref {xrc.fooling-maj}
suggests that the iterated majority vote $\mathcal {M}_{3}^{r}$ is not safe
against manipulation. However, it works very well under some circumstances.
Let the input to $\mathcal {M}_{3}^{r}$ be a vector
$\boldsymbol {X}= (X_{1},\dots ,X_{n})$ of independent Boolean random
variables with $\mathbf {P}\mskip 1mu\mathopen [\,X_{i}=1\,\mathclose ]=p<1/6$.
Denote the (random) output bit of the circuit by $Z$.
Assuming that our majority gates can fail with probability $\leq \epsilon \leq p/2$
independently, prove
\begin {equation*}
\mathbf {P}\mskip 1mu\mathopen [\,Z=1\,\mathclose ] \leq \max \{10\epsilon ,\; 0.3(p/0.3)^{2^{k}}\} \koz .
\end {equation*}
\textit{Hint.} Define $g(p) = \epsilon + 3p^2$,
$g_{0}(p) = p$, $g_{i+1}(p) = g(g_{i}(p))$, and
prove $\mathbf {P}\mskip 1mu\mathopen [\,Z=1\,\mathclose ] \leq g_{r}(p)$. ]
\ujgyak \label {xrc.dobr-ort-lb}
We say that a circuit $\mathcal {N}$ computes the function $f(x_{1},\dots ,x_{n})$
in an $(\epsilon ,\delta )$-\ki {input-robust} way, if the following holds:
For any input vector $\boldsymbol {x}=(x_{1},\dots ,x_{n})$, for any
vector $\boldsymbol {X}= (X_{1},\dots ,X_{n})$ of
independent Boolean random variables
``perturbing it'' in the sense
$\mathbf {P}\mskip 1mu\mathopen [\,X_{i}\neq x_{i}\,\mathclose ]\leq \epsilon $, for the output $Y$ of
circuit $\mathcal {N}$ on input $\boldsymbol {X}$
we have $\mathbf {P}\mskip 1mu\mathopen [\,Y=f(\boldsymbol {x})\,\mathclose ]\geq 1-\delta $.
Show that if the function $x_{1}\oplus \dots \oplus x_{n}$ is computable on
an $(\epsilon ,1/4)$-input-robust circuit then $\epsilon \leq 1/n$.
\end {gyak}
\section {Safeguarding intermediate results}\label {s.cables}
In this section, we will see ways to introduce
fault-tolerance that scale up better. Namely, we will show:
\begin {tetel}\label {t.rel-bool}
There are constants $R_{0},\epsilon _{0}$ such that for
\begin {equation*}
F(n,\delta ) = N \log (n/\delta ) \koz ,
\end {equation*}
for all $\epsilon <\epsilon _{0}$, $\delta \geq 3\epsilon $,
for every deterministic computation of size
$N$ there is an $(\epsilon ,\delta )$-resilient
computation of size $R_{0} F(N,\delta )$ with the same result.
\end {tetel}
Let us introduce a concept that will simplify the error analysis
of our circuits, making it independent of the input vector $x$.
\begin{defi}
In a Boolean circuit $\mathcal {N}$, let us call a majority gate at a node $v$ a
\ki {correcting majority gate} if for every input vector $\boldsymbol {x}$ of $\mathcal {N}$,
all input wires of node $v$ have the same value.
Consider a computation of such a circuit $\mathcal {N}$.
This computation will make some nodes and wires of $\mathcal {N}$
\ki {tainted}.
We define taintedness by the following rules:
\begin{description}
\item[\rm{--}] The input nodes are untainted.
\item[\rm{--}] If a node is tainted then all of its output wires are tainted.
\item[\rm{--}] A correcting majority gate is tainted if either it fails or
a majority of its inputs are tainted.
\item[\rm{--}] Any other gate is tainted if either it fails or one of its inputs
is tainted.
\end{description}
\end{defi}
Clearly, if for all $\epsilon $-admissible random configurations
the output is tainted with probability $\leq \delta $ then the
circuit is $(\epsilon ,\delta )$-resilient.
\subsection{Cables}
So far, we have only made use of redundancy
idea~\eqref {i.repeat} of the introduction to the present chapter: repeating
computation steps. Let us now try to use idea~\eqref {i.code} (keeping information in redundant
form) in Boolean circuits.
To protect information traveling from gate to gate, we replace each wire of
the noiseless circuit by a ``cable'' of $k$ wires (where $k$ will be
chosen appropriately). Each wire within the cable is supposed to carry the same bit of
information, and we hope that a majority will carry this bit even if some of the wires fail.
\begin{defi}
In a Boolean circuit $\mathcal {N}'$, a
certain set of edges is allowed to be called a \ki {cable} if in a
noise-free computation of this circuit, each edge carries the same Boolean
value. The \ki {width} of the cable is its number of elements.
Let us fix an appropriate constant threshold $\vartheta $.
Consider any possible computation of the noisy version of the circuit
$\mathcal {N}'$, and a cable of width $k$ in $\mathcal {N}'$.
This cable will be called \ki {$\vartheta $-safe} if at most $\vartheta k$
of its wires are tainted.
\end{defi}
\begin {figure}
\begin {equation*}
\includegraphics {figs/cables.eps}
\end {equation*}
\caption {An executive organ.}
\end {figure}
Let us take a Boolean circuit $\mathcal {N}$ that we want to make resilient.
As we replace wires of $\mathcal {N}$ with cables of $\mathcal {N}'$ containing $k$ wires
each, we will replace each noiseless 2-argument gate at a node
$v$ by a module called the \ki {executive organ} of $k$ gates, which
for each $i=1,\ldots ,k$, passes the $i$th wire both incoming cables
into the $i$th node of the organ.
Each of these nodes contains a gate of one and the same type $b_{v}$.
The wires emerging from these nodes form the \ki {output cable} of the
executive organ.
The number of tainted wires in this output cable may become too high:
indeed, if there were $\vartheta k$ tainted wires in the $x$ cable and also in the $y$ cable
then there could be as many as $2\vartheta k$ such wires in the $g(x,y)$ cable
(not even counting the possible new taints added by faults in the
executive organ). The crucial part of the construction is to attach to the executive
organ a so-called \ki {restoring organ}: a
module intended to decrease the taint in a cable.
\begin{figure}
\begin {equation*}
\includegraphics {figs/restoring.eps}
\end {equation*}
\caption {A restoring organ.}
\end {figure}
\subsection{Compressors}
How to build a restoring organ?
Keeping in mind that this organ itself must also work in noise,
one solution is to build (for an approriate $\delta '$)
a special $(\epsilon ,\delta ')$-resilient
circuit that computes the near-majority of its
$k$ inputs in $k$ independent copies.
Theorem~\ref {t.resilient-majority} provides a circuit of size
$k(k+1)^{7}$ to do this.
It turns out that, at least asymptotically, there is a better solution.
We will look for a \emph {very simple} restoring organ: one whose own
noise we can analyse easily.
What could be simpler than a circuit having only \emph {one level} of gates?
We fix an odd positive integer constant $d$ (for example, $d=3$).
Each gate of our organ will be a $d$-input majority gate.
\begin{defi}
A \ki{multigraph} is a graph in which between any two vertices there may be
several edges, not just 0 or 1. Let us call a bipartite
multigraph with $k$ inputs and $k$ outputs, $d$-\ki{half-regular.}
if each output node has degree $d$. Such a graph is a \ki {$(d,\alpha ,\gamma ,k)$-compressor} if it
has the following property: for every set $E$ of at most $\leq \alpha k$
inputs, the number of those output points
connected to at least $d/2$ elements of $E$ (with multiplicity)
is at most $\gamma \alpha k$.
\end {defi}
The compressor property is interesting generally when $\gamma <1$.
For example, in an $(5, 0.1, 0.5, k)$-compressor the outputs have
degree $5$, and the majority operation in these nodes
decreases every error set confined to
10\% of all input to just 5\% of all outputs.
A compressor with the right parameters could serve as our
restoring organ: it decreases a minority to a smaller minority and may in
this way restore the safety of a cable.
But, are there compressors?
\begin{tetel}\label{t.compr}
For all $\gamma <1$, all integers $d$ with
\begin {equation}\label {e.compr.proportion}
1 < \gamma (d-1)/2 \koz ,
\end {equation}
there is an $\alpha $ such that for all integer $k>0$
there is a $(d, \alpha , \gamma , k)$-compressor.
\end {tetel}
Note that for $d=3$, the theorem does not guarantee a compressor
with $\gamma <1$.
\begin {biz}
We will not give an explicit construction for the multigraph, we will just
show that it exists.
We will select a $d$-half-regular multigraph randomly
(each such multigraph with the same probability), and show that it will be
a $(d, \alpha , \gamma ,k)$-compressor with positive probability.
This proof method is called the \ki {probabilistic method}.
Let
\begin {equation*}
s={\lfloor d/2\rfloor } \koz .
\end {equation*}
Our construction will be somewhat more general, allowing $k'\neq k$ outputs.
Let us generate a random bipartite $d$-half-regular
multigraph with $k$ inputs and $k'$ outputs in the following way.
To each output, we draw edges from $d$ random input nodes chosen
independently and with uniform distribution over all inputs.
Let $A$ be an input set of size $\alpha k$, let $v$ be an
output node and let $E_v$ be the event that $v$ has $s+1$ or more edges
from $A$. Then we have
\[
\mathbf {P}\mskip 1mu(E_v) \leq \binom {d}{s+1}\alpha ^{s+1}
= \binom {d}{s}\alpha ^{s+1} =: p \koz .
\]
On the average (in expected value), the event $E_{v}$ will occur for $p k'$
different output nodes $v$.
For an input set $A$, let $F_A$ be the event that the set of nodes $v$
for which $E_v$ holds has size $>\gamma \alpha k'$.
By inequality~\eqref {e.small-prob} we have
\[
\mathbf {P}\mskip 1mu(F_{A}) \leq {\left ( \frac {e p}{\gamma \alpha }\right )}^{k'\gamma \alpha } \koz .
\]
The number $M$ of sets $A$ of inputs with $\leq \alpha k$ elements
is, using inequality~\eqref {e.entr.small.p},
\[
M \leq \sum _{i \leq \alpha k}\binom {k}{i}
\leq {\left ( \frac {e}{\alpha }\right )}^{\alpha k} \koz .
\]
The probability that our random graph is not a compressor is at
most as large as the probability that there is at least one input set
$A$ for which event $F_A$ holds.
This can be bounded by
\[
M\cdot \mathbf {P}\mskip 1mu(F_A) \leq e^{-\alpha Dk'}
\]
where
\begin{align*}
D &= -(\gamma s-k/k')\ln \alpha -
\gamma {\bigl (\,\ln {\scriptstyle \binom {d}{s}} - \ln {\gamma }+1\,\bigr )} - k/k' \koz.
\end{align*}
As we decrease $\alpha$ the first term of this expression dominates. Its
coefficient is positive according to the
assumption~\eqref{e.compr.proportion}. We will have $D>0$ if
\begin{align*}
\alpha < \mathrm {exp}{\left (
-\frac {\gamma {\bigl (\,\ln {\scriptstyle \binom {d}{s}} - \ln {\gamma }+1\,\bigr )} + k/k'}
{\gamma s- k/k'}\right )} \koz .
\end {align*}
\end {biz}
\begin{pelda}\label {x.compressor-1}
Choosing $\gamma =0.4$, $d=7$, the value $\alpha = 10^{-7}$ will work.
\end{pelda}
We turn a $(d,\alpha ,\gamma ,k)$-compressor into a restoring organ
$\mathcal {R}$, by placing $d$-input majority gates into its outputs.
If the majority elements sometimes
fail then the output of $\mathcal {R}$ is random.
Assume that at most $\alpha k$ inputs of $\mathcal {R}$ are tainted.
Then $(\gamma +\rho )\alpha k$ outputs can only be tainted
if $\alpha \rho k$ majority gates fail.
Let
\begin {equation*}
p_{\mathcal {R}}
\end {equation*}
be the probability of this event.
Assuming that the gates of $\mathcal {R}$
fail independently with probability $\leq \epsilon $,
inequality~\eqref {e.small-prob} gives
\begin{equation}\label {e.imperfect}
p_{\mathcal {R}} \leq {\left ( \frac {e\epsilon }{\alpha \rho }\right )}^{\alpha \rho k} \koz .
\end{equation}
\begin{pelda}\label {x.imperfect-use-1}
Choose $\gamma =0.4$, $d=7$, $\alpha =10^{-7}$
as in Example~\ref {x.compressor-1},
further $\rho = 0.14$ (this will satisfy the inequality~\eqref {e.rho-ineq} needed later).
With $\epsilon =10^{-9}$, we get $p_{\mathcal {R}}\leq e^{-10^{-8}k}$.
The attractively small degree $d=7$ led to an extremely
unattractive probability bound on the failure of the whole compressor.
This bound does decrease exponentially with cable width $k$, but only
an extremely large $k$ would make it small.
\end{pelda}
\begin{pelda}\label {x.imperfect-use-2}
Choosing again $\gamma =0.4$, but $d=41$ (voting in each gate of the
compressor over 41 wires instead of 7),
leads to somewhat more realistic results.
This choice allows $\alpha =0.15$.
With $\rho =0.14$, $\epsilon =10^{-9}$ again, we get $p_{\mathcal {R}} \leq e^{-0.32k}$.
These numbers look less frightening, but we will still need many
scores of wires in the cable to drive down the probability of compression
failure.
And although in practice our computing components fail with frequency much
less than $10^{-9}$, we may want to look at the largest $\epsilon $ that still
can be tolerated.
\end{pelda}
\begin{figure}
\begin{equation*}
\includegraphics {figs/supergate.eps}
\end {equation*}
\caption {\label {f.supergate}
An executive organ followed by a restoring organ.}
\end{figure}
\subsection{Propagating safety}
Compressors allow us to construct a reliable
Boolean circuit all of whose cables are safe.
\begin{defi}\label{d.Cab}
Given a Boolean circuit $\mathcal {N}$ with a single bit of output (for simplicity),
a cable width $k$ and a Boolean circuit $\mathcal {R}$ with $k$ inputs and $k$
outputs, let
\begin {equation*}
\mathcal {N}' = \mathrm {Cab}(\mathcal {N},\mathcal {R})
\end {equation*}
be the Boolean circuit that we obtain as follows.
The input nodes of $\mathcal {N}'$ are the same as those of $\mathcal {N}$.
We replace each wire of $\mathcal {N}$ with a cable of width $k$, and each gate
of $\mathcal {N}$ with an executive organ followed by a restoring organ that is a
copy of the circuit $\mathcal {R}$.
The new circuit has $k$ outputs: the outputs of the restoring organ of
$\mathcal {N}'$ belonging to the last gate of $\mathcal {N}$.
\end{defi}
In noise-free computations, on every input, the output of $\mathcal {N}'$ is the
same as the output of $\mathcal {N}$, but in $k$ identical copies.
\begin{lemma}\label{l.propagate}
There are constants $d,\epsilon _{0},\vartheta ,\rho >0$
and for every cable width $k$ a circuit $\mathcal {R}$ of size $2k$ and gate size
$\leq d$ with the following property.
For every Boolean circuit $\mathcal {N}$ of gate size
$\leq 2$ and number of nodes $N$, for every $\epsilon <\epsilon _{0}$, for every
$\epsilon $-admissible configuration of $\mathcal {N}' = \mathrm {Cab}(\mathcal {N},\mathcal {R})$, the probability that
not every cable of $\mathcal {N}'$ is $\vartheta $-safe is
$< 2N(\frac {e\epsilon }{\vartheta \rho })^{\vartheta \rho k}$.
\end{lemma}
\begin{biz}
We know that there are $d,\alpha $ and $\gamma <1/2$ with the property that for
all $k$ a $(d, \alpha , \gamma , k)$-compressor exists.
Let $\rho $ be chosen to satisfy
\begin{equation}\label {e.rho-ineq}
\gamma (2+\rho ) + \rho \leq 1 \koz,
\end{equation}
and define
\begin{equation}\label {e.thg-def}
\vartheta =\alpha /(2+\rho) \koz.
\end{equation}
Let $\mathcal {R}$ be a restoring organ built from a
$(d, \alpha , \gamma , k)$-compressor.
Consider a gate $v$ of circuit $\mathcal {N}$, and the corresponding executive organ
and restoring organ in $\mathcal {N}'$.
Let us estimate the probability of the event $E_{v}$
that the input cables of this combined organ are
$\vartheta $-safe but its output cable is not.
Assume that the two incoming cables are safe: then at most $2\vartheta k$ of
the outputs of the executive organ
are tainted due to the incoming cables: new taint can still occur
due to failures.
Let $E_{v1}$ be the event that the executive organ taints at least
$\rho \vartheta k$ more of these outputs.
Then $\mathbf {P}\mskip 1mu(E_{v1}) \leq (\frac {e \epsilon }{\rho \vartheta })^{\rho \vartheta k}$,
using the estimate~\eqref {e.imperfect}.
The outputs of the executive organ are the inputs of the restoring organ.
If no more than $(2+\rho )\vartheta k=\alpha k$ of these are tainted
then, in case the organ operates perfectly, it would decrease the number of
tainted wires to $\gamma (2+\rho )\vartheta k$.
Let $E_{v2}$ be the event that the restoring organ taints an additional
$\rho \vartheta k$ of these wires.
Then again, $\mathbf {P}\mskip 1mu(E_{v2}) \leq (\frac {e \epsilon }{\rho \vartheta })^{\rho \vartheta k}$.
If neither $E_{v1}$ nor $E_{v2}$ occur then at most
$\gamma (2+\rho )\vartheta k+ \rho \vartheta k\leq \vartheta k$ (see~\eqref {e.rho-ineq})
tainted wires emerge from the
restoring organ, so the outgoing cable is safe.
Therefore $E_{v}\subset E_{v1}\cup E_{v2}$ and hence
$\mathbf {P}\mskip 1mu(E_{v})\leq 2(\frac {e \epsilon }{\rho \vartheta })^{\rho \vartheta k}$.
Let $V=\{1,\dots ,N\}$ be the nodes of the circuit $\mathcal {N}$.
Since the incoming cables of the whole circuit $\mathcal {N}'$ are safe, the event
that there is some cable that is not safe is contained in
$E_{1}\cup E_{2}\cup \dots \cup E_{N}$; hence the probability is bounded by
$2 N (\frac {e \epsilon }{\rho \vartheta })^{\rho \vartheta k}$.
\end{biz}
\subsection{Endgame}
\begin{figure}
\begin{equation*}
\includegraphics{figs/neumann.eps}
\end{equation*}
\caption {Reliable circuit from a fault-free circuit.}
\end{figure}
\begin{biz}[Proof of Theorem~\protect \ref {t.rel-bool}]
We will prove the theorem only for the case when our computation is a
Boolean circuit with a single bit of output.
The generalization with more bits of output is straightforward.
The proof of Lemma~\ref {l.propagate} gives us a circuit $\mathcal {N}'$ whose
output cable is safe except for an event of probability
$<2 N (\frac {e \epsilon }{\rho \vartheta })^{\rho \vartheta k}$.
Let us choose $k$ in such a way that this becomes $\leq \delta /3$:
\begin{equation}\label {e.cable-size}
k\geq \frac {\log (6N/\delta )}{\rho \vartheta \log \frac {\rho \vartheta }{e \epsilon _{0}}} \koz .
\end{equation}
It remains to add a little circuit to this output cable to
extract from it the majority reliably.
This can be done using Theorem~\ref{t.resilient-majority}, adding a small
extra circuit of size $(k+1)^{7}$ that can be called the \ki {coda}
to $\mathcal {N}'$.
Let us call the resulting circuit $\mathcal {N}''$.
The probability that the output cable is unsafe is $<\delta /3$.
The probability that the output cable is safe but the ``coda'' circuit
fails is bounded by $2\epsilon $.
So, the probability that $\mathcal {N}''$ fails is
$\leq 2\epsilon + \delta /3 \leq \delta $, by the assumption $\delta \geq 3\epsilon $.
Let us estimate the size of $\mathcal {N}''$.
By~\eqref {e.cable-size}, we can choose cable width $k= O(\log (N/\delta ))$.
We have $|\mathcal {N}'|\leq 2kN$, hence
\begin{equation*}
|\mathcal {N}''|\leq 2kN + (k+1)^{7} = O(N \log (N/\delta )).
\end{equation*}
\end{biz}
\begin{pelda}\label{x.endgame}
Take the constants of Example~\ref{x.imperfect-use-2}, with
$\vartheta$ defined in equation~\eqref{e.thg-def}:
then $\epsilon_{0}=10^{-9}$, $d=41$, $\gamma =0.4$, $\rho =0.14$, $\alpha =0.15$,
$\vartheta =0.07$, giving
\begin{equation*}
\frac{1}{\rho \vartheta \ln \frac {\rho \vartheta }{e \epsilon _{0}}} \approx 6.75 \koz,
\end{equation*}
so making $k$ as small as possible (ignoring that it must be integer), we
get $k\approx 6.75\ln (N/\delta )$.
With $\delta =10^{-8}$, $N=10^{12}$ this allows $k=323$.
In addition to this truly unpleasant cable size,
the size of the ``coda'' circuit is
$(k+1)^{7}\approx 4\cdot 10^{17}$, which dominates the size of the rest of
$\mathcal {N}''$ (though as $N\to \infty $ it becomes asymptotically negligible).
\end{pelda}
As Example~\ref {x.endgame} shows, the actual price in redundancy computable
from the proof is unacceptable in practice.
The redundancy $O(\lg (N/\delta ))$ sounds good, since it is only logarithmic
in the size of the computation, and by choosing a rather large majority
gate (41 inputs), the factor $6.75$ in the $O(\cdot)$ here also does not
look bad; still, we do not expect the final price of reliability to be this high.
How much can this redundancy improved by optimization or other methods?
Problem~\ref{xrc.dobr-restoring} shows that in a slightly more restricted
error model (all faults are independent and have the same probability),
with more randomization, better constants can be achieved.
Exercises~\ref {xrc.iter-endgame}, \ref {xrc.compr-compr}
and~\ref{xrc.dobr-endgame} are concerned with an improved construction for
the ``coda'' circuit.
Exercise~\ref{xrc.no-coda} shows that the coda circuit can be omitted
completely. But none of these improvements bring redundancy to acceptable level.
Even aside from the discomfort caused by their random choice (this can be
helped), concentrators themselves are rather large and unwieldy.
The problem is probably with using circuits as a model for computation.
There is no natural way to break up a general circuit into subunits of
non-constant size in order to deal with the reliability problem in
modular style.
\subsection{The construction of compressors}
This subsection is sketchier than the preceding ones, and
assumes some knowledge of linear algebra.
We have shown that compressors exist.
How expensive is it to find a $(d,\alpha ,\gamma ,k)$-compressor, say, with
$d=41$, $\alpha =0.15$, $\gamma =0.4$, as in Example~\ref {x.imperfect-use-2}?
In a deterministic algorithm, we could search through all the approximately
$d^{k}$ $d$-half-regular bipartite graphs.
For each of these, we could check all possible input sets of size
$\leq \alpha k$: as we know, their number is $\leq (e/\alpha )^{\alpha k}<2^{k}$.
The cost of checking each subset is $O(k)$,
so the total number of operations is $O(k(2d)^{k})$.
Though this number is exponential in $k$,
recall that in our error-correcting
construction, $k=O(\log (N/\delta ))$ for the size $N$ of the noiseless
circuit: therefore the total number of operations needed
to find a compressor is polynomial in $N$.
The proof of Theorem~\ref {t.compr} shows that a randomly chosen
$d$-half-regular bipartite graph is
a compressor with large probability.
Therefore there is a faster, randomized algorithm for finding a compressor.
Pick a random $d$-half-regular
bipartite graph, check if it is a compressor: if it is not, repeat.
We will be done in a constant expected number of repetititons.
This is a faster algorithm, but is still exponential in $k$, since
each checking takes $\Omega (k(e/\alpha )^{\alpha k})$ operations.
Is it possible to construct a compressor explicitly, avoiding
any search that takes exponential time in $k$?
The answer is yes.
We will show here only, however, that the
compressor property is implied by a certain property involving linear
algebra, which can be checked in polynomial time.
Certain explicitly constructed
graphs are known that possess this property.
These are generally sought after not so much for their compressor property
as for their \emph{expander} property\index{expander property} (see the section on reliable storage).
For vectors $\boldsymbol {v},\boldsymbol {w}$, let $(\boldsymbol {v},\boldsymbol {w})$ denote their inner product.
A $d$-half-regular bipartite multigraph with $2k$ nodes
can be defined by an \ki {incidence matrix} $\boldsymbol {M}=(m_{ij})$, where
$m_{ij}$ is the number of edges connecting input $j$ to output $i$.
Let $\boldsymbol {e}$ be the vector $(1,1,\ldots ,1)^{T}$.
Then $\boldsymbol {M}\boldsymbol {e}=d\boldsymbol {e}$, so $\boldsymbol {e}$ is an \ki {eigenvector} of $\boldsymbol {M}$ with
\ki {eigenvalue} $d$.
Moreover, $d$ is the largest eigenvalue of $\boldsymbol {M}$.
Indeed, denoting by $|\boldsymbol {x}|_{1}=\sum _{i}|x_{i}|$ for any
row vector $\boldsymbol {x}=(x_{1},\dots ,x_{k})$, we have
$|\boldsymbol {x}\boldsymbol {M}|_{1} \leq |\boldsymbol {x}|_{1}$.
\begin{tetel}
Let $G$ be a multigraph defined by the matrix $\boldsymbol {M}$.
For all $\gamma >0$, and
\begin{equation}\label {e.first-second-eigen}
\mu < d\sqrt {\gamma }/2,
\end{equation}
there is an $\alpha >0$ such that if the second largest eigenvalue of
the matrix $\boldsymbol {M}^{T}\boldsymbol {M}$ is $\mu ^{2}$ then $G$ is a
$(d,\alpha ,\gamma ,k)$-compressor.
\end{tetel}
\begin{biz}
The matrix $\boldsymbol {M}^{T}\boldsymbol {M}$ has largest eigenvalue $d^2$.
Since it is symmetric, it has a basis of orthogonal eigenvectors
$\boldsymbol {e}_1,\ldots ,\boldsymbol {e}_{k}$ of unit length with corresponding nonnegative
eigenvalues
\[
\lambda _1^2\geq \cdots \geq \lambda _{k}^2
\]
where $\lambda _1=d$ and $\boldsymbol {e}_1=\boldsymbol {e}/\sqrt {k}$.
Recall that in the orthonormal basis $\{\boldsymbol {e}_{i}\}$, any vector $\boldsymbol {f}$ can be
written as $\boldsymbol {f}= \sum _{i}(\boldsymbol {f},\boldsymbol {e}_{i})\boldsymbol {e}_{i}$.
For an arbitrary vector $\boldsymbol {f}$, we can estimate $|\boldsymbol {M}\boldsymbol {f}|^2$ as follows.
\begin {align*}
|\boldsymbol {M}\boldsymbol {f}|^2 &= (\boldsymbol {M}\boldsymbol {f},\boldsymbol {M}\boldsymbol {f})=(\boldsymbol {f},\boldsymbol {M}^{T}\boldsymbol {M}\boldsymbol {f})
=\sum _i\lambda _i^2(\boldsymbol {f},\boldsymbol {e}_i)^2
\\ &\leq d^2(\boldsymbol {f},\boldsymbol {e}_1)^2+\mu ^2\sum _{i>1}(\boldsymbol {f},\boldsymbol {e}_i)^2
\leq d^2(\boldsymbol {f},\boldsymbol {e}_1)^2+\mu ^2(\boldsymbol {f},\boldsymbol {f})
\\ &= d^2(\boldsymbol {f},\boldsymbol {e})^2/k+ \mu ^2(\boldsymbol {f},\boldsymbol {f})\koz.
\end{align*}
Let now $A\subset \{1,\dots ,k\}$
be a set of size $\alpha k$ and $\boldsymbol {f}=(f_{1},\dots ,f_{k})^{T}$ where
$f_{j}=1$ for $j\in A$ and 0 otherwise.
Then, coordinate $i$ of $\boldsymbol {M}\boldsymbol {f}$ counts the number $d_i$ of edges coming
from the set $A$ to the node $i$.
Also, $(\boldsymbol {f},\boldsymbol {e})=(\boldsymbol {f},\boldsymbol {f})=|A|$, the number of elements of $A$.
We get
\begin {align*}
\sum _i d_i^2 &= |\boldsymbol {M}\boldsymbol {f}|^{2} \leq d^2(\boldsymbol {f},\boldsymbol {e})^2/k+ \mu ^2(\boldsymbol {f},\boldsymbol {f})
= d^2 \alpha ^2k+ \mu ^2\alpha k,
\\ k^{-1}\sum _i (d_i/d)^2 &\leq \alpha ^2 + (\mu /d)^2\alpha \koz.
\end{align*}
Suppose that there are $c\alpha k$ nodes $i$ with $d_i>d/2,$ then this says
\[
c\alpha \leq 4(\mu /d)^2\alpha + 4\alpha ^2 \koz.
\]
Since~\eqref {e.first-second-eigen} implies
$4(\mu /d)^{2}<\gamma $, it follows that $\boldsymbol {M}$ is a
$(d,\alpha ,\gamma ,k,k)$-compressor for small enough $\alpha $.
\end{biz}
It is actually sufficient to look for graphs with large $k$
and $\mu /d< c < 1$ where $d,c$ are constants.
To see this, let us define the product of two bipartite
multigraphs with $2k$ vertices by the
multigraph belonging to the product of the corresponding matrices.
Suppose that $\boldsymbol {M}$ is symmetric:
then its second largest eigenvalue is $\mu $ and
the ratio of the two largest eigenvalues of $\boldsymbol {M}^r$ is $(\mu /d)^r$.
Therefore using $\boldsymbol {M}^{r}$ for a
sufficiently large $r$ as our matrix, the
condition~\eqref {e.first-second-eigen} can be satisfied.
Unfortunately, taking the power
will increase the degree $d$, taking us probably even
farther away from practical realizability.
We found that there is a construction of a
compressor with the desired parameters as soon as we find
multigraphs with arbitrarily large
sizes $2k$, with symmetric matrices $\boldsymbol {M}_{k}$ and with a
ratio of the two largest eigenvalues of $\boldsymbol {M}_{k}$ bounded
by a constant $c<1$ independent of $k$.
There are various constructions of such multigraphs
(see the references in the historical overview).
The estimation of the desired eigenvalue quotient is never very simple.
\begin {gyak}
\ujgyak\label{xrc.iter-endgame}
The proof of Theorem~\ref {t.rel-bool} uses a ``coda'' circuit of size $(k+1)^{7}$.
Once we proved this theorem we could, of course, apply it to the
computation of the final majority itself: this would reduce the size of the
coda circuit to $O(k\log k)$. Try out this approach on the numerical examples considered above to see
whether it results in a significant improvement.
\ujgyak\label{xrc.compr-compr}
The proof of Theorem~\ref{t.compr} provided also
bipartite graphs with the compressor property, with
$k$ inputs and $k'< 0.8k$ outputs.
An idea to build a smaller ``coda'' circuit in the proof of
Theorem~\ref{t.rel-bool} is to concatenate several such compressors,
decreasing the number of cables in a geometric series.
Explore this idea, keeping in mind, however, that as $k$ decreases,
the ``exponential'' error estimate in inequality~\eqref {e.imperfect}
becomes weaker.
\ujgyak\label {xrc.input-indep-errors}
In a noisy Boolean circuit, let $F_{v}=1$ if the gate at vertex
$v$ fails and $0$ otherwise.
Further, let $T_{v}=1$ if $v$ is tainted, and 0 otherwise.
Suppose that the distribution of the random variables
$F_{v}$ does not depend on the Boolean input vector.
Show that then the joint distribution of the random variables $T_{v}$
is also independent of the input vector.
\ujgyak\label{xrc.randomized-maj}
This exercise extends the result of Exercise~\ref{xrc.iter-maj-converg}
to random input vectors: it shows that if a random input vector has
only a small number of errors, then the
iterated majority vote $\mathcal {M}_{3}^{r}$ of
Exercise~\ref {xrc.fooling-maj}
may still work for it, if we rearrange the
input wires randomly. Let $k=3^{r}$, and
let $\boldsymbol {j}= (j_{1},\dots ,j_{k})$ be a vector of integers
$j_{i}\in \{1,\dots ,k\}$.
We define a Boolean circuit $C(\boldsymbol {j})$ as follows.
This circuit takes input vector $\boldsymbol {x}=(x_{1},\dots ,x_{k})$, computes the vector
$\boldsymbol {y}=(y_{1},\dots ,y_{k})$ where $y_{i}=x_{j_{i}}$
(in other words, just leads a wire from input node $j_{i}$ to
an ``intermediate node'' $i$)
and then inputs $\boldsymbol {y}$ into the circuit $\mathcal {M}_{3}^{r}$.
Denote the (possibly random) output bit of $C(\boldsymbol {j})$ by $Z$.
For any fixed input vector $\boldsymbol {x}$,
assuming that our majority gates can fail with probability $\leq \epsilon \leq \alpha /2$
independently, denote $q(\boldsymbol {j},\boldsymbol {x}) := \mathbf {P}\mskip 1mu\mathopen [\,Z=1\,\mathclose ]$.
Assume that the input is a vector
$\boldsymbol {X}=(X_{1},\dots ,X_{k})$ of
(not necessarily independent) Boolean random variables, with
$p(\boldsymbol {x}) := \mathbf {P}\mskip 1mu\mathopen [\,\boldsymbol {X}=\boldsymbol {x}\,\mathclose ]$.
Denoting $|\boldsymbol {X}|=\sum _{i}X_{i}$, assume
$\mathbf {P}\mskip 1mu\mathopen [\,|\boldsymbol {X}|> \alpha k\,\mathclose ]\leq \rho <1$.
Prove that there is a choice of the vector $\boldsymbol {j}$ for which
\begin{equation*}
\sum _{\boldsymbol {x}}p(\boldsymbol {x})
q(\boldsymbol {j},\boldsymbol {x}) \leq \rho +\max \{10\epsilon ,\; 0.3(\alpha /0.3)^{2^{k}}\} \koz .
\end{equation*}
The choice may depend on the distribution of the random vector $\boldsymbol {X}$.
\textit{Hint.} Choose the vector $\boldsymbol {j}$ (and hence the circuit $C(\boldsymbol {j})$)
randomly, as a random vector
$\boldsymbol {J}=(J_{1},\dots ,J_{k})$ where the variables $J_{i}$ are independent and
uniformly distributed over $\{1,\dots ,k\}$, and denote
$s(\boldsymbol {j}) := \mathbf {P}\mskip 1mu\mathopen [\,\boldsymbol {J}=\boldsymbol {j}\,\mathclose ]$.
Then prove
\begin{equation*}
\sum _{j} s(\boldsymbol {j})\sum _{\boldsymbol {x}}p(\boldsymbol {x})
q(\boldsymbol {j},\boldsymbol {x}) \leq \rho +\max \{10\epsilon ,\; 0.3(\alpha /0.3)^{2^{k}}\}.
\end{equation*}
For this, interchange the averaging over $\boldsymbol {x}$ and $\boldsymbol {j}$.
Then note that $\sum _{j} s(\boldsymbol {j})q(\boldsymbol {j},\boldsymbol {x})$ is the probability of
$Z=1$ when the ``wires'' $J_{i}$ are chosen randomly ``on the fly'' during
the computation of the circuit.
]
\ujgyak\label {xrc.dobr-endgame}
Taking the notation of Exercise~\ref{xrc.input-indep-errors}
suppose, like there, that the random variables
$F_{v}$ are independent of each other, and their
distribution does not depend on the Boolean input vector.
Take the Boolean circuit $\mathrm {Cab}(\mathcal {N},\mathcal {R})$ introduced in
Definition~\ref {d.Cab}, and define the random Boolean vector
$\boldsymbol {T}=(T_{1},\dots ,T_{k})$ where $T_{i}=1$ if and only if the $i$th output
node is tainted. Apply Exercise~\ref {xrc.randomized-maj} to show that there is a circuit
$C(\boldsymbol {j})$ that can be attached to the output nodes
to play the role of the ``coda'' circuit in the proof of Theorem~\ref {t.rel-bool}.
The size of $C(\boldsymbol {j})$ is only linear in $k$, not $(k+1)^{7}$ as for
the coda circuit in the proof there.
But, we assumed a little more about the fault distribution, and also
the choice of the ``wiring'' $\boldsymbol {j}$ depends on the circuit $\mathrm {Cab}(\mathcal {N},\mathcal {R})$.
\end {gyak}
\section{The reliable storage problem}
There is hardly any simpler computation than not doing anything,
just keeping the input unchanged.
This task does not fit well, however, into the simple model of Boolean
circuits as introduced above.
\subsection{Clocked circuits}
\begin{figure}
\begin{equation*}
\includegraphics {figs/shift-reg.eps}
\end {equation*}
\caption {\label{f.shift-reg}A shift register.}
\end {figure}
An obvious element of ordinary computations is missing from
the above described Boolean circuit model: \emph {repetition}.
If we want to repeat some computation steps, then we need to
introduce \emph {timing} into the work of computing elements
and to \emph {store} the partial results between consecutive steps.
Let us look at the drawings of the circuit designer again.
We will see components like in Figure~\ref {f.shift-reg},
with one ingoing edge and no operation associated
with them; these will be called \ki {shift registers}.
The shift registers are controlled by one central \ki {clock}
(invisible on the drawing). At each clock pulse, the assignment value on the incoming edge
jumps onto the outgoing edges and ``stays in'' the register.
Figure~\ref {f.adder} shows how shift registers may be used inside a circuit.
\begin{defi}
A \ki{clocked circuit} over a complete
basis $Q$ is given by a tuple just like a Boolean
circuit in~\eqref{e.logic-circuit}.
Also, the circuit defines a graph $G=(V,E)$ similarly.
Recall that we identified nodes with the natural numbers $1,\dots ,N$.
To each non-input node $v$ either a gate $b_{v}$ is assigned as before, or
a \ki{shift register:} in this case $k_{v}=1$ (there is only one argument).
We do not require the graph to be acyclic, but we do
require every directed cycle (if there is any) to pass through at least
one shift register.
\begin{figure}[!ht]
\begin{equation*}
\includegraphics{figs/adder.eps}
\end{equation*}
\caption{\label{f.adder} Part of a circuit which
computes the sum of two binary numbers $x,y$.
We feed the digits of $x$ and $y$ beginning with the lowest-order
ones, at the input nodes.
The digits of the sum come out on the output edge.
A shift register holds the carry.}
\end{figure}
The circuit works in a sequence $t=0,1,2,\dots $ of \ki {clock cycles}.
Let us denote the input vector
at clock cycle $t$ by $\boldsymbol {x}^{t}=(x_{1}^{t},\ldots ,x_{n}^{t})$,
the shift register states
by $\boldsymbol {s}^{t}=(s_{1}^{t},\ldots ,s_{k}^{t})$, and the output vector by
$\boldsymbol {y}^{t}=(y_{1}^{t},\ldots ,y_{m}^{t})$.
The part of the circuit going from the inputs and the shift registers
to the outputs and the shift registers defines two Boolean vector functions
$\lambda : \{0,1\}^{k}\times \{0,1\}^{n}\to \{0,1\}^{m}$ and
$\tau : \{0,1\}^{k}\times \{0,1\}^{n}\to \{0,1\}^{k}$.
The operation of the clocked circuit is described by the following
equations (see Figure~\ref {f.computer}, which does not show any
inputs and outputs).
\begin{equation}\label {e.FSM}
\boldsymbol {y}^{t} = \lambda (\boldsymbol {s}^{t}, \boldsymbol {x}^{t}),
\quad \boldsymbol {s}^{t+1} = \tau (\boldsymbol {s}^{t}, \boldsymbol {x}^{t}) \koz .
\end{equation}
\end{defi}
\begin{figure}
\begin{equation*}
\includegraphics{figs/computer.eps}
\end{equation*}
\caption{\label{f.computer} A ``computer'' consists of some memory (shift registers) and a Boolean
circuit operating on it. We can define the \ki {size of computation} as the
size of the computer times the number of steps.}
\end{figure}
Frequently, we have no inputs or outputs during the work of the circuit,
so the equations~\eqref {e.FSM} can be simplified to
\begin{equation}\label{e.FSM-simple}
\boldsymbol {s}^{t+1} = \tau (\boldsymbol {s}^{t}) \koz .
\end{equation}
How to use a clocked circuit described by this equation for computation?
We write some initial values into the shift registers,
and propagate the assignment using the gates, for
the given clock cycle.
Now we send a clock pulse to the register, causing it to
write new values to their output edges (which are identical to the
input edges of the circuit).
After this, the new assignment is computed, and so on.
How to compute a \emph {function} $f(x)$ with the help of such a circuit?
Here is a possible convention.
We enter the input $x$ (only in the first step), and then
run the circuit, until it signals at an extra output edge
when desired result $f(x)$ can be received from the other output nodes.
\begin{pelda}
This example uses a convention different from the above described one:
new input bits are supplied in every step, and the output is also delivered
continuously. For the binary adder of Figure~\ref{f.adder},
let $u^{t}$ and $v^{t}$ be the two input bits in cycle
$t$, let $c^{t}$ be the content of the carry, and $w^{t}$ be the
output in the same cycle. Then the equations~\eqref {e.FSM} now have the form
\begin{equation*}
w^{t} = u^{t}\oplus v^{t}\oplus c^{t},
\quad c^{t+1} = \mathop {\mathrm {Maj}}(u^{t}, v^{t}, c^{t}) \koz ,
\end{equation*}
where $\mathop {\mathrm {Maj}}$ is the majority operation.
\end{pelda}
\subsection{Storage}
A clocked circuit is an interesting parallel computer but
let us pose now a task for it that is trivial in the absence
of failures: information storage.
We would like to store a certain amount of information in such a
way that it can be recovered after some time, despite failures in the
circuit. For this, the transition function $\tau $ introduced in~\eqref {e.FSM-simple}
cannot be just the identity: it
will have to perform some \emph {error-correcting} operations.
The restoring organs discussed earlier are natural candidates.
Indeed, suppose that we use $k$ memory cells to store a bit of
information. We can call the content of this $k$-tuple \ki {safe} when the number of
memory cells that dissent from the correct value is under some treshold
$\vartheta k$.
Let the rest of the circuit be a restoring organ built
on a $(d, \alpha , \gamma , k)$-compressor with $\alpha =0.9\vartheta $.
Suppose that the input cable is safe.
Then the probability that after the transition, the new output cable (and
therefore the new state) is
not safe is $O(e^{-ck})$ for some constant $c$.
Suppose we keep the circuit running for $t$ steps.
Then the probability that the state is not safe in some of these
steps is $O(te^{-ck})$ which is small as long as $t$ is significantly
smaller than $e^{ck}$.
When storing $m$ bits of information, the probability that
any of the bits loses its safety in some step is $O(mt e^{-c m})$.
To make this discussion rigorous, an error model must be introduced for
clocked circuits.
Since we will only consider simple
transition functions $\tau $ like the majority vote above, with a
single computation step between times $t$ and $t+1$, we will make the model
also very simple.
\begin{defi}
Consider a clocked circuit described by equation~\eqref {e.FSM-simple},
where at each time instant $t=0,1,2,\dots $,
the configuration is described by the bit vector
$\boldsymbol {s}^{t} = (s^{t}_{1},\dots ,s^{t}_{n})$.
Consider a sequence of random bit vectors
$\boldsymbol {Y}^{t}=(Y^{t}_{1},\dots ,Y^{t}_{n})$ for $t=0,1,2,\dots $.
Similarly to~\eqref {e.failure} we define
\begin {equation}\label {e.failure-clocked}
Z_{i,t} = \tau (\boldsymbol {Y}^{t-1})\oplus Y^{t}_{i} \koz .
\end{equation}
Thus, $Z_{i,t}=1$ says that a
\ki {failure} occurs at the space-time point $(i,t)$.
The sequence $\{\boldsymbol {Y}^{t}\}$ will be called $\epsilon $-\ki {admissible}
if~\eqref {e.admissible} holds for every set $C$ of
space-time points with $t>0$.
\end{defi}
By the just described construction,
it is possible to keep $m$ bits of information for $T$
steps in
\begin{equation}\label {e.feluj-mem}
O(m\lg (mT))
\end{equation}
memory cells.
More precisely, the cable $\boldsymbol {Y}^{T}$ will be safe with large
probability in any admissible evolution $\boldsymbol {Y}^{t}$ ($t=0,\dots ,T$).
Cannot we do better? The reliable information storage problem is related to the
problem of \ki{information transmission:} given a \ki {message} $\boldsymbol {x}$,
a \ki{sender} wants to transmit it to a \ki {receiver} throught a
\ki{noisy channel.}
Only now sender and receiver are the same person, and the noisy channel is
just the passing of time.
Below, we develop some basic concepts of reliable information transmission,
and then we will apply them to the construction of
a reliable data storage scheme that is
more economical than the above seen naive, repetition-based solution.
\subsection{Error-correcting codes}
\subsubsection{Error detection}
To protect information, we can use redundancy in a way more efficient than
repetition.
We might even add only a single redundant bit to our message.
Let $\boldsymbol {x}=(x_{1},\dots ,x_{6})$, $(x_{i}\in \{0,1\})$
be the word we want to protect.
Let us create the \ki {error check bit}
\begin{equation*}
x_{7} = x_{1} \oplus \dots \oplus x_{6} \koz .
\end{equation*}
For example, $\boldsymbol {x}= 110010$, $\boldsymbol {x}' = 1100101$.
Our \ki {codeword} $\boldsymbol {x}' = (x_{1},\dots ,x_{7})$ will be subject to noise
and it turns into a new word, $\boldsymbol {y}$.
If $\boldsymbol {y}$ differs from $\boldsymbol {x}'$ in a single changed (not deleted or added)
bit then we will \ki {detect}
this, since then $\boldsymbol {y}$ violates the \ki {error check relation}
\begin{equation*}
y_{1}\oplus \dots \oplus y_{7} = 0 \koz .
\end{equation*}
We will not be able to correct the error, since we do not know
which bit was corrupted.
\subsubsection{Correcting a single error.}
To also \ki{correct} corrupted bits, we need to add more error check bits.
We may try to add two more bits:
\begin{align*}
x_{8} &= x_{1} \oplus x_{3}\oplus x_{5} \koz ,
\\ x_{9} &= x_{1} \oplus x_{2}\oplus x_{5}\oplus x_{6} \koz .
\end {align*}
Then an uncorrupted word $\boldsymbol {y}$ must satisfy the error check relations
\begin{align*}
y_{1}\oplus \dots \oplus y_{7} &=0 \koz ,
\\ y_{1} \oplus y_{3}\oplus y_{5} \oplus y_{8} &=0 \koz ,
\\ y_{1} \oplus y_{2}\oplus y_{5}\oplus y_{6} \oplus y_{9} &=0 \koz ,
\end{align*}
or, in matrix notation $\boldsymbol {H}\boldsymbol {y}\bmod 2 = 0$, where
\setcounter{MaxMatrixCols}{13}
\begin{equation*}
\boldsymbol {H}= \begin {pmatrix}
1 & 1 & 1 & 1 & 1 & 1 & 1 & 0 & 0
\\ 1 & 0 & 1 & 0 & 1 & 0 & 0 & 1 & 0
\\ 1 & 1 & 0 & 0 & 1 & 1 & 0 & 0 & 1
\end{pmatrix} = (\boldsymbol {h}_{1},\dots ,\boldsymbol {h}_{9}) \koz .
\end {equation*}
Note $\boldsymbol {h}_{1} = \boldsymbol {h}_{5}$.
The matrix $\boldsymbol {H}$ is called the \ki {error check matrix}, or \ki {parity
check matrix}.
Another way to write the error check relations is
\begin {equation*}
y_{1} \boldsymbol {h}_{1} \oplus \dots \oplus y_{5} \boldsymbol {h}_{5}
\oplus \dots \oplus y_{9} \boldsymbol {h}_{9} = 0.
\end {equation*}
Now if $\boldsymbol {y}$ is corrupted, even if only in a single position,
unfortunately we still cannot correct it:
since $\boldsymbol {h}_{1}=\boldsymbol {h}_{5}$,
the error could be in position 1 or 5 and
we could not tell the difference.
If we choose our error-check matrix $\boldsymbol {H}$ in such a way
that the colum vectors $\boldsymbol {h}_{1},\boldsymbol {h}_{2},\dots $ are
\emph {all different} (of course also from 0),
then we can always correct an error, provided there is only one.
Indeed, if the error was in position 3 then
\begin {equation*}
\boldsymbol {H}\boldsymbol {y}\bmod 2 = \boldsymbol {h}_{3}.
\end {equation*}
Since all vectors $\boldsymbol {h}_{1},\boldsymbol {h}_{2},\dots $ are different, if we see the
vector $\boldsymbol {h}_{3}$ we can imply that the bit $y_{3}$ is corrupted.
This code is called the \ki{Hamming code.}
For example, the following error check matrix defines the Hamming
code of size 7:
\begin {equation}\label {e.Hamming-7}
\boldsymbol {H}= \begin {pmatrix}
1 & 1 & 1 & 0 & 1 & 0 & 0
\\ 1 & 0 & 1 & 1 & 0 & 1 & 0
\\ 1 & 1 & 0 & 1 & 0 & 0 & 1
\end {pmatrix} = (\boldsymbol {h}_{1},\dots ,\boldsymbol {h}_{7}) \koz.
\end {equation}
In general, if we have $s$ error check bits then our code can have size
$2^{s}-1$, hence the number of bits left to store information, the
\ki {information bits} is $k= 2^{s} - s - 1$.
So, to protect $m$ bits of information from a single error, the Hamming code
adds $\approx \log m$ error check bits.
This is much better than repeating every bit 3 times.
\subsubsection{Codes.} Let us summarize the error-correction scenario in general terms.
\begin{figure}
\begin{equation*}
\includegraphics{figs/channel}
\end {equation*}
\caption{Transmission through a noisy channel.}
\end{figure}
In order to fight noise, the sender \ki{encodes} the \ki{message} $\boldsymbol {x}$
by an \ki{encoding function} $\phi _{*}$ into a longer string
$\phi _{*}(\boldsymbol {x})$ which, for simplicity, we also assume to be binary.
This \ki{codeword} will be changed by noise into a string $\boldsymbol{y}.$
The receiver gets $\boldsymbol{y}$ and applies to it a
\ki{decoding function} $\phi^{*}$.
\begin{defi}
The pair of functions $\phi_{*}:\{0,1\}^{m} \to \{0,1\}^{n}$ and
$\phi^{*}:\{0,1\}^{n} \to \{0,1\}^{m}$ is
called a \ki{code} if $\phi^{*}(\phi _{*}(\boldsymbol {x}))=\boldsymbol {x}$
holds for all $\boldsymbol{x}\in \{0,1\}^{m}$.
The strings $\boldsymbol{x}\in \{0,1\}^{m}$ are called \ki {messages},
words of the form $\boldsymbol{y}=\phi_{*}(\boldsymbol{x})\in \{0,1\}^{n}$ are
called \ki{codewords.} (Sometimes the set of all codewords by itself is also called a code.)
For every message $\boldsymbol{x}$, the set of words
$C_{\boldsymbol{x}}=\mathopen \{\,\boldsymbol {y}: \phi ^{*}(\boldsymbol {y})=\boldsymbol {x}\,\mathclose \}$ is called the
\ki{decoding set} of $\boldsymbol{x}.$ (Of course, different decoding sets are disjoint.) The number
\begin{equation*}
R= m/n
\end{equation*}
is called the \ki{rate} of the code.
We say that our code that \ki{corrects $t$ errors} if for all possible messages
$\boldsymbol{x}\in \{0,1\}^{m},$ if the received word
$\boldsymbol{y}\in \{0,1\}^{n}$ differs from the codeword
$\phi_{*}(\boldsymbol{x})$ in at most $t$ positions, then $\phi^{*}(\boldsymbol{y})=\boldsymbol{x}.$
\end{defi}
If the rate is $R$ then the $n$-bit codewords carry $Rn$ bits of useful information.
In terms of decoding sets, a code corrects $t$ errors if each decoding set
$C_{\boldsymbol{x}}$ contains all words that differ from $\phi_{*}(\boldsymbol{x})$ in at most
$t$ symbols (the set of these words is a kind of ``ball'' of radius $t$).
The Hamming code corrects a single error, and its rate is close to 1.
One of the important questions connected with error-correcting codes is how
much do we have to lower the rate in order to correct more errors.
Having a notion of codes, we can formulate the main result of this section
about information storage.
\begin{tetel}[Network information storage]\label{t.netw-stor}
There are constants $\epsilon ,c_{1},c_{2},R>0$ with the following property.
For all sufficiently large $m,$
there is a code $(\phi _{*},\phi ^{*})$ with
message length $m$ and codeword length $n\leq m/R$,
and a Boolean clocked circuit $\mathcal{N}$ of
size $O(n)$ with $n$ inputs and $n$ outputs, such that the
following holds.
Suppose that at time 0, the memory cells of the circuit contain
string $\boldsymbol{Y}_{0}=\phi_{*}(\boldsymbol {x}).$
Suppose further that the evolution $\boldsymbol {Y}_{1},\boldsymbol {Y}_{2},\dots ,\boldsymbol {Y}_{t}$
of the circuit has $\epsilon$-admissible failures.
Then we have
\begin{equation*}
\mathbf{P}\mskip 1mu\mathopen [\,\phi ^{*}(\boldsymbol {Y}_{t})\neq \boldsymbol {x}\,\mathclose ] < t (c_{1}\epsilon )^{-c_{2}n}\koz.
\end{equation*}
\end{tetel}
This theorem shows that it is possible to store $m$ bits
information for time $t$, in a clocked circuit of size
\begin{equation*}
O(\max (\log t, m)) \koz .
\end{equation*}
As long as the storage time $t$ is below the exponential bound
$e^{cm}$ for a certain constant $c$, this
circuit size is only a constant times larger than the amount $m$ of
information it stores. (In contrast, in~\eqref {e.feluj-mem}
we needed an extra factor $\log m$ when we used
a separate restoring organ for each bit.)
The theorem says nothing about how difficult it is to
compute the codeword $\phi_{*}(\boldsymbol{x})$ at the beginning and how difficult it is
to carry out the decoding $\phi^{*}(\boldsymbol{Y}_{t})$ at the end.
Moreover, it is desireable to perform these two operations also in a
noise-tolerant fashion.
We will return to the problem of decoding later.
\subsubsection{Linear algebra.}
Since we will be dealing more with bit matrices,
it is convenient to introduce the algebraic structure
\[
\mathbb {F}_{2}=(\{0,1\},+,\cdot)\koz,
\]
which is a two-element \ki{field.}
Addition and multiplication in $\mathbb{F}_{2}$ are defined modulo 2 (of course,
for multiplication this is no change).
It is also convenient to vest the set $\{0,1\}^{n}$ of binary strings with
the structure $\mathbb{F}_{2}^{n}$ of an $n$-dimensional vector space over the
field $\mathbb{F}_{2}.$
Most theorems and algorithms of basic linear algebra apply to arbitrary
fields: in particular, one can define the row rank of a matrix as the
maximum number of linearly independent rows, and similarly the column rank.
Then it is a theorem that the row rank is equal to the colum rank.
From now on, in algebraic operations over bits or bit vectors, we
will write $+$ in place of $\oplus$ unless this leads to confusion.
To save space, we will frequently write column vectors horizontally:
we write
\begin{equation*}
\begin{pmatrix}
x_{1}
\\ \vdots
\\ x_{n}
\end{pmatrix} = (x_{1},\dots ,x_{n})^{T}\koz,
\end{equation*}
where $\boldsymbol {A}^{T}$ denotes the transpose of matrix $\boldsymbol {A}$.
We will write
\begin{equation*}
\boldsymbol{I}_{r}
\end{equation*}
for the identity matrix over the vector space $\mathbb {F}_{2}^{r}.$
\subsubsection{Linear codes.} Let us generalize the idea of the Hamming code.
\begin{defi}
A code $(\phi_{*},\phi^{*})$ with message length $m$ and codeword length
$n$ is \ki{linear} if, when viewing the message and code vectors as
vectors over the field $\mathbb{F}_{2}$, the encoding
function can be computed according to the formula
\begin{equation*}
\phi_{*}(\boldsymbol {x}) = \boldsymbol {G}\boldsymbol {x}\koz,
\end{equation*}
with an $m\times n$ matrix $\boldsymbol{G}$ called the \ki{generator matrix} of the code.
The number $m$ is called the the number of \ki {information bits} in the
code, the number
\begin{equation*}
k= n- m
\end{equation*}
the number of \ki{error-check bits.}
\end{defi}
\begin{pelda}\label{x.Hamming-7}
The matrix $\boldsymbol{H}$ in~\eqref{e.Hamming-7} can be written as
$\boldsymbol {H}= (\boldsymbol{K}, \boldsymbol{I}_{3})$, with
\begin{equation*}
\boldsymbol{K}=
\begin{pmatrix}
1 & 1 & 1 & 0
\\ 1 & 0 & 1 & 1
\\ 1 & 1 & 0 & 1
\end{pmatrix} \koz .
\end{equation*}
Then the error check relation can be written as
\begin{equation*}
\boldsymbol{y}=
\begin{pmatrix}
\boldsymbol{I}_{4} \\ -\boldsymbol{K}\end{pmatrix}\begin{pmatrix}
y_{1} \\ \vdots \\ y_{4} \koz .
\end {pmatrix} \koz .
\end{equation*}
This shows that the bits $y_{1},\dots ,y_{4}$ can be taken to be the message
bits, or ``information bits'', of the code, making
the Hamming code a linear code with the generator matrix
$(\boldsymbol{I}_{4}, -\boldsymbol{K})^{T}.$
(Of course, $-\boldsymbol{K}= \boldsymbol{K}$ over the field $\mathbb{F}_{2}.)$
\end{pelda}
The following statement is proved using standard linear algebra, and it
generalizes the relation between error check matrix and generator matrix
seen in Example~\ref{x.Hamming-7}.
\begin{allit}\label {p.linalg}
Let $k,m>0$ be given with $n= m+k$.
\begin{description}
\item[\rm{(a)}] For every $n\times m$ matrix $\boldsymbol {G}$ of rank $m$
over $\mathbb{F}_{2}$ there is a $k\times n$
matrix $\boldsymbol{H}$ of rank $k$ with the property
\begin{equation}\label{e.G-H}
\mathopen \{\,\boldsymbol {G}\boldsymbol {x}: \boldsymbol {x}\in \mathbb {F}_{2}^{m}\,\mathclose \} =
\mathopen \{\,\boldsymbol {y}\in \mathbb {F}_{2}^{n} : \boldsymbol {H}\boldsymbol {y}= 0\,\mathclose \}.
\end{equation}
\item[\rm{(b)}] For every $k\times n$ matrix $\boldsymbol {H}$ of rank $k$
over $\mathbb {F}_{2}$ there is an $n\times m$ matrix $\boldsymbol {G}$
of rank $m$ with property~\eqref {e.G-H}.
\end{description}
\end{allit}
\begin{defi}
For a vector $\boldsymbol{x}$, let $|\boldsymbol{x}|$ denote the number of its nonzero
elements: we will also call it the \ki {weight} of $\boldsymbol{x}.$
\end{defi}
In what follows it will be convenient to define a code starting from an
error-check matrix $\boldsymbol {H}$.
If the matrix has rank $k$ then the code has rate
\begin{equation*}
R= 1 - k/n.
\end{equation*}
We can fix any subset $S$ of $k$ linearly independent columns, and call
the indices $i\in S$ \ki {error check bits} and the indices $i\not \in S$ the
\ki{information bits}.
(In Example~\ref{x.Hamming-7}, we chose $S=\{5,6,7\}$.)
Important operations can performed over a code, however, without fixing
any separation into error-check bits and information bits.
\subsection{Refreshers}
Correcting a single error was not too difficult; finding a similar scheme
to correct 2 errors is much harder.
However, in storing $n$ bits, typically $\epsilon n$ (much more
than 2) of those bits will be corrupted in every step.
There are ingenious and quite efficient codes of positive rate
(independent of $n$) correcting even this many errors.
When applied to information storage, however,
the error-correction mechanism itself must also work in noise,
so we are looking for a particularly simple one.
It works in our favor, however, that
not all errors need to be corrected: it is sufficient to cut down
their number, similarly to the restoring organ in reliable Boolean
circuits above.
For simplicity, as gates of our circuit
we will allow certain Boolean functions with a large, but constant,
number of arguments.
On the other hand, our Boolean circuit will have just depth 1, similarly to
a restoring organ of Section~\ref{s.cables}.
The output of each gate is the input of a memory cell (shift register).
For simplicity, we identify the gate and the memory cell and
call it a \ki{cell.}
At each clock tick, a cell reads its inputs from other cells, computes a Boolean function on
them, and stores the result (till the next clock tick). But now,
instead of majority vote among the input values cells, the Boolean function
computed by each cell will be slightly more complicated.
Our particular restoring operations will be defined,
with the help of a certain $k\times n$ parity check matrix $\boldsymbol {H}=(h_{ij})$.
Let $\boldsymbol {x}=(x_1,\ldots ,x_n)^{T}$ be a vector of bits.
For some $j=1,\dots ,n$, let $V_j$ (from ``vertical'') be the
set of those indices $i$ with $h_{ij}=1$.
For integer $i=1,\dots ,k$, let $H_i$ (from ``horizontal'') be the set
of those indices $j$ with $h_{ij}=1$.
Then the condition $\boldsymbol {H}\boldsymbol {x}= 0$ can also be expressed by saying that for
all $i$, we have $\sum _{j\in H_i} x_j \equiv 0 \pmod 2$.
The sets $H_i$ are called the \ki {parity check sets}
belonging to the matrix $\boldsymbol {H}$.
From now on, the indices $i$ will be called \ki {checks}, and the
indices $j$ \ki{locations.}
\begin{defi}
A linear code $\boldsymbol{H}$ is a \ki{low-density parity-check code} with bounds
$K,N>0$ if the following conditions are satisfied:
\begin{description}
\item[\rm{(a)}] For each $j$ we have $|V_j|\leq K$;
\item[\rm{(b)}] For each $i$ we have $|H_i|\leq N$.
\end{description}
In other words, the weight of each row is at most $N$ and the weight of
each column is at most $K$.
\end{defi}
In our constructions, we will keep the bounds $K,N$ constant while the
length $n$ of codewords grows.
Consider a situation when $\boldsymbol {x}$ is a codeword corrupted by some errors.
To check whether bit $x_j$ is incorrect we may check all the sums
\[
s_{i}=\sum _{j\in H_i} x_j
\]
for all $i\in V_j$. If all these sums are 0 then we would not suspect $x_j$ to be in
error. If only one of these is nonzero, we will know that $\boldsymbol {x}$ has some
errors but we may still think that the error is not in bit $x_j.$
But if a significant number of these sums is nonzero then we may
suspect that $x_j$ is a culprit and may want to change it.
This idea suggests the following definition.
\begin{defi}
For a low-density parity-check code $\boldsymbol{H}$ with bounds $K,N$,
the \ki{refreshing operation} associated with the code is
the following, to be performed simultaneously for all locations $j$:
\begin{quotation}
Find out whether more than ${\lfloor K/2\rfloor }$ of the sums $s_{i}$ are
nonzero among the ones for $i\in V_j$.
If this is the case, flip $x_j$.
\end{quotation}
Let $\boldsymbol{x}^{\boldsymbol{H}}$ denote the vector obtained from $\boldsymbol{x}$ by this
operation. For parameters $0<\vartheta ,\gamma <1$, let us call $\boldsymbol {H}$ a
$(\vartheta, \gamma, K,N, k,n)$-\ki{refresher} if for each vector $\boldsymbol{x}$ of
length $n$ with weight $|\boldsymbol{x}|\leq \vartheta n$ the weight of the
resulting vector decreases thus: $|\boldsymbol{x}^{\boldsymbol{H}}|\leq \gamma \vartheta n.$
\end{defi}
Notice the similarity of refreshers to compressors.
The following lemma shows the use of refreshers, and is an example of the
advantages of linear codes.
\begin{lemma}
For an $(\vartheta , \gamma , K,N, k,n)$-refresher $\boldsymbol {H}$, let
$\boldsymbol {x}$ be an $n$-vector and $\boldsymbol {y}$ a codeword of length $n$
with $|\boldsymbol {x}-\boldsymbol {y}|\leq \vartheta n$.
Then $|\boldsymbol {x}^{\boldsymbol {H}}-\boldsymbol {y}|\leq \gamma \vartheta n$.
\end{lemma}
\begin{biz}
Since $\boldsymbol {y}$ is a codeword, $\boldsymbol {H}\boldsymbol {y}= 0$, implying
$\boldsymbol {H}(\boldsymbol {x}-\boldsymbol {y}) = \boldsymbol {H}\boldsymbol {x}$.
Therefore the error correction flips the same bits in $\boldsymbol {x}-\boldsymbol {y}$ as in
$\boldsymbol {x}$: $(\boldsymbol {x}-\boldsymbol {y})^{\boldsymbol {H}}-(\boldsymbol {x}-\boldsymbol {y}) = \boldsymbol {x}^{\boldsymbol {H}}-\boldsymbol {x}$, giving
$\boldsymbol {x}^{\boldsymbol {H}}-\boldsymbol {y}= (\boldsymbol {x}-\boldsymbol {y})^{\boldsymbol {H}}$.
So, if $|\boldsymbol {x}-\boldsymbol {y}|\leq \vartheta n$, then
$|\boldsymbol {x}^{\boldsymbol {H}}-\boldsymbol {y}|=|(\boldsymbol {x}-\boldsymbol {y})^{\boldsymbol {H}}| \leq \gamma \vartheta n$.
\end{biz}
\begin{tetel}\label {t.ec-matrix}
There is a parameter $\vartheta >0$ and integers $K>N>0$
such that for all sufficiently large codelength $n$ and $k=Nn/K$
there is a $(\vartheta , 1/2, K,N, k,n)$-refresher with at least
$n-k= 1-N/K$ information bits.
In particular, we can choose
$N=100$, $K=120$, $\vartheta = 1.31\cdot 10^{-4}$.
\end{tetel}
\begin{figure}\label{f.storage}
\begin{equation*}
\includegraphics{figs/storage.eps}
\end {equation*}
\caption{Using a refresher.}
\end{figure}
We postpone the proof of this theorem, and apply it first.
\begin{biz}[Proof of Theorem~\ref{t.netw-stor}]
Theorem~\ref {t.ec-matrix} provides us with a device for information
storage. Indeed, we can implement the operation $\boldsymbol{x}\to \boldsymbol{x}^{\boldsymbol{H}}$ using a single
gate $g_{j}$ of at most $KN$ inputs for each bit $j$ of $\boldsymbol{x}$.
Now as long as the inequality
$|\boldsymbol{x}-\boldsymbol{y}|\leq \vartheta n$ holds for some codeword $\boldsymbol{y}$, the inequality
$|\boldsymbol{x}^{\boldsymbol{H}}-\boldsymbol{y}|\leq \gamma \vartheta n$ follows with $\gamma =1/2$.
Of course, some gates will fail and introduce new deviations
resulting in some $\boldsymbol{x}'$ rather than $\boldsymbol{x}^{\boldsymbol {H}}$.
Let $e\epsilon < \vartheta /2$ and $\rho = 1 - \gamma (=1/2)$.
Then just as earlier, the probability that there are
more than $\rho \vartheta n$ failures is bounded by the exponentially decreasing
expression $(e\epsilon /\rho \vartheta )^{\rho \vartheta n}$.
With fewer than $\rho \vartheta n$ new deviations,
we will still have $|\boldsymbol {x}'-\boldsymbol {y}|< (\gamma +\rho )\vartheta n< \vartheta n$.
The probability that at any time $\leq t$ the number of failures is
more than $\rho \vartheta n$ is bounded by
\begin{equation*}
t (e\epsilon /\rho \vartheta )^{\rho \vartheta n} < t (6\epsilon /\vartheta )^{(1/2)\vartheta n}\koz.
\end{equation*}
\end{biz}
\begin{pelda}
Let $\epsilon =10^{-9}$.
Using the sample values in Theorem~\ref {t.ec-matrix} we
can take $N=100$, $K=120$, so the information rate is
$1-N/K=1/6$.
With the corresponding values of $\vartheta $, and $\gamma =\rho =1/2$,
we have $\rho \vartheta =6.57\cdot 10^{-5}$.
The probability that there are
more than $\rho \vartheta n$ failures is bounded by
\begin{equation*}
(e\epsilon /\rho \vartheta )^{\rho \vartheta n} = (10^{-4}e/6.57)^{6.57\cdot 10^{-5}n}
\approx e^{-6.63\cdot 10^{-4}n}.
\end{equation*}
This is exponentially decreasing with $n$, albeit initially very
slowly: it is not really small until $n=10^{4}$.
Still, for $n=10^{6}$, it gives $e^{-663}\approx 1.16\cdot 10^{-288}$.
\end{pelda}
\subsubsection{Decoding?}
In order to use a refresher for information storage, first we need to enter
the encoded information into it, and at the end, we need to decode the
information from it.
How can this be done in a noisy environment?
We have nothing particularly smart to say here about encoding besides the
reference to the general reliable computation scheme discussed earlier.
On the other hand, it turns out that if $\epsilon $ is sufficiently small then
\emph{decoding can be avoided} altogether.
Recall that in our codes, it is possible to designate certain symbols as
information symbols.
So, in principle it is sufficient to read out these symbols.
The question is only how likely it is that any one of these symbols will be
corrupted. The following theorem upperbounds the probability
for any symbol to be corrupted, at any time.
\begin{tetel}\label {t.no-decoding}
For parameters $\vartheta ,\gamma >0$, integers $K>N>0$, codelength
$n$, with $k=Nn/K,$ consider a $(\vartheta , 1/2, K,N, k,n)$-refresher.
Build a Boolean clocked circuit $\mathcal {N}$ of
size $O(n)$ with $n$ inputs and $n$ outputs based on this refresher,
just as in the proof of Theorem~\ref {t.netw-stor}.
Suppose that at time 0, the memory cells of the circuit contain
string $\boldsymbol{Y}_{0}=\phi _{*}(\boldsymbol{x})$.
Suppose further that the evolution $\boldsymbol{Y}_{1},\boldsymbol{Y}_{2},\dots ,\boldsymbol{Y}_{t}$
of the circuit has $\epsilon$-admissible failures.
Let $\boldsymbol{Y}_{t}=(Y_{t}(1),\dots ,Y_{t}(n))$ be the bits stored at time $t$.
Then $\epsilon < (2.1KN)^{-10}$ implies
\begin{equation*}
\mathbf{P}\mskip 1mu\mathopen [\,Y_{t}(j) \neq Y_{0}(j)\,\mathclose]
\leq c\epsilon + t(6\epsilon /\vartheta )^{(1/2)\vartheta n}
\end{equation*}
for some $c$ depending on $N,K$.
\end{tetel}
\begin{megj}
What we are bounding is only the probability of a corrupt symbol in the
particular position $j$.
Some of the symbols will certainly be corrupt, but any one symbol one
points to will be corrupt only with probability $\leq c\epsilon $.
The upper bound on $\epsilon$ required in the condition
of the theorem is very severe, underscoring the
theoretical character of this result.
\end{megj}
\begin{biz}
As usual, it is sufficient to assume $\boldsymbol {Y}_{0}=0$.
Let $D_{t}=\mathopen \{\,j : Y_{t}(j)=1\,\mathclose \}$, and let
$E_{t}$ be the set of circuit elements $j$ which fail at time $t$.
Let us define the following sequence of integers:
\begin{equation*}
b_{0}=1,\quad b_{u+1} = {\lceil (4/3)b_{u}\rceil },
\quad c_{u} = {\lceil (1/3)b_{u}\rceil }\koz.
\end{equation*}
It is easy to see by induction
\begin{equation}\label {e.b-sum}
b_{0} + \dots + b_{u-1} \leq 3b_{u} \leq 9 c_{u}\koz.
\end{equation}
The first members of the sequence $b_{u}$ are 1,2,3,4,6,8,11,15,18,24,32,
and for $c_{u}$ they are 1,1,1,2,2,3,4,5,6,8,11.
\begin{lemma}
Suppose that $Y_{t}(j_{0})\neq 0$.
Then either there is a time $t'0$, every element of $B_{u}$ shares some error-check with
some element of $B_{u-1}$.
Also every element of $C$ shares some error-check with
some element of $B_{v-1}$.
\item[\rm(b)] We have $|E_{t-u}\cap B_{u}| < |B_{u}|/3$ for $u\vartheta n$ or
\begin{equation*}
|B_{u}\mathbin {\raise 0.15ex\hbox {$\smallsetminus $}}E_{t-u}| \leq (1/2)|B''_{u+1}| \koz .
\end{equation*}
In the former case, there must have been some time $t' (1/2)\vartheta n$, otherwise $D_{t-u-1}$ could never become
larger than $\vartheta n.$
In the latter case,
the property $|E_{t-u}\cap B_{u}| < (1/3)|B_{u}|$ implies
\begin{align*}
(2/3)|B_{u}| &< |B_{u}\mathbin {\raise 0.15ex\hbox {$\smallsetminus $}}E_{t-u}|\leq (1/2)|B''_{u+1}| \koz ,
\\ (4/3)b_{u} &< |B''_{u+1}| \koz.
\end{align*}
Now if $|E_{t-u-1} \cap B''_{u+1}| < (1/3)|B''_{u+1}|$
then let $B_{u+1}$ be any subset of $B''_{u+1}$ with size
$b_{u+1}$ (there is one), else let $v=u+1$ and
$C\subseteq E_{t-u-1} \cap B''_{u+1}$ a set of size $c_{v}$
(there is one).
This construction has the required properties.
\end{biz}
For a given $B_{u}$, the number of different choices for $B_{u+1}$
is bounded by
\begin{equation*}
\binom {|B'_{u+1}|}{b_{u+1}}\leq \binom {KNb_{u}}{b_{u+1}} \leq {\left ( \frac {eKNb_{u}}{b_{u+1}}\right )}^{b_{u+1}}
\leq {\left ( (3/4)eKN\right )}^{b_{u+1}} \leq {\left ( 2.1KN\right )}^{b_{u+1}}\koz,
\end{equation*}
where we used~\eqref{e.binom-bound}.
Similarly, the number of different choices for $C$
is bounded by
\begin{equation*}
\binom{KNb_{v-1}}{c_{v}} \leq \mu ^{c_{v}} \text{ with }
\mu =2.1KN \koz.
\end{equation*}
It follows that the number of choices for the whole sequence
$B_{1},\dots ,B_{v-1},C$ is bounded by
\begin{equation*}
\mu ^{b_{1}+\dots +b_{v-1}+c_{v}} \koz.
\end{equation*}
On the other hand, the probability for a fixed $C$ to have
$C\subseteq E_{v}$ is $\leq \epsilon ^{c_{v}}$.
This way, we can bound the probability that the sequence
ends exactly at $v$ by
\begin{equation*}
p_{v} \leq \epsilon ^{c_{v}}\mu ^{b_{1}+\dots +b_{v-1}+c_{v}}
\leq \epsilon ^{c_{v}}\mu ^{10 c_{v}} \koz,
\end{equation*}
where we used~\eqref{e.b-sum}. For small $v$ this gives
\begin{equation*}
p_{0} \leq \epsilon ,\quad
p_{1}\leq \epsilon \mu ,\quad
p_{2}\leq \epsilon \mu ^{3},\quad
p_{3}\leq \epsilon ^{2}\mu ^{6},\quad
p_{4}\leq \epsilon ^{2}\mu ^{10},\quad
p_{5}\leq \epsilon ^{3}\mu ^{16} \koz.
\end{equation*}
Therefore
\begin{equation*}
\sum _{v=0}^{\infty } p_{v} \leq \sum _{v=0}^{5} p_{v}
+\sum _{v=6}^{\infty } (\mu ^{10}\epsilon )^{c_{v}}
\leq \epsilon (1+\mu +\mu ^{3}) + \epsilon ^{2}(\mu ^{6}+\mu ^{10})+
\frac {\epsilon ^{3}\mu ^{16}}{1-\epsilon \mu ^{10}} \koz,
\end{equation*}
where we used $\epsilon \mu ^{10}<1$ and
the property $c_{v+1}>c_{v}$ for $v\geq 5$.
We can bound the last expression by $c\epsilon $ with an appropriate constantb $c$.
We found that the event $Y_{t}(j) \neq Y_{0}(j)$ happens either if
there is a time $t'0$, our graph $B$ is an $(N,K, \alpha , \lambda , n)$-expander\inddef{expander} if
$B$ expands every subset $E$ of size $\leq \alpha n$ of the left set
by a factor $\lambda $.
\end{defi}
\begin{figure}
\begin{equation*}
\includegraphics{figs/expander.eps}
\end{equation*}
\caption{\label{f.expander}A regular expander.}
\end{figure}
\begin{defi}
Given an $(N,K)$-regular bipartite multigraph $B$, with left set
$\{u_{1},\dots ,u_{n}\}$ and right set $\{v_{1},\dots ,v_{k}\}$,
we assign to it a parity-check code $\boldsymbol {H}(B)$ as follows:
$h_{ij}=1$ if $v_{i}$ is connected to $u_{j}$, and 0 otherwise.
\end{defi}
Now for every possible error set $E$, the set $\mathrm {Nb}(E)$ describes the set of
parity checks that the elements of $E$ participate in.
Under some conditions, the lower bound on the size of $\mathrm {Nb}(E)$ guarantees
that a sufficient number of errors will be corrected.
\begin{tetel}\label {t.expander-code} Let $B$ be an $(N, K, \alpha , (7/8)N, n)$-expander with integer $\alpha n$.
Let $k=Nn/K$.
Then $\boldsymbol {H}(B)$ is a $((3/4)\alpha , 1/2, K,N, k,n)$-refresher.
\end{tetel}
\begin{biz}
More generally, for any $\epsilon >0$,
let $B$ be an $(N,K,\alpha ,(3/4+\epsilon )N,n)$-expander with integer $\alpha n$.
We will prove that $\boldsymbol {H}(B)$ is a
$(\alpha (1 + 4\epsilon )/2, (1-4\epsilon ), K,N, k,n)$-refresher.
For an $n$-dimensional bit vector $\boldsymbol {x}$ with $A=\mathopen \{\,j: x_{j}=1\,\mathclose \}$,
$a=|A|=|\boldsymbol {x}|$, assume
\begin{equation}\label {e.a-ub}
a\leq n\alpha (1 + 4\epsilon )/2 \koz .
\end{equation}
Our goal is to show $|\boldsymbol {x}^{\boldsymbol {H}}|\leq a(1-4\epsilon )$: in other words, that in
the corrected vector the number of errors decreases at least by a factor of
$(1-4\epsilon )$.
Let $F$ be the set of bits in $A$ that the error correction operation
fails to flip, with $f=|F|$,
and $G$ the set of of bits that were 0 but the operation turns them to 1,
with $g=|G|$.
Our goal is to bound $|F\cup G| = f+g$.
The key observation is that each element of $G$ shares at least
half of its neighbors with elements of $A$, and similarly,
each element of $F$ shares at least half of its neighbors
with other elements of $A$.
Therefore both $F$ and $G$ contribute relatively weakly
to the expansion of $A\cup G$.
Since this expansion is assumed strong, the size of $|F\cup G|$ must be
limited.
Let
\begin{equation*}
\delta = |\mathrm {Nb}(A)|/(Na) \koz .
\end{equation*}
By expansion, $\delta \geq 3/4+\epsilon $ \koz.
First we show $|A\cup G| \leq \alpha n$.
Assume namely that, on the contrary,
$|A\cup G| > \alpha n$, and let $G'$ be a subset of $G$ such that
$|A\cup G'|=\alpha n=: p$ (an integer, according to the assumptions of the
theorem).
By expansion,
\begin {equation*}
(3/4+\epsilon )Np \leq \mathrm {Nb}(A\cup G')\koz.
\end {equation*}
Each bit in $G$ has at most $N/2$ neighbors that are not neighbors of
$A$; so,
\begin {equation*}
|\mathrm {Nb}(A\cup G')| \leq \delta Na+ N(p - a)/2 \koz.
\end {equation*}
Combining these:
\begin {align*}
\delta a+ (p - a)/2 &\geq (3/4+\epsilon ) p,
\\ a&\geq p (1+4\epsilon )/(4\delta -2) \geq \alpha n(1+4\epsilon )/2,
\end {align*}
since $\delta \leq 1$.
This contradiction with~\eqref {e.a-ub} shows $|A\cup G| \leq \alpha n$.
Now $|A\cup G|\leq \alpha n$ implies (recalling that each element of $G$
contributes at most $N/2$ new neighbors):
\begin {align}
\nonumber (3/4 + \epsilon )N(a+g) &\leq |\mathrm {Nb}(A\cup G)| \leq \delta Na+ (N/2) g,
\\\nonumber (3/4 + \epsilon )(a+g) &\leq \delta a+ g/2 \koz,
\\\label {e.g-v}
(3/4 + \epsilon )a+(1/4+\epsilon )g&\leq \delta a \koz.
\end {align}
Each $j\in F$ must share at least half of its neighbors with others in $A$.
Therefore $j$ contributes at most $N/2$ neighbors on its own;
the contribution of the other $N/2$ must be divided by 2, so the
the total contribution of $j$ to the neighbors of $A$ is at most
$(3/4)N$:
\begin {align*}
\delta Na&= \mathrm {Nb}(A) \leq N(a-f) + (3/4)Nf= N(a- f/4) \koz,
\\ \delta a&\leq a- f/4 \koz.
\end {align*}
Combining with~\eqref {e.g-v}:
\begin {align*}
(3/4 + \epsilon )a+(1/4+\epsilon )g&\leq a- f/4 \koz,
\\ (1-4\epsilon )a&\geq f+ (1+4\epsilon )g\geq f+g \koz.
\end {align*}
\end {biz}
\subsubsection{Random expanders.} Are there expanders good enough for Theorem~\ref{t.expander-code}?
The maximum expansion factor is the degree $N$ and we require a factor of $(7/8)N.$
It turns out that random choice works here, too, similarly to the one used
in the ``construction'' of compressors.
The choice has to be done in a way that the result is an
$(N,K)$-regular bipartite multigraph of left size $n$.
We will start with $Nn$ left nodes $u_{1},\dots ,u_{Nn}$
and $Nn$ right nodes $v_{1},\dots ,v_{Nn}$.
Now we choose a random \ki {matching}, that is a set of $Nn$ edges with
the property that every left node is connected by an edge
to exactly one right node.
Let us call the resulting graph $M$.
We obtain $B$ now as follows: we collapse each group of $N$ left nodes
into a single node: $u_{1},\dots ,u_{N}$ into one node,
$u_{N+1},\dots ,u_{2N}$ into another node, and so on.
Similarly, we collapse each group of $K$ right nodes
into a single node: $v_{1},\dots ,v_{K}$ into one node,
$v_{K+1},\dots ,v_{2K}$ into another node, and so on.
The edges between any pair of nodes in $B$ are inherited from the
ancestors of these nodes in $M$.
This results in a graph $B$ with $n$ left nodes of degree $N$ and
$nN/K$ right nodes of degree $K$.
The process may give multiple edges between nodes of $B$, this is why $B$
is a multigraph.
Two nodes of $M$ will be called
\ki {cluster neighbors} if they are collapsed to the same node of $B$.
\begin{tetel}\label {t.random-expander}
Suppose
\begin {equation*}
0 < \alpha \leq e^{\frac {-1}{N/8-1}}\cdot (22K)^{\frac {-1}{1-8/N}} \koz.
\end {equation*}
Then the above random choice gives an $(N, K, \alpha , (7/8)N, n)$-expander
with positive probability.
\end{tetel}
\begin{pelda}\label {x.expander-1}
If $N=48$, $K=60$ then the inequality in the condition
of the theorem becomes
\begin{equation*}
\alpha \leq 1/6785 \koz.
\end{equation*}
\end{pelda}
\begin{biz}
Let $E$ be a set of size $\alpha n$ in the left set of $B$.
We will estimate the probability that $E$ has too few neighbors.
In the above choice of the graph $B$ we might as well start with
assigning edges to the nodes of $E$, in some fixed order of the $N|E|$
nodes of the preimage of $E$ in $M$.
There are $N|E|$ edges to assign.
Let us call a node of the right set of $M$ \ki {occupied} if it
has a cluster neighbor already reached by an earlier edge.
Let $X_{i}$ be a random variable that is 1 if the $i$th edge goes to an
occupied node and 0 otherwise.
There are
\begin {equation*}
Nn-i+1 \geq Nn-N\alpha n= Nn(1-\alpha )
\end {equation*}
choices for the $i$th edge and at most
$KN|E|$ of these are occupied.
Therefore
\begin {equation*}
\mathbf {P}\mskip 1mu\mathopen [\,X_{i}=1 \mid X_{1},\dots ,X_{i-1}\,\mathclose ]
\leq \frac {KN|E|}{Nn(1-\alpha )} = \frac {K\alpha }{1-\alpha } =: p \koz.
\end {equation*}
Using the large deviations theorem in the generalization given in
Exercise~\ref {xrc.dependent}, we have, for $f>0$:
\begin {equation*}
\mathbf {P}\mskip 1mu\mathopen [\,\sum _{i=1}^{N\alpha n} X_{i} \geq fN\alpha n\,\mathclose ]
\leq e^{-N\alpha nD(f, p)} \leq {\left ( \frac {e p}{f}\right )}^{fN\alpha n} \koz.
\end {equation*}
Now, the number of different neighbors of $E$ is $N\alpha n-\sum _{i} X_{i}$,
hence
\begin {equation*}
\mathbf {P}\mskip 1mu\mathopen [\,N(E) \leq N\alpha n(1-f)\,\mathclose ]
\leq {\left ( \frac {e p}{f}\right )}^{fN\alpha n}
= {\left ( \frac {eK\alpha }{f(1-\alpha )}\right )}^{fN\alpha n} \koz.
\end {equation*}
Let us now multiply this with the number
\begin {equation*}
\sum _{i\leq \alpha n}\binom {n}{\alpha n} \leq (e/\alpha )^{\alpha n}
\end {equation*}
of sets $E$ of size $\leq \alpha n$:
\begin {align*}
{\left ( \frac {e}{\alpha }\right )}^{\alpha n}{\left ( \frac {eK\alpha }{f(1-\alpha )}\right )}^{fN\alpha n}
&= {\left ( \alpha ^{fN-1}e{\left ( \frac {eK}{f(1-\alpha )}\right )}^{fN}\right )}^{\alpha n}
\leq {\left ( \alpha ^{fN-1}e{\left ( \frac {eK}{0.99f}\right )}^{fN}\right )}^{\alpha n} \koz,
\end {align*}
where in the last step we assumed $\alpha \leq 0.01$.
This is $<1$ if
\begin {equation*}
\alpha \leq e^{\frac {-1}{fN-1}}{\left ( \frac {eK}{0.99f}\right )}^{\frac {-1}{1-1/(fN)}} \koz.
\end {equation*}
Substituting $f=1/8$ gives the formula of the theorem.
\end {biz}
\begin{biz}[Proof of Theorem~\protect \ref {t.ec-matrix}]
Theorem~\ref {t.expander-code} shows how to get a refresher from an
expander, and
Theorem~\ref {t.random-expander} shows the existence of
expanders for certain parameters.
Example~\ref {x.expander-1} shows that the parameters can be chosen as
needed for the refreshers.
\end{biz}
\vspace{-4mm}
\begin{gyak}
\ujgyak
Prove Proposition~\ref{p.linalg}.
\ujgyak\label{xrc.no-coda}
Apply the ideas of the proof of Theorem~\ref{t.no-decoding}
to the proof of Theorem~\ref{t.rel-bool}, showing that the ``coda''
circuit is not needed: each wire of the output cable carries the correct
value with high probability.
\end {gyak}
\begin{fld}
\ujfld{Critical value}{
Consider\label{xrc.crit} a circuit $\mathcal{M}_{k}$ like in Exercise~\ref{xrc.fooling-maj},
assuming that each gate fails with probability $\le \epsilon$ independently
of all the others and of the input.
Assume that the input vector is all 0, and let $p_{k}(\epsilon)$ be the probability
that the circuit outputs a 1. Show that there is
a value $\epsilon_{0}<1/2$ with the property that for all $\epsilon < \epsilon_{0}$ we
have $\lim _{k\to \infty} p_{k}(\epsilon)=0$, and for $\epsilon_{0}<\epsilon \le 1/2$,
we have have $\lim _{k \to \infty} p_{k}(\epsilon)=1/2$.
Estimate also the speed of convergence in both cases.}
\ujfld{Regular compressor}{
We defined\label{xrc.reg-compr} a compressor\felindex{compressor} as a $d$-halfregular bipartite multigraph.
Let us call a compressor \ki{regular} if it is a
$d$-regular multigraph (the input nodes also have degree $d$).
Prove a theorem similar to Theorem~\ref{t.compr}:
for each $\gamma < 1$ there is an integer $d > 1$ and an $\alpha>0$
such that for all integer $k>0$ there is a regular
$(\d, \alpha,\gamma, k)$-compressor.
\textit{Hint.} Choose a random $d$-regular bipartite multigraph by the following process:
(1. Replace each vertex by a group of $d$ vertices.
2. Choose a random complete matching betwen the new input and output vertices.
3. Merge each group of $d$ vertices into one vertex again.)
Prove that the probability, over this choice, that a
$d$-regular multigraph is a not a compressor is small. For this, express the probability
with the help of factorials and estimate the factorials using Stirling's\felindex{Stirling's formula} formula.}
\ujfld{Two-way expander}{
Recall the definition of expanders.\gyakindex{expander}
Call a $(d, \alpha, \lg, k)$-expander \ki{regular} if it is a
$d$-regular multigraph (the input nodes also have
degree $d$).
We will call this multigraph a \ki{two-way expander} if it is an
expander in both directions: from $A$ to $B$ and from $B$ to $A$.
Prove a theorem similar to the one in Problem~\ref{xrc.reg-compr}:
for all $\lg < d$ there is an $\alpha>0$
such that for all integers $k>0$ there is a two-way regular
$(d, \alpha, \lg, k)$-expander.}
\ujfld{Restoring organ from 3-way voting}{
The proof\label{xrc.many-layers} of Theorem~\ref{t.compr} did not guarantee a
$(\d, \alpha, \gamma, k)$-compressor with any $\gamma <1/2$, $\d<7$.
If we only want to use 3-way majority gates, consider the following
construction.
First create a 3-halfregular bipartite graph $G$ with inputs
$u_{1},\dots,u_{k}$ and
outputs $v_{1},\dots,v_{3k}$, with a 3-input majority
gate in each $v_{i}$.
Then create new nodes $w_{1},\dots,w_{k}$, with a 3-input majority
gate in each $w_{j}$.
The gate of $w_{1}$ computes the majority of $v_{1},v_{2},v_{3}$,
the gate of $w_{2}$ computes the majority of $v_{4},v_{5},v_{6}$, and so on.
Calculate whether a random choice of the graph $G$
will turn the circuit with inputs $(u_{1},\dots,u_{k})$ and outputs
$(w_{1},\dots,w_{k})$ into a restoring organ.
Then consider three stages instead of two, where $G$ has $9k$ outputs
and see what is gained.}
\ujfld{Restoring organ from NOR gates}{
The majority gate is not the only gate capable of strengthening the majority.
Recall the NOR gate introduced in Exercise~\ref{xrc.nor}, and form
$\mathrm{NOR}_{2}(x_{1},x_{2},x_{3},x_{4}) =
(x_{1}\mathrm{NOR} x_{2})\mathrm{NOR}(x_{3}\mathrm{NOR} x_{4})$.
Show that a construction similar to Problem~\ref{xrc.many-layers} can be
carried out with $\mathrm{NOR}_{2}$ used in place of 3-way majority gates.}
\ujfld{More randomness, smaller restoring organs}{
Taking\label{xrc.dobr-restoring} the notation of Exercise~\ref{xrc.input-indep-errors},
suppose like there, that the random variables
$F_{v}$ are independent of each other, and their
distribution does not depend on the Boolean input vector.
Apply the idea of Exercise~\ref{xrc.dobr-endgame} to the construction of
each restoring organ.
Namely, construct a different restoring organ for each
position: the choice depends on the circuit preceding this position.
Show that in this case, our error estimates can be significantly improved.
The improvement comes, just as in Exercise~\ref{xrc.dobr-endgame},
since now we do not have to multiply the error probability by
the number of all possible sets of size $\le\alpha k$ of tainted wires.
Since we know the distribution of this set, we can average over it.}
\ujfld{Near-sorting with expanders}{
In this problem,\label{xrc.AKS-near-sort} we show that expanders can be used for ``near-sorting''.
Let $G$ be a regular two-way
$(d, \alpha,\lg, k)$-expander, whose two parts of size $k$
are $A$ and $B$.
According to a theorem of Kõnig, (the edge-set of)
every $d$-regular bipartite multigraph
is the disjoint union of (the edge-sets of) $d$ complete matchings
$M_{1},\dots,M_{d}$.
To such an expander, we assign a Boolean circuit of depth $d$ as follows.
The circuit's nodes are subdivide into levels $i=0,1,\dots,d$.
On level $i$ we have two disjoint sets $A_{i},B_{i}$ of size $k$ of nodes
$a_{ij}$, $b_{ij}$ ($j=1,\dots,k$).
The Boolean value on $a_{ij},b_{ij}$ will be $x_{ij}$ and $y_{ij}$
respectively.
Denote the vector of $2k$ values at stage $i$ by
$\boldsymbol{z}_{i} = (x_{i1},\dots,y_{ik})$.
If $(p,q)$ is an edge in the matching $M_{i},$
then we put an $\wedge$ gate into $a_{ip}$, and a $\vee$ gate into
$b_{iq}$:
\begin{equation*}
x_{ip} = x_{(i-1)p}\wedge y_{(i-1)q}, \quad
y_{iq} = x_{(i-1)p}\vee y_{(i-1)q} \koz .
\end{equation*}
This network is trying to ``sort''
the 0's to $A_{i}$ and the 1's to $B_{i}$ in $d$ stages.
More generally, the values in the vectors $\boldsymbol{z}_{i}$
could be arbitrary numbers.
Then if $x\wedge y$ still means $\min(x,y)$ and
$x\vee y$ means $\max(x,y)$ then each vector $\boldsymbol{z}_{i}$
is a permutation of the vector $\boldsymbol{z}_{0}.$
Let $\mathbf{G}=(1+\lambda)\alpha.$
Prove that $\boldsymbol{z}_{d}$ is $\mathbf{G}$-\ki{sorted} in the sense that
for all $m,$ at least $\mathbf{G}m$ among the $m$ smallest values
of $\boldsymbol{z}_{d}$ is in its left half and at least $\mathbf{G}m$ among the $m$
largest values are in its right half.}
\ujfld{Restoring organ from near-sorters}{
Develop\label{xrc.AKS-restoring} a new restoring organ using expanders, as follows.
First, split each wire of the input cable $A$,
to get two sets $A'_{0},B'_{0}$.
Attach the $\mathbf{G}$-sorter of Problem~\ref{xrc.AKS-near-sort}, getting
outputs $A'_{d},B'_{d}$.
Now split the wires of $B'_{d}$ into two sets $A''_{0},B''_{0}$.
Attach the $\mathbf{G}$-sorter again, getting
outputs $A''_{d},B''_{d}$.
Keep only $B=A''_{d}$ for the output cable.
Show that the Boolean vector circuit leading from $A$ to $B$ can be used as
a restoring organ.}
\end{fld}
\begin{fejmegj}
The large deviation theorem (Theorem~\ref{t.large-dev}), or theorems
similar to it, are sometimes attributed to Chernoff or Bernstein.
One of its frequently used variants is given in
Exercise~\ref{xrc.Chernoff}.
The problem of reliable computation with unreliable components was
addressed by John von Neumann in~\cite{Neu56} on the model of logic circuits.
A complete proof of the result of that paper (with a different restoring
organ) appeare first in the paper \cite{DobrOrtyUp77} of R.~L.~Dobrushin and S.~I.~Ortyukov.
Our presentation relied on parts of the paper \cite{Pip85} of N.~Pippenger.
The lower-bound result of Dobrushin and Ortyukov in the
paper~\cite{DobrOrtyLo77} (corrected
in~\cite{Pip91}, \cite{ReischSchm91} and \cite{GacsGal94}), shows
that reduncancy of $\log n$ is unavoidable for a
general reliable computation whose complexity is $n$.
However, this lower bound only shows the
necessity of putting the input into a redundantly encoded form
(otherwise critical information may be lost in the first step).
As shown in~\cite{Pip85},
for many important function classes, linear redundancy is achievable.
It seems natural to separate the cost of the initial encoding:
it might be possible to perform
the rest of the computation with much less redundancy.
An important step in this direction has been made by D.~Spielman in the
paper \cite{Spi96}\nevindex{Spielman, Daniel A.} in (essentially) the clocked-circuit model.
Spielman takes a parallel computation with time $t$ running on $w$
elementary components and makes it reliable using only
$(\log w)^{c}$ times more processors and running it $(\log w)^{c}$ times longer.
The failure probability will be $t\mathrm {exp}(-w^{1/4}).$
This is small as long as $t$ is not much larger than $\mathrm{exp}(w^{1/4})$.
So, the redundancy is bounded by some power of the logarithm of the \emph{space
requirement}; the time requirement does not enter explictly.
In Boolean circuits no time- and space- complexity is defined separately.
The size of the circuit is analogous to the quantity obtained in other
models by taking the product of space and time complexity.
Questions more complex than Problem~\ref{xrc.crit} have been studied
in~\cite{PippMaj89}. The method of Problem~\ref{xrc.reg-compr}, for generating random
$d$-regular multigraphs is analyzed for example
in~\cite{BenderCanfield78}\nevindex{Bender, E.}\nevindex{Canfield, R.}.
It is much harder to generate simple regular graphs (not multigraphs) uniformly.
See for example~\cite{KimVu03}.\nevindex{Kim, J. H.}\nevindex{Vu, V. H.}
The result of Exercise~\ref{xrc.large-bool} is due to C.~Shannon,
see~\cite{ShannonCirc49}.\nevindex{Shannon, Claude Elwood (1916--2001)}
The asymptotically best circuit size for the worst functions
was found by Lupanov in~\cite{LupanovCircUB58}.\nevindex{Lupanov, Oleg Borisovitsch}
Exercise~\ref{xrc.iter-maj-converg} is based
on~\cite {DobrOrtyUp77},\nevindex{Dobrushin, Roland Lvovitsch (1929--1995)}
and Exercise~\ref{xrc.dobr-ort-lb} is based on~\cite{DobrOrtyLo77}\nevindex{Ortyukov, S. I.}
(and its corrections).
Problem~\ref{xrc.AKS-near-sort} is based on the starting idea of
the $\lg n$ depth sorting networks in~\cite{AKMSort83}.
For storage in Boolean circuits
we partly relied on A.~V.~Kuznietsov's paper~\cite{Kuz73}
(the main theorem, on the existence of refreshers is from M.~Pinsker).
Low density parity check codes were introduced by R.~G.~Gallager\nevindex{Gallager, Robert G.} in the
book~\cite{Gal63}, and their
use in reliable storage was first suggested by M.~G.~Taylor\nevindex{Taylor, M. G.} in the
paper~\cite{Tay68}.
New, constructive versions of these codes were developed by
M.~Sipser\nevindex{Sipser, Michael} and D.~Spielman\nevindex{Spielman, Daniel A.}
in the paper \cite{Spi95}, with superfast coding and decoding.
Expanders, invented by Pinsker in~\cite{Pinsker73} have been used
extensively in theoretical computer science: see
for example~\cite{MotwaniRa95}\nevindex{Motwani, Rajeev}\nevindex{Raghavan, P.} for some more detail.
This book also gives references on the construction of graphs with large
eigenvalue-gap. Exercise~\ref{xrc.randomized-maj} and Problem~\ref{xrc.dobr-restoring} are
based on~\cite{DobrOrtyUp77}.
The use of expanders in the role of refreshers was suggested by Pippenger
(private communication): our exposition follows Sipser and
Spielman in~\cite{SipserSpi96}.
Random expanders were found for example by Pinsker.
The needed expansion rate ($>3/4$ times the left degree) is larger than
what can be implied from the size of the eigenvalue gap.
As shown in~\cite{Pinsker73} (see the proof in
Theorem~\ref{t.random-expander}) random expanders have the needed expansion rate.
Lately, constructive expanders with nearly maximal expansion rate were
announced by Capalbo, Reingold, Vadhan and Wigderson\nevindex{Wigderson, Avi}
in~\cite{Capalbo02}.
Reliable computation is also possible in a model of parallel computation
that is much more regular than logic circuits: in cellular automata.
We cannot present those results here: see for example the papers
\cite{Gacs01,GacsReif3dim88}\nevindex{Gács, Péter}.\index{reliable|)}
\end{fejmegj}
\newpage