Automata Theory

In Automata Theory, an Automaton is
a self-propelled computing device that follows predetermined (self-contained) instructions automatically on input.

As the point of Automata Theory is to model real-life computation machines, their formal definitions vary by author according to what exactly the author is trying to model.
Below is an overview of a basic, generic definition of an automaton which then specializes to those generally used to study Formal Languages & Grammar.

An Automaton is a 5-tuple $M = \braket{\Sigma, \Gamma, Q, \delta, \lambda}$ :

$\Sigma$ – Input Alphabet
a finite set of symbols
$\Gamma$ – Output Alphabet
a finite set of symbols
$Q$ – States
a set of states
$\delta$ – Transition Function
$\delta : Q \times \Sigma \mapsto Q$
$\lambda$ – Output Function
$\lambda: Q \times \Sigma \mapsto \Gamma$

An automaton is finite when $Q$ is finite.
$\lambda$ and $\Gamma$ are “optional”; an automaton is not required to “output symbols”.

Automata can be nicely visualized (and represented) as transition diagrams – directed graphs whose vertices are states and edges are transitions (labelled with the necessary symbol(s) to “traverse” them).

Input

An input (word/string) $w$ is a string of symbols $a_1 a_2 a_3 \dots a_n$ for $a_n \in \Sigma$ . In Kleene Star notation, $w \in \Sigma^\ast$ . (Recall a language, $L \subseteq \Sigma^\ast$ .)
An automaton reads (consumes) an input word one symbol at a time.

Note only $\omega$ -languages are not considered, and as such the inputs to all automata are finite in length.

Run

A run is a sequence of states $q_0 q_1 q_2 \dots q_n$ where $q_n \in Q$ and $q_{i+1} = \delta(q_i, a_{i+1})$ .

I.e., an automaton starts at state $q_0$ , then for each input symbol, advances one state according to the transition function $\delta$ . After reading the entire input word, the automaton’s run ends on state $q_n$ , the final state.

Should the automaton have a $\lambda$ , it also emits an output symbol $b \in \Gamma$ for each transition where $b_{i+1} = \lambda(q_i, a_{i + 1})$ .

The inductive extension of $\delta$ into $\bar \delta : Q \times \Sigma^\ast \mapsto Q$ maps entire input words to final states.

$\bar \delta(q, w) \coloneqq \begin{cases} q & w = \varepsilon \\ \delta\big(\bar\delta(q, u), a\big) & w = ua, a \in \Sigma \end{cases} \\[-1.5em]$
where $\varepsilon$ is the empty string.
$\bar\lambda$ can be constructed in the same way.

This abstracts away the “process of computation” and models automata as input-output functions.
For automata that do not have a $\lambda$ (i.e., that do not output symbols), their “output” is taken as $q_n$ , the final state.

Acceptor

To ship the two lovebirds automata and Formal Languages, we equip an ( $\lambda$ , $\Gamma$ -less) automaton with

$q_0 \in Q$ , a distinguished state, the start state; and,
$F \subseteq Q$ , a distinguished set of states, the accepting states,

to form an acceptor (the variant of automata used to study formal languages, and everything discussed hereafter).

Observe that acceptors are boolean-output functions.

Accepting Word: A word $w \in \Sigma^\ast$ is an accepting word for acceptor $M$ if $\bar\delta^M(q_0, w) \in F^M$ .
Recognized Language: The language recognized by (or the language of) an acceptor is the set of all its accepting words. I.e.,
$M$ recognizes $L \subseteq \Sigma^\ast$ iff $L = \big\{ w \in \Sigma^{M\ast} \mid \bar\delta^M(q_0^M, w) \in F^M \big\}$ .
I.e., $M$ will tell you for all $w \in \Sigma^\ast$ if $w \in L$ – if $w$ is a sentential form.
Recognizable Languages: The (class of) recognizable languages for a class of automata is, well, the set of languages each of which is recognized by some automata of said class.
(See below for two automata classes: Finite and Pushdown.)

In addition, automata are commonly modeled with forms of memory (e.g., pushdown automata, below) and non-functional transitions (one-to-many, aka. nondeterminism).
Nondeterministic automata can be in more than one state at once, due to transitions with multiple output states; i.e., $\delta$ is a binary relation, and no longer a function.

Note that nondeterminism here solely refers to the ability for an automaton to be in more than one state at once; it does not mean the automaton produces nondeterministic results (runs are always deterministic).

Automaton Class Recognizable Languages

Deterministic/Nondeterministic Finite State Machine (FSM) Regular (Type-3)

Deterministic Pushdown Automaton (DPDA) (deterministic) Context-Free

Pushdown Automaton (NPDA) Context-Free (Type-2)

Linear-bounded Automaton (LBA) Context-Sensitive (Type-1)

Turing Machine (TM) Recursively Enumerable (Type-0)

Automaton Class	Recognizable Languages
Deterministic/Nondeterministic Finite State Machine (FSM)	Regular (Type-3)
Deterministic Pushdown Automaton (DPDA)	(deterministic) Context-Free
Pushdown Automaton (NPDA)	Context-Free (Type-2)
Linear-bounded Automaton (LBA)	Context-Sensitive (Type-1)
Turing Machine (TM)	Recursively Enumerable (Type-0)

Finite Automata

Aka. Finite State Machine (FSM).

A Finite Automaton (FA) is a 5-tuple $\braket{Q, \Sigma, \delta, q_0, F}$ (see acceptor above):

$Q$ – States
$\Sigma$ – Alphabet
$\delta$ – Transition Function
$q_0$ – Initial State
$F$ – Accepting States

Note that FA has no memory / auxiliary storage. Transitions depend purely on current state and input symbol.

Nondeterministic FA (NFA) and Deterministic FA (DFA) are equivalent (always inter-convertible).

Recognizes all Regular (Type-3) Languages.
$L$ is regular iff there exists a DFA whose language is $L$ .

Completeness:
A DFA is complete if $\delta$ is total.
Technically all DFAs should be complete, implying, in practice, that we explicitly define “dead”/“trash” states… but if not, undefined transitions are assumed to denote failure (immediate rejection), or the transition into an implicit non-accepting dead state.

Nondeterministic Finite Automaton

$\delta$ is a binary relation (it can map one-to-many states for the same input symbol) (or as $\delta : Q \times \Sigma \mapsto \mathcal{P}(Q)$ )
$\varepsilon$ -transitions are allowed (transitions may consume no symbols)
no explicit “dead” state needed; NFAs need not be complete

$\bm\varepsilon$ -Closure
As we now have $\varepsilon$ -transitions, for any current state, all other states accessible via $\varepsilon$ -transition(s) are also current-states.
The set of $\varepsilon$ -reachable states from a given state (which includes itself) is its $\varepsilon$ -closure.
I.e., the transitive closure of $\xrightarrow{\varepsilon}$ .

NFA $\bm \to$ DFA
For NFA $N = \braket{Q, \Sigma, \delta, q_0, F}$ , its equivalent DFA is $M = \braket{Q', \Sigma', \delta', q_0', F'}$ :

$Q' \coloneqq \mathcal{P}(Q)$
$\Sigma' \coloneqq \Sigma$
$\delta'(q', a) \coloneqq \bigcup_{q \in q'} E\big(\delta(q, a)\big)$
$q_0' \coloneqq E\big(\{ q_0 \}\big)$
$F' \coloneqq \{ q' \in Q' \mid q' \cap F \ne \varnothing \}$

where $E : \mathcal{P}(Q) \mapsto \mathcal{P}(Q)$ is the union of the $\varepsilon$ -closures for each of the states in its inputs.

Closure

Complement
$\bar L \coloneqq \Sigma^\ast / L$
Union (w/ regular language)
$\coloneqq L_1 \cup L_2$
Concatenation (w/ regular language)
$L_1 \circ L_2 \coloneqq \{ ab \mid a \in L_1 \land b \in L_2 \}$
Kleene star (self-concat)
$L^\ast \coloneqq \{\varepsilon\} \cup L \cup (L \circ L) \cup (L \circ L \circ L) \cup \cdots$

Observe that $\varnothing \circ L = \varnothing$ and $\varnothing^\ast = \{\varepsilon\}$ .

For (complete) DFA $M_1$ , $\bar M_1$ (complement) has:

$F \coloneqq Q_1 / F_1$
everything else the same as $M_1$ 's

Observe that the complement of a DFA is itself with all accepting and non-accepting states swapped.

For DFAs $M_1$ and $M_2$ , $M_1 \cup M_2$ (union w/ $M_2$ ) has:

$Q \coloneqq Q_1 \times Q_2$
$\Sigma \coloneqq \Sigma_1 \cup \Sigma_2$
$\delta\big((q_1, q_2), a \big) \coloneqq \big(\delta_1(q_1, a), \delta_2(q_2, a) \big)$
$q_0 \coloneqq (q_{0_1}, q_{0_2})$
$F \coloneqq (F_1 \times Q_2) \cup (Q_1 \times F_2) = \big\{ (q_1, q_2) \in Q \mid q_1 \in F_1 \lor q_2 \in F_2 \big\}$

Observe that this is just an NFA (with two possible simultaneous states, one in $M_1$ , other in $M_2$ ) converted to a DFA.

For DFAs $M_1$ and $M_2$ , $M1 \circ M_2$ (concatenation w/ $M_2$ ) has (as an NFA):

$Q \coloneqq Q_1 \cup Q_2$
$\Sigma \coloneqq \Sigma_1 \cup \Sigma_2$
$q_0 \coloneqq q_{0_1}$
$F \coloneqq F_2$
$\delta(q, a) \coloneqq \begin{cases} \{ q_{0_2} \} & q \in F_1 \land a = \varepsilon \\ \delta_1(q, a) & q \in Q_1 \\ \delta_2(q, a) & q \in Q_2 \end{cases}$

Observe that the only nontrivial change is adding $\varepsilon$ -transitions from $M_1$ 's accepting states to $M_2$ 's initial state (and make $F_1$ no longer accept).

For DFA $M_1$ , $M_1^\ast$ (Kleene star) has:

too lazy to write out formally, but:

add $\varepsilon$ -transitions from all accepting states to $q_0$

to also accept $\varepsilon$ (as required by $^\ast$ ), introduce new state as new $q_0$ , which is an accepting state and has an $\varepsilon$ -transition to the old $q_0$ .

Pumping Lemma

All sufficiently long sentences (strings) of a regular language can be pumped – have a middle substring repeated (self-concat) an arbitrary number of times while still remaining within the language.
“Sufficiently long” is defined by a length threshold $p$ .

Finite languages vacuously satisfy the lemma (take $p >$ longest sentence).

$\forall L \in \mathrm{RL} : \exists p \in \Z^+ : \forall w \in L, |w| \ge p : \exists xyz = w, |y| > 0 : xy^\ast z \subseteq L$

Also, $|xy| \le p$ .

Note that the pumping lemma is a necessary but not sufficient condition for regularity. (Hence it alone can only be used to disprove, not prove.)

Alternatively, as three properties:

$|y| \ge 1$
$|xy| \le p$
$\forall n \ge 0 : xy^n z \in L$

For $L \subseteq \Sigma^\ast$ ,
$\forall p : \exists s, |s| \ge p : \big( \forall xyz = s : \lnot P(x, y, z) \big) \implies L \not\in \mathrm{RL} \ .$

where $p \in \Z^+$ , $s \in L$ , and $P$ is the pumping lemma (the three properties).

Key Limitations

Inability to count; e.g., balanced parentheses. No internal state.
Failure at recursive structures; e.g., nested loops in programming languages.

Regular Expressions

Note that various “regex” languages used in programming have long exceeded the capabilities of a (formal) Regular Grammar. Here we speak of a strictly regular regex.

$\begin{align*} \mathrm{regex} &\coloneqq a \mid \varepsilon \mid \varnothing \qquad a \in \Sigma \\ \mathrm{regex} &\coloneqq (\mathrm{regex} \cup \mathrm{regex}) \mid (\mathrm{regex} \circ \mathrm{regex}) \mid \mathrm{regex}^\ast \\ \end{align*}$

In CS-style notation, $\cup$ (union) is demoted by $\mid$ and $\circ$ (concatenation) by juxtaposition. Also, there’s commonly $+$ for “one or more” – i.e., $a+ \coloneqq aa^\ast$ .
Then, operator precedence is 1. parentheses (), 2. star $^\ast$ , 3. concat., 4. union $\mid$ .

Generalized NFA (GNFA)
NFAs where transitions are regexes

just for the convenience of writing (and useful in manual DFA $\to$ Regex conversion)

Regex $\bm\to$ NFA

Structural induction

Base cases: $a$ is trivially an DFA (single $a$ -transition to accepting state)
$\varepsilon$ is trivially an DFA (only (initial) state is accepting state)
$\varnothing$ is trivially an DFA (one with no accepting states)
Induction Hypotheses: $R_1$ and $R_2$ are (arbitrary) regexes with equivalent DFAs
Inductive Steps: $R_1 \cup R_2$ : union closure proof!!!
$R_1 \circ R_2$ : concat closure proof!!!
$R_1^\ast$ : star closure proof!!!

DFA $\bm\to$ Regex
See notes on how to do the regex thing.

Pushdown Automata

Nondeterministic PDAs (NPDA) are more powerful than Deterministic PDAs (DPDA).
NPDAs can recognize all Context-Free (Type-2) Languages, while DPDAs recognize only a subset.
Buuut we won’t talk about DPDAs :) .

A (nondeterministic) Pushdown Automaton ((N)PDA) is 6-tuple $\braket{Q, \Sigma, \Gamma, \delta, q_0, F}$ :

$\Gamma$ – Stack Alphabet
$\delta$ – Transition Function
$\delta : Q \times \Sigma_\varepsilon \times \Gamma_\varepsilon \mapsto \mathcal{P}(Q \times \Gamma_\varepsilon)$
everything else – same as those of an NFA

Note that PDAs are FAs equipped with auxiliary storage in the form of a (LIFO) stack.
…And we’re smushing $\lambda$ 's responsibility of outputting $\Gamma$ symbols into $\delta$ .¹

In addition to being in a set of states, every point in a run also has an associated stack along its contents. Every transition depends on the top of the stack and manipulates it.

For DFAs, a point in a run is just a state (and remaining input).
For DPDAs, a point in a run is a $⟨$ state, stack, remaining input $⟩$ triple.
The nondeterministic variants have multiple nonlinear “sub-runs” per run.

A $\delta(\dots, \dots, a) \to (\dots, b)$ transition replaces the top of the stack (which must be $a$ ) with $b$ .

$\varepsilon$ s are also allowed for $\Gamma$ in both $\delta$ 's input and output, where $\varepsilon \to \gamma$ pushes $\gamma$ onto the stack, $\gamma \to \varepsilon$ pops ( $\gamma$ from) the stack, and $\varepsilon \to \varepsilon$ leaves the stack unchanged.

Pumping Lemma

$\forall L \in \mathrm{CFL} : \exists p \in \Z^+ : \forall s \in L, |s| \ge p : \exists uvxyz = s, |vy| > 0 : uv^n xy^n z \subseteq L$

(for any $n \in \N$ ) and $|vxy| \le p$

Again, as three properties:

$|vy| \ge 1$
$|vxy| \le p$
$uv^n xy^n z \subseteq L \qquad n \in \N$

Branching factor $b$ is the max number of symbols on the RHS of a rule.
The pumping length of $L$ is $p = b^{|N| + 1}$ ( $N$ is the set of non-terminals in the grammar).

$p$ is a min length that guarantees a repeated non-terminal $R$ (in the parse tree)
$p$ is an upper bound on the number of terminals generated by the upper $R$ .

Turing Machine

I typo this as turning machine way too often.

A Turing Machine (TM) is a 7-tuple $\braket{Q, \Sigma, \Gamma, \delta, q_0, q_{\text{accept}}, q_{\text{reject}}}$ :

$\Gamma$ – Tape Alphabet
$\epsilon \in \Gamma, \ \Gamma / \{\epsilon\} \supseteq \Sigma$ ( $\epsilon$ is the default “blank” tape state)
$\delta$ – Transition Function
$\delta : Q \times \Gamma \mapsto Q \times \Gamma \times \{\mathrm{L, R}\}$
$q_{\text{accept}}$ , $q_{\text{reject}}$ – Terminating States
these have immediate effect
everything else – same as those of FSM

Storage (tape) is infinite, like the stack of PDAs.
Except here the input is placed on the tape, and we can read/write arbitrary locations.

Configuration
A configuration is a point in a run which now includes (in addition to the state)
contents of tape and
location of read head.
I.e., a configuration $C = w_L q_i w_R$ has

$w_L$ – contents of tape left of read head
$w_R$ – contents of tape under and right of read head
$q_i$ – current state

TM $M = \braket{Q, \Sigma, \Gamma, \delta, q_0, q_{\text{accept}}, q_{\text{reject}}}$ accepts input string $s$ if there exists a run $C_1 C_2 \dots C_n$ where

$C_1 = q_0 s$ (start by placing head on first symbol of input)

$C_i \to C_{i+1}$ ( $C_i$ yields $C_{i+1}$ )
$\delta(q_i, \mathrm{x}) = (q_j, \mathrm{y}, \mathrm{L}) \implies a q_i \mathrm{x} b \to q_j a \mathrm{y} b$
$\delta(q_i, \mathrm{x}) = (q_j, \mathrm{y}, \mathrm{R}) \implies a q_i \mathrm{x} b \to a \mathrm{y} q_i b$

$C_n = \dots q_{\text{accept}} \dots$

Recognition & Decidability
A TM recognizes a language $L$ if it accepts all $s \in L$ (in finite steps).
A TM decides a language $L$ if it recognizes $L$ and rejects all $s \not \in L$ (in finite steps).

Recognition: partial function returning true for acceptance.
Decidability: total function always returning true/false.
Co-Turing-Recognizable: recognizes $\bar L$ (rejects all non- $L$ in finite steps).

Turing-decidable languages are a strict subset of recursively enumerable (Type-0) languages.

Some decidable languages:
$\begin{align*} L &= \{ a^i b^j c^k \mid i \times j = k \land i, j, k \ge 1 \} &\text{(arbitrary arithmetic)} \\ L &= \{ ww \mid w \in \Sigma^\ast \} &\text{({\small(}sub{\small)}string matching)} \\ L &= \{ \braket{D, w} \mid \braket{D} \,\text{is a DFA that accepts } w \} &\text{(DFA Acceptance)} \\ L &= \{ \braket{D} \mid \braket{D} \,\text{is a DFA} \land L(D) = \varnothing \} &\text{(DFA emptiness)} \\ L &= \{ \braket{G, w} \mid G \text{ is a CFG that generates } w \} &\text{(CFG Acceptance)} \\ \end{align*}$

Chomsky Normal Form (CNF)

Used to make the last (CFG) example above decidable.

Any CFG can be written in Chomsky Normal Form. In CNF, every derivation of a string $w$ , $|w| = n$ requires exactly $2n - 1$ steps (rule applications).

Specifically, $n$ characters are produced by $n$ (corresponding) non-terminals, of which are expanded from $S$ in $n - 1$ steps. Then, $n$ more derivations converts the non-terminal string into $w$ .

Constraints on Production Rules:
$S$ (start symbol) cannot appear on RHS of any rule.
$\varepsilon$ may appear only as rule $S \to \varepsilon$ .
Then, all other rules are in one of two forms:
$A \to a$
$A \to BC$
where $A, B, C \in N$ (non-terminals) and $a \in \Sigma$ (terminals).

Note that this forces the expansion to be a binary tree, hence the exact bounding on derivation steps shown above.

CFG $\bm\to$ CNF

eliminate RHS $S$ 's $\quad A \to S\dots$
if any, introduce new start symbol to proxy the old one
eliminate $\varepsilon$ -Productions $\quad A \to \varepsilon \;(\mid \dots)$
by collapsing into all super-rules (rules that produce $A$ , like $B \to A \dots$ )
eliminate Unit Productions $\quad A \to B$
by expanding the RHS $B$
substitute Terminals
introduce new non-terminals to proxy the each terminal
e.g., for $A \to aB$ , introduce $C \to a$ to write $A \to CB$
chop down Long Productions (more than two symbols in RHS)
e.g., write $A \to BBB$ as $A \to CB$ , $C \to BB$ .
(this reduces the RHS by 1 symbol at a time, so it works for all lengths)

This algorithm finishes in finite steps (trust me bro).

Other Computation Theory Stuffs

Universal Turing Machine: we can write a TM that simulates any other TM.
(They must share a finite alphabet)

$L = \{ \braket{M, w} \mid M \text{ is a TM that accepts } w \}$ , Turing Acceptance, is undecidable. (Because of the halting problem.)

Proof of TM Acceptance Undecidability (by contradiction):
Assume some TM $H$ decides $L$ .
Let a TM $D$ take in any TM $N$ as string $\braket{N}$ , and
accepts if [N rejects $\braket{N}$ ];
rejects if [N accepts $\braket{N}$ ].

I.e., $D$ reverses the self-acceptance of any given input.
$D$ can do this using $H$ ; it feeds it $\braket{N, \braket{N}}$ (and flips the result).

Consider when $D$ is given $w = \braket{D}$ as input:
if $H$ accepts $\braket{D, w}$ , it means $D$ accepts $w$ … but $D$ will flip $H$ 's result and reject $w$ … oops.
$\therefore$ No TM decides $L$ .

An from this it is obvious that $\bar L$ is not Turing-recognizable (as it is only co-Turing-recognizable).

~~because this uni course is oddballs.~~
I guess it makes some sense as the stack is not exactly an output string of the automata, and that state transitions also depend on the (top of) the stack. ↩︎