Automata Theory

In Automata Theory, an Automaton is
a self-propelled computing device that follows predetermined (self-contained) instructions automatically on input.

As the point of Automata Theory is to model real-life computation machines, their formal definitions vary by author according to what exactly the author is trying to model.
Below is an overview of a basic, generic definition of an automaton which then specializes to those generally used to study Formal Languages & Grammar.

An Automaton is a 5-tuple M=Σ,Γ,Q,δ,λM = \braket{\Sigma, \Gamma, Q, \delta, \lambda}:

An automaton is finite when QQ is finite.
λ\lambda and Γ\Gamma are “optional”; an automaton is not required to “output symbols”.

Automata can be nicely visualized (and represented) as transition diagrams – directed graphs whose vertices are states and edges are transitions (labelled with the necessary symbol(s) to “traverse” them).

Input

An input (word/string) ww is a string of symbols a1a2a3ana_1 a_2 a_3 \dots a_n for anΣa_n \in \Sigma. In Kleene Star notation, wΣw \in \Sigma^\ast. (Recall a language, LΣL \subseteq \Sigma^\ast.)
An automaton reads (consumes) an input word one symbol at a time.

Note only ω\omega-languages are not considered, and as such the inputs to all automata are finite in length.

Run

A run is a sequence of states q0q1q2qnq_0 q_1 q_2 \dots q_n where qnQq_n \in Q and qi+1=δ(qi,ai+1)q_{i+1} = \delta(q_i, a_{i+1}).

I.e., an automaton starts at state q0q_0, then for each input symbol, advances one state according to the transition function δ\delta. After reading the entire input word, the automaton’s run ends on state qnq_n, the final state.

Should the automaton have a λ\lambda, it also emits an output symbol bΓb \in \Gamma for each transition where bi+1=λ(qi,ai+1)b_{i+1} = \lambda(q_i, a_{i + 1}).

The inductive extension of δ\delta into δˉ:Q×ΣQ\bar \delta : Q \times \Sigma^\ast \mapsto Q maps entire input words to final states.

δˉ(q,w){qw=εδ(δˉ(q,u),a)w=ua,aΣ\bar \delta(q, w) \coloneqq \begin{cases} q & w = \varepsilon \\ \delta\big(\bar\delta(q, u), a\big) & w = ua, a \in \Sigma \end{cases} \\[-1.5em]
where ε\varepsilon is the empty string.
λˉ\bar\lambda can be constructed in the same way.

This abstracts away the “process of computation” and models automata as input-output functions.
For automata that do not have a λ\lambda (i.e., that do not output symbols), their “output” is taken as qnq_n, the final state.

Acceptor

To ship the two lovebirds automata and Formal Languages, we equip an (λ\lambda, Γ\Gamma-less) automaton with

to form an acceptor (the variant of automata used to study formal languages, and everything discussed hereafter).

Observe that acceptors are boolean-output functions.


Accepting Word
A word wΣw \in \Sigma^\ast is an accepting word for acceptor MM if δˉM(q0,w)FM\bar\delta^M(q_0, w) \in F^M.
Recognized Language
The language recognized by (or the language of) an acceptor is the set of all its accepting words. I.e.,
MM recognizes LΣL \subseteq \Sigma^\ast iff L={wΣMδˉM(q0M,w)FM}L = \big\{ w \in \Sigma^{M\ast} \mid \bar\delta^M(q_0^M, w) \in F^M \big\}.
I.e., MM will tell you for all wΣw \in \Sigma^\ast if wLw \in L – if ww is a sentential form.
Recognizable Languages
The (class of) recognizable languages for a class of automata is, well, the set of languages each of which is recognized by some automata of said class.
(See below for two automata classes: Finite and Pushdown.)

In addition, automata are commonly modeled with forms of memory (e.g., pushdown automata, below) and non-functional transitions (one-to-many, aka. nondeterminism).
Nondeterministic automata can be in more than one state at once, due to transitions with multiple output states; i.e., δ\delta is a binary relation, and no longer a function.

Note that nondeterminism here solely refers to the ability for an automaton to be in more than one state at once; it does not mean the automaton produces nondeterministic results (runs are always deterministic).

Automaton Class Recognizable Languages
Deterministic/Nondeterministic Finite State Machine (FSM) Regular (Type-3)
Deterministic Pushdown Automaton (DPDA) (deterministic) Context-Free
Pushdown Automaton (NPDA) Context-Free (Type-2)
Linear-bounded Automaton (LBA) Context-Sensitive (Type-1)
Turing Machine (TM) Recursively Enumerable (Type-0)

Finite Automata

Aka. Finite State Machine (FSM).

A Finite Automaton (FA) is a 5-tuple Q,Σ,δ,q0,F\braket{Q, \Sigma, \delta, q_0, F} (see acceptor above):

Note that FA has no memory / auxiliary storage. Transitions depend purely on current state and input symbol.

Nondeterministic FA (NFA) and Deterministic FA (DFA) are equivalent (always inter-convertible).

Recognizes all Regular (Type-3) Languages.
LL is regular iff there exists a DFA whose language is LL.

Completeness:
A DFA is complete if δ\delta is total.
Technically all DFAs should be complete, implying, in practice, that we explicitly define “dead”/“trash” states… but if not, undefined transitions are assumed to denote failure (immediate rejection), or the transition into an implicit non-accepting dead state.

Nondeterministic Finite Automaton

ε\bm\varepsilon-Closure
As we now have ε\varepsilon-transitions, for any current state, all other states accessible via ε\varepsilon-transition(s) are also current-states.
The set of ε\varepsilon-reachable states from a given state (which includes itself) is its ε\varepsilon-closure.
I.e., the transitive closure of undefinedε\xrightarrow{\varepsilon}.

NFA \bm \to DFA
For NFA N=Q,Σ,δ,q0,FN = \braket{Q, \Sigma, \delta, q_0, F}, its equivalent DFA is M=Q,Σ,δ,q0,FM = \braket{Q', \Sigma', \delta', q_0', F'}:

where E:P(Q)P(Q)E : \mathcal{P}(Q) \mapsto \mathcal{P}(Q) is the union of the ε\varepsilon-closures for each of the states in its inputs.

Closure

Observe that L=\varnothing \circ L = \varnothing and ={ε}\varnothing^\ast = \{\varepsilon\}.

For (complete) DFA M1M_1, Mˉ1\bar M_1 (complement) has:

Observe that the complement of a DFA is itself with all accepting and non-accepting states swapped.

For DFAs M1M_1 and M2M_2, M1M2M_1 \cup M_2 (union w/ M2M_2) has:

Observe that this is just an NFA (with two possible simultaneous states, one in M1M_1, other in M2M_2) converted to a DFA.

For DFAs M1M_1 and M2M_2, M1M2M1 \circ M_2 (concatenation w/ M2M_2) has (as an NFA):

Observe that the only nontrivial change is adding ε\varepsilon-transitions from M1M_1's accepting states to M2M_2's initial state (and make F1F_1 no longer accept).

For DFA M1M_1, M1M_1^\ast (Kleene star) has:

too lazy to write out formally, but:

  1. add ε\varepsilon-transitions from all accepting states to q0q_0
  2. to also accept ε\varepsilon (as required by ^\ast), introduce new state as new q0q_0, which is an accepting state and has an ε\varepsilon-transition to the old q0q_0.

Pumping Lemma

All sufficiently long sentences (strings) of a regular language can be pumped – have a middle substring repeated (self-concat) an arbitrary number of times while still remaining within the language.
“Sufficiently long” is defined by a length threshold pp.

Finite languages vacuously satisfy the lemma (take p>p > longest sentence).

LRL:pZ+:wL,wp:xyz=w,y>0:xyzL \forall L \in \mathrm{RL} : \exists p \in \Z^+ : \forall w \in L, |w| \ge p : \exists xyz = w, |y| > 0 : xy^\ast z \subseteq L

Also, xyp|xy| \le p.

Note that the pumping lemma is a necessary but not sufficient condition for regularity. (Hence it alone can only be used to disprove, not prove.)

Alternatively, as three properties:

  1. y1|y| \ge 1
  2. xyp|xy| \le p
  3. n0:xynzL\forall n \ge 0 : xy^n z \in L

For LΣL \subseteq \Sigma^\ast,
p:s,sp:(xyz=s:¬P(x,y,z))    L∉RL . \forall p : \exists s, |s| \ge p : \big( \forall xyz = s : \lnot P(x, y, z) \big) \implies L \not\in \mathrm{RL} \ .

where pZ+p \in \Z^+, sLs \in L, and PP is the pumping lemma (the three properties).

Key Limitations

Regular Expressions

Note that various “regex” languages used in programming have long exceeded the capabilities of a (formal) Regular Grammar. Here we speak of a strictly regular regex.

regexaεaΣregex(regexregex)(regexregex)regex \begin{align*} \mathrm{regex} &\coloneqq a \mid \varepsilon \mid \varnothing \qquad a \in \Sigma \\ \mathrm{regex} &\coloneqq (\mathrm{regex} \cup \mathrm{regex}) \mid (\mathrm{regex} \circ \mathrm{regex}) \mid \mathrm{regex}^\ast \\ \end{align*}

In CS-style notation, \cup (union) is demoted by \mid and \circ (concatenation) by juxtaposition. Also, there’s commonly ++ for “one or more” – i.e., a+aaa+ \coloneqq aa^\ast.
Then, operator precedence is 1. parentheses (), 2. star ^\ast, 3. concat., 4. union \mid.

Generalized NFA (GNFA)
NFAs where transitions are regexes

just for the convenience of writing (and useful in manual DFA \to Regex conversion)

Regex \bm\to NFA

Structural induction

Base cases
aa is trivially an DFA (single aa-transition to accepting state)
ε\varepsilon is trivially an DFA (only (initial) state is accepting state)
\varnothing is trivially an DFA (one with no accepting states)
Induction Hypotheses
R1R_1 and R2R_2 are (arbitrary) regexes with equivalent DFAs
Inductive Steps
R1R2R_1 \cup R_2: union closure proof!!!
R1R2R_1 \circ R_2: concat closure proof!!!
R1R_1^\ast: star closure proof!!!

DFA \bm\to Regex
See notes on how to do the regex thing.

Pushdown Automata

Nondeterministic PDAs (NPDA) are more powerful than Deterministic PDAs (DPDA).
NPDAs can recognize all Context-Free (Type-2) Languages, while DPDAs recognize only a subset.
Buuut we won’t talk about DPDAs :) .

A (nondeterministic) Pushdown Automaton ((N)PDA) is 6-tuple Q,Σ,Γ,δ,q0,F\braket{Q, \Sigma, \Gamma, \delta, q_0, F}:

Note that PDAs are FAs equipped with auxiliary storage in the form of a (LIFO) stack.
…And we’re smushing λ\lambda's responsibility of outputting Γ\Gamma symbols into δ\delta.1

In addition to being in a set of states, every point in a run also has an associated stack along its contents. Every transition depends on the top of the stack and manipulates it.

For DFAs, a point in a run is just a state (and remaining input).
For DPDAs, a point in a run is a state, stack, remaining input triple.
The nondeterministic variants have multiple nonlinear “sub-runs” per run.

A δ(,,a)(,b)\delta(\dots, \dots, a) \to (\dots, b) transition replaces the top of the stack (which must be aa) with bb.

ε\varepsilons are also allowed for Γ\Gamma in both δ\delta's input and output, where εγ\varepsilon \to \gamma pushes γ\gamma onto the stack, γε\gamma \to \varepsilon pops (γ\gamma from) the stack, and εε\varepsilon \to \varepsilon leaves the stack unchanged.

Pumping Lemma

LCFL:pZ+:sL,sp:uvxyz=s,vy>0:uvnxynzL \forall L \in \mathrm{CFL} : \exists p \in \Z^+ : \forall s \in L, |s| \ge p : \exists uvxyz = s, |vy| > 0 : uv^n xy^n z \subseteq L

(for any nNn \in \N) and vxyp|vxy| \le p

Again, as three properties:

  1. vy1|vy| \ge 1
  2. vxyp|vxy| \le p
  3. uvnxynzLnNuv^n xy^n z \subseteq L \qquad n \in \N

Branching factor bb is the max number of symbols on the RHS of a rule.
The pumping length of LL is p=bN+1p = b^{|N| + 1} (NN is the set of non-terminals in the grammar).

pp is a min length that guarantees a repeated non-terminal RR (in the parse tree)
pp is an upper bound on the number of terminals generated by the upper RR.

Turing Machine

I typo this as turning machine way too often.

A Turing Machine (TM) is a 7-tuple Q,Σ,Γ,δ,q0,qaccept,qreject\braket{Q, \Sigma, \Gamma, \delta, q_0, q_{\text{accept}}, q_{\text{reject}}}:

Storage (tape) is infinite, like the stack of PDAs.
Except here the input is placed on the tape, and we can read/write arbitrary locations.

Configuration
A configuration is a point in a run which now includes (in addition to the state)
contents of tape and
location of read head.
I.e., a configuration C=wLqiwRC = w_L q_i w_R has

TM M=Q,Σ,Γ,δ,q0,qaccept,qrejectM = \braket{Q, \Sigma, \Gamma, \delta, q_0, q_{\text{accept}}, q_{\text{reject}}} accepts input string ss if there exists a run C1C2CnC_1 C_2 \dots C_n where

  1. C1=q0sC_1 = q_0 s (start by placing head on first symbol of input)
  2. CiCi+1C_i \to C_{i+1} (CiC_i yields Ci+1C_{i+1})
    δ(qi,x)=(qj,y,L)    aqixbqjayb\delta(q_i, \mathrm{x}) = (q_j, \mathrm{y}, \mathrm{L}) \implies a q_i \mathrm{x} b \to q_j a \mathrm{y} b
    δ(qi,x)=(qj,y,R)    aqixbayqib\delta(q_i, \mathrm{x}) = (q_j, \mathrm{y}, \mathrm{R}) \implies a q_i \mathrm{x} b \to a \mathrm{y} q_i b
  3. Cn=qacceptC_n = \dots q_{\text{accept}} \dots

Recognition & Decidability
A TM recognizes a language LL if it accepts all sLs \in L (in finite steps).
A TM decides a language LL if it recognizes LL and rejects all s∉Ls \not \in L (in finite steps).

Recognition: partial function returning true for acceptance.
Decidability: total function always returning true/false.
Co-Turing-Recognizable: recognizes Lˉ\bar L (rejects all non-LL in finite steps).

Turing-decidable languages are a strict subset of recursively enumerable (Type-0) languages.

Some decidable languages:
L={aibjcki×j=ki,j,k1}(arbitrary arithmetic)L={wwwΣ}((sub)string matching)L={D,wDis a DFA that accepts w}(DFA Acceptance)L={DDis a DFAL(D)=}(DFA emptiness)L={G,wG is a CFG that generates w}(CFG Acceptance) \begin{align*} L &= \{ a^i b^j c^k \mid i \times j = k \land i, j, k \ge 1 \} &\text{(arbitrary arithmetic)} \\ L &= \{ ww \mid w \in \Sigma^\ast \} &\text{({\small(}sub{\small)}string matching)} \\ L &= \{ \braket{D, w} \mid \braket{D} \,\text{is a DFA that accepts } w \} &\text{(DFA Acceptance)} \\ L &= \{ \braket{D} \mid \braket{D} \,\text{is a DFA} \land L(D) = \varnothing \} &\text{(DFA emptiness)} \\ L &= \{ \braket{G, w} \mid G \text{ is a CFG that generates } w \} &\text{(CFG Acceptance)} \\ \end{align*}

Chomsky Normal Form (CNF)

Used to make the last (CFG) example above decidable.

Any CFG can be written in Chomsky Normal Form. In CNF, every derivation of a string ww, w=n|w| = n requires exactly 2n12n - 1 steps (rule applications).

Specifically, nn characters are produced by nn (corresponding) non-terminals, of which are expanded from SS in n1n - 1 steps. Then, nn more derivations converts the non-terminal string into ww.

Constraints on Production Rules:
SS (start symbol) cannot appear on RHS of any rule.
ε\varepsilon may appear only as rule SεS \to \varepsilon.
Then, all other rules are in one of two forms:
AaA \to a
ABCA \to BC
where A,B,CNA, B, C \in N (non-terminals) and aΣa \in \Sigma (terminals).

Note that this forces the expansion to be a binary tree, hence the exact bounding on derivation steps shown above.

CFG \bm\to CNF

  1. eliminate RHS SS's AS\quad A \to S\dots
    if any, introduce new start symbol to proxy the old one
  2. eliminate ε\varepsilon-Productions Aε  ()\quad A \to \varepsilon \;(\mid \dots)
    by collapsing into all super-rules (rules that produce AA, like BAB \to A \dots)
  3. eliminate Unit Productions AB\quad A \to B
    by expanding the RHS BB
  4. substitute Terminals
    introduce new non-terminals to proxy the each terminal
    e.g., for AaBA \to aB, introduce CaC \to a to write ACBA \to CB
  5. chop down Long Productions (more than two symbols in RHS)
    e.g., write ABBBA \to BBB as ACBA \to CB, CBBC \to BB.
    (this reduces the RHS by 1 symbol at a time, so it works for all lengths)

This algorithm finishes in finite steps (trust me bro).

Other Computation Theory Stuffs

Universal Turing Machine: we can write a TM that simulates any other TM.
(They must share a finite alphabet)

L={M,wM is a TM that accepts w}L = \{ \braket{M, w} \mid M \text{ is a TM that accepts } w \}, Turing Acceptance, is undecidable. (Because of the halting problem.)

Proof of TM Acceptance Undecidability (by contradiction):
Assume some TM HH decides LL.
Let a TM DD take in any TM NN as string N\braket{N}, and
accepts if [N rejects N\braket{N}];
rejects if [N accepts N\braket{N}].

I.e., DD reverses the self-acceptance of any given input.
DD can do this using HH; it feeds it N,N\braket{N, \braket{N}} (and flips the result).

Consider when DD is given w=Dw = \braket{D} as input:
if HH accepts D,w\braket{D, w}, it means DD accepts ww… but DD will flip HH's result and reject ww… oops.
\therefore No TM decides LL.

An from this it is obvious that Lˉ\bar L is not Turing-recognizable (as it is only co-Turing-recognizable).


  1. because this uni course is oddballs.
    I guess it makes some sense as the stack is not exactly an output string of the automata, and that state transitions also depend on the (top of) the stack. ↩︎