start state has no in-arrows
add new state state with ε-transition to old
singular accept state and has no out-arrows
add new accept state with ε-transitions from all olds
Then, eliminate non-accept/start states one by one, until only start and accept are left.
Eliminating a state:
For all pairs of in-arrows and out-arrows, add a new equivalent “bypass” arrow – the concatenation of the in and the out.
If the state has a self-loop, then sandwich that (with a Kleene star) in the concatenation.
Observe that the number of new arrows to add is #ins × #outs, so eliminate states with the least in×out first!!!
That’s it.
Example
Consider the following 6-state state diagram.
Q2 has 2 ins and 2 outs; to eliminate it would require 4 additional bypass arrows.
Meanwhile, Q0 has only 2×1 = 2, so let’s eliminate Q0 first.
The two in-out pairs are startεQ0bQ1 Q2aQ0bQ1
We therefore add the bypass arrows startbQ1 Q2abQ1
And we can now safely killQ0 (and its attached arrows).
We now have
Oh no! we have two parallel arrows from Q2 to Q1, b and ab?!
Well, they can be union’d into b∣ab:
Now we repeat the process. Q3 (or Q2) now has the least in×out, so let’s eliminate it.
The 2 in-out pairs: Q1aQ3aQ2 Q1aQ3εend
Turns into: Q1aaQ2 Q1aend
So,
And union’ing b and aa,
Now, eliminate Q2.
It only has 1 in-out pair: Q1b∣aaQ2b∣abQ1
So we add Q1(b∣aa)(b∣ab)Q1
(remember to add parentheses)
Only Q1 is left 👀.
It has 1 in and 1 out, but has a self-loop! startb(b∣aa)(b∣ab)↻Q1aend
We do the same as before, but sandwich the self-loop along a kleene star: startb((b∣aa)(b∣ab))∗aend
Which gives
…and b((b∣aa)(b∣ab))∗a is precisely the equivalent regex!