For example:
[See specifics of the pipeline in Table 3 of the linked paper]
* There are 181 million ish essentially different Turing Machines with 5 states, first these were enumerated exhaustively
* Then, each machine was run for about 100 million steps. Of the 181 million, about 25% of them halt within this memany step, including the Champion, which ran for 47,176,870 steps before halting.
* This leaves 140 million machines which run for a long time.
So the question is: do those TMs run FOREVER, or have we just not run them long enough?
The goal of the BB challenge project was to answer this question. There is no universal algorithm that works on all TMs, so instead a series of (semi-)deciders were built. Each decider takes a TM, and (based on some proven heuristic) classifies it as either "definitely runs forever" or "maybe halts".
Four deciders ended up being used:
* Loops: run the TM for a while, and if it re-enters a previously-seen configuration, it definitely has to loop forever. Around 90% of machines do this or halt, so covers most.
6.01 million TMs remain.
* NGram CPS: abstractly simulates each TM, tracking a set of binary "n-grams" that are allowed to appear on each side. Computes an over-approximation of reachable states. If none of those abstract states enter the halting transition, then the original machine cannot halt either.
Covers 6.005 million TMs. Around 7000 TMs remain.
* RepWL: Attempts to derive counting rules that describe TM configurations. The NGram model can't "count", so this catches many machines whose patterns depend on parity. Covers 6557 TMs.
There are only a few hundred TMs left.
* FAR: Attempt to describe each TM's state as a regex/FSM.
* WFAR: like FAR but adds weighted edges, which allows some non-regular languages (like matching parentheses) to be described
* Sporadics: around 13 machines had complicated behavior that none of the previous deciders worked for. So hand-written proofs (later translated into Rocq) were written for these machines.
All of the deciders were eventually written in Rocq, which means that they are coupled with a formally-verified proof that they actually work as intended ("if this function returns True, then the corresponding mathematical TM must actually halt").
Hence, all 5-states TMs have been formally classified as halting or non-halting. The longest running halter is therefore the champion- it was already suspected to be the champion, but this proves that there wasn't any longer-running 5-state TM.
In legacy bbchallenge's seed database, machines were run for exactly 47,176,870 steps, and we were left with about 80M nonhalting candidates. Discrepancy comes from Coq-BB5 not excluding 0RB and other minor factors.
Also, "There are 181 million ish essentially different Turing Machines with 5 states", it is important to note that we end up with this number only after the proof is completed, knowing this number is as hard as solving BB(5).
ape4•1h ago
olmo23•1h ago
PartiallyTyped•1h ago
arethuza•1h ago
https://scottaaronson.blog/?p=8972
IsTom•1h ago
You need proofs of nontermination for machines that don't halt. This isn't possible to bruteforce.
QuadmasterXLII•53m ago
The most naive algorithm is to use the assistant to check if each length 1 coq program can prove halting with computation limited to 1 second, then check each length 2 coq program running for 2 seconds, etc till the proofs in the arxiv paper are run for more than their runtime.
schoen•2m ago
... even though your actual method of discovering the programs in question was usually not purely exhaustive search (though it may have included some significant computer search components).
More precisely, we could say that if mathematicians are working in a formal system, they can't find any results that a computer with "sufficiently large" memory and runtime couldn't also find. Yet currently, human mathematicians are often more computationally efficient in practice than computer mathematicians, and the human mathematicians often find results that bounded computer mathematicians can't. This could very well change in the future!
Like it was somewhat clear in principle that a naive tree search algorithm in chess should be able to beat any human player, given "sufficiently large" memory and runtime (e.g. to exhaustively check 30 or 40 moves ahead or something). However, real humans were at least occasionally able to beat top computer programs at chess until about 2005. (This analogy isn't perfect because proof correctness or incorrectness within a formal system is clear, while relative strength in chess is hard to be absolutely sure of.)
gus_massa•6m ago
They solved a lot of the machines with something like that, and some with more advanced methods, and "13 Sporadic Machines" that don't halt were solved with a hand coded proof.
CaptainNegative•50m ago
karmakurtisaani•35m ago
karmakaze•1h ago
For BB(5) the proof of its value is an indirect computation. The verification process involved both computation (running many machines) and proofs (showing others run forever or halt earlier). The exhaustiveness of crowdsourced proofs was a tour de force.
arethuza•1h ago
bc569a80a344f9c•1h ago
> Structure of the proof. The proof of our main result, Theorem 1.1, is given in Section 6. The structure of the proof is as follows: machines are enumerated arborescently in Tree Normal Form (TNF) [9] – which drastically reduces the search space’s size: from 16,679,880,978,201 5-state machines to “only” 181,385,789; see Section 3. Each enumerated machine is fed through a pipeline of proof techniques, mostly consisting of deciders, which are algorithms trying to decide whether the machine halts or not. Because of the uncomputability of the halting problem, there is no universal decider and all the craft resides in creating deciders able to decide large families of machines in reasonable time. Almost all of our deciders are instances of an abstract interpretation framework that we call Closed Tape Language (CTL), which consists in approximating the set of configurations visited by a Turing machine with a more convenient superset, one that contains no halting configurations and is closed under Turing machine transitions (see Section 4.2). The S(5) pipeline is given in Table 3 – see Table 4 for S(2,4). All the deciders in this work were crafted by The bbchallenge Collaboration; see Section 4. In the case of 5-state machines, 13 Sporadic Machines were not solved by deciders and required individual proofs of nonhalting, see Section 5.
So, they figured out how to massively reduce the search space, wrote some generic deciders that were able to prove whether large amounts of the remaining search spaces would halt or not, and then had to manually solve the remaining 13 machines that the generic deciders couldn't reason about.
ufo•1h ago
lairv•56m ago
jjgreen•39m ago
Xcelerate•37m ago
Second, you have to consider what's feasible in finite time. You can enumerate machines and also enumerate proofs, but any concrete strategy has limits. In the case of BB(5), the authors did not use naive brute force. They exhaustively enumerated the 5-state machines (after symmetry reductions), applied a collection of certified deciders to prove halting/non-halting behavior for almost all of them, and then provided manual proofs (also formalized) for some holdout machines.