Computing, Fast and Slow
The formal notion of computation is one of the most powerful ideas humanity has discovered. The notion itself is simple: there’s a finite alphabet of symbols, a (deterministic) transition function that dictates how those symbols are manipulated, and that’s it. Nearly all of the information processing that supports modern human civilization rests on that idea.
It has taken a lot of infrastructure and time to get from that powerful idea to a productively powerful idea. A pen, paper, and a very detail-oriented person is as much of a computer as a Macbook. But it is very expensive for a human brain to simulate a Turing machine. At best we can follow simple procedures with a bunch of guard rails, but even a simple procedura like long division takes a human a long time. We, like LLMs, are heuristic machines.
Nearly all of the symbolic information processing that happens in the human world happens on mechanical computers that we have built over generations of increasingly powerful computational procedures.
Ancient carpenters used tools that were essentially analogue computers, computing values in hard-coded lookup tables. Those hard-coded values were originally computed by either empirical observations by builders around what ratios worked well together, basic mathematical operations, or just trial and error. Many civilizations independently developed forms of geometry, which in time codified more formal procedures that could be used to derive many different kinds of values.
When you ask an LLM to add 5 + 9 it does not perform a general purpose algorithm to compute that result. It follows a reinforced path to get to it. You may get it closer to performing a general purpose algorithm for addition if you prefix it with “let’s think step by step”, because it conditions the reinforced paths to more likely follow the pattern of the underlying algorithm. The LLM’s entire exogenous information interface is its context prefix, so if it is to have something akin to what we think of as thoughts, those symbols must be written to the context prefix.
General purpose machines are never the most efficient machines at specific tasks. This should be obvious, but somehow is being missed. It’s not just that we are generally intelligent, it is that we are effectively intelligent, i.e. in some sense maximizing intelligence per unit resource.
We built mechanical computers not because we couldn’t technically do the desired computations as humans, but because we are incredibly ineffective at doing formal computations. We are error-prone, slow to modify bits, and are severely limited, both in memory capacity and memory durability.
Effective Intelligence changes how we think about a lot of things. For one, benchmarking. To pass MMLU it is most efficient to go online and find the answers to MMLU (Brad Gilbert, Winning Ugly). It is not exactly obvious, however, to come to that conclusion on your own. The true measure of intelligence is whether something will Win Ugly.
References
Brad Gilbert, Winning Ugly
(1993) https://www.goodreads.com/book/show/7540.Winning_Ugly
In the most simplistic way, winning ugly is about figuring out how to win. In a lot of tennis lessons, you’re learning how to hit the ball a little better. But that’s not about competing.
A player must engage the mind and determine which disparate tools will work best on that particular day versus the specific opponent.
Daniel Kahneman, Thinking, Fast and Slow
(2011) https://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow
System 1: Fast, automatic, frequent, emotional, stereotypic, unconscious.
System 2: Slow, effortful, infrequent, logical, calculating, conscious.
Rich Sutton, The Bitter Lesson
(Mar 2019) http://www.incompleteideas.net/IncIdeas/BitterLesson.html
We should stop trying to find simple ways to think about the contents of minds. They are not what should be built in, as their complexity is endless; instead we should build in only the meta-methods that can find and capture this arbitrary complexity.