Alright, well. This took me way longer than I hoped. Moving into a new house and subsequently having to tear out a bathroom floor in said house did not help. Nevertheless, here we are.
I will note, I have finally decided to enable paid subscriptions. This series will continue to be FREE, as will all of my articles. But my novels and a couple extras can be read behind the paywall… or you can just chip in for the sake of supporting what I’m doing here. I won’t stop you.
Anyway, please enjoy and hopefully you learn something…

Last time we covered:
The physical principles computers operate on
The abstract processes we try to model using them
The subtle but significant differences between those
And the problem that difference creates when attempting to simulate human reasoning
That is, to build an AGI
This time around, we’ll talk about one of the commonly proposed strategies to get around this problem: approximating human reasoning. This is probably the most relevant strategy in the field of AGI, as it’s the most fully developed and pursued.
It’s a relatively practical and realistic strategy, as well. In no small part because it accepts that there will be limits from the outset.
It goes something like this…
If it walks like a duck, and quacks like a duck, then… “close enough”—it’s a duck
While laughably naïve in some contexts—much to the disappointment of many 3-year-olds—this approach does actually have numerous practical applications across engineering. The screen you’re looking at now is arguably one of them.
Let’s run an exercise, if you have a low resolution or very large display handy (so not a smartphone): Turn it on to a white screen. Now get your eyes as close to it as you can while still being able to focus… (You may need a magnifying glass even on a large display. Pixels are just so darn small these days.)
…And you probably already know what this is about and skipped my demonstration. Yes: the screen is not actually as white as it appears when you’re at any normal viewing position. It’s actually red, green, and blue. And those red, green, and blue lights output just a single wavelength of light each.
That’s right, with just three fixed wavelengths of light in the approximately 380-750nm spectrum we can see, we can synthesize nearly the entire spectrum artificially. And that’s because, as far as the sense organ called “our eyes” is concerned, all of the necessary conditions to “see white” were met by our display’s lights. The correct cones of our eyes were stimulated in the correct proportions. If you compared it to a screen with a “real” white light, there would be no discernable difference—assuming correct color balancing.
Our “fake” white is “real” as far as the limits of our eyes detect.
There are some display technologies out there that are actually RGBW, but that white subpixel is usually used to increase the overall brightness, mainly.
What this approximation is taking advantage of to work its magic, is a “gap” in our perceptions. For electronic displays, that gap lies in the natural limits of human visual acuity—or the maximum visual “resolution” we can perceive, if you will. There are only so many cone cells in the human eye, and only so much sensory processing the brain wants to do, and yet there’s so much light to see.
This “overload” creates at least two exploits our technology can take advantage of—and boy do we. We exploit the fact that we have a limited number of different types of cone cells in the eye (small, medium, long) to manually reproduce the combined response the real wavelength would give. We also exploit the brain’s limited processing rate—that gaps of time between physical stimulus and mental imaging—to blend static images together and create the illusion of movement.
All of these tricks, and a few more, I’m sure [I hope] we’ve learned by middle school. So, that’s neat. But do similar tricks exist for our ability to reason?
Quite a few, yes.
Applying the principle to human reasoning
First and foremost, we are beings of limited knowledge, so there’s always a gap in what we know that can be taken advantage of to create an illusion. However, there are also gaps in our reasoning over things we do know. And specifically because we know them well.
Thinking is kinda hard, after all. It would save a lot of time and energy if we did as little of it as possible. Thus, the brain usually tries to do just that: as little as possible. Among the ways it does this is by “skipping” steps when thinking about things, because it detects that we’re treading familiar territory in our thoughts and we should probably hurry along to the important bits instead of resolving solved problems.
For instance, “fire” is an important concept for us humans. It is both life giving and life taking. Empowering when tamed, and destructive when wild. When it’s around, we typically need to figure out which case we’re dealing with—and the faster we do so, the better.
Luckily, fire also generally comes in pretty distinct shades of glowing yellow or orange. Which is great for our brain, as it then becomes simple to have a quick response system by wiring the visual perception of glowing yellow/orange to the idea of “fire” and thus “urgent attention required.”
It’s not certain that this is exactly how this particular set of instincts works, but you can read about Pavlovian behavior if you want more rigorous examples of this kind of cognitive associative response… I just really wanted to talk about fire.
This is a convenient little shortcut, but this kind of “efficiency” also creates a gap and thus a sort of “cognitive resolution” if you will. The brain now “assumes” by default that a certain kind of light means fire, but that’s not actually guaranteed to be true. It could just be a bright orange sign, but the mere presence of the color will have already turned our head before we’re conscious of the truth. At least until we’ve seen that sign a few times and given our mind a chance to learn the more subtle distinctions.
With all of these gaps lying around in how humans think, it then seems quite possible to “approximate” an AGI the same way we approximate the color yellow on a screen. While humans could theoretically think in an immense amount of detail constantly, the practical reality is that we don’t—that would take a very large amount of nutrition for little payoff.
Defining our target
For AGI, then, we need only to identify where we humans stop “paying attention” to the details of the logic and design a machine that can “reason” at least well enough to cover over those gaps. To do that, though, we need to establish a basis on what human reasoning is, otherwise we won’t know where the gaps are. To start, it’s classically understood that there are at least two major forms of reasoning which humans employ:
Deductive reasoning: following a set of known rules and facts to a guaranteed conclusion.
The train left the station at 8:45am
I arrived at the station at 8:46am
Therefore, I missed the train (hypothetical example)
Inductive reasoning: following a pattern of observances to a probabilistic conclusion.
My uncle’s cat likes bread
I left my Armenian sweet bread on the counter
Therefore, my uncle’s cat will [probably] eat my bread (real world example)
Some people include “abductive” reasoning as a distinct form, but to my understanding the only significant difference is in how high the probability is. If that’s not the case, I would love to hear a better explanation.
A standard computer program could be argued to be a close model of deductive reasoning, but since those are explicitly programmed, it can’t really be considered “the machine” reasoning, so much as “the programmer” reasoning remotely. Meanwhile, our current Machine Learning systems—the basis of Generative AI—model inductive reasoning.
And the way they do that, is through statistics.
How we made a computer “guess”
The “statistical modelling” that our current crop of AI systems is built around can be roughly explained as a set of “descriptors” (think “adjectives”), a bunch of “things” they can describe (words, pictures, etc), and then numerical measurements saying how well each adjective applies to each “thing.”
For an example, let’s imagine that we had 3 “things” to assess: a purple rubber ball, a red dragon, and The Hulk. Now, for each one of those, we assign a number between 1 and 10 for how “red” the thing is, and another for how “angry” it is. We might then score the objects as follows:
Ball: Red = 5, Angry = 2
Dragon: Red = 10, Angry = 5
The Hulk: Red = 1, Angry = 10
All of these scores are our “datapoints” from which we can then answer questions about the collection of things. For instance, we can determine the average redness of them, which is ~5.33.
And then someone can take that “redness” statistic, and without knowing anything about the objects it was derived from, logically conclude that they are all purple…. But bad statistical analysis is a conversation for another time…
This scoring process is (roughly) the kind of thing we’re talking about when we refer to an AI’s “statistical model.” Now, let’s take this model and use it to generate an ending of a sentence. For example, this sentence:
“I was beset by a great, red, and angry ______.”
A way to do this would be to look at our datapoints and see what we “know about.” Namely, the “Redness” of things and the “Angry-ness” of things. Then, we can look to see if the sentence has any references to “Redness” and “Angryness” or their opposites…
And hey, look at that. Both are explicitly mentioned. We could then conclude through very uncontroversial math that, of the “things” we know and the descriptors we can assess, the closest match for the sentence is “dragon.” So, our finalized sentence becomes:
“I was beset by a great, red, and angry dragon.”
Now… imagine the incoming sentence was instead just:
“I was beset by a great and angry ______.”
If we used the same math, the answer would definitively change to “The Hulk,” as the Hulk is higher on the one metric we have: Angryness. However, looking at it as humans, we can actually see that “dragon” is still a viable answer. And if you look at that word “beset” in the example, and note its more rustic, medieval-y tone (things our statistical model isn’t aware of), it would seem that “dragon” might still be the better answer, even if our math doesn’t reflect that.
So, how do we pick between “dragon” and “The Hulk” in a way that could compensate for what we don’t know?
Well, instead of just picking the closest match, we could pick randomly between all of the options that are above some threshold of “relevance.” Let’s say, at least “half angry,” or a “5” in Angry-ness.
That could work, but we still have to honor the hard evidence we have, so we should also give the “things” with closer scores higher chances. Between our two choices, then, we could give “The Hulk” a 55% chance, since it’s the closest from what we know in our model, and “dragon” a 45%.
Dragons being quite angry creatures, but not quite as angry as The Hulk
Now there’s once again a pretty good chance we could end the line with “dragon,” which we as humans can guess is the most likely better fit. And on the other hand, we’re not certain dragon was the best answer anyway. The sentence could be for a story where the Hulk was sent back in time to medieval England.
But, again, our statistical model can only objectively evaluate for “Redness” and “Angryness.”
So, let’s scale up.
Now imagine that, instead of knowing about just 3 things and 2 descriptors of them, our system had run this kind of analysis with a volume of information comparable to the Library of Congress. Analysts have speculated that GPT-4 was trained on 1 petabyte of data (1000 terabytes), whereas the LoC had a total of 3 petabytes in 2012, which included all of the backups and metadata.
While I can’t find any firm basis for the claim that GPT-4’s training data size was 1 petabyte, that value almost seems too low given how often OpenAI asks for more, and who they're asking. Certainly after two years of continued development and several high profile corporate partnerships since GPT-4’s launch, OpenAI has undoubtedly acquired multiple petabytes of information to work with—whether or not it all makes its way into a given version of the model.
It should start to seem reasonable, then, how that statistics based “guessing” method described above could start to make some really intricate and human-like sentences. If reading a handful of good books in a genre lets us predict what’s going to happen next in other books in that genre, then any system that has read everything, should at least be able to write a short story in a given genre just based on the statistically likely next words.
And the nice thing is, this process is based on deterministic, number based rules. Perfect for a computer.
It is, however, still a noticeably limited model of inductive reasoning
In one way, you could say that makes it a perfect model of inductive reasoning, since induction is, by nature, imperfect in its conclusions. If you stretch the logic far enough, you could even say the lack of certainty could be called a feature, not a bug. And, since we can [theoretically] scale a machine system arbitrarily—unlike poor little humans, stuck in their fleshy shells (as the robot overlords would say)—this statistics driven form of reasoning could be implemented to a degree of certainty a human cannot attain by following the same logic.
At least, for the things which statistics can know about…
However effective it may get in its domain, GenAI’s version of inductive reasoning is still fundamentally limited compared to human inductive reasoning.
For one, it’s statistical model is “static”—it doesn’t change. The datapoints within it certainly can, but the rules it reasons by are set by the programmer at the outset, and can’t adapt to further information about the world.
Whatever flaws were in the programmer’s statistics will be in the AI’s logic, no matter how much data it is trained on. It cannot reflect to analyze—let alone adjust—it’s own process.
Additionally, it is incapable of basing its conclusions on a “real” understanding of a given subject or its own prior conclusions. By which I mean it has (at least) the following specific limitations:
It can’t combine its process with deductive reasoning.
It requires a certain (high) degree of scale for both hardware and data before it works at all.
It can’t take advantage of more elusively defined—but very real—skills of human reasoning, like “intuition.”
Limit #1 means it can’t reliably extrapolate the information it has, either to narrow down the set of possible conclusions from its observations, or to generate additional “suppositionary” conclusions.
To try and put it simply, it can never properly “check itself.” It can try, but every attempt comes with the possibility of adding a new error. And when making a second conclusion based on its first conclusion, it not only has the initial error rate of its statistical model, but if that first conclusion was wrong, that will feed in as a new bad datapoint in the model, increasing the error rate.
Humans can also fall into this kind of negative feedback loop of incorrect conclusions, of course, but we also have the tools needed to escape it. The primary of which is deductive reasoning, with which we can determine if a conclusion necessarily contradicts our starting assumptions and should be discarded as a possibility.
GenAI can not make such definitive conclusions, as there is no concept of “definitive” in its reasoning. Not just objectively (as is still a point of debate for humans) but also relatively. Even the most staunch objector to the idea of “objective truth” accepts that some conclusions necessarily lead to others.
i.e. “if gravity exists, then two objects in a vacuum will move towards one another.” That kind of thing.
A Generative AI can sound certain when it makes this claim, but it never truly is. Not without a developer’s manual intervention.
In practical terms, this makes the system pretty unreliable at “theorizing.” Which is a pretty fundamentally important skill when competing with human reasoning, as theorizing is our first line of protection when stepping out into the unknown.
Say… when travelling to the moon for the first time in a giant metal tube full of exploding liquids. In that kind of scenario in particular, it’s very important to have as much deductive reasoning as you can manage.
Limit #2 is more of a practical concern, but no less important. If we want a thing to exist, it must be buildable in practice. On one hand, the last five years have demonstrated that a version of the concept is practical—in the most literal sense.
On the other hand, this last year has shown that we’re approaching a few different ceilings that would hinder us from going any further up this mountain. Both in terms of producing the hardware, as well as the data required to scale further.
More on that later…
Limit #3 (but perhaps not the last): it really can’t be understated how powerful skills like intuition can be. Even if it’s rather difficult to leverage those skills consistently in our own reasoning.
I’m sure many arguments could be made that the nature of our “unique” cognitive skills may not be as “mysterious” as we often believe them to be. Nevertheless, results speak for themselves on this. The best of human insight has discerned truths with several orders of magnitude greater resource and information efficiency than the best of our machine learning systems. And per the point on scaling, that efficiency means a lot.
With all that said…
We can work with these limits. Perfect efficiency isn’t the goal: effective results are. Like with the RGB lights, it doesn’t matter if our computer “intelligence” is fake or expensive, as long as it can hide its flaws in all of the gaps in our own reasoning… and as long as it’s not too expensive.
Unfortunately, those two points may actually be what hold computers back…
I wanted to cover this whole topic in one go, but this is getting too long for my tastes and I don’t want to rush my explanations, so I’m going to split this topic in two, with the second part scheduled for tomorrow.
In any case, thank you to all who have read this far. If you have any thoughts or comments on the matter or the way I’m talking about it, I would love to hear from you in the comments!
Sharing the Love
This time around, I’m want to shoutout
and and their DOD-Powerpoint Core Sci-fi Anthology “Waybound.” If character driven tales of galactic warfare and astropolitics—served with a side of mid-century-military-industrial inspired art, and an occasional cup of satire—is your definition of a hearty sci-fi meal, it’s worth checking out.I mean, just look at this sick poster they put together for this feature…
They got a fancy intro website here… if you’re on Desktop. They’re working on mobile support for that. Either way, you can find all of their stories on their Substack:
You can continue on to the next article here…