A new natural language AI model launched by OpenAI, GPT-3, has been making waves in the artificial intelligence community. GPT-3 is a transformer model (simplifying greatly, a neural network approach with modifications for better performance on text) trained on one specific task: predicting the next word, given all previous words within some text. This simple goal means that GPT-3 can be trained on any text (no labeling required), and the model has certainly made use of this fact, with training conducted over 499B tokens (for context, the entirety of Wikipedia is 3B tokens – and is included in the training set, along with 429B tokens from the web, and 67B from books). This massive training set is used to optimize values for a correspondingly massive set of 175B parameters, more than 10x any previous model. The upgrade has paid off, with GPT-3 demonstrating proficiency in the areas of storytelling, poetry, coding, and more (though some areas still require work). This progress is certainly significant, and has been hyped as such. However, there are inherent limitations to the GPT approach, and these limitations are often skipped over, especially by those heralding GPT-3 and its successors as the start of human-level (and eventually superhuman) artificial general intelligence. The primary issue is one of domain; GPT-3’s domain of natural language is insufficient for general intelligence in the natural world. OpenAI called out this limitation with the release of the first GPT (see below); this post aims to drive home just how significant it is.
Intelligence is a domain specific attribute; there can be no concept of intelligence without a domain in which that intelligence is relevant. Intelligence within a domain requires some understanding of the patterns or regularities occurring in that domain. Taking chess as an example, this understanding could be that losing pieces tends to lead to defeat, or that connected passed pawns tend to make for a winning game, or that e4 works well as a first move. Computers exemplify the domain-specificity of intelligence, with examples like SHRDLU (domain of block stacking), AlphaZero (domain of chess), and VGG 19 (domain of image classification), among many others.
Human intelligence may at first seem like a counterexample; however, although our intelligence appears domain-agnostic, this is only because our intelligence operates within the wide domain of the natural world. We can recognize patterns in chess, but this recognition is built up from a much deeper level, with more general patterns of matter (objects and their dynamics) creating the foundation. The natural world subsumes all the more limited domains of the computer examples above, allowing humans to achieve proficiency across the board (building blocks, chess, and images all exist within the natural world). This difference in domain is the reason artificial general intelligence is so difficult; we’re good at programming computers in very limited, mathematical domains (like chess), but the natural world does not easily lend itself to the same approach.
GPT-3 is impressive because the domain of natural language is far wider than the previous domains we’ve conquered with AI; while it’s governed by syntactical rules, its semantics are as flexible and open-ended as the natural world it seeks to describe. GPT-3 has taken this wide domain and made sense of it, recognizing the deep patterns in how words are used together to craft stories, communications, code, and more. However, while the domain of natural language is wider than the more limited mathematical domains of previous AI efforts, it’s still far more narrow than the natural world.
For GPT-3, the word “dog” is a token; it can follow certain patterns in relation to other tokens in a text sequence (and the 175B parameters of the model demonstrate an impressive capability to recognize these patterns), but it can’t denote anything further than those token relationships. There’s no deep concept of dog, only a deep concept of the token “dog” and it’s role in the domain of words. A human, on the other hand, builds up a concept of dog based on the instantiations of dogs in the natural world; we develop an understanding of the patterns of photons that denote a “dog” object, the patterns of object dynamics that the dog object exhibits, and the manner in which the dog object responds to our actions. We also develop an understanding of the token “dog”, but for us, that token is used primarily as a label for our deeper conception.
This difference in domain can be illustrated by the way in which GPT-3 and a human respond to a text prompt asking them to continue the story. For this example, we’ll use the prompt: “Mary was out walking her dog when she dropped the leash.”
Feed GPT-3 the prompt, and it will likely continue with, “The dog ran away”, but it will do so because “run away” is statistically the most likely action sequence following the tokens “dog”, “dropped”, and “leash”. Ask a human to continue that prompt, and while you’ll likely get a similar answer, the response will instead be rooted in deep concepts based on the natural world. The human will create a mental image of Mary, the dog, and their respective attributes, and will recognize the likelihood of a quick escape. As seen here, this difference in pathway matters little when simply trying to extend a sequence of text in a human-passable way, but it makes all the difference when trying to develop original, impactful ideas.
To further illustrate this point, let’s consider a deeply intelligent act – the invention of the computer. Charles Babbage originated the concept of a computer back in the 1800’s, devising a system of gears and drivers that could perform specified calculations and store the outputs. How did Babbage come up with such an invention? While we can’t know for certain, we can imagine that his deep (natural world based) concepts of orderly, sequential calculation and mechanical engineering overlapped in some abstract way, leading to the idea of a mechanical system representing calculations. Additionally, we can say for certain that the discovery had nothing to do with the patterns of relation demonstrated by the words “calculation”, “mechanical”, and “computer”; this domain of tokens offered no hints. Place GPT-3 (or any sized GPT model) back in time to Babbage’s day (and train it on corresponding text sequences of the times), and the system has no hope of inventing the computer – the requisite patterns to recognize simply lie outside its domain. Inventing a computer is an extreme example, but this limitation on GPT capabilities applies equally to more standard demonstrations of intelligence. The model can fake its way through text prompts and discourse, but ask it the right questions (ones rooted more deeply in the natural world) and the facade falls apart.
While this post has stressed the limitations of current AI, there’s no reason to believe that computers are inherently incapable of achieving general intelligence – getting there will just require systems that deal in the domain of the natural world. Our successes in more narrow domains are certainly meaningful, and offer critical insights about how to structure systems; but to take the next step, we must begin shifting the domain of AI to the same domain we inhabit.
Author’s Note: If you enjoyed this post, please consider subscribing to be notified of new posts 🙂
Although the referenced article is sloppily written (e.g., “GPT-3 and it’s successors” as well as “it’s semantics are as flexible“), it’s full of such pearls of wisdom as: GPT-3’s domain of natural language is insufficient for general intelligence in the natural world.Intelligence is a domain specific attribute; there can be no concept of intelligence without a domain in which that intelligence is relevant.Human intelligence may at first seem like a counterexample; however, although our intelligence appears domain-agnostic, this is only because our intelligence operates within the wide domain of the natural world.There’s no deep concept of dog, only a deep… Read more »
Thank you for the generous review! I’ve updated the referenced spelling errors, appreciate you pointing those out 🙂
“How the Mind Works” looks like a good read – will check out when I have some time.
[…] Type 2 systems, on the other hand, are a bit trickier to reason about. They’re different from the systems we’re used to building, as there’s no task-specific goal to form a solution landscape. Instead, these systems are centered around a “world model” landscape, moving towards a better representation of the world, and then using that representation to solve problems (in the general sense). These systems can still be structured to have goals, just not final goals. Again, we can look to the human brain for intuitions. Simplifying greatly, the brain consists of the neocortex, which models the world, and… Read more »
[…] previously discussed in this post, deep learning models designed to predict and create natural language have become quite powerful, […]
[…] topic of GPT-3’s abilities was covered in this post, which arrived at the conclusion that, while GPT-3 represents a significant improvement in […]