Although GPT-3 was released ages ago (in AI time), it continues to generate interesting conversations, particularly with regard to the path toward general artificial intelligence. Building off a discussion of some others in the field (centered around the potential upside of scaling deep learning models), Scott Aaronson (a quantum computing expert who writes Shtetl-Optimized) and Steven Pinker (a prominent cognitive psychologist) recently exchanged a series of essays in which they shared their thinking on what we can learn from GPT-3 and how we might progress toward AGI. While both are at the top of their respective fields, their essays reveal the thorny nature of concepts such as “intelligence” and “goal”, as they seem to talk past each other and miss the core of the issue.
Pinker leads off by asking the question of whether scaling up deep-learning models will be sufficient for AGI. He answers it (or really, doesn’t) by stating that the question is ill-conceived because intelligence cannot be defined without specifying the goal, meaning there’s no such thing as general intelligence, only “better and better gadgets” (with single, narrow goals). Pinker uses basketball as an example: “Specifying the goal is critical to any definition of intelligence: a given strategy in basketball will be intelligent if you’re trying to win a game and stupid if you’re trying to throw it. So is the environment: a given strategy can be smart under NBA rules and stupid under college rules.”
Aaronson follows up by attempting to pin down an example of artificial general intelligence. He lays out a scenario where an AI system emulates Einstein’s brain, only faster, and uses it as a proof-of-principle for general (superintelligent) AI. Pinker responds with some examples of domains where Einstein may not have been particularly “intelligent” (quantum physics and politics) and concludes by reiterating his point that defining intelligence requires a specific criterion (which he claims Aaronson attempted to avoid by using an example rather than a definition).
The claim Pinker makes in the exchange reminds me of the “no free lunch” theorem, which says that, across all possible problems, the performance of any two algorithms (or intelligent systems) will be equivalent. The spirit of this theorem is that when considering the infinite class of possible problems (not limited to our universe), there’s no way to beneficially leverage information from past experience, as there will be both a problem (or universe) where the insights are correct and one where they are incorrect. For example, given the problem of winning at basketball, a “win basketball” algorithm will succeed and a “lose basketball” algorithm will fail, but the reverse is true for the problem of losing at basketball (and both algorithms will fair equally poorly for all problems not involving basketball).
The issue with the “no free lunch” theorem is that our universe does not consist of all possible problems, but rather of only a specific subset of problems. Though the theorem might show the performance of an algorithm which predicts repulsion between two electrons (because it has always occurred previously) to be equal to that of one which predicts their attraction (when evaluated on all possible problems), the former will perform far better on the subset of problems encountered in this universe. While no algorithm can improve on all possible problems by “latching onto” regularities relevant for only a subset, the narrower scope of this particular universe (which does not contain all potential possibilities) presents opportunities for gains to be had. Electrons repulse one another, matter has a tendency to clump together into “objects”, and making a basket results in points for the scoring team – regularities (“free lunches”) can be identified and exploited across all hierarchies of our universe.
While Pinker does not directly reference the “no free lunch” theorem, my interpretation is that he is advocating for its applicability in this universe. In his view, the problems we can encounter here are mostly independent, and so intelligence can only be evaluated for particular problems (or sets of problems), rather than for the entire class of problems as a whole. Essentially, he’s claiming there’s no way for an algorithm / system to do better than chance for all potential problems within our universe – any manifestation of intelligence must be limited to a particular problem or subset of problems.
This view neglects the fact that, as highlighted above, the possible problems of this universe (at least the ones we care about) do seem to be deeply interrelated. Extending Pinker’s example, algorithms are not limited to either being good at winning or losing basketball – an algorithm could succeed at both (with both NBA and college rules) if its learning encompasses basketball as a whole, rather than just a particular strategy. Likewise, if the algorithm’s learning instead encompasses the general regularities of the world, it will (assuming a sufficiently robust model) be able to drive a car, translate languages, empty dishwashers, predict protein folding patterns, and even win or lose at basketball. The world contains regularities and patterns across all its hierarchies, and it is possible for specific algorithms (e.g., the brain’s algorithms) to be better at capturing them (at the cost of performing worse on classes of problems not relevant to our universe). Certainly, the relative competence for specific classes of regularities can vary, but not to such a great degree that the concept of general intelligence is invalidated. Einstein may not have been a transcendent genius in the domains of quantum physics and politics, but he was still able to deeply grasp the relevant concepts, and an Einstein-like AI system would have competency on all problems we care about.
I read Aaronson’s posting of it, and felt he was clearly in the right, but had trouble forming the exact argument against Pinker, as it seemed Aaronson did as well.
Thanks for trying to include it yourself.