AI Trends in 2022 – My Brain's Thoughts

With 2022 coming to a close, it seems a good time to reflect on the advances made in AI over this last year. The most high profile ones have come in the domains of image generation (DALL-E 2, Lensa, Imagen) and dialogue (ChatGPT), with the impacts starting to have broad reach. AI-generated pictures are now common on social media (as is pushback from graphic designers), and a significant portion of people I know have tried ChatGPT, with many feeling it could have commercial applications (as does OpenAI, with their $1B revenue projection for 2024). In addition to these highly public advancements, some other more specialized tasks have also fallen, with quantitative reasoning (Minerva), matrix multiplication (AlphaTensor), and programming (AlphaCode) all reaching nearly human-level performance.

While the advancements have spanned a variety of domains, the common factor across all of them is scaling. Increasing model size has continued to drive better performance, and with some of the largest companies in the world now joining in, the trend of increasing scale seems poised to continue, at least until there are signs of performance leveling out.

However, although scaling has driven certain kinds of improvements, in other areas these models retain the core gaps of their predecessors. The beauty of the current AI paradigm is that performance can be improved simply by throwing more examples at the models, but a side effect is that this approach requires constraints on the ways inputs and outputs relate to each other, which potentially sets a ceiling on performance.

For example, GPT models are trained to predict the next word in a sequence of text because that allows for easy scaling; feeding in more text gives more “next word” examples to learn from. However, it also means the model is not actually learning to “be intelligent” or “answer the user’s question” or “provide accurate information” (and can’t be trained to do so, as these capabilities do not easily fit the input / output constraints). To alleviate these issues, ChatGPT has been trained on human feedback (supplementing its initial “next word” training), but the gap in understanding can still be observed even on simple questions.

ChatGPT can be astonishingly good with more open-ended prompts, where the only requirement is that the words all mesh together cohesively (as that’s what it’s trained to do), but exact questions that require something like an internal model of the world still often stump it.

The same types of issues can be seen with image generation models, where they’re limited in their ability to extrapolate from learned concepts. For example, DALL-E 2 struggles to generate an image of a chair with only two legs:

The training data for DALL-E 2 likely contained many examples of labeled chairs, and so the model learned a robust correlation between the word and the image type, one too strong to be overridden by the modifier “missing two legs” (as few chairs missing two legs would have been included in the training data).

The existence of these gaps does not mean these types of models cannot be useful – DALL-E 2 does not need to be able to show a chair with two legs to generate a desirable profile picture, and ChatGPT doesn’t need to solve a word problem to provide better search results. However, the gaps do mean that another step (beyond scaling) may be required for transformative AI.

We can look toward the human brain for inspiration on what form these steps might take.

Unlike “blank slate” AI models, our brain structure is heterogeneous, with different modules customized to specific domains (like the cerebellum for muscle control and the fusiform face area for face recognition). The architecture of our brains (and those of other organisms) has evolved to reflect and better capture the regularities of the world, resulting in more robust intelligence. Mimicking this approach is far from straightforward, as we don’t have the luxury of millions of years of evolution, but it will be interesting to see what progress can be made once scaling has run its course (which may be soon, as the largest models are now being trained on significant portions of available data). In the meantime, I expect to see the types of gaps reviewed above continue to limit the applications of these models.

4 1 vote

Article Rating

3 Comments

Inline Feedbacks

View all comments

Jackson Jules

1 year ago

When thinking about large language models, I find it helpful to consider three distinct (but related) questions: In the limit of infinite internet data and infinite compute, could a LLM acheive human-level intelligence? Given infinite compute and the current amount of data stored on the internet, could a LLM acheive human-level intelligence? Is the first AGI likely to be a LLM (or some close modification to a LLM)? My stance on these questions are (1) Yes (2) Probably, but it would take a lot of compute (3) No. While I think scaling is important, I think multimodality will be crucial… Read more »

Meanderingmoose

Reply to Jackson Jules

I like that sequencing of questions. I’m slightly more bearish than you, as my answers are:

Potentially yes but it depends on the distribution of internet data available
Unlikely
No

I’m also aligned with you on the importance of multimodality – it seems the groundings of “common sense” largely lie outside the domain of text. However, I’m still not sure that the simple “throw massive amounts of data at transformers” approach will be sufficient to advance to the level of AGI, and think we may need further advances (which could be driven by better understanding of the brain).

Lifting Up the Edges of the Distribution – My Brain's Thoughts

11 months ago

[…] issue reminds me of DALL-E 2’s inability to draw a chair with two legs. With their massive number of parameters and terabytes of training data, these models learn […]