Thinking About Thinking Machines

A number of other posts so far have touched on what it is that brains do – and for the most part, it’s been summarized as “creating a model of the world”. By this, we’ve meant that certain patterns of neural activity can be understood as representing or standing for some observed pattern of material activity, allowing these materials activities to be incorporated into the decisions of the organism. A bacteria responds to sensed environmental changes directly, through a series of chemical reactions induced by the change in environment; a human responds indirectly, by first perceiving the environmental change (i.e. going from photons hitting rods and cones to a set of activated concepts in the brain) and then making a decision about how to act based on the known qualities of these concepts (e.g. a bear appears in your field of vision, you process that it’s a bear, and then determine the best course of action is to run away, as bears are associated with danger). Examining the brain and its capabilities at this level is interesting, but it leaves open some glaring questions about how this behavior (i.e. the brain’s behavior – not the organisms) is accomplished. How are concepts encoded in the brain? How are they learned, and how does this learning process change over time? What capabilities are innate, and which ones arise from experience? What is different about the process in humans that gives us such a powerful ability to understand our world (at least compared to the other organisms found here on earth)? These are hard questions, ones which we don’t yet have complete answers to – but they’re also the same questions which, given a complete enough answer, will likely allow us to create human-level (and super-human level) artificial intelligence, changing the world as we know it. As we don’t know the answers, this post is aimed more at providing some structure for thinking about the problems described above. The high level structure proposed takes a computational approach (a “complete enough” answer really means computationally implementable), breaking up the problem into three sub-problems (these sub-problems are all intertwined in different ways, but likely most effectively studied separately):

  1. What is the initial structure? (i.e. what is coded for directly by genes, and how detailed / structured is it)
  2. What is the update algorithm? (i.e. what gives us the ability to learn new capabilities / associations based on experience)
  3. How does this update algorithm change over time? (e.g. the increased ability to learn language at a young age)

These three questions, answered at a deep enough granularity, provide the type of detail necessary to actually implement the described system. It may be helpful to start by describing two simpler examples at this level – a machine learning algorithm and a simple worm nervous system – to give you a sense of how this level of description allows for implementation.  

Machine Learning Algorithm (VGG16, Designed for Image Recognition)

  1. The initial structure of the VGG16 algorithm is a number of layers of connected “neurons” with randomized weights between the neurons of each layer. The first layer is designed to take in an input image and to use the pixel values to calculate the activation level of each of its neurons (which then act as the input to the second layer, etc.). The setup is one-directional, with the input feeding only layer 1, layer 1 feeding only layer 2, etc., and all functions from one layer to the next are differentiable (typically Sigmoid, Tanh, or ReLu functions). The output of the algorithm is a value between 0 and 999 representing a particular image label (e.g. horse, car, etc.). As the weights associated with all these functions are randomized to start, there’s no “intelligence” in these connection weights in the initial structure, and feeding in any input image at the start will result in a random output. Below, you can see a visualization of the VGG16 algorithm setup. 
  1. The update algorithm takes advantage of the fact that the entire algorithm is differentiable. Some number of labeled example pictures are fed in, and for each, the error function is recorded (to start, the algorithm will get nearly every image wrong, as its connections are all randomized). After some set of pictures has been fed through (generally between 50 and 1,000), a partial derivative of the error function is taken for each of the weights in the network. This partial derivative points to the direction in which to move each weight so as to get closer to the right labels (based on the sampled set). Note that it doesn’t say the optimal value of the weights – just the direction in which to move them. All weights are then moved according to a set step size, and another set is run through the algorithm to allow for further adjustment. The nature of this process requires a huge number of sample images to effectively train an algorithm, as there must be enough cases of each image type to allow the algorithm to “get a sense” of the concept (e.g. just adjusting the weights based on a couple images of a dog would not position the algorithm well to identify other pictures of dogs not in the training set).
  2. As the algorithm progresses, the step change it takes based on the weight gradient gets progressively smaller. This allows the algorithm to “hone in” on a close-to-optimal solution which incorporates the “knowledge” from all example sets (moving too much based on later sets would in some ways discard information gained from earlier sets). 

When looking at the three parts of this algorithm, it may seem at first glance like the initial structure would be the hardest to figure out (assuming no prior knowledge), due to the complexity of the different types of layers. However, it turns out that for these types of algorithms, the exact initial structure is not important, as there’s no “intelligence” embedded in these structures – you could add a few extra layers, or take away some, and the end behavior would be similar. The “intelligence” of this algorithm is much more dependent on the update algorithm (and to a lesser degree, how it changes over time). The update algorithm is what allows VGG16 to incorporate the distinct characteristics of the different types of pictures – thinking about this algorithm from a reverse engineering perspective (similar to what we’re trying to do with the human brain), the critical part to understand would be the nature of this update algorithm; once you had that figured out, the rest of the structure (and subsequent simulation of the structure as a whole) would fall quickly into place.

Simple Worm Nervous System (C. Elegans)

  1. The worm’s nervous system consists of 302 neurons (these neurons make up about ⅓ of the worm’s total cell count), with 68 sensory neurons (which detect signals from the worms environment and send them to the interneurons), 121 interneurons (which act as the intermediary between the sensory and motor neurons, and also have myriad connections amongst themselves), and 113 motor neurons (which directly control the worm’s muscles) – and with ~7,000 synapses between neurons. The number of neurons and the connections between them are relatively consistent across different worms, and contrary to VGG16, this initial structure encodes significant intelligence for surviving in the world. The worm’s nervous system does not require any experience to instruct the worm on how to eat, hide, mate, etc. – the worm “comes out of the box” with these capabilities, based on the initial connections between its neurons.
  2. The “neurons that fire together, wire together” principle offers a higher level view of the update algorithm of the nervous system – although the exact degree to which concurrent firings leads to stronger connections is not yet defined (i.e. we don’t yet have a true computational update algorithm). Essentially, updates to the nervous system allow experiences happening concurrently to be associated with each other – like when the worm learns to associate certain chemicals with food through repeated exposure in the presence of food. The update algorithm can only accommodate minor tweaks (i.e. associating chemicals with food), as for the most part the neurons, their connections, and the behavioral patterns they drive are set from the start of the worm’s life.
  3. The update algorithm of the worm’s nervous system does not seem to change significantly with time – which makes sense given its more limited impact on the worm’s behavior.

The worm’s nervous system is obviously much different than VGG16 – almost all of its behavior is specified in the initial structure, with the update algorithm playing only a limited role in the system’s intelligence. What does this mean with regards to reverse-engineering a worm’s brain? Well, although we’ve been referring to a pre-experience worm’s brain as the initial structure, this view is only accurate when considering the life of a single worm – in reality, this initial structure has been crafted over innumerable generations by evolution. In this crafting, a wide variety of different neural circuits have arisen to guide the worm through different situations (much as many different types of organisms have arisen to fill different environmental niches) – and this means understanding function is much less straightforward than for VGG16. For the worm, there’s limited simplification when it comes to understanding how it does the things it does, as all of its behaviors (eating, hiding, mating, etc.) will be specified in a different, distinct way in its nervous system; for VGG16, on the other hand, we only need to know how the update algorithm functions to specify (and implement) all facets of its behavior (i.e. its behavior is emergent, not directly coded for). The fact that VGG16 is easier to describe is no surprise – of course the living creature is more complex than the human-created algorithm. However, taking this view does provide some useful context for examining the most complex artifact we know of – the human brain.

While the key to the intelligence of each of the systems we just looked at could be summarized by answering just one of the three sub-problems, understanding the human brain requires answering all three. There is certainly significant complexity to the initial structure, with specialized areas like the hippocampus (memory), Wernicke’s area (understanding speech), and Broca’s area (producing speech) all wired in different ways – however, these areas are not fully functioning “out of the box”. As the brain has experiences and the update algorithm runs its course, these areas acquire their full level of functionality, but it’s still an open question how well other areas could acquire the same functionality if these areas weren’t present in the initial structure (i.e. Wernicke’s area may be the optimal place in our brain’s for an understanding of speech to develop, but if it was damaged in an infant, how well could they still learn to understand speech?). In the image below, you can see the difference in synapse count between a newborn and an adult – the update algorithm clearly plays a major role in endowing the system with intelligence (as does the way in which the algorithm changes over time). Our brain’s aren’t like VGG16, nor like the worm’s nervous system – we have significant intelligence residing in our brain’s structure at birth, and leverage this structure, together with a robust update algorithm (and effective changes to the algorithm over time) to acquire myriad capabilities for interacting with the world. 

The big question as we seek to understand the brain is the degree to which our capabilities are dependent on our brain’s initial structure vs. the degree to which they’re dependent on the update algorithm. As we saw with the worm, understanding the initial structure is hard – an accurate description requires direct interpretation of the wirings and their impact on behavior. There aren’t any shortcuts – you need to get down to the level of individual neurons and their connections and see how information flows. Understanding an update algorithm, on the other hand, can be much simpler, since it’s applied in a consistent manner across the board, and the desired behavior emerges from it (rather than being coded for directly by it). To understand VGG16, we don’t need to look at the individual neurons – all we need to do is to understand how the broadly applied algorithm gives rise to certain types of emergent behavior. This is not to say that the update algorithm for our brains will be simple or uniform (unlikely “neurons that fire together, wire together” can be expressed quite as succinctly computationally), but it will have some level of regularity that will allow for easier expression than genetically specified wirings. Understanding the update algorithm will be hard, but will still be far easier than developing a complete understanding of the initial structure – if we do have the initial structure to thank for our capabilities, it may be a very long road indeed to any sort of real artificial intelligence!

Author’s Note: If you enjoyed this post, please consider subscribing to be notified of new posts 🙂

Loading
5 1 vote
Article Rating
Subscribe
Notify of
11 Comments
Inline Feedbacks
View all comments
Paul Topping
4 years ago

“if we do have the initial structure to thank for our capabilities, it may be a very long road indeed to any sort of real artificial intelligence!” I’m afraid that this is the case and it is inescapable. The reason brains work better than ANNs is that they take advantage of millions of years of evolution determining what in our environment is important for a creature to be able to perceive. This knowledge is embedded in our brains virtually at birth at every single level: from visual pixel processing to high-level cogitation. Childhood experience and growth are providing just the… Read more »

Paul Topping
4 years ago

I agree. There’s no reason to give up. As you suggest, we may be able to discover some general rules that get us a lot of the way there. I also feel like we can do a lot of interesting and useful work without having to duplicate all the intricacies of our brains, much like we created flying machines without duplicating birds. For example, I would love to have an AI-based internet research assistant. I would need to be able to converse with it, and have it ask me questions when it needs clarity or gets stuck. It would need… Read more »

Paul Topping
4 years ago

There are too many working in AGI that claim victory when they really have achieved very little. I don’t want to be one of those. My project has not yet reached the stage where I’m ready to tackle common sense anyway, though that is coming soon. I have a lot of ideas that I’ve gathered in many years working in software. One central idea is that understanding natural language is a good place to start, not only because it is so important that we be able to communicate with our AGI, but if we understand how the brain processes it,… Read more »

[…] degree) allows the system to build up its own “meaning”, and relies less on human encodings (this post offers a deeper dive into how deep learning works). For example, consider a neural network set up to identify dogs. The programmer specifies all the […]

[…] to have an understanding of how machine learning (specifically image recognition) works today; this post provides additional details, but I’ll give a short summary here. Machine learning consists of […]

[…] of an image, making them shift-invariant. For those less familiar with how these systems work, this post provides a more rigorous explanation (and the image below, from here, provides a […]

[…] of arriving at the Evolution number will be the focus of this post). In the terms laid out in this post, the Human Lifetime approach assumes that most of the “work” of intelligence is done by the […]

Caroline Luna
10 months ago

Are you looking for expert writing services? We’ve got you covered! Our skilled team can handle a wide range of assignments, from essays and case studies to editing and formatting. Need a compelling book review or a captivating creative writing piece? We’re here for you. Whether it’s a research paper, coursework, or a critical analysis, we deliver good quality. Your personal statement, presentation, or problem-solving task is in good hands.

b.link/HireWriter