The Power of Annealing

Author’s Note: For any French readers, this post has recently been translated by Azurisme!

Metals are a unique form of matter, especially with regards to their behavior under heating and cooling. While metals are described by the material they’re made of (e.g. copper or iron), their properties are determined by their arrangement of atoms, and this arrangement can vary significantly. 

The arrangement is semi-permanent, but can be manipulated through certain heating and cooling techniques. One especially interesting technique is “annealing” (if you’re interested in this technique or the many others, this video provides a great introduction), where the metal is first heated and then slowly cooled over time. Through this process, the metal becomes less hard and more ductile, as the heating provides the necessary energy for the form to shift, and the slow cooling provides the time and energy levels needed to form larger grains (see below).

Computer programmers have taken note of these particular properties and ported them into the field, resulting in an algorithm known as “simulated annealing”. The algorithm is used to find an approximate global optimum for a specified problem, and works, like metallurgic annealing, by gradually reducing the “temperature” (for the algorithm, “temperature” determines the likelihood of moving in the direction of a worse solution). We can better understand how the algorithm works by graphically representing the solution space for a problem (the below picture represents a 1-dimensional space, but we can extend the idea to arbitrarily complex N-dimensional spaces). 

The algorithm starts from a random input value (a random place on the above line), then at each iteration selects a random nearby point (the definition of “nearby” changes with the temperature; at higher temperatures, further points can be chosen). If the randomly selected point is higher than the current point, the algorithm “moves” to that point; if it is lower, the algorithm may or may not “move”, depending on the temperature (the algorithm is more likely to “move” at higher temperatures). This strategy converges to an approximate global optimum – one way to think about it is that the initial runs (at high temperature) identify the right “hill” to “climb” by searching the entire solution space, and the later runs (at low temperature) “climb” to the top of that “hill”. The below image (from Wikipedia) shows an example of the algorithm working to solve an instance of the 1-dimensional problem laid out above.

Note that although the hill climbing image shows the solution space here, when trying to solve one of these problems it is not visible. If we knew the layout of the solution space, we could simply select the maximum. This can be shown by looking at simple problems (e.g. y = 1 – |x|), but for more complex problems (e.g. the traveling salesman problem, as shown below) it becomes clear that we need a different path (like simulated annealing).

As we’ve seen with both metals and programming, the process of annealing – of slowly reducing the energy state of a system – has some interesting properties. It allows a certain type of organization to happen, with the system falling into “low energy” configurations (ductile and malleable configurations in the case of metals, and optimum solution configurations in the case of algorithms) that are “better” than those which could be reached at either high or low temperatures by themselves. Essentially, annealing takes the best of each range of the temperature spectrum, leveraging high temperatures to solve the more broad parts of the problem (in a more inexact way), and low temperatures to drive toward a more exact solution (by focusing on the particular area identified while at high temperature). Viewed in this way, the concept of annealing spreads far beyond metals and mathematics, with two powerful examples in our brain’s formation (at an individual level) and evolution (at a group level). 

Looking at an individual brain, we can see the different speeds of development without even going to the level of neurons. Newborns quickly gain understanding of the world around them, and toddlers pick up language with ease. During these periods of our lives, the brain can be viewed as operating in a “high temperature” state; each experience plays a significant role in how the brain updates, and new concepts are more easily acquired. As we grow older, our brains “harden”, or move to a “lower temperature” state; the concepts we acquired as children shape our general world understanding, and while we can tweak these concepts, the changes happen far more slowly than at a younger age. This pattern of formation, in the same way as metallurgic and mathematical annealing, allows the brain to reach particularly “low energy” states, where here, “low energy” can be understood as “low prediction error about the world”. By first forming general concepts as children (e.g. “living things”, “inanimate things”, “food”, etc. – though at the youngest ages we don’t yet have words attached to the concepts themselves), our brains cement a foundation from which more complex parts of the world can be understood. 

When looking at the brain through this lens, it’s interesting to consider the effects of psychoactive drugs, especially those like LSD. Many people describe these drugs as “making them feel like children again”, with the world feeling more magical and open-ended. By the time we reach adulthood, our brains have reached a fairly “low temperature”, with strong priors about the world. One hypothesis about the effects of these drugs is that they serve to reduce the strength of these priors, essentially preventing the brain from imposing much of its built-up knowledge onto current experience, and instead experiencing things “unfiltered” and without priors. Said differently, these drugs “raise the temperature”.

Looking at the brain from the neural level provides additional support for the annealing analogy. At birth, all our neurons are present, but only a limited number of synapses (connections between neurons) have formed – on the order of ~1/6 of the total number in an adult brain. Over the next few years, synapses form rapidly, and by age two a toddler has significantly more synapses than an adult (estimates range from 2x to 10x as many). From that point forward, the formation of synapses is more limited, with pruning instead being the primary activity. Again, we can see the role “high temperature” plays in forming synapses somewhat indiscriminately, and the role “low temperature” plays in building off that foundation and settling into more exact representations of the world.

Jumping to the evolutionary view, the analogy is a bit looser, but still useful. The initial evolution of nervous systems required significant leaps of mutation, with neurons (or something like them) evolving first, then forming into “nets”, then eventually centralizing in brains. We can view these early leaps as happening at a “high temperature”, over time settling into the general structure of the brain (which is shared across a vast number of species) as the temperature “cooled”. Just as the early phases of simulated annealing served to identify the “best” hill to then “climb”, the early phases of brain evolution served to identify the “best” foundational structure (the analogy is stretched a bit here, as evolution is in no degree forward-looking – but the general idea still feels useful). With this structure as a foundation, smaller-scale mutations began to play a larger role in moving things forward. Our brains share much in common with those of other mammals, as only small tweaks were required to get from theirs to ours – at the very least, far smaller changes than were required to get from early neural nets to brains. 

Hopefully, these examples have helped highlight the role annealing can play in the development of complex systems; there seems to be something special about the process. Through annealing, systems get the “best of both worlds” – while temperatures are high, they can take large leaps through the solution space, searching for a promising area to settle on, and when low, they can explore that promising area in full. This analogy may be useful as we look to build artificially intelligent systems. Right now, we only imbue our AI systems with a small level of annealing – parameters are updated more slowly over time, but the systems themselves are static, with a fixed structure and a constant number of parameters to update. Perhaps we could look toward our brains for inspiration, and try to mimic the explosive synaptic growth that happens as infants; or even more ambitiously, perhaps we could look toward evolution, and glean insights from its refinement of nervous system structure over hundreds of millions of years. In any event, it seems annealing will be a useful concept to think about as we move forward.

If you enjoyed this post, please consider subscribing to be notified of new posts 🙂

Loading
4.5 4 votes
Article Rating
Subscribe
Notify of
4 Comments
Inline Feedbacks
View all comments

[…] Traduit de My Brain’s Thoughts […]

Martin
3 years ago

Great post!

How do you feel about Michael Johnson’s article on neural annealing? I think it’s the most important of comprehensive takes written so far on this topic.

[…] seems the primary determinant of human malleability is age. As discussed here, tremendous changes happen in the brain during the first few years of life (allowing us to easily […]