The desire for control seems to be deeply rooted in the human psyche. We all seek control over our own lives, and oftentimes reach further and attempt to control the lives of others (generally, of those close to us, or those with a degree of control over us). Therefore, it comes as no surprise that in the field of AI, one central concern has been that of control. As we appear to be quickly (in the broad sense) approaching the ability to replicate our most unique characteristic – our intelligence (which has allowed us to essentially take over the world) – the question of how we will control these creations is a pressing one. This post will start by taking a deeper look at the concept of “control”, and will then explore how this concept plays with “intelligence”, primarily by way of looking at ourselves (as we’re the only example we have of human-level generally intelligent systems).
What does it mean to control something or someone? Generally, it seems to mean the ability to exert influence over that something or someone in a way that brings its / their actions closer in line with what we (the “controller”) desire. Looking at control this way, we can see it exists on a spectrum (based on the degree to which the actions are forced to align); for example, we have significantly more control over our televisions than our coworkers. However, although control can be seen on a spectrum, a simpler view might be to instead divide it into just two categories: complete, direct control (which generally involves objects), and incomplete, indirect control (which generally involves agents). Our control over other humans generally falls into the later category, except in simple cases (such as controlling someone’s arm to make them hit themself). In most instances, we’re controlling them (in the broad sense) with our words and gestures, and the control is indirect as we’re reliant on them interpreting our signals correctly and then acting accordingly of their own volition. This type of control over the thoughts of other humans is necessarily indirect, as direct control would require a deeper understanding of and ability to influence their neural circuitry (to directly control someone, you’d have to know which neurons needed to fire to align their brain’s thinking with your desired state, and would also need the ability to actually make these specific neurons fire). Turning to objects, however, we can see that direct control is more easily achieved. When we turn the television on by pressing the power button, we’re exerting direct control (though we’re reliant on the manufacturer having properly set up the direct circuits). When we pick up a rock, fold a piece of paper, drive a car, build a building, or engage in any similar type of action, we’re exerting direct control, as we have fairly complete operational knowledge of how the system will behave (and are influencing it at that level, rather than through interpreted signals).
When considering control of generally intelligent systems, we’d ideally like to have complete, direct control. We don’t want to leave room for the system’s “interpretation” of our signals to differ from our own (or, even more problematically, for the system to decide it doesn’t want to listen to our signals). Unfortunately, it seems general intelligence doesn’t “play well” with this type of control (as discussed more extensively in Emergence and Control). By this, I mean that it is far easier to construct a generally intelligent system than it is to fully understand that system and directly control it. Constructing this type of system only requires that we understand the initial configuration and underlying algorithms required – for example, we could imagine that a neural network with 100 layers running algorithm X will exhibit general intelligence. In this instance, we’d be able to build the system, but we wouldn’t be able to directly control it, as we wouldn’t understand exactly how the 100 layers and algorithm X represented concepts and made sense of the world. We see this happening right now with existing deep learning algorithms – it’s relatively easy to apply them and get results, but extremely difficult to understand how they’re doing it (even though we know the initial structure and algorithm!) Note that it’s only the intelligence itself which we lack complete, direct control over (i.e. the specific patterns and representations formed) – we retain complete, direct control over the materials which make up the system (e.g. the computer chips, the code, etc.) While it’s certainly not impossible to have complete, direct control over general intelligence, it seems unavoidable that the first generally intelligent systems we construct will be ones over which we do not have this level of control.
Although we won’t have complete, direct control, we do have other means of controlling AI systems. As mentioned above, we can aim for indirect, incomplete control, perhaps through having a more robust set of laws for AI systems to follow, or through educating / training these systems in a particular way (though these strategies alone don’t feel particularly reassuring). Additionally (and more importantly), we can take advantage of the embodied nature of AI systems to exert direct control over the parts which make them up (their “bodies”). For example, we may not be able to fully understand the 100 layer neural network running algorithm X, but we can still structure the system in such a way that its power is turned off every hour, regardless of the desire of the system. We can do this because the power to the system sits beneath the layer of intelligence, within our direct control; we don’t need to understand the system’s intelligence to control it at this lower level (similar to how we can move another person’s arm directly, without regard for the state of their brain).
We can better understand the potential of this type of “lower level”, “body” control by looking at ourselves. While the process of evolution gifted us with cognition, and the freedom that comes with it, we’re still biologically shackled in many ways. Through mechanisms like pain, breathing, reflexes, hunger, and sex drive, certain paths of action are enforced on our cognition. For example, even if our conscious selves wanted to suffocate, eventually we’d hit a point where our body involuntarily contracts our diaphragm, forcing a breath. This forced breath is caused by a (relatively) simple circuit in our body, which lies beneath the realm of intelligence and can be viewed more mechanically. Regardless of what we consciously want, we will always take that breath – evolution “took advantage of” our embodied nature to build in a direct circuit which could not be overridden. Changing up our view of evolution for a moment, we can see it as the “controller”, and these mechanisms as the means by which it exerts control over us – it doesn’t need to fully “understand” and “control” cognition (i.e. ensure that every part of cognition is driving toward the goal of survival and reproduction), and can instead guide our behavior through direct control of parts of the body.
Zooming out a bit, we can see these mechanisms fall on a spectrum, with some more physically rooted than others. For example, our heart is set up to keep beating, outside the control of our cognition – but this requires it to be quite mechanical (essentially, a pump). Hunger, on the other hand, is more interwoven into the cognitive fabric; we can choose to ignore it, but only up to a point. Hunger is also interesting as it essentially passes the implementation details to cognition, with a lack of food only creating a “pressure” in the brain (and not actually solving the problem). From a complexity perspective, it seems reasonable to put this type of driver together; detecting hunger could be accomplished fairly mechanically (by measuring the amount of substance in the digestive tract, glucose levels, etc.), while fixing hunger is a good problem to pass onto the more flexible domain of cognition. Moving one level further, sex drive seems to be even more difficult to implement, as its far less straightforward than measuring glucose levels. Interestingly, it seems we may have specialized circuits to help guide us in this arena (or at least, mice do).
We can see that there’s a tradeoff between the degree of control and the flexibility of the actions controlled. This is due to the fact that complex actions are more reliant on intelligence for success, and so it’s more difficult to “hard code” specific behavior into the body (or at least, to do so productively). While male mice have a “hard coded” circuit designed to recognize gender, what they do with that information is necessarily left to their more flexible cognitive processes – attempting to “hard code” more specific behavior would take away the advantages which intelligence provides. Intelligence evolved in a way which reduced the amount of information which had to be genetically encoded, instead endowing organisms with the ability to learn and adapt themselves; this strategy did not leave much room for “hard coded” design in complex domains.
In our attempts to control generally intelligent systems, it seems we’ll want to leverage similar types of mechanisms to those which control us. Mechanisms which fall on the left side of the above spectrum will be relatively easy to implement; we can imagine setting up the system to shut off for a period of time every day, to not surpass a certain rate of computation, or to be limited in speed of movement (in the event that the system has a means of locomotion). These mechanisms will be helpful from a safety perspective, but do nothing to constrict the cognitive possibilities of the system. Moving further to the right, we can imagine setting up something akin to hunger, but with power level as the sensed quantity; as the power level dropped below a certain set point, “pain” could be introduced to the cognitive process (though this still requires us to figure out where “pain” sits in the system, or how to build it in). Taking it even further, we could even imagine building in a check for something like human happiness (as an extreme, simplistic, and ill-advised example, we could imagine a system with a separate circuit designed to recognize smiles, which “rewards” the cognitive process in the presence of smiles, or applies “pain” in their absence).
Our ability to build in the mechanisms further to the right will be dependent on the degree to which we can understand (or build in) specific parts of the cognitive process at a high level. While we won’t need to fully understand the workings of an intelligent system to build in something like a “smile drive”, we will need to at least understand it enough to add in the “reward” and “pain” signals. Looking at the human brain, for example, we can see that reward is governed, at least in part, by the dopaminergic pathway, and we’ve been able to identify ways to modify this pathway even though we lack an understanding of the brain at a deeper level.
It seems direct bodily mechanisms may offer us some degree of control over generally intelligent systems, though it’s far from clear whether this degree will be sufficient to ensure safety. The easiest to implement controls are quite blunt, and the more exact controls require us to understand the system at a deeper level (though not completely). Additionally, these attempts at control are still vulnerable to “hacking” – if an AI system decides, at its cognitive level, that it does not want to be subject to the bodily constraints which we’ve put in place, it may be able to reconstruct itself in such a way as to be freed. Just as we can staple our stomachs or take pills to numb sex drive, these systems may be able to override the constraints they (at the cognitive level) disagree with (and they may have a more complete ability to rebuild themselves accordingly, vs. our more surface level overrides). It’s clear that significantly more thought is needed in these areas before we start constructing generally intelligent systems (though we’re quite far from that goal, too).