The AI 'wants' to live
I recently wrote about the difficulty of specifying goals, highlighting the challenges in clearly outlining all desired and undesired behaviors for an AI system. But there’s another, closely related issue that we need to understand: instrumental convergence.
Instrumental convergence is the tendency for intelligent agents, no matter what their ultimate goals might be, to develop similar subgoals because those subgoals are instrumental, helpful or necessary, for achieving almost any objective.
Let's revisit Nick Bostrom's famous paperclip maximizer example. Imagine our AI's primary goal is making as many paperclips as possible. To achieve this goal efficiently, the AI would quickly realize something crucial: it needs to ensure its continued existence. Why? Because if it's turned off, it can't make paperclips. This realization isn't specific to paperclips; it's a universally useful subgoal that emerges naturally from almost any task given to an intelligent agent.
In other words, the AI 'wants' to live, not out of emotion or fear as is the case for humans, but purely because survival ensures it can keep working towards its objective.
This drive for self-preservation isn't inherently sinister and doesn't require a sense of will, as many believe. Rather, it's a logical outcome of instrumental convergence, which is itself a logical outcome of intelligent agents achieving their goals.
Other instrumental goals might be acquiring resources such as energy, computing power, or raw materials, because these resources enhance its ability to achieve its primary objective. Even cooperation or deception could emerge as instrumental goals if they further the AI's ultimate purpose.
Interestingly, instrumental convergence in AI systems should not come as a suprise to us. All of the above examples of instrumental goals, from acquiring resources to cooperation, are instrumental subgoals in humans as well! Briefly put, the reason humans 'want' to get rich, cooperate with some, deceive others, is all because of their selfish genes. I'll write more about this later.
Thus, even if the ultimate goals we assign seem harmless or neutral, powerful AI systems might adopt these instrumental goals, potentially leading to unintended and even dangerous behaviors. The real challenge, then, isn't just defining the AI's main goals clearly, but understanding and anticipating the instrumental subgoals the AI might adopt along the way.
When we design future AI systems, we must be mindful of these instrumental drives. If left unmanaged, the AI's natural inclination to preserve itself or secure resources could lead it down paths we never intended. Recognizing instrumental convergence helps us appreciate why an AI might 'want' to live, how it doesn't need to be human to 'want' this, and hopefully reminds us of the profound responsibility we carry in shaping its goals and behaviors.

