User Tools

Site Tools


Will malign AI agents control the future?

This page is under active work and may be updated soon.

The balance of evidence suggests a substantial risk of malign AI agents controlling the future, though none of the arguments that we know of appears to be be strongly compelling.


This appears to be the most discussed AI extinction scenario. In it:

  1. AI systems are created which, a) have goals, and b) are each more capable than a human at many economically valuable tasks, including strategic decision making.
  2. These AI systems' superior performance allows them to take control of the future, for instance through accruing social and economic power, or through immediately devising a plan for destroying humanity.
  3. The AI systems do not want the same things as humans, so will bring about a future that humans would disprefer

This scenario includes sub-scenarios where the above process happens fast or slow, or involves different kinds of agents, or different specific routes, etc.


Arguments that this scenario will occur include:

  • AI developments will produce powerful agents with undesirable goals

    (Main article: Argument for AI X-risk from competent malign agents)

    Summary: At least some advanced AI systems will probably be 'goal-oriented', a powerful force in the world, and their goals will probably be bad by human lights. Powerful goal-oriented agents tend to achieve their goals.

    Apparent status: This seems to us the most suggestive argument, though not watertight. This seems prima facie plausible, but destroying everything is a very implausible event, so the burden of proof is high.
  • AI will replace humans as most intelligent 'species'

    (Main article: Argument for AI x-risk from most intelligent species)

    Summary: Humans' dominance over other species in controlling the world is due primarily to our superior cognitive abilities. If another 'species' with better cognitive abilities appeared, we should then expect humans to lose control over the future and therefore for the future to lose its value.

    Apparent status: Somewhat suggestive, though doesn't appear to be valid, since intelligence in animals doesn't appear to generally relate to dominance. A valid version may be possible to construct.
  • AI agents will cause humans to 'lose control'

    Summary: AI will ultimately be much faster and more competent than humans, so either, a) must make most decisions because waiting for humans will be so costly, b) will make decisions if it wants, since humans will be so relatively powerless, due to their intellectual inferiority. Losing control of the future isn't necessarily bad, but is prima facie a very bad sign.

    Apparent status: Suggestive, but as stated does not appear to be valid. For instance, humans do not generally seem to become disempowered by possession of software that is far superior to them.
  • Argument for loss of control from extreme speed

    Summary: Advancing AI will tend to produce very rapid changes, either because of feedback loops in automation of automation processes, or because automation tends to be faster than the human activity it replaces. Faster change reduces human ability to steer a situation, e.g. reviewing and understanding it, responding to problems as they appear, preparing. In the extreme, the pace of socially relevant events could become so fast as to exclude human participation.

    Apparent status: Heuristically suggestive, however the burden of proof should arguably be high for an implausible event such as the destruction of humanity. This argument also seems to support concern about a wide range of technologies, which may be correct.

In light of these arguments, this scenario seems to us plausible but not guaranteed. Its likelihood appears to depend strongly on the strength of one's prior probability on arbitrary risks being sufficient to destroy the world.

arguments_for_ai_risk/is_ai_an_existential_threat_to_humanity/will_malign_ai_agents_control_the_future/start.txt · Last modified: 2023/02/12 04:43 by katjagrace