Incentives to create AI systems known to pose extinction risks
Published 06 August, 2022
Economic incentives to deploy AI systems seem unlikely to be reliably eliminated by knowledge that those AI systems pose an existential risk.
Reasons for people with normal values to be incentivized to bring about human extinction
One might reason that if advanced AI systems had such malign preferences as to pose a substantial existential risk to humanity, then almost nobody would be motivated to deploy such systems. This reasoning fails because a) the personal cost of incurring such risks can be small relative to the benefits, even for a person who cares unusually much about the future of humanity, and b) because coordination problems reduce the counterfactual downside of taking such risks further still.
a) Externalities, or, personal costs of incurring extinction risks can be small
A person might strongly disprefer human extinction, and yet want to take an action which contributes to existential risk if:
the action only incurs a risk of extinction, or the risk is in the distant future, so that taking the risk does not negate other benefits accruing from the action
the person does not value the survival of humanity (or a slightly higher chance of the survival of humanity) radically more than their other interests
Using the AI system would materially benefit the person
A different way of describing this issue is that even if people disprefer causing human extinction, since most of the costs of human extinction fall on others, any particular person making a choice that risks humanity for private gain will take more risk than is socially optimal.
A person faces the choice of using an AI lawyer system for \$100, or a human lawyer for \$10,000. They believe that the AI lawyer system is poorly motivated and agentic, and that movement of resources to such systems is gradually disempowering humanity, which they care about. Nonetheless, their action only contributes a small amount to this problem, and they are not willing to raise tens of thousands of dollars to avoid that harm.
A person faces the choice of deploying the largest scale model to date, or trying to call off the project. They believe that at some scale, a model will become an existential threat to humanity. However they are very unsure at what scale, and estimate that the model in front of them only has a 1% chance of being the dangerous one. They value the future of humanity a lot, but not ten times more than their career, and calling off the project would be a huge hit, for only 1% of the future of humanity.
b) Coordination problems
Coordination problems can make the above situation more common: if a person believes that if they don’t take an action that incurs a cost to others, then the same action will be taken by others and the cost incurred anyway, then the real downside to incurring that cost is even smaller.
If many people are independently choosing whether to use dangerously misaligned AI systems, and all believe that they will anyway be used enough to destroy humanity in the long run, then even people who wouldn’t have wanted to deploy such systems if they were the sole decision-maker have reason to deploy them.
Situations where extinction risk is worth incurring for individuals seem likely to be common in a world where advanced AI systems in fact pose extinction risk. Reasons to expect this include:
In a large class of scenarios, AI x-risk is anticipated to take at least decades to transpire, after the decisions that brought about the risk
People commonly weight benefits to strangers in the future substantially lower than benefits to themselves immediately
Many people claim to not intrinsically care about the long run existence of humanity.
AI systems with objectives that roughly match people’s short term goals seem likely to be beneficial for those goals in the short term, while potentially costly to society at large in the long term. (e.g. an AI system which aggressively optimizes for mining coal may be economically useful to humans running a coal-mining company in the short term, but harmful to those and other humans in the long run, if it gains the resources and control to mine coal to a destructive degree.)
Primary author: Katja Grace