User Tools

Site Tools


arguments_for_ai_risk:quantitative_estimates_of_ai_risk

Quantitative Estimates of AI Risk

This page is in an early draft. It is very incomplete and may contain errors.

Some people who are working in AI Safety have published quantitative estimates for how likely they think it is that AI will pose an existential threat.

Background

Many thinkers believe advanced artificial intelligence (AI) poses a large threat to humanity's long term survival or flourishing. Here we review their quantitative estimates.

For quotes from specific prominent people working on AI, see this page. For expert surveys about AI risk, see this page, and for public surveys about AI risk, see this page.

This page draws heavily from this database made by Michael Aird at Convergence Analysis.

Quantitative Estimates

The table below includes estimates from individuals working in AI Safety of how likely very bad outcomes due to AI are.

Many of the individuals expressed Knightian uncertainty when making their estimates, saying that their probability varies day-to-day, or that the estimate is currently in development, or that this is a very quick-and-dirty estimate. People who have explicitly said something like this include Katja Grace, Joseph Carlsmith, Peter Wildeford, Nate Soares, Paul Christiano, and others. These estimates should not be treated as definitive statements of these individuals' beliefs, but rather as glimpses of their thinking at that moment.

Each estimate includes:

  • The person making the estimate.
  • The year the estimate was made.
  • What exactly is being estimated. Different people have different explanation of what “very bad” looks like, and some people have given multiple conditional estimates.
  • The estimate the individual gives for the probability that AI development causes a very bad outcome.
  • The source for this estimate.
  • Whether this is the person's most recent public estimate that we are aware of and whether this is this person's best guess as opposed to a conditional estimate.

The estimates are in no particular order. The table can be sorted by clicking at the top of each column.

Sort a HTML Table Alphabetically

Estimator Date What is Estimated? Probability Source Most Recent?
Katja Grace 2023 Bad future because AI agents with bad goals control cognitive labor 0.19 Will AI end everything? A guide to guessing Yes
Joseph Carlsmith 2021 Existential catastrophe by 2070 from advanced, planning, strategic AI 0.05 Is Power-Seeking AI an Existential Risk? No
Joseph Carlsmith 2022 Existential catastrophe by 2070 from advanced, planning, strategic AI 0.1+ (Greater than 10%) Update to: Is Power-Seeking AI an Existential Risk? Yes
Nate Soares 2021 Existential catastrophe by 2070 from advanced, planning, strategic AI 0.77 Comments on Carlsmith's "Is power-seeking AI an existential risk?" Yes
Toby Ord 2020 Existential catastrophe by 2120 as a result of unaligned AI 0.1 The Precipice Yes
Toby Ord 2020 Humanity does not survive and is in charge of our future, if something that is more intelligent than us is created this century 0.2 Toby Ord on the precipice and humanity's potential futures No
Eliezer Yudkowsky 2022 AGI "killing literally everyone" ~1 AGI Ruin: A List of Lethalities Yes
Rohin Shah 2019 Things with AI do not go well, without additional intervention by long-termist community doing safety research 0.1 Conversation with Rohin Shah No
Paul Christiano 2019 How much worse the future is in expectation by virtue of our failure to align AI 0.1 Conversation with Paul Christiano Yes
Peter Wildeford 2023 X-risk, including several scenarios 0.22 Slack channel & private conversation Yes
Adam Gleave 2019 Chance that AI does cause a significant risk of harm, without intervention from AI safety efforts 0.6 - 0.7 Conversation with Adam Gleave No
Adam Gleave 2019 Chance that AI does cause a significant risk of harm, with median AI safety efforts 0.3 - 0.4 Conversation with Adam Gleave Yes
Adam Gleave 2019 Chance that AI does cause a significant risk of harm, with best case AI safety efforts 0.1 - 0.2 Conversation with Adam Gleave No
Rohin Shah 2020 Probability of AI-induced existential risk 0.05 AI Alignment Podcast: An Overview of Technical AI Alignment in 2018 and 2019 with Buck Shlegeris and Rohin Shah Yes
Buck Shlegeris 2020 Probability of AI-induced existential risk 0.5 AI Alignment Podcast: An Overview of Technical AI Alignment in 2018 and 2019 with Buck Shlegeris and Rohin Shah No
James Fodor 2020 Unaligned AI usurps and establishes permanent dominance over humanity 0.0005 Critical Review of 'The Precipice': A Reassessment of the Risks of AI and Pandemics Yes
Buck Shlegeris 2023 Likelihood of AI coup 0.25 The current alignment plan, and how we might improve it Yes
Stuart Armstrong 2014 Probability of humanity's non-survival in the context of artificial superintelligence 0.33 - 0.5 The future is going to be wonderful if we don't get whacked No
Stuart Armstrong 2020 Whether AGI could threaten humanity's survival or permanently curtail its potential 0.05 - 0.3 Is AI an existential threat? We don't know, and we should work on it Yes
Rohin Shah 2021 Chance of human extinction if literally no one tries to address problems with AI 0.33 - 0.7 Rohin Shah on the State of AGI Safety Research in 2021 No
Eli Lifland 2022 Misaligned takeover this century 0.35 My take on What We Owe the Future Yes
Katja Grace 2022 AI destroys the world 0.07 Katja Grace on Slowing Down AI and Surveys No
Andrew Critch 2023 Humanity not surviving the next 50 years 0.8 My May 2023 priorities for AI x-safety: more empathy, more unification of concerns, and less vilification of OpenAI Yes
Andrew Critch 2023 Humanity not surviving the next 50 years, without a major international regulatory effort to control how AI is used 0.9+ My May 2023 priorities for AI x-safety: more empathy, more unification of concerns, and less vilification of OpenAI No
Scott Aaronson 2023 The generative AI race, which started in earnest around 2016 or 2017 with the founding of OpenAI, to play a central causal role in the extinction of humanity 0.02 Why am I not terrified of AI? Yes

Framings

Different people use different framings to arrive at their estimate of AI risk. The most common framing seems to be to describe a model of what the risk from advanced AI looks like, assign probabilities to various components of that model, and then calculate the existential risk from AI on the basis of this model. Another framing is to describe various scenarios for the future of AI, assign probabilities to the various scenarios, and then add together the probabilities of the different scenarios to determine the total existential risk from AI. There are also some people who give a probability without describing what framing they used to get this number.

Below is an example of each of these two framings, due to Joseph Carlsmith and Peter Wildeford, respectively. Both individuals have updated their estimates since publishing their framing, so neither probability breakdown reflects the author's most recent estimate of AI risk. They are included to show how these framings work.

Model

One example of using a model to calculate the existential risk from AI is due to Joseph Carlsmith. He calculates AI-risk by 2070 by breaking it down in the following way:

  1. It will become possible and financially feasible to build APS [advanced, planning, strategic] systems. 65%
  2. There will be strong incentives to build APS systems | (1). 80%
  3. It will be much harder to develop APS systems that would be practically PS-aligned [power-seeking] if deployed, than to develop APS systems that would be practically PS-misaligned if deployed (even if relevant decision-makers don’t know this), but which are at least superficially attractive to deploy anyway | (1)–(2). 40%
  4. Some deployed APS systems will be exposed to inputs where they seek power in misaligned and high-impact ways (say, collectively causing >$1 trillion 2021-dollars of damage) | (1)–(3). 65%
  5. Some of this misaligned power-seeking will scale (in aggregate) to the point of permanently disempowering ~all of humanity | (1)–(4). 40%
  6. This will constitute an existential catastrophe | (1)–(5). 95%

The total AI risk is the product of the probabilities for each part of the model.

This same model has been used by Nate Soares and Eli Lifland to calculate their estimates of AI risk. Several other people in the table have also used similar models.

Scenarios

One example of describing possible scenarios to calculate the x-risk from AI is due to Peter Wildeford. He calculates x-risk from AI by 2100 by breaking it down in the following way:

  1. We get aligned and very capable AI systems, but we do not end the time of perils. We however still survive to the end of the century. 43%
  2. Very capable AI systems are not developed this century. We survive until the end of the century. 22%
  3. An unaligned rogue AI system creates a singleton. 13%
  4. We get aligned and very capable AI systems that decisively end the time of perils. 10%
  5. We get very capable AI systems that are broadly aligned with human values, but we create an existential risk anyways via abuse of nonhuman animals and/or digital minds. 7%
  6. Some subset of humanity intentionally takes over the world via a very capable AI system. 4%
  7. An unaligned rogue AI system causes complete human extinction. 1%
  8. Some subset of humanity intentionally takes over the world via a very capable AI system and in the process ends up causing complete human extinction. 0.2%
  9. Human extinction arises through a nuclear war that is started by accident. 0.1%
  10. Human extinction arises through a nuclear war that is started intentionally. 0.1%
  11. Some unknown unknown thing causes an existential risk. 0.05%
  12. Nanotech causes an existential risk. 0.04%
  13. An engineered pandemic causes an existential risk. 0.01%

The total AI risk is the sum of (3), (5), (6), (7), & (8).

Primary author: Jeffrey Heninger

arguments_for_ai_risk/quantitative_estimates_of_ai_risk.txt · Last modified: 2023/12/01 18:15 by harlanstewart