arguments_for_ai_risk:is_ai_an_existential_threat_to_humanity:will_malign_ai_agents_control_the_future:argument_for_ai_x-risk_from_competent_malign_agents:start

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
arguments_for_ai_risk:is_ai_an_existential_threat_to_humanity:will_malign_ai_agents_control_the_future:argument_for_ai_x-risk_from_competent_malign_agents:start [2023/02/14 02:48]
katjagrace [III. If most superhuman AI systems have bad goals, the future will very likely be bad]
arguments_for_ai_risk:is_ai_an_existential_threat_to_humanity:will_malign_ai_agents_control_the_future:argument_for_ai_x-risk_from_competent_malign_agents:start [2023/09/26 07:22] (current)
katjagrace changed title to be less confusing as one of many arguments for malign agents controlling the future
Line 1: Line 1:
-====== Argument for AI x-risk from competent malign agents ======+====== Argument for AI x-risk from effective malign agents ======
  
 //This page is incomplete, under active work and may be updated soon.// //This page is incomplete, under active work and may be updated soon.//
Line 16: Line 16:
   - **Inaction**: no further special action will be taken to mitigate existential risk from superhuman AI systems. (This argument is about the default scenario without such efforts, because it is intended to inform decisions about applying these efforts, not because such efforts are unlikely.)   - **Inaction**: no further special action will be taken to mitigate existential risk from superhuman AI systems. (This argument is about the default scenario without such efforts, because it is intended to inform decisions about applying these efforts, not because such efforts are unlikely.)
  
-==== I. If superhuman AI is developed, then at least some superhuman AI systems are likely to be 'goal-directed====+==== I. If superhuman AI is developed, then at least some superhuman AI systems are likely to be goal-directed ====
  
 //(Main article: [[arguments_for_ai_risk:is_ai_an_existential_threat_to_humanity:will_malign_ai_agents_control_the_future:argument_for_ai_x-risk_from_competent_malign_agents:will_advanced_ai_be_agentic:start|Will advanced AI be agentic?]])// //(Main article: [[arguments_for_ai_risk:is_ai_an_existential_threat_to_humanity:will_malign_ai_agents_control_the_future:argument_for_ai_x-risk_from_competent_malign_agents:will_advanced_ai_be_agentic:start|Will advanced AI be agentic?]])//
  
-That is, some such systems will systematically choose actions that increase the probability of some states of the world over others((‘Goal-directed’ suggests pursuit of one particular outcome, but note that the term here also refers to having preferences over every choice of states of affairs, and acting to increase the chances of higher-ranked outcomes, without a particular focus on the top-ranked one. \\ \\ This is intended to be weaker than the claim that such systems will be ‘agents’ with consistent utility functions, for instance also including systems such as humans, who appear to be inconsistent (for example, see the [[https://en.wikipedia.org/wiki/Allais_paradox|Allais Paradox]]) but still systematically bring about certain outcomes on net, across a range of situations.)). Reasons to expect that some superhuman AI systems will be goal-directed include:+[[clarifying_concepts:agency|That is]], some such systems will systematically choose actions that increase the probability of some states of the world over others((‘Goal-directed’ suggests pursuit of one particular outcome, but note that the term here also refers to having preferences over every choice of states of affairs, and acting to increase the chances of higher-ranked outcomes, without a particular focus on the top-ranked one. \\ \\ This is intended to be weaker than the claim that such systems will be ‘agents’ with consistent utility functions, for instance also including systems such as humans, who appear to be inconsistent (for example, see the [[https://en.wikipedia.org/wiki/Allais_paradox|Allais Paradox]]) but still systematically bring about certain outcomes on net, across a range of situations.)). Reasons to expect that some superhuman AI systems will be goal-directed include:
  
   - **Some goal-directed behavior is likely to be [[arguments_for_ai_risk:is_ai_an_existential_threat_to_humanity:will_malign_ai_agents_control_the_future:argument_for_ai_x-risk_from_competent_malign_agents:will_advanced_ai_be_agentic:how_large_are_economic_incentives_for_agentic_ai:start|economically valuable to create]]** (i.e. also not replaceable using only non-goal-directed systems). This appears to be true even for [[arguments_for_ai_risk:incentives_to_create_ai_systems_known_to_pose_extinction_risks|apparently x-risky systems]], and will likely [[arguments_for_ai_risk:is_ai_an_existential_threat_to_humanity:will_dangerous_ai_systems_appear_safe|appear true]] more often than it is.   - **Some goal-directed behavior is likely to be [[arguments_for_ai_risk:is_ai_an_existential_threat_to_humanity:will_malign_ai_agents_control_the_future:argument_for_ai_x-risk_from_competent_malign_agents:will_advanced_ai_be_agentic:how_large_are_economic_incentives_for_agentic_ai:start|economically valuable to create]]** (i.e. also not replaceable using only non-goal-directed systems). This appears to be true even for [[arguments_for_ai_risk:incentives_to_create_ai_systems_known_to_pose_extinction_risks|apparently x-risky systems]], and will likely [[arguments_for_ai_risk:is_ai_an_existential_threat_to_humanity:will_dangerous_ai_systems_appear_safe|appear true]] more often than it is.
Line 54: Line 54:
 If I, II, and III are true, we thus have: assuming humanity develops superhuman AI systems, then by default some such systems will have goals, those goals will be extinction-level bad, and will likely be achieved. Thus if superhuman AI systems are developed, the future will likely be extinction-level bad by default. If I, II, and III are true, we thus have: assuming humanity develops superhuman AI systems, then by default some such systems will have goals, those goals will be extinction-level bad, and will likely be achieved. Thus if superhuman AI systems are developed, the future will likely be extinction-level bad by default.
  
 +It is not clear how likely these premises are. Each appears to have substantial probability, so the conclusion overall appears to have non-negligible probability.
 ===== Counterarguments and open questions ===== ===== Counterarguments and open questions =====
  
Line 70: Line 71:
  
 If this argument is successful, then in conjunction with the view that [[will_superhuman_ai_be_created:start|superhuman AI will be developed]], it implies that humanity faces a large risk from artificial intelligence. This is evidence that it is a problem worthy of receiving resources, though this depends on the tractability of improving the situation (which depends in turn on the [[ai_timelines:start|timing of the problem]]), and on what other problems exist, none of which we have addressed here.  If this argument is successful, then in conjunction with the view that [[will_superhuman_ai_be_created:start|superhuman AI will be developed]], it implies that humanity faces a large risk from artificial intelligence. This is evidence that it is a problem worthy of receiving resources, though this depends on the tractability of improving the situation (which depends in turn on the [[ai_timelines:start|timing of the problem]]), and on what other problems exist, none of which we have addressed here. 
 +
 +===== Primary author =====
 +
 +Katja Grace
 +
arguments_for_ai_risk/is_ai_an_existential_threat_to_humanity/will_malign_ai_agents_control_the_future/argument_for_ai_x-risk_from_competent_malign_agents/start.1676342893.txt.gz · Last modified: 2023/02/14 02:48 by katjagrace