arguments_for_ai_risk:is_ai_an_existential_threat_to_humanity:will_malign_ai_agents_control_the_future:argument_for_ai_x-risk_from_variance_in_human_values

Argument for AI x-risk from variance in human values

This page is incomplete, under active work and may be updated soon.

The argument for AI x-risk from variance in human values is an argument that advanced artificial intelligence poses an existential risk to humanity by empowering specific humans, many of whom would bring about catastrophe by most humans' lights if sufficiently empowered.

Details

Argument

  1. People who broadly agree on good outcomes within the current world may, given much more power, want outcomes that one another would consider catastrophic. e.g. A utilitarian and a Christian might both work to reduce poverty now, but with much more control, the utilitarian might replace humans with efficient pleasure-producing systems without knowledge of the real world, and the Christian may dedicate most resources to glorifying God, and both may consider the other future a radical loss.
  2. AI may empower some humans or human groups to bring about futures closer to what they desire
  3. From 1, that may be catastrophic according to the values of most other humans

Counterarguments

  • It is unclear that on extensive reflection, all humans would not substantially agree on good outcomes.

Discussion of this argument elsewhere

Joe Carlsmith (2024):

Different human value systems are similar, and reasonably aligned with each other, within a limited distribution of familiar cases, partly because they were crafted in order to capture the same intuitive data-points. But systematize them and amp them up to foom, and they decorrelate hard. Cf, too, the classical utilitarians and the negative utilitarians. On the one hand, oh-so-similar – not just in having human bodies, genes, cognitive architectures, etc, but in many more specific ways (thinking styles, blogging communities, etc). And yet, and yet – amp them up to foom, and they seek such different extremes (the one, Bliss; and the other, Nothingness).

Katja Grace (2022)

Utility maximization does seem to quickly engender an interest in controlling literally everything, at least for many utility functions one might have. If you want things to go a certain way, then you have reason to control anything which gives you any leverage over that, i.e. potentially all resources in the universe (i.e. agents have ‘convergent instrumental goals’). This is in serious conflict with anyone else with resource-sensitive goals, even if prima facie those goals didn’t look particularly opposed. For instance, a person who wants all things to be red and another person who wants all things to be cubes may not seem to be at odds, given that all things could be red cubes. However if these projects might each fail for lack of energy, then they are probably at odds

Scott Alexander (2018):

The morality of Mediocristan is mostly uncontroversial. It doesn’t matter what moral system you use, because all moral systems were trained on the same set of Mediocristani data and give mostly the same results in this area. Stealing from the poor is bad. Donating to charity is good. A lot of what we mean when we say a moral system sounds plausible is that it best fits our Mediocristani data that we all agree upon…
The further we go toward the tails, the more extreme the divergences become. Utilitarianism agrees that we should give to charity and shouldn’t steal from the poor, because Utility, but take it far enough to the tails and we should tile the universe with rats on heroin. Religious morality agrees that we should give to charity and shouldn’t steal from the poor, because God, but take it far enough to the tails and we should spend all our time in giant cubes made of semiprecious stones singing songs of praise. Deontology agrees that we should give to charity and shouldn’t steal from the poor, because Rules, but take it far enough to the tails and we all have to be libertarians.

Contributors

Primary author: Katja Grace

Other authors: Nathan Young, Josh Hart

Suggested citation:

Grace, K., Young, N., Hart, J., (2024), Argument for AI x-risk from variance in human values, AI Impacts Wiki, https://wiki.aiimpacts.org/arguments_for_ai_risk/list_of_arguments_that_ai_poses_an_xrisk/argument_for_ai_x-risk_from_variance_in_human_values
arguments_for_ai_risk/is_ai_an_existential_threat_to_humanity/will_malign_ai_agents_control_the_future/argument_for_ai_x-risk_from_variance_in_human_values.txt · Last modified: 2024/08/09 01:09 by katjagrace