Interview on the strength of the evidence for AI risk claims with an anonymous AI alignment researcher

About anonymous AI alignment researcher

This researcher is an associate professor in CS, and works on the alignment team for a major AI lab.

They think there’s a 25% chance that AI causes human extinction by 2100, but an 85% chance that AI causes permanent human disempowerment by 2100.1)

Anonymous AI alignment researcher’s overall assessment of the evidence

  • The background model that makes this researcher believe that AI risk is so high is:2)
    • Change is inevitable and will cause our descendants to be bizarre and alien to us.
    • AI will accelerate the rate of change.
  • This researcher thinks that there are two stable states: runaway competition leading to extinction, or totalitarianism leading to value lock in.3)
    • They also think we should expect values to get worse (relative to our current values) rather than to continue to improve.4)
  • Empirical evidence about AI capabilities is not very important to this researcher’s beliefs about AI risk.5)
  • This researcher thinks that the evidence for AI risk is weaker and more speculative than would usually be the case to motivate expensive policy interventions.6)
  • Reasons that this researcher is convinced of AI risk anyway include:7)
    • A track record of being well-calibrated
    • The social proof of accomplished people believing these claims
    • The weakness of counterarguments

Anonymous AI alignment researcher’s assessment of the evidence on particular AI capabilities

  • This researcher thinks that only small capability improvements (combined with a continuing drop in the price of compute) are required to cause a large economic shock.8)
    • AI systems can already be copied cheaply and learn in a distributed way.9)
  • This researcher is 95% confident that AI systems will improve in planning sufficiently to seriously marginalize humans.10)
    • Many similar problems have already been solved.
    • This researcher expects compute used to increase by a factor of a hundred or a thousand in the next few years.
    • So even without any algorithmic improvements, this researcher is 80% confident that these planning abilities will be achieved.
  • This researcher expects that the level of robotics required for humans to be seriously marginalized might take another 10 or 20 years.11)
Written communication after the interview.
“I’ll frame it as the inevitability of change… It’s something that would happen in the long run, even if we never invented machine intelligence, we would just gradually change into something that would be really alien and unrecognizable. And then I think the thing that’s going to change is mainly the rate of change… The difficulty of achieving value lock in is the other part of this.” [5:35]
“If we allow competitive dynamics to play out, some sort of evolution or selection is going to mean that our descendants are going to be bizarre and alien… So the new thing from AI is a) something that we wouldn’t even count as human right off the bat probably, and b) just the rate of change being faster.” [9:51]
“There are basically two stable states in the long run. One is runaway competition where we all become von Neumann probes or something like that… And the other one is value lock in, like a permanent totalitarian North Korean/Amish state that has some values that it can protect, but we have to give up a lot of our existing ability to grow and adapt and change and things that we consider make us human in order to achieve that. So basically there has to be some sort of threading the needle that happens over the course of the next hundred years or so for us to still be around in some sort of meaningful form.” [5:35]
“I think we can all agree that if you look back at past values then we gradually see them seeming to improve up until today… This causes people to think, obviously if we extrapolate it would be really weird if there was this hard inflection point where they get better and better and better until today and then they immediately start going down. But I actually think that’s what we should expect and that's basically what everybody in all of history has experienced from their point of view. By definition, if there’s something you value it’s something you don’t want to change.” [33:30]
“[Hadshar] Empirical details about capabilities that AI systems have now don’t sound very important to your world view.
[Researcher] Exactly.” [30:08]
“The main best objection I get from really smart people on this is that most of the evidence is of a weaker or more speculative form than what we are used to using to evaluate policies, at least really expensive policies like the ones AI doomers are advocating. They basically say, if I believed you based on these sorts of arguments, I would also have to believe lots of other people saying crazy sounding things. And I think they’re right that this is actually a weaker form of evidence that’s easier to spoof.” [36:07]
“[Hadshar] What is it that most gives you confidence that you’re not being fooled by incorrect arguments?
[Researcher] … Part of it is that I think I’ve been pretty well calibrated for most of my life about a lot of stuff… The social proof of the other people in this community being very accomplished… So that’s another sanity check. Also the quality of the dismissals is just so poor… They’re not even meeting their normal bar of does this argument hold water to simple objections.” [38:07]
“Even if they only became a little bit more capable in their current form and the cost of compute continues to drop, there'll be some big economic shock and the value of most white collar labor will drop.” [10:30]
“The important abilities that they already have that humans don’t is the ability to have copies made of them extremely cheaply and to learn in a more distributed way.” [10:30]
“[Researcher] I don’t really expect most humans to really start to be marginalized in a serious way until we have actual robots and machines which are a little bit better at planning and a little bit more flexible at online learning and stuff like that.
[Hadshar] And how confident do you feel in things like the improvement in planning that you would need to see for this to happen?
[Researcher] 95%… In principle it could just take us a really long time to figure out, but that would be really surprising to me.
[Hadshar] And why would it be surprising?
[Researcher] … Because of the huge number of related problems that have already fallen or partially fallen just by following the big blob of compute engineering approach. For any one of these skills that we’re talking about we already do have systems which can kind of do it in a limited way… we’re going to dial up the amount of compute being used by a factor of a hundred or a thousand over the next few years. That would already get me to 80%, without any algorithmic improvements.” [10:30]
“I don’t really expect most humans to really start to be marginalized in a serious way until we have actual robots and machines which are a little bit better at planning and a little bit more flexible at online learning and stuff like that… I threw in robotics, which I do think could easily take another 10 or 20 years to be really mature.” [10:30]”
arguments_for_ai_risk/is_ai_an_existential_threat_to_humanity/interviews_on_the_strength_of_the_evidence_for_ai_risk_claims/summary_of_an_interview_on_the_strength_of_the_evidence_for_ai_risk_claims_with_anonymous_ai_alignment_researcher.txt · Last modified: 2023/10/12 11:12 by rosehadshar