AI Impacts Wiki conversation_notes

Conversation with Adam Gleave

2022-09-21T07:37:41+00:00

@@ -1 +1,462 @@
+ ====== Conversation with Adam Gleave ======
+ 
+ // Published 23 December, 2019 //
+ 
+ <HTML>
+ <p>AI Impacts talked to AI safety researcher Adam Gleave about his views on AI risk. With his permission, we have transcribed this interview.</p>
+ </HTML>
+ 
+ 
+ 
+ ===== Participants =====
+ 
+ 
+ <HTML>
+ <ul>
+ <li><div class="li">
+ <a href="https://gleave.me/">Adam Gleave</a> — PhD student at the Center for Human-Compatible AI, UC Berkeley
+                 </div></li>
+ <li><div class="li">Asya Bergal – AI Impacts</div></li>
+ <li><div class="li">
+ <a href="http://robertlong.online/">Robert Long</a> – AI Impacts
+                 </div></li>
+ </ul>
+ </HTML>
+ 
+ 
+ ===== Summary =====
+ 
+ 
+ <HTML>
+ <p>We spoke with Adam Gleave on August 27, 2019. Here is a brief summary of that conversation:</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <ul>
+ <li><div class="li">Gleave gives a number of reasons why it’s worth working on AI safety:
+                   <ul>
+ <li><div class="li">It seems like the AI research community currently isn’t paying enough attention to building safe, reliable systems.</div></li>
+ <li><div class="li">There are several unsolved technical problems that could plausibly occur in AI systems without much advance notice.</div></li>
+ <li><div class="li">A few additional people working on safety may be extremely high leverage, especially if they can push the rest of the AI research community to pay more attention to important problems.</div></li>
+ </ul>
+ </div></li>
+ <li><div class="li">Gleave thinks there’s a ~10% chance that AI safety is very hard in the way that MIRI would argue, a ~20-30% chance that AI safety will almost certainly be solved by default, and a remaining ~60-70% chance that what we’re working on actually has some impact.
+                   <ul>
+ <li><div class="li">Here are the reasons for Gleave’s beliefs, weighted by how much they factor into his holistic viewpoint:
+                       <ul>
+ <li><div class="li">40%: The traditional arguments for risks from AI are unconvincing:
+                           <ul>
+ <li><div class="li">Traditional arguments often make an unexplained leap from having superintelligent AIs to superintelligent AIs being catastrophically bad.</div></li>
+ <li><div class="li">It’s unlikely that AI systems not designed from mathematical principles are going to inherently be unsafe.</div></li>
+ <li><div class="li">They’re long chains of heuristic reasoning, with little empirical validation.</div></li>
+ <li><div class="li">Outside view: most fears about technology have been misplaced.</div></li>
+ </ul>
+ </div></li>
+ <li><div class="li">20%: The AI research community will solve the AI safety problem naturally.</div></li>
+ <li><div class="li">20%: AI researchers will be more interested in AI safety when the problems are nearer.</div></li>
+ <li><div class="li">10%: The hard, MIRI version of the AI safety problem is not very compelling.</div></li>
+ <li><div class="li">10%: AI safety problems that seem hard now will be easier to solve once we have more sophisticated ML.</div></li>
+ </ul>
+ </div></li>
+ </ul>
+ </div></li>
+ <li><div class="li">Fast takeoff defined as “GDP will double in 6 months before it doubles in 24 months” is plausible, though Gleave still leans towards slow takeoff.</div></li>
+ <li><div class="li">Gleave thinks discontinuous progress in AI is extremely unlikely:
+                   <ul>
+ <li><div class="li">There is unlikely to be a sudden important insight dropped into place, since AI has empirically progressed more by accumulation of lots of bags and tricks and compute.</div></li>
+ <li><div class="li">There isn’t going to be a sudden influx of compute in the near future, since well-funded organizations are currently already spending billions of dollars to optimize it.</div></li>
+ <li><div class="li">If we train impressive systems, we will likely train other systems beforehand that are almost as capable.</div></li>
+ <li><div class="li">Given discontinuous progress, the most likely story is that we combine many narrow AI systems in a way where the integrated whole is much better than half of them.</div></li>
+ </ul>
+ </div></li>
+ <li><div class="li">Gleave guesses a ~10-20% chance that AGI technology will only be a small difference away from current techniques, and a ~50% chance that AGI technology will be easily comprehensible to current AI researchers:
+                   <ul>
+ <li><div class="li">There are fairly serious roadblocks in current techniques right now, e.g. memory, transfer learning, Sim2Real, sample inefficiency.</div></li>
+ <li><div class="li">Deep learning is slowing down compared to 2012 – 2013:
+                       <ul>
+ <li><div class="li">Much of the new progress is going to different domains, e.g. deep RL instead of supervised deep learning.</div></li>
+ <li><div class="li">Computationally expensive algorithms will likely hit limits without new insights.
+                           <ul>
+ <li><div class="li">Though it seems possible that in fact progress will come from more computationally efficient algorithms.</div></li>
+ </ul>
+ </div></li>
+ </ul>
+ </div></li>
+ <li><div class="li">Outside view, we’ve had lots of different techniques for AI over time, so it would be surprising is the current one is the right one for AGI.</div></li>
+ <li><div class="li">Pushing more towards current techniques getting to AGI, from an economic point of view, there is a lot of money going into companies whose current mission is to build AGI.</div></li>
+ </ul>
+ </div></li>
+ <li><div class="li">Conditional on advanced AI technology being created, Gleave gives a 60-70% chance that it will pose a significant risk of harm without additional safety efforts.
+                   <ul>
+ <li><div class="li">Gleave thinks that best case, we drive it down to 20 – 10%, median case, we drive it down to 40 – 30%. A lot of his uncertainty comes from how difficult the problem is.</div></li>
+ </ul>
+ </div></li>
+ <li><div class="li">Gleave thinks he could see evidence that could push him in either direction in terms of how likely AI is to be safe:
+                   <ul>
+ <li><div class="li">Evidence that would cause Gleave to think AI is less likely to be safe:
+                       <ul>
+ <li><div class="li">Evidence that thorny but speculative technical problems, like inner optimizers, exist.</div></li>
+ <li><div class="li">Seeing more arms race dynamics, e.g. between U.S. and China.</div></li>
+ <li><div class="li">Seeing major catastrophes involving AI, though they would also cause people to pay more attention to risks from AI.</div></li>
+ <li><div class="li">Hearing more solid arguments for AI risk.</div></li>
+ </ul>
+ </div></li>
+ <li><div class="li">Evidence that would cause Gleave to think AI is more likely to be safe:
+                       <ul>
+ <li><div class="li">Seeing AI researchers spontaneously focus on relevant problems would make Gleave think that AI is less risky.</div></li>
+ <li><div class="li">Getting evidence that AGI was going to take longer to develop.</div></li>
+ </ul>
+ </div></li>
+ </ul>
+ </div></li>
+ <li><div class="li">Gleave is concerned that he doesn’t understand why members of the safety community come to widely different conclusions when it comes to AI safety.</div></li>
+ <li><div class="li">Gleave thinks a potentially important question is the extent to which we can successfully influence field building within AI safety.</div></li>
+ </ul>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>This transcript has been lightly edited for concision and clarity.</p>
+ </HTML>
+ 
+ 
+ ===== Transcript =====
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> We have a bunch of questions, sort of around the issue of– basically, we’ve been talking to people who are more optimistic than a lot of people in the community about AI. The proposition we’ve been asking people to explain their reasoning about is, ‘Is it valuable for people to be expending significant effort doing work that purports to reduce the risk from advanced artificial intelligence?’ To start with, I’d be curious for you to give a brief summary of what your take on that question is, and what your reasoning is.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Adam Gleave:</strong> Yeah, sure. The short answer is, yes, I think it’s worth people spending a lot of effort on this, at the margins, it’s still in absolute terms quite a small number. Obviously it depends a bit whether you’re talking about diverting resources of people who are already really dedicated to having a high impact, versus having your median AI researchers work more on safety related things. Maybe you think the median AI researcher isn’t trying to optimize for impact anyway, so the opportunity cost might be lower. The case I see from reducing the risk of AI is maybe weaker than some people in the community, but I think it’s still overall very strong.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>The goal of AI as a field is still to build artificial general intelligence, or human-level AI. If we’re successful in that, it does seem like it’s going to be an extremely transformative technology. There doesn’t seem to be any roadblock that would prevent us from eventually reaching that goal. The path to that, the timeline is quite murky, but that alone seems like a pretty strong signal for ‘oh, there should be some people looking at this and being aware of what’s going on.’<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>And then, if I look at the state of the art in AI, there’s a number of somewhat worrying trends. We seem to be quite good at getting very powerful superhuman systems in narrow domains when we can specify the objective that we want quite precisely. So AlphaStar, AlphaGo, OpenAI Five, these systems are very much lacking in robustness, so you have some quite surprising failure modes. Mostly we see adversarial examples in image classifiers, but some of these RL systems also have somewhat surprising failure modes. This seems to me like an area the AI research community isn’t paying much attention to, and I feel like it’s almost gotten obsessed with producing flashy results rather than necessarily doing good rigorous science and engineering. That seems like quite a worrying trend if you extrapolate it out, because some other engineering disciplines are much more focused on building reliable systems, so I more trust them to get that right by default.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Even in something like aeronautical engineering where safety standards are very high, there are still accidents in initial systems. But because we don’t even have that focus, it doesn’t seem like the AI research community is going to put that much focus on building safe, reliable systems until they’re facing really strong external or commercial pressures to do so. Autonomous vehicles do have a reasonably good safety track record, but that’s somewhere where it’s very obvious what the risks are. So that’s kinda the sociological argument, I guess, for why I don’t think that the AI research community is going to solve all of the safety problems as far ahead of time as I would like.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>And then, there’s also a lot of very thorny technical problems that do seem like they’re going to need to be solved at some point before AGI. How do we get some information about what humans actually want? I’m a bit hesitant to use this phrase ‘value learning’ because you could plausibly do this just by imitation learning as well. But there needs to be some way of getting information from humans into the system, you can’t just derive it from first principles, we still don’t have a good way of doing that.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>There’s lots of more speculative problems, e.g. inner optimizers. I’m not sure if these problems are necessarily going to be real or cause issues, but it’s not something that we– we’ve not ruled it in or out. So there’s enough plausible technical problems that could occur and we’re not necessarily going to get that much advance notice of, that it seems worrying to just charge ahead without looking into this.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>And then to caveat all this, I do think the AI community does care about producing useful technology. We’ve already seen some backlashes against autonomous weapons. People do want to do good science. And when the issues are obvious, there’s going to be a huge amount of focus on them. And it also seems like some of the problems might not actually be that hard to solve. So I am reasonably optimistic that in the default case of there’s no safety community really, things will still work out okay, but it also seems like the risk is large enough that just having a few people working on it can be extremely high leverage, especially if you can push the rest of the AI research community to pay a bit more attention to these problems.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Does that answer that question?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> Yeah, it totally does.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> Could you say a little bit more about why you think you might be more optimistic than other people in the safety community?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Adam Gleave:</strong> Yeah, I guess one big reason is that I’m still not fully convinced by a lot of the arguments for risks from AI. I think they are compelling heuristic arguments, meaning it’s worth me working on this, but it’s not compelling enough for me to think ‘oh, this is definitely a watertight case’.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I think the common area where I just don’t really follow the arguments is when you say, ‘oh, you have this superintelligent AI’. Let’s suppose we get to that, that’s already kind of a big leap of faith. And then if it’s not aligned, humans will die. It seems like there’s just a bit of a jump here that no one’s really filled in.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>In particular it seems like sure, if you have something sufficiently capable, both in terms of intelligence and also access to other resources, it could destroy humanity. But it doesn’t just have to be smarter than an individual human, it has to be smarter than all of humanity potentially trying to work to combat this. And humanity will have a lot of inside knowledge about how this AI system works. And it’s also starting from a potentially weakened position in that it doesn’t already have legal protection, property ownership, all these other things.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I can certainly imagine there being scenarios unfolding where this is a problem, so maybe you actually give an AI system a lot of power, or it just becomes so, so much more capable than humans that it really is able to outsmart all of us, or it might just be quite easy to kill everyone. Maybe civilization is just much more fragile than we think. Maybe there are some quite easy bio ex-risks or nanotech that you could reason about from first principles. If it turned out that a malevolent but very smart human could kill all of humanity, then I would be more worried about the AI problem, but then maybe we should also be working on the human x-risk problem. So that’s one area that I’m a bit skeptical about, though maybe flushing that argument out more is bad for info-hazard reasons. <br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Then the other thing is I guess I feel like there’s a distribution of how difficult the AI safety problem is going to be. So there’s one world where anything that is not designed from mathematical principles is just going to be unsafe– there are going to be failure modes we haven’t considered, these failure modes are only going to arise when the system is smart enough to hurt you, and the system is going to be actively trying to deceive you. So this is I think, maybe a bit of a caricature, but I think this is roughly MIRI’s viewpoint. I think this is a productive viewpoint to inhabit when you’re trying to identify problems, but I think it’s probably not the world we actually live in. If you can solve that version, great, but it seems like a lot of the failure modes that are going to occur with advanced AI systems you’re going to see signs of earlier, especially if you’re actually looking out for them.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I don’t see much reason for AI progress to be discontinuous in particular. So there’s a lot of empirical records you could bring to bear on this, and it also seems like a lot of commercially valuable interesting research applications are going to require solving some of these problems. You’ve already seen this with value learning, that people are beginning to realize that there’s a limitation to what we can just write a reward function down for, and there’s been a lot more focus on imitation learning recently. Obviously people are solving much narrower versions of what the safety community cares about, but as AI progresses, they’re going to work on broader and broader versions of these problems.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I guess the general skepticism I have with the arguments, is, a lot of them take the form of ‘oh, there’s this problem that we need to solve and we have no idea how to solve it,’ but forget that we only need to solve that problem once we have all this other treasure trove of AI techniques that we can bring to bear on the problem. It seems plausible that this very strong unsupervised learning is going to do a lot of heavy lifting for us, maybe it’s going to give us a human ontology, it’s going to give us quite a good inductive bias for learning values, and so on. So there’s just a lot of things that might seem a lot stickier than they actually are in practice.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>And then, I also have optimism that yes, the AI research community is going to try to solve these problems. It’s not like people are just completely disinterested in whether their systems cause harm, it’s just that right now, it seems to a lot of people very premature to work on this. There’s a sense of ‘how much good can we do now, where nearer to the time there’s going to just be naturally 100s of times more people working on the problem?’. I think there is still value you can do now, in laying the foundations of the field, but that maybe gives me a bit of a different perspective in terms of thinking, ‘What can we do that’s going to be useful to people in the future, who are going to be aware of this problem?’ versus ‘How can I solve all the problems now, and build a separate AI safety community?’.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I guess there’s also the outside view of just, people have been worried about a lot of new technology in the past, and most of the time it’s worked out fine. I’m not that compelled by this. I think there are real reasons to think that AI is going to be quite different. I guess there’s also just the outside view of, if you don’t know how hard a problem is, you should put a probability distribution over it and have quite a lot of uncertainty, and right now we don’t have that much information about how hard the AI safety problem is. Some problems seem to be pretty tractable, some problems seem to be intractable, but we don’t know if they actually need to be solved or not. <br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>So, decent chance– I think I put a reasonable probability, like 10% probability, on the hard-mode MIRI version of the world being true. In which case, I think there’s probably nothing we can do. And I also put a significant probability, 20-30%, on AI safety basically not needing to be solved, we’ll just solve it by default unless we’re completely completely careless. And then there’s this big chunk of probability mass in the middle where maybe what we’re working on will actually have an impact, and obviously it’s hard to know whether at the margin, you’re going to be changing the outcome.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> I’m curious– I think a lot of people we’ve talked to, some people have said somewhat similar things to what you said. And I think there’s two classic axes on which peoples’ opinions differ. One is this slow takeoff, fast takeoff proposition. The other is whether they think something that looks like current methods is likely to lead to AGI. I’m curious on your take on both those questions.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Adam Gleave:</strong> Yeah, sure. So, for slow vs. fast takeoff, I feel like I need to define the terms for people who use them in slightly different ways. I don’t expect there to be a discontinuity, in the sense of, we just see this sudden jump. But I wouldn’t be that surprised if there was exponential growth and quite a high growth rate. I think Paul defines fast takeoff as, GDP will double in six months before it doubles in 24 months. I’m probably mangling that but it was something like that. I think that scenario of fast takeoff seems plausible to me. I probably am still leaning slightly more towards the slow takeoff scenario, but it seems like fast takeoff will be plausible in terms of very fast exponential growth.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I think a lot of the case for the discontinuous progress argument falls on there being sudden insight that dropped into place, and it doesn’t seem to me like that’s what’s happening in AI, it’s more just a cumulation of lots of bags of tricks and a lot of compute. I also don’t see there being bags of compute falling out of the sky. Maybe if there was another AI winter, leading to a hardware overhang, then you might see sudden progress when AI gets funding again. But right now a lot of very well-funded organizations are spending billions of dollars on compute, including developing new application-specific integrated circuits for AI, so we’re going to be very close to the physical limits there anyway. <br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Probably the strongest case I see for discontinuities are the discontinuities you see when you’re training systems. But I just don’t think that’s going to be strong enough, because you’ll train other systems before that’ll be almost as capable. I guess we do see sometimes cases where one technique lets you solve a new class of problems.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Maybe you could see something where you get increasingly capable narrow systems, and there’s not a discontinuity overall, you already had very strong narrow AI. But eventually you just have so many narrow AI systems that they can basically do everything, and maybe you get to a stage where the integrated whole of those is much stronger than if you just had half of them, let’s say. I guess this is sort of the comprehensive AI services model. But again that seems a bit unlikely to me, because most of the time you can probably outsource some other chunks to humans if you really needed to. But yeah, I think it’s a bit more plausible than some of the other stories.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>And then, in terms of whether I think current techniques are likely to get us to human-level AI– I guess I put significant probability mass on that depending on how narrowly you define it. One fuzzy definition is that a PhD thesis describing AGI being something that a typical AI researcher today could read and understand without too much work. Under this definition I’d assign 40 – 50%. And that could still include introducing quite a lot of new techniques, right, but just– I mean plausibly I think something based on deep learning, deep RL, you could describe to someone in the 1970s in a PhD thesis and they’d still understand it. But it’s just showing you, it’s not that much real theory that was developed, it was applying some pretty simple algorithms and a lot of compute in the right way. Which implies no huge new theoretical insights.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>But if we’re defining it more narrowly, only allowing small variants of current techniques, I think that’s much less likely to lead to AGI: around 10-20%. I think that case is almost synonymous with the argument that you just need more compute, because it seems like there are so many things right now that we really cannot do: we still don’t have great solutions to memory, we still can’t really do transfer learning, Sim2Real just barely works sometimes. We’re still extremely sample inefficient. It just feels like all of those problems are going to require quite a lot of research in themselves. I can’t see there being one simple trick that would solve all of them. But maybe, current algorithms if you gave them 10000x compute would do a lot better on these, that is somewhat plausible.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>And yeah, I do put fairly significant probability, 50%, on it being something that is kind of radically different. And I guess there’s a couple of reasons for that. One is, just trying to extrapolate progress forward, it does seem like there are some fairly serious roadblocks. Deep learning is slowing down in terms of, it’s not hitting as many big achievements as it was in the past. And also just AI has had many kinds of fads over time, right. We’ve had good old-fashioned AI, symbolic AI, we had expert systems, we had Bayesianism. It would be sort of surprising that the current method is the right one.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I don’t find that people are focusing on these techniques is necessarily particularly strong evidence that these systems are going to lead us to AGI. First, many researchers are not focused on AGI, and you can probably get useful applications out of current techniques. Second, AI research seems like it can be quite fashion driven. Obviously, there are organizations whose mission is to build AGI who are working within the current paradigm. And I think it is probably still the best bet, of the things that we know, but I still think it’s a bet that’s reasonably unlikely to pay off.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Does that answer your question?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> Yeah.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> Just on that last bit, you said– I might just be mixing up the different definitions you had and your different credences in those– but in the end there you said that’s a bet that you think is reasonably unlikely to pay off, but you’d also said 50% that it’s something radically different, so how– I think I was just confusing which ones you were on.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Adam Gleave:</strong> Right. So, I guess these definitions are all quite fuzzy, but I was saying 10-20% that something that is only a small difference away from current techniques would build AGI, and 50% that AGI was going to be comprehensible to us. I guess the distinction I’m trying to draw is the narrow one, which I give 10-20% credence, is we basically already have the right algorithms and we just need a few tricks and more compute. And the other more expansive definition, which I give 40-50% credence to, is allows for completely different algorithms, but excludes any deep theoretical insight akin to a whole new field of mathematics. So we might not be using back propagation any longer, we might not be using gradient descent, but it’ll be something similar — like the difference between gradient descent and evolutionary algorithms.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>There’s a separate question of, if you’re trying to build AGI right now, where should you be investing your resources? Should you be trying to come up with a completely new novel theory, or should you be trying to scale up current techniques? And I think it’s plausible that you should just be trying to scale up techniques and figure out if we can push them forward, because trying to come up with a completely new way of doing AI is also very challenging, right. It’s not really a sort of insight you can force.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> You kind of covered this earlier– and maybe you even said the exact number, so I’m sorry if this is a repeat. But one thing we’ve been asking people is the credence that without additional intervention– so imagining a world where EA wasn’t pushing for AI safety, and there wasn’t this separate AI safety movement outside of the AI research community, imagining that world. In that world, what is the chance that advanced artificial intelligence poses a significant risk of harm?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Adam Gleave:</strong> The chance it does cause a significant risk of harm?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> Yeah, that’s right.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Adam Gleave:</strong> Conditional on advanced artificial intelligence being created, I think 60, 70%. I have a much harder time giving an unconditional probability, because there are other things that could cause humanity to stop developing AI. Is a conditional probability good enough, or do you want me to give an unconditional one?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> No, I think the conditional one is what we’re looking for.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> Do you have a hunch about how much we can expect dedicated efforts to drive down that probability? That is, the EA-focused AI safety efforts.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Adam Gleave:</strong> I think the best case is, you drive it down to 20 – 10%. I’m kind of picturing a lot of this uncertainty coming from just, how hard is the problem technically? And if we do inhabit this really hard version where you have to solve all of the problems perfectly and you have to have a formally verified AI system, I just don’t think we’re going to do that in time. You’d have to solve a very hard coordination problem to stop people developing AI without those safety checks. It seems like a very expensive process, developing safe AI.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I guess the median case, where the AI safety community just sort of grows at its current pace, I think maybe that gets it down to 40 – 30%? But I have a lot of uncertainty in these numbers.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> Another question, going back to original statements for why you believe this– do you think there’s plausible concrete evidence that we could get or are likely to get that would change your views on this one direction or the other?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Adam Gleave:</strong> Yeah, so, seeing evidence of some of the more thorny but currently quite speculative technical problems, like inner optimizers, would make me update towards, ‘oh, this is just a really hard technical problem, and unless we really work hard on this, the default outcome is definitely going to be bad’. Right now, no one’s demonstrated an inner optimizer existing, it’s just a sort of theoretical problem. This is a bit of an unfair thing to ask in some sense, in that the whole reason that people are worried about this is that it’s only a problem with very advanced AI systems. Maybe I’m asking for evidence that can’t be provided. But relative to many other people, I am unconvinced by heuristic arguments appealing just to mathematical intuitions. I’m much more convinced either by very solid theoretical arguments that are proof-based, or by empirical evidence.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Another thing that would update me in a positive direction, as in AI seems less risky, would be seeing more AI researchers spontaneously focus on some relevant problems. There’s already, I guess this is a bit of a tangent, but I think maybe– people tend to conceive as the AI safety community as people who would identify as AI safety researchers. But I think the vast majority of AI safety research work is happening by people who have never heard of AI safety, but they have been working on related problems. This is useful to me all of the time. I think where we could plausibly end up having a lot more of this work happening without AI safety ever really becoming a thing is people realizing ‘oh, I want my robot to do this thing and I have a really hard time making it do that, let’s come up with a new imitation learning technique’.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>But yeah, other things that could update me positively… I guess, AI seeming like a harder problem, as in, it seems like AI, general artificial intelligence is further away, that would probably update me in a positive direction. It’s not obvious but I think generally all else being equal, longer timelines is going to generally have more time to diagnose problems. And also it seems like the current set of AI techniques — deep learning and very data-driven approaches — are particularly difficult to analyze or prove anything about, so some other paradigm is probably going to be better, if possible.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Other things that would make me scared would be more arms race dynamics. It’s been very sad to me what we’re seeing with China – U.S. arms race dynamics around AI, especially since it doesn’t even seem like there is much direct competition, but that meme is still being pushed for political reasons. <br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Any actual major catastrophes involving AI would make me think it’s more risky, although it would also make people pay more attention to AI risk, so I guess it’s not obvious what direction it would push overall. But it certainly would make me think that there’s a bit more technical risk.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I’m trying to think if there’s anything else that would make more pessimistic. I guess just more solid arguments for AI safety, because a lot of my skepticism is coming from there’s just this very unlikely sounding set of ideas, and there are just heuristic arguments that I’m convinced enough by to work on the problem, but not convinced by enough to say, this is definitely going to happen. And if there was a way to patch some of the holes in those arguments, then I probably would be more convinced as well.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> Can I ask you a little bit more about evidence for or against AGI being a certain distance away? You mentioned that as evidence that would change your mind. What sort of evidence do you have in mind?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Adam Gleave:</strong> Sure, so I guess a lot of the short timelines scenarios basically are coming from current ML techniques scaling to AGI, with just a bit more compute. So, watching for if those milestones are being achieved at the rate I was expecting, or slower.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>This is a little bit hard to crystallize, but I would say right now it seems like the rate of progress is slowing down compared to something like 2012, 2013. And interestingly, I think a lot of the more interesting progress has come from, I guess, from going to different domains. So we’ve seen maybe a little bit more progress happening in deep RL compared to supervised deep learning. And the optimistic thing is to say, well, that’s because we’ve solved supervised learning, but we haven’t really. We’ve got superhuman performance on ImageNet, but not on real images that you just take on your mobile phone. And it’s still very sample inefficient, we can’t do few-shot learning well. Sometimes it seems like there’s a lack of interest on the part of the research community in solving some of these problems. I think it’s partly because no one has a solid angle of attack on solving these problems.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Similarly, while some of the recent progress in deep RL has been very exciting, it seems to have some limits. For example, AlphaStar and OpenAI Five both involved scaling up self-play and population based training. These were hugely computationally expensive, and that was where a lot of the scaling was coming from. So while there have been algorithmic improvements, I don’t see how you get this working in much more complicated environments without either huge additional compute or some major insights. These are things that are pushing me towards thinking deep learning will not continue to scale, and therefore very short timelines are unlikely.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Something that would update me towards shorter timelines would be if  something that I thought was impossible turns out to be very easy. So OpenAI Five did update me positively, because I just didn’t think PPO was going to work well in Dota and it turns out that it does if you have enough compute. I don’t think it updated me that strongly towards short timelines, because it did need a lot of compute, and if you scale it to a more complex game you’re going to have exponential scaling. But it did make me think, well, maybe there isn’t a deep insight required, maybe this is going to be much more about finding more computationally efficient algorithms rather than lots of novel insights.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I guess there’s also sort of economic factors– I mention mostly because I often see people neglecting them. One thing that makes me bullish on short timelines is that, there’s some very well-resourced companies whose mission is to build AGI. OpenAI just raised a billion, DeepMind is spending considerable resources. As long as this continues, it’s going to be a real accelerator. But that could go away: if AI doesn’t start making people money, I expect another AI winter.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> One thing we’re asking people, and again I think you’ve actually already given us a pretty good sense of this, is just a relative weighting of different considerations. And as I say that, you actually have already been tagging this. But just to half review, from what I’ve scrawled down. A lot of different considerations in your relative optimism are: cases for AI as an x-risk being not as watertight as you’d like them, arguments for failure modes being the default and really hard, not being sold on those arguments, ideas that these problems might become easier to solve the closer we get to AGI when we have more powerful techniques, and then the general hope that people will try to solve them as we get closer to AI. Yeah, I think those were at least some of the main considerations I got. How strong relatively are those considerations in your reasoning?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Adam Gleave:</strong> I’m going to quote numbers that may not add up to 100, so we’ll have to normalize it at the end. I think the skepticism surrounding AI x-risk arguments is probably the strongest consideration, so I would put maybe 40% of my weight on that. This is because the outside view is quite strong to me, so if you talk about this very big problem that there’s not much concrete evidence for, then I’m going to be reasonably optimistic that actually we’re wrong and there isn’t a big problem.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>The second most important thing to me is the AI research community solving this naturally. We’re already seeing signs of a set of people beginning to work on related problems, and I see this continuing. So I’m putting 20% of my weight on that.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>And then, the hard version of AI safety not seeming very likely to me, I think that’s 10% of the weight. This seems reasonably important if I buy into the AI safety argument in general, because that makes a big difference in terms of how tractable these problems are. What were the other considerations you listed?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> Two of them might be so related that you already covered them, but I had distinguished between the problems getting easier the closer we get, and people working more on them the closer we get.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Adam Gleave:</strong> Yeah, that makes sense. I think I don’t put that much weight on the problems getting easier. Or I don’t directly put weight on it, maybe it’s just rolled into my skepticism surrounding AI safety arguments, because I’m going to naturally find an argument a bit uncompelling if you say ‘we don’t know how to properly model human preferences’. I’m going to say, ‘Well, we don’t know how to properly do lots of things humans can do right now’. So everything needs to be relative to our capabilities. Whereas I find arguments of the form ‘we can solve problems that humans can’t solve, but only when we know how to specify what those problems are’, that seems more compelling, that’s talking about a relative strength between ability to optimize vs. ability to specify objectives. Obviously that’s not the only AI safety problem, but it’s a problem.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>So yeah, I think I’m putting a lot of the weight on people paying more attention to these problems over time, so that’s probably actually 15 – 20% of my weight. And then I’ll put 5% on the problems getting easier and then some residual probability mass on things I haven’t thought about or haven’t mentioned in this conversation.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> Is there anything you wish we had asked that you would like to talk about?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Adam Gleave:</strong> I guess, I don’t know if this is really useful, but I do wish I had a better sense of what other people in the safety community and outside of it actually thought and why they were working on it, so I really appreciate you guys doing these interviews because it’s useful to me as well. I am generally a bit concerned about lots of people coming to lots of different conclusions regarding how pessimistic we should be, regarding timelines, regarding the right research agenda. <br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I think disagreement can be healthy because it’s good to explore different areas. The ideal thing would be for us to all converge to some common probability distribution and we decide we’re going to work on different areas. But it’s very hard psychologically to do this, to say, ‘okay, I’m going to be the person working on this area that I think isn’t very promising because at the margin it’s good’– people don’t work like that. It’s better if people think, ‘oh, I am working on the best thing, under my beliefs’. So having some diversity of beliefs is good. But it bothers me that I don’t know why people have come to different conclusions to me. If I understood why they disagree, I’d be happier at least.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I’m trying to think if there’s anything else that’s relevant… yeah, so I guess another, this is merely just a question for you guys to maybe think about, is, I’m still unsure about how valuable field-building should be. And in particular, to what extent AI safety researchers should be working on this. It seems like a lot of reasons why I was optimistic assume the the AI research community is going to solve some of these problems naturally. A natural follow up to that is to ask whether we should be doing something to encourage this to happen, like writing more position papers, or just training up more grad students? Should we be trying to actively push for this rather than just relying on people to organically develop an interest in this research area? And I don’t know whether you can actually change research directions in this way, because it’s very far outside my area of expertise, but I’d love someone to study it.</p>
+ </HTML>
+ 
+

Conversation with Ernie Davis

2022-09-21T07:37:41+00:00

@@ -1 +1,343 @@
+ ====== Conversation with Ernie Davis ======
+ 
+ // Published 23 August, 2019; last updated 27 May, 2020 //
+ 
+ <HTML>
+ <p>AI Impacts spoke with computer scientist Ernie Davis about his views of AI risk. With his permission, we have transcribed this interview.</p>
+ </HTML>
+ 
+ 
+ 
+ ==== Participants ====
+ 
+ 
+ <HTML>
+ <ul>
+ <li><div class="li">
+ <a href="https://cs.nyu.edu/davise/" rel="noopener noreferrer" target="_blank"><strong>Ernest Davis</strong></a> – professor of computer science at the Courant Institute of Mathematical Science, New York University
+                 </div></li>
+ <li><div class="li">
+ <a href="http://robertlong.online/" rel="noopener noreferrer" target="_blank"><strong>Robert Long</strong></a> – AI Impacts
+                 </div></li>
+ </ul>
+ </HTML>
+ 
+ 
+ ==== Summary ====
+ 
+ 
+ <HTML>
+ <p>We spoke over the phone with Ernie Davis on August 9, 2019. Some of the topics we covered were:</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <ul>
+ <li><div class="li">What Davis considers to be the most urgent risks from AI</div></li>
+ <li><div class="li">
+ <span style="font-weight: 400;">Davis’s disagreements with Nick Bostrom, Eliezer Yudkowsky, and Stuart Russell</span>
+ <ul>
+ <li><div class="li">The relationship between greater intelligence and greater power</div></li>
+ <li><div class="li"><span style="font-weight: 400;">How difficult it is to design a system that can be turned off</span></div></li>
+ <li><div class="li">How difficult it would be to encode safe ethical principles in an AI system</div></li>
+ </ul>
+ </div></li>
+ <li><div class="li">Davis’s evaluation of the <span style="font-weight: 400;">likelihood that advanced, autonomous AI will be a major problem within the next two hundred years; and what evidence would change his mind</span></div></li>
+ <li><div class="li">Challenges and progress towards human-level AI</div></li>
+ </ul>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>This transcript has been lightly edited for concision and clarity.</p>
+ </HTML>
+ 
+ 
+ ==== Transcript ====
+ 
+ 
+ <HTML>
+ <p><b>Robert Long:</b> <span style="font-weight: 400;">You’re one of the few people, I think, who is an expert in AI, and is not necessarily embedded in the AI Safety community, but you have engaged substantially with arguments from that community. I’m thinking especially of your</span> <a href="https://cs.nyu.edu/davise/papers/Bostrom.pdf" rel="noopener noreferrer" target="_blank"><span style="font-weight: 400;">review</span></a> <span style="font-weight: 400;">of</span> <i><span style="font-weight: 400;">Superintelligence</span></i><span style="font-weight: 400;">.<span class="easy-footnote-margin-adjust" id="easy-footnote-1-1955"></span><span class="easy-footnote"><a href="#easy-footnote-bottom-1-1955" title='Davis, Ernest. &amp;#8220;&lt;a href="https://cs.nyu.edu/davise/papers/Bostrom.pdf"&gt;Ethical guidelines for a superintelligence&lt;/a&gt;.&amp;#8221; Artificial Intelligence 220 (2015): 121-124.'><sup>1</sup></a></span> <span class="easy-footnote-margin-adjust" id="easy-footnote-2-1955"></span><span class="easy-footnote"><a href="#easy-footnote-bottom-2-1955" title='&lt;/span&gt;&lt;span style="font-weight: 400;"&gt;Bostrom, Nick. &lt;a href="https://www.amazon.com/Superintelligence-Dangers-Strategies-Nick-Bostrom/dp/1501227742"&gt;&lt;em&gt;Superintelligence: Paths, Dangers, Strategies&lt;/em&gt;&lt;/a&gt;. Oxford University Press (2014).&lt;/span&gt;&lt;span style="font-weight: 400;"&gt;'><sup>2</sup></a></span></span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><span style="font-weight: 400;">I was hoping we could talk a little bit more about your views on AI safety work. There’s a particular proposition that we’re trying to get people’s opinions on. The question is: Is it valuable for people to be expending significant effort doing work that purports to reduce the risk from advanced artificial intelligence? I’ve read some of your work; I can guess some of your views. But I was wondering: what would you say is your answer to that question, whether this kind of work is valuable to do now?</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><b>Ernie Davis:</b> <span style="font-weight: 400;">Well, a number of parts to the answer. In terms of short term—and “short” being not very short—short term risks from computer technology generally, this is very low priority. The risks from cyber crime, cyber terrorism, somebody taking hold of the insecurity of the internet of things and so on—that in particular is one of my bugaboos—are, I think, an awful lot more urgent. So there’s urgency; I certainly don’t see that this is especially urgent work. </span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><span style="font-weight: 400;">Now, some of the approaches are being taken to long term AI safety seem to me extremely far fetched. On the one hand the fears of people like Bostrom and Yudkowsky and to a lesser extent Stuart Russell—seem to me misdirected and the approaches they are proposing are also misdirected. I have a</span> <a href="https://www.amazon.com/Rebooting-AI-Building-Artificial-Intelligence-ebook/dp/B07MYLGQLB" rel="noopener noreferrer" target="_blank"><span style="font-weight: 400;">book</span></a> <span style="font-weight: 400;">with Gary Marcus which is coming out in September, and we have a chapter which is called ‘Trust’ which gives our opinions—which are pretty much convergent—at length. I can send you that chapter. </span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><b>Robert Long:</b> <span style="font-weight: 400;">Yes, I’d certainly be interested in that.</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><b>Ernie Davis:</b> <span style="font-weight: 400;">So, the kinds of things that Russell is proposing—Russell also has a</span> <a href="https://www.amazon.com/Human-Compatible-Artificial-Intelligence-Problem/dp/0525558616/ref=sr_1_2?keywords=Stuart+Russell&amp;qid=1565996574&amp;s=books&amp;sr=1-2" rel="noopener noreferrer" target="_blank"><span style="font-weight: 400;">book</span></a> <span style="font-weight: 400;">coming out in October, he is developing ideas that he’s already published about: the way to have safe AI is to have them be unsure about what the human goals are.</span><span style="font-weight: 400;"><span class="easy-footnote-margin-adjust" id="easy-footnote-3-1955"></span><span class="easy-footnote"><a href="#easy-footnote-bottom-3-1955" title='see, for example, Russell, Stuart. &amp;#8220;&lt;a href="https://people.eecs.berkeley.edu/~russell/papers/russell-bbvabook17-pbai.pdf"&gt;Provably beneficial artificial intelligence&lt;/a&gt;.&amp;#8221; Exponential Life, The Next Step (2017).'><sup>3</sup></a></span> And Yudkowsky develops similar ideas in his work, engages with them, and tries to measure their success. This all seems to me too clever by half. And I don’t think it’s addressing what the real problems are going to be.</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><span style="font-weight: 400;">My feeling is that the problem of AIs doing the wrong thing is a very large one—you know, just by sheer inadvertence and incompetent design. And the solution there, more or less, is to design them well and build in safety features of the kinds that one has in engineering, one has throughout engineering. Whenever one is doing an engineering project, one builds in—one designs for failure. And one has to do that with AI as well. The danger of AI being abused by bad human actors is a very serious danger. And that has to be addressed politically, like all problems involving bad human actors. </span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><span style="font-weight: 400;">And then there are directions in AI where I think it’s foolish to go. For instance it would be very foolish to build—it’s not currently technically feasible, but if it were, and it may at some point become technically feasible—to build robots that can reproduce themselves cheaply. And that’s foolish, but it’s foolish for exactly the same reason that you want to be careful about introducing new species. It’s why Australia got into trouble with the rabbits, namely: if you have a device that can reproduce itself and it has no predators, then it will reproduce itself and it gets to be a nuisance.</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><span style="font-weight: 400;">And that’s almost separate. A device doesn’t have to be superintelligent to do that, in fact superintelligence probably just makes that harder because a superintelligent device is harder to build; a self replicating device might be quite easy to build on the cheap. It won’t survive as well as a superintelligent one, but if it can reproduce itself fast enough that doesn’t matter. So that kind of thing, you want to avoid.</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><span style="font-weight: 400;">There’s a question which we almost entirely avoided in our book, which people always ask all the time, which is, at what point do machines become conscious. And my answer to that—I’m not necessarily speaking for Gary—my answer to that is that you want to avoid building machines which you have any reason to suspect are conscious. Because once they become conscious, they simply raise a whole collection of ethical issues like—”is it ethical turn them off?”, is the first one, and “what are your responsibilities toward the thing?”. And so you want to continue to have programs which, like current programs, one can think of purely as tools which we can use, which it is ethical to use as we choose.</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><span style="font-weight: 400;">So that’s a thing to be avoided, it seems to me, in AI research. And whether people are wise enough to avoid that, I don’t know. I would hope so. So in some ways I’m more conservative than a lot of people in the AI safety world—in the sense that they assume that self replicating robots will be a thing and that self-aware robots will be a thing and the object is to design them safely. My feeling is that research shouldn’t go there at all.</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><b>Robert Long:</b> <span style="font-weight: 400;">I’d just like to dig in on a few more of those claims in particular. I would just like to hear a little bit more about what you think the crux of your disagreement is with people like Yudkowsky and Russell and Bostrom. Maybe you can pick one because they all have different views. So, you said that you feel that their fears are far-fetched and that their approaches are far-fetched as well. Can you just say a little bit about more about why you think that? A few parts: what you think is the core fear or prediction that their work is predicated on, and why you don’t share that fear or prediction.</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><span style="font-weight: 400;"><strong>Ernie Davis:</strong> Both Bostrom very much, and Yudkowsky very much, and Russell to some extent, have this idea that if you’re smart enough you get to be God. And that just isn’t correct. The idea that a smart enough machine can do whatever it want—there’s a really good</span> <a href="https://www.popsci.com/robot-uprising-enlightenment-now" rel="noopener noreferrer" target="_blank"><span style="font-weight: 400;">essay</span></a> <span style="font-weight: 400;">by Steve Pinker, by the way, have you seen it?<span class="easy-footnote-margin-adjust" id="easy-footnote-4-1955"></span><span class="easy-footnote"><a href="#easy-footnote-bottom-4-1955" title='Pinker, Steven. &amp;#8220;&lt;a href="https://www.popsci.com/robot-uprising-enlightenment-now"&gt;We’re Told to Fear Robots. But Why Do We Think They’ll Turn on Us?&lt;/a&gt;&amp;#8221; Popular Science 13 (2018).'><sup>4</sup></a></span></span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><b>Robert Long:</b> <span style="font-weight: 400;">I’ve heard of it but have not read it.</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><b>Ernie Davis:</b> <span style="font-weight: 400;">I’ll send you the link. A couple of good essays by Pinker, I think. So, it’s not the case that once superintelligence is reached, then times become messianic if they’re benevolent and dystopian if they’re not. They’re devices. They are limited in what they can do. And the other thing is that we are here first, and we should be able to design them in such a way that they’re safe. It is not really all that difficult to design an AI or a robot which you can turn off and which cannot block you from turning it off.</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><span style="font-weight: 400;">And it seems to me a mistake to believe otherwise. With two caveats. One is that, if you embed it in a situation where it’s very costly to turn off—it’s controlling the power grid and the power grid won’t work if you turn it off, then you’re in trouble. And secondly, if you have malicious actors who are deliberately designing, building devices which can’t be turned off. It’s not that it’s impossible to build an intelligent machine that is very dangerous.</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><span style="font-weight: 400;">But that doesn’t require superintelligence. That’s possible with very limited intelligence, and the more intelligent, to some extent, the harder it is. But again that’s a different problem. It doesn’t become a qualitatively different problem once the thing has exceeded some predefined level of intelligence.</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><b>Robert Long:</b> <span style="font-weight: 400;">You might be even more familiar with these arguments than I am—in fact I can’t really recite them off the top of my head—but I suppose Bostrom and Yudkowsky, and maybe Russell too, do talk about this at length. And I guess they’re they’re always like, Well, you might think you have thought of a good failsafe for ensuring these things won’t get un-turn-offable. But, so they say, you’re probably underestimating just how weird things can get once you have superintelligence. </span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><span style="font-weight: 400;">I suppose maybe that’s precisely what you’re disagreeing with: maybe they’re overestimating how weird and difficult things get once things are above human level. Why do you think you and they have such different hunches, or intuitions, about how weird things can get?</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><b>Ernie Davis:</b> <span style="font-weight: 400;">I don’t know, I think they’re being unrealistic. If you take a 2019 genius and you put him into a Neolithic village, they can kill him no matter how intelligent he is, and how much he knows and so on. </span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><span style="font-weight: 400;"><strong>Robert Long:</strong> I’ve been trying to trace the disagreements here and I think a lot of it does just maybe come down to people’s intuitions about what a very smart person can do if put in a situation where they are far smarter than other people. I think this actually comes up in someone who <a href="https://intelligence.org/2015/02/06/davis-ai-capability-motivation/" rel="noopener noreferrer" target="_blank">responded</a> to your review.</span> <span style="font-weight: 400;">They claim</span><span style="font-weight: 400;">, “I think if I went back to the time of the Romans I could probably accrue a lot of power just by knowing things that they did not know.”<span class="easy-footnote-margin-adjust" id="easy-footnote-5-1955"></span><span class="easy-footnote"><a href="#easy-footnote-bottom-5-1955" title='This is not an accurate paraphrase because the review in question stipulates that the human could take back “all the 21st-century knowledge and technologies they wanted”. The passage is: “If we sent a human a thousand years into the past, equipped with all the 21st-century knowledge and technologies they wanted, they could conceivably achieve dominant levels of wealth and power in that time period.”—Bensinger, Rob. “&lt;a href="https://intelligence.org/2015/02/06/davis-ai-capability-motivation/"&gt;Davis on AI Capability and Motivation&lt;/a&gt;.” Accessed August 23, 2019. https://intelligence.org/2015/02/06/davis-ai-capability-motivation/.'><sup>5</sup></a></span></span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><span style="font-weight: 400;"><strong>Ernie Davis:</strong> I missed that, or I forgot that or something.</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><b>Robert Long:</b> <span style="font-weight: 400;">Trying to locate the crux of the disagreement: one key disagreement is what the relationship is between greater intellectual capacity and greater physical power and control over the world. Does that seem safe to say, that that’s one thing you disagree with them about?</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><b>Ernie Davis:</b> <span style="font-weight: 400;">I think so, yes. That’s one point of disagreement. A second point of disagreement is the difficulty of—the point which we make in the book at some length is that, if you’re going to have an intelligence that’s in any way comparable to human, you’re going to have to build in common sense. It’s going to have to have a large degree of commonsense understanding. And once an AI has common sense it will realize that there’s no point in turning the world into paperclips, and that there’s no point in committing mass murder to go fetch the milk—Russell’s example—and so on. My feeling is that one can largely incorporate a moral sense, when it becomes necessary; you can incorporate moral rules into your robots.</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><span style="font-weight: 400;">And one of the people who criticized my Bostrom paper said, well, philosophers haven’t solved the problems of ethics in 2,000 years, how do you think we’re going to solve them? And my feeling is we don’t have to come up with the ultimate solution to ethical problems. You just have to make sure that they understand it to a degree that they don’t do spectacularly foolish and evil things. And that seems to me doable.</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><span style="font-weight: 400;">Another point of disagreement with Bostrom in particular, and I think also Yudkowsky, is that they have the idea that ethical senses evolve—which is certainly true—and that a superintelligence, if well-designed, can be designed in such a way that it will itself evolve toward a superior ethical sense. And that this is the thing to do. Bostrom goes into this at considerable length: somehow, give it guidance toward an ethical sense which is beyond anything that we currently understand. That seems to me not very doable, but it would be a really bad thing to do if we could do it, because this super ethics might decide that the best thing to do is to exterminate the human population. And in some super-ethical sense that might be true, but we don’t want it to happen. So the belief in the super ethics—I have no belief, I have no faith in the super ethics, and I have even less faith that there’s some way of designing an AI so that as it grows superintelligent it will achieve super ethics in a comfortable way. So this all seems to me pie in the sky.</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><b>Robert Long:</b> <span style="font-weight: 400;">So the key points of disagreement we have so far are the relationship between intelligence and power; and the second thing is, how hard is what we might call the safety problem. And it sounds like even if you became more worried about very powerful AIs, you think it would not require substantial research and effort and money (as some people think) to make them relatively safe?</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><b>Ernie Davis:</b> <span style="font-weight: 400;">Where I would put the effort in is into thinking about, from a legal regulatory perspective, what we want to do. That’s not an easy question.</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><span style="font-weight: 400;">The problem at the moment, the most urgent question, is the problem of fake news. We object to having bots spreading fake news. It’s not clear what the best way of preventing that is without infringing on free speech. So that’s a hard problem. And that is, I think, very well worth thinking about. But that’s of course a very different problem. The problems of security at the practical level—making sure that an adversary can’t take control of all the cars that are connected to the Internet and start using them as weapons—is, I think, a very pressing problem. But again that has nothing much to do with the AI safety projects that are underway.</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><b>Robert Long:</b> <span style="font-weight: 400;">Kind of a broad question—I was curious to hear what you make of the mainstream AI safety efforts that are now occurring. My rough sense is since your review and since</span> <i><span style="font-weight: 400;">Superintelligence</span></i><span style="font-weight: 400;">, AI safety really gained respectability and now there are AI safety teams at places like DeepMind and OpenAI. And not only do they work on the near-term stuff which you talk about, but they are run by people who are very concerned about the long term. What do you make of that trend?</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><b>Ernie Davis:</b> <span style="font-weight: 400;">The thing is, I haven’t followed their work very closely, to tell you the truth. So I certainly don’t want to criticize it very specifically. There are smart and well-intentioned people on these teams, and I don’t doubt that a lot of what they’re doing is good work. </span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><span style="font-weight: 400;">The work I’m most enthusiastic about in that direction is problems that are fairly near term. And also autonomous weapons is a pretty urgent problem, and requires political action. So the more that can be done about keeping those under control the better.</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><b>Robert Long:</b> <span style="font-weight: 400;">Do you think your views on what it will take before we ever get to human-level or more advanced AI, do you think that drives a lot of your opinions as well? For example, your</span> <a href="https://cs.nyu.edu/davise/research.html" rel="noopener noreferrer" target="_blank"><span style="font-weight: 400;">own work</span></a><span style="font-weight: 400;"> on common sense and how hard of a problem that can be?<span class="easy-footnote-margin-adjust" id="easy-footnote-6-1955"></span><span class="easy-footnote"><a href="#easy-footnote-bottom-6-1955" title='Davis, Ernest. &amp;#8220;&lt;a href="https://ubiquity.acm.org/article.cfm?id=2667640"&gt;The Singularity and the State of the Art in Artificial Intelligence: The technological singularity&lt;/a&gt;.&amp;#8221; Ubiquity 2014, no. October (2014): 2.'><sup>6</sup></a></span> <span class="easy-footnote-margin-adjust" id="easy-footnote-7-1955"></span><span class="easy-footnote"><a href="#easy-footnote-bottom-7-1955" title='Davis, Ernest, and Gary Marcus. &amp;#8220;&lt;a href="https://cs.nyu.edu/davise/papers/CommonsenseFinal.pdf"&gt;Commonsense reasoning and commonsense knowledge in artificial intelligence&lt;/a&gt;.&amp;#8221; Commun. ACM 58, no. 9 (2015): 92-103.'><sup>7</sup></a></span></span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><b>Ernie Davis:</b> <span style="font-weight: 400;">Yeah sure, certainly it informs my views. It affects the question of urgency and it affects the question of what the actual problems are likely to be.</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><b>Robert Long:</b> <span style="font-weight: 400;">What would you say is your credence, your evaluation of the likelihood, that without significant additional effort, advanced AI poses a significant risk of harm?</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><b>Ernie Davis:</b> <span style="font-weight: 400;">Well, the problem is that without more work on artificial intelligence, artificial intelligence poses no risk. And the distinction between work on AI, and work on AI safety—work on AI is an aspect of work on AI safety. So I’m not sure it’s a well-defined question.</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><span style="font-weight: 400;">But that’s a bit of a debate. What we mean is, if we get rid of all the AI safety institutes, and don’t worry about the regulation, and just let the powers that be do whatever they want to do, will advanced AI be a significant threat? There is certainly a sufficiently significant probability of that, but almost all of that probability has do with its misuse by bad actors.</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><span style="font-weight: 400;">The problem that AI will autonomously become a major threat, I put it at very small. The probability that people will start deploying AI in a destructive way and causing serious harm, to some extent or other, is fairly large. The probability that autonomous AI is going to be one of our major problems within the next two hundred years I think is less than one in a hundred.</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><b>Robert Long:</b> <span style="font-weight: 400;">Ah, good. Thank you for parsing that question. It’s that last bit that I’m curious about. And what do you think are the key things that go into that low probability? It seems like there’s two parts: odds of it being a problem if it arises, and odds of it arising. I guess what I’m trying to get at is—again, uncertainty in all of this—but do you have hunches or ‘AI timelines’ as people call them, about how far away we are from human level intelligence being a real possibility?</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><b>Ernie Davis:</b> <span style="font-weight: 400;">I’d be surprised—well, I will not be surprised, because I will be dead—but I would be surprised if AI reached human levels of capacity across the board within the next 50 years.</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><b>Robert Long:</b> <span style="font-weight: 400;">I suspect a lot of this is also found in your written work. But could you say briefly what you think are the things standing in the way, standing in between where we’re at now in our understanding of AI, and getting to that—where the major barriers or confusions or new discoveries to be made are?</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><b>Ernie Davis:</b> <span style="font-weight: 400;">Major barriers—well, there are many barriers. We don’t know how to give computers basic commonsense understanding of the world. We don’t know how to represent the meaning of either language or what the computer can see through vision. We don’t have a good theory of learning. Those, I think, are the main problems that I see and I don’t see that the current direction of work in AI is particularly aimed at those problems.</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><span style="font-weight: 400;">And I don’t think it’s likely to solve those problems without a major turnaround. And the problems, I think, are very hard. And even after the field has turned around I think it will take decades before they’re solved.</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><b>Robert Long:</b> <span style="font-weight: 400;">I suspect a lot of this might be what the book is about. But can you say what you think that turnaround is, or how you would characterize the current direction? I take it you mean something like deep learning and reinforcement learning?</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><b>Ernie Davis:</b> <span style="font-weight: 400;">Deep learning, end-to-end learning, is what I mean by the current direction. It is very much the current direction. And the turnaround, in one sentence, is that one has to engage with the problems of meaning, and with the problems of common sense knowledge.</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><b>Robert Long:</b> <span style="font-weight: 400;">Can you think of plausible concrete evidence that would change your views one way or the other? Specifically, on these issues of the problem of safety, and what if any work should be done.</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><b>Ernie Davis:</b> <span style="font-weight: 400;">Well sure, I mean, if, on the one hand, progress toward understanding in a broad sense—if there’s startling progress on the problem of understanding then my timeline changes obviously, and that makes the problem harder.</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><span style="font-weight: 400;">And if it turned out—this is an empirical question—if it turned out that certain types of AI systems inherently turned toward single minded pursuit of malevolence or toward their own purposes and so on. And it seems to me wildly unlikely, but it’s not unimaginable.</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><span style="font-weight: 400;">Or of course, if in a social sense—if people start uncontrollably developing these things. I mean it always amazes me the amount of sheer malice in the cyber world, the number of people who are willing to hack systems and develop bugs for no reason. The people who are doing it to make money is one thing, I can understand them. The people do it simply out of the challenge and out of the spirit of mischief making—I’m surprised that there are so many. </span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><b>Robert Long:</b> <span style="font-weight: 400;">Can I ask a little bit more about what progress towards understanding looks like? What sort of tasks or behaviors? What does the arxiv paper that demonstrates that look like? What’s it called, and what is is the program doing, where you’re like, “Wow, this is this is a huge stride.”</span></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><span style="font-weight: 400;"><strong>Ernie Davis:</strong> I have a <a href="https://cs.nyu.edu/davise/papers/squabu.pdf">paper</a> called “</span><span style="font-weight: 400;">How to write science questions that are easy for people and hard for computers</span><span style="font-weight: 400;">.”<span class="easy-footnote-margin-adjust" id="easy-footnote-8-1955"></span><span class="easy-footnote"><a href="#easy-footnote-bottom-8-1955" title='Davis, Ernest. &amp;#8220;&lt;a href="https://cs.nyu.edu/davise/papers/squabu.pdf"&gt;How to write science questions that are easy for people and hard for computers&lt;/a&gt;.&amp;#8221; AI magazine 37, no. 1 (2016): 13-22.'><sup>8</sup></a></span></span> <span style="font-weight: 400;">So once you get a response paper to that: “My system answers all the questions in this dataset which are easy for people and hard for computers.” That would be impressive. If you have a program that can read basic narrative text and answer questions about it or watch a video and answer questions, a film and answer questions about it—that would be impressive.</span></p>
+ </HTML>
+ 
+ 
+ ==== Notes ====
+ 
+ 
+ <HTML>
+ <ol class="easy-footnotes-wrapper">
+ <li><div class="li">
+ <span class="easy-footnote-margin-adjust" id="easy-footnote-bottom-1-1955"></span>Davis, Ernest. “<a href="https://cs.nyu.edu/davise/papers/Bostrom.pdf">Ethical guidelines for a superintelligence</a>.” Artificial Intelligence 220 (2015): 121-124.<a class="easy-footnote-to-top" href="#easy-footnote-1-1955"></a>
+ </div></li>
+ <li><div class="li"><span class="easy-footnote-margin-adjust" id="easy-footnote-bottom-2-1955"></span><span style="font-weight: 400">Bostrom, Nick. <a href="https://www.amazon.com/Superintelligence-Dangers-Strategies-Nick-Bostrom/dp/1501227742"><em>Superintelligence: Paths, Dangers, Strategies</em></a>. Oxford University Press (2014).</span><span style="font-weight: 400"><a class="easy-footnote-to-top" href="#easy-footnote-2-1955"></a></span></div></li>
+ <li><div class="li"><span style="font-weight: 400"><span class="easy-footnote-margin-adjust" id="easy-footnote-bottom-3-1955"></span>see, for example, Russell, Stuart. “<a href="https://people.eecs.berkeley.edu/~russell/papers/russell-bbvabook17-pbai.pdf">Provably beneficial artificial intelligence</a>.” Exponential Life, The Next Step (2017).<a class="easy-footnote-to-top" href="#easy-footnote-3-1955"></a></span></div></li>
+ <li><div class="li">
+ <span class="easy-footnote-margin-adjust" id="easy-footnote-bottom-4-1955"></span>Pinker, Steven. “<a href="https://www.popsci.com/robot-uprising-enlightenment-now">We’re Told to Fear Robots. But Why Do We Think They’ll Turn on Us?</a>” Popular Science 13 (2018).<a class="easy-footnote-to-top" href="#easy-footnote-4-1955"></a>
+ </div></li>
+ <li><div class="li">
+ <span class="easy-footnote-margin-adjust" id="easy-footnote-bottom-5-1955"></span>This is not an accurate paraphrase because the review in question stipulates that the human could take back “all the 21st-century knowledge and technologies they wanted”. The passage is: “If we sent a human a thousand years into the past, equipped with all the 21st-century knowledge and technologies they wanted, they could conceivably achieve dominant levels of wealth and power in that time period.”—Bensinger, Rob. “<a href="https://intelligence.org/2015/02/06/davis-ai-capability-motivation/">Davis on AI Capability and Motivation</a>.” Accessed August 23, 2019. https://intelligence.org/2015/02/06/davis-ai-capability-motivation/.<a class="easy-footnote-to-top" href="#easy-footnote-5-1955"></a>
+ </div></li>
+ <li><div class="li">
+ <span class="easy-footnote-margin-adjust" id="easy-footnote-bottom-6-1955"></span>Davis, Ernest. “<a href="https://ubiquity.acm.org/article.cfm?id=2667640">The Singularity and the State of the Art in Artificial Intelligence: The technological singularity</a>.” Ubiquity 2014, no. October (2014): 2.<a class="easy-footnote-to-top" href="#easy-footnote-6-1955"></a>
+ </div></li>
+ <li><div class="li">
+ <span class="easy-footnote-margin-adjust" id="easy-footnote-bottom-7-1955"></span>Davis, Ernest, and Gary Marcus. “<a href="https://cs.nyu.edu/davise/papers/CommonsenseFinal.pdf">Commonsense reasoning and commonsense knowledge in artificial intelligence</a>.” Commun. ACM 58, no. 9 (2015): 92-103.<a class="easy-footnote-to-top" href="#easy-footnote-7-1955"></a>
+ </div></li>
+ <li><div class="li">
+ <span class="easy-footnote-margin-adjust" id="easy-footnote-bottom-8-1955"></span>Davis, Ernest. “<a href="https://cs.nyu.edu/davise/papers/squabu.pdf">How to write science questions that are easy for people and hard for computers</a>.” AI magazine 37, no. 1 (2016): 13-22.<a class="easy-footnote-to-top" href="#easy-footnote-8-1955"></a>
+ </div></li>
+ </ol>
+ </HTML>
+ 
+

Conversation with Paul Christiano

2022-09-21T07:37:41+00:00

@@ -1 +1,630 @@
+ ====== Conversation with Paul Christiano ======
+ 
+ // Published 11 September, 2019; last updated 12 September, 2019 //
+ 
+ <HTML>
+ <p>AI Impacts talked to AI safety researcher Paul Christiano about his views on AI risk. With his permission, we have transcribed this interview.</p>
+ </HTML>
+ 
+ 
+ 
+ ===== Participants =====
+ 
+ 
+ <HTML>
+ <ul>
+ <li><div class="li">
+ <a href="https://paulfchristiano.com/">Paul Christiano</a> — OpenAI safety team
+                 </div></li>
+ <li><div class="li">Asya Bergal – AI Impacts</div></li>
+ <li><div class="li">Ronny Fernandez – AI Impacts</div></li>
+ <li><div class="li">
+ <a href="http://robertlong.online/">Robert Long</a> – AI Impacts
+                 </div></li>
+ </ul>
+ </HTML>
+ 
+ 
+ ===== Summary =====
+ 
+ 
+ <HTML>
+ <p>We spoke with Paul Christiano on August 13, 2019. Here is a brief summary of that conversation:<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <ul>
+ <li><div class="li">AI safety is worth working on because AI poses a large risk and AI safety is neglected, and tractable.</div></li>
+ <li><div class="li">Christiano is more optimistic about the likely social consequences of advanced AI than some others in AI safety, in particular researchers at the Machine Intelligence Research Institute (MIRI), for the following reasons:
+                   <ul>
+ <li><div class="li">The prior on any given problem reducing the expected value of the future by 10% should be low.</div></li>
+ <li><div class="li">There are several ‘saving throws’–ways in which, even if one thing turns out badly, something else can turn out well, such that AI is not catastrophic.</div></li>
+ <li><div class="li">Many algorithmic problems are either solvable within 100 years, or provably impossible; this inclines Christiano to think that AI safety problems are reasonably likely to be easy.</div></li>
+ <li><div class="li">MIRI thinks success is guaranteeing that unaligned intelligences are never created, whereas Christiano just wants to leave the next generation of intelligences in at least as good of a place as humans were when building them. </div></li>
+ <li><div class="li">‘Prosaic AI’ that looks like current AI systems will be less hard to align than MIRI thinks: 
+                       <ul>
+ <li><div class="li">Christiano thinks there’s at least a one-in-three chance that we’ll be able to solve AI safety on paper in advance.</div></li>
+ <li><div class="li">A common view within ML is that that we’ll successfully solve problems as they come up.</div></li>
+ </ul>
+ </div></li>
+ <li><div class="li">Christiano has relatively less confidence in several inside view arguments for high levels of risk:
+                       <ul>
+ <li><div class="li">Building safe AI requires hitting a small target in the space of programs, but building any AI also requires hitting a small target.</div></li>
+ <li><div class="li">Because Christiano thinks that the state of evidence is less clear-cut than MIRI does, Christiano also has a higher probability that people will become more worried in the future. </div></li>
+ <li><div class="li">Just because we haven’t solved many problems in AI safety yet doesn’t mean they’re intractably hard– many technical problems feel this way and then get solved in 10 years of effort.</div></li>
+ <li><div class="li">Evolution is often used as an analogy to argue that general intelligence (humans with their own goals) becomes dangerously unaligned with the goals of the outer optimizer (evolution selecting for reproductive fitness). But this analogy doesn’t make Christiano feel so pessimistic, e.g. he thinks that if we tried, we could breed animals that are somewhat smarter than humans and are also friendly and docile.</div></li>
+ <li><div class="li">Christiano is optimistic about verification, interpretability, and adversarial training for inner alignment, whereas MIRI is pessimistic.</div></li>
+ <li><div class="li">MIRI thinks the outer alignment approaches Christiano proposes are just obscuring the core difficulties of alignment, while Christiano is not yet convinced there is a deep core difficulty.</div></li>
+ </ul>
+ </div></li>
+ </ul>
+ </div></li>
+ <li><div class="li">Christiano thinks there are several things that could change his mind and optimism levels, including:
+                   <ul>
+ <li><div class="li">Learning about institutions and observing how they solve problems analogous to AI safety.</div></li>
+ <li><div class="li">Seeing whether AIs become deceptive and how they respond to simple oversight.</div></li>
+ <li><div class="li">Seeing how much progress we make on AI alignment over the coming years.</div></li>
+ </ul>
+ </div></li>
+ <li><div class="li">Christiano is relatively optimistic about his iterated amplification approach:
+                   <ul>
+ <li><div class="li">Christiano cares more about making aligned AIs that are competitive with unaligned AIs, whereas MIRI is more willing to settle for an AI with very narrow capabilities.</div></li>
+ <li><div class="li">Iterated amplification is largely based on learning-based AI systems, though it may work in other cases.</div></li>
+ <li><div class="li">Even if iterated amplification isn’t the answer to AI safety, it’s likely to have subproblems in common with problems that are important in the future.</div></li>
+ </ul>
+ </div></li>
+ <li><div class="li">There are still many disagreements between Christiano and the Machine Intelligence Research Institute (MIRI) that are messy and haven’t been made precise.</div></li>
+ </ul>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>This transcript has been lightly edited for concision and clarity.</p>
+ </HTML>
+ 
+ 
+ ===== Transcript =====
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> Okay. We are recording. I’m going to ask you a bunch of questions related to something like AI optimism.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I guess the proposition that we’re looking at is something like ‘is it valuable for people to be spending significant effort doing work that purports to reduce the risk from advanced artificial intelligence’? The first question would be to give a short-ish version of the reasoning around that.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Paul Christiano:</strong> Around why it’s overall valuable?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> Yeah. Or the extent to which you think it’s valuable.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Paul Christiano:</strong> I don’t know, this seems complicated. I’m acting from some longtermerist perspective, I’m like, what can make the world irreversibly worse? There aren’t that many things, we go extinct. It’s hard to go extinct, doesn’t seem that likely.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> We keep forgetting to say this, but we are focusing less on ethical considerations that might affect that. We’ll grant…yeah, with all that in the background….<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Paul Christiano:</strong> Granting long-termism, but then it seems like it depends a lot on what’s the probability? What fraction of our expected future do we lose by virtue of messing up alignment * what’s the elasticity of that to effort / how much effort?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> That’s the stuff we’re curious to see what people think about.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal</strong>: I also just read your 80K interview, which I think probably covered like a lot of the reasoning about this.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Paul Christiano:</strong> They probably did. I don’t remember exactly what’s in there, but it was a lot of words.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I don’t know. I’m like, it’s a lot of doom probability. Like maybe I think AI alignment per se is like 10% doominess. That’s a lot. Then it seems like if we understood everything in advance really well, or just having a bunch of people working on now understanding what’s up, could easily reduce that by a big chunk.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Ronny Fernandez:</strong> Sorry, what do you mean by 10% doominesss?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Paul Christiano:</strong> I don’t know, the future is 10% worse than it would otherwise be in expectation by virtue of our failure to align AI. I made up 10%, it’s kind of a random number. I don’t know, it’s less than 50%. It’s more than 10% conditioned on AI soon I think.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Ronny Fernandez:</strong> And that’s change in expected value.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Paul Christiano:</strong> Yeah. Anyway, so 10% is a lot. Then I’m like, maybe if we sorted all our shit out and had a bunch of people who knew what was up, and had a good theoretical picture of what was up, and had more info available about whether it was a real problem. Maybe really nailing all that could cut that risk from 10% to 5% and maybe like, you know, there aren’t that many people who work on it, it seems like a marginal person can easily do a thousandth of that 5% change. Now you’re looking at one in 20,000 or something, which is a good deal.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> I think my impression is that that 10% is lower than some large set of people. I don’t know if other people agree with that.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Paul Christiano:</strong> Certainly, 10% is lower than lots of people who care about AI risk. I mean it’s worth saying, that I have this slightly narrow conception of what is the alignment problem. I’m not including all AI risk in the 10%. I’m not including in some sense most of the things people normally worry about and just including the like ‘we tried to build an AI that was doing what we want but then it wasn’t even trying to do what we want’. I think it’s lower now or even after that caveat, than pessimistic people. It’s going to be lower than all the MIRI folks, it’s going to be higher than almost everyone in the world at large, especially after specializing in this problem, which is a problem almost no one cares about, which is precisely how a thousand full time people for 20 years can reduce the whole risk by half or something.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> I’m curious for your statement as to why you think your number is slightly lower than other people.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Paul Christiano:</strong> Yeah, I don’t know if I have a particularly crisp answer. Seems like it’s a more reactive thing of like, what are the arguments that it’s very doomy? A priori you might’ve been like, well, if you’re going to build some AI, you’re probably going to build the AI so it’s trying to do what you want it to do. Probably that’s that. Plus, most things can’t destroy the expected value of the future by 10%. You just can’t have that many things, otherwise there’s not going to be any value left in the end. In particular, if you had 100 such things, then you’d be down to like 1/1000th of your values. 1/10 hundred thousandth? I don’t know, I’m not good at arithmetic.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Anyway, that’s a priori, just aren’t that many things are that bad and it seems like people would try and make AI that’s trying to do what they want. Then you’re like, okay, we get to be pessimistic because of some other argument about like, well, we don’t currently know how to build an AI which will do what we want. We’re like, there’s some extrapolation of current techniques on which we’re concerned that we wouldn’t be able to. Or maybe some more conceptual or intuitive argument about why AI is a scary kind thing, and AIs tend to want to do random shit.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Then like, I don’t know, now we get into, how strong is that argument for doominess? Then a major thing that drives it is I am like, reasonable chance there is no problem in fact. Reasonable chance, if there is a problem we can cope with it just by trying. Reasonable chance, even if it will be hard to cope with, we can sort shit out well enough on paper that we really nail it and understand how to resolve it. Reasonable chance, if we don’t solve it the people will just not build AIs that destroy everything they value.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>It’s lots of saving throws, you know? And you multiply the saving throws together and things look better. And they interact better than that because– well, in one way worse because it’s correlated: If you’re incompetent, you’re more likely to fail to solve the problem and more likely to fail to coordinate not to destroy the world. In some other sense, it’s better than interacting multiplicatively because weakness in one area compensates for strength in the other. I think there are a bunch of saving throws that could independently make things good, but then in reality you have to have a little bit here and a little bit here and a little bit here, if that makes sense. We have some reasonable understanding on paper that makes the problem easier. The problem wasn’t that bad. We wing it reasonably well and we do a bunch of work and in fact people are just like, ‘Okay, we’re not going to destroy the world given the choice.’ I guess I have this somewhat distinctive last saving throw where I’m like, ‘Even if you have unaligned AI, it’s probably not that bad.’<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>That doesn’t do much of the work, but you know you add a bunch of shit like that together.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> That’s a lot of probability mass on a lot of different things. I do feel like my impression is that, on the first step of whether by default things are likely to be okay or things are likely to be good, people make arguments of the form, ‘You have a thing with a goal and it’s so hard to specify. By default, you should assume that the space of possible goals to specify is big, and the one right goal is hard to specify, hard to find.’ Obviously, this is modeling the thing as an agent, which is already an assumption.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Paul Christiano:</strong> Yeah. I mean it’s hard to run or have much confidence in arguments of that form. I think it’s possible to run tight versions of that argument that are suggestive. It’s hard to have much confidence in part because you’re like, look, the space of all programs is very broad, and the space that do your taxes is quite small, and we in fact are doing a lot of selecting from the vast space of programs to find one that does your taxes– so like, you’ve already done a lot of that.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>And then you have to be getting into more detailed arguments about exactly how hard is it to select. I think there’s two kinds of arguments you can make that are different, or which I separate. One is the inner alignment treacherous turney argument, where like, we can’t tell the difference between AIs that are doing the right and wrong thing, even if you know what’s right because blah blah blah. The other is well, you don’t have this test for ‘was it right’ and so you can’t be selecting for ‘does the right thing’.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>This is a place where the concern is disjunctive, you have like two different things, they’re both sitting in your alignment problem. They can again interact badly. But like, I don’t know, I don’t think you’re going to get to high probabilities from this. I think I would kind of be at like, well I don’t know. Maybe I think it’s more likely than not that there’s a real problem but not like 90%, you know? Like maybe I’m like two to one that there exists a non-trivial problem or something like that. All of the numbers I’m going to give are very made up though. If you asked me a second time you’ll get all different numbers.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> That’s good to know.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Paul Christiano:</strong> Sometimes I anchor on past things I’ve said though, unfortunately.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> Okay. Maybe I should give you some fake past Paul numbers.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Paul Christiano:</strong> You could be like, ‘In that interview, you said that it was 85%’. I’d be like, ‘I think it’s really probably 82%’.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> I guess a related question is, is there plausible concrete evidence that you think could be gotten that would update you in one direction or the other significantly?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Paul Christiano:</strong> Yeah. I mean certainly, evidence will roll in once we have more powerful AI systems.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>One can learn… I don’t know very much about any of the relevant institutions, I may know a little bit. So you can imagine easily learning a bunch about them by observing how well they solve analogous problems or learning about their structure, or just learning better about the views of people. That’s the second category.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>We’re going to learn a bunch of shit as we continue thinking about this problem on paper to see like, does it look like we’re going to solve it or not? That kind of thing. It seems like there’s lots of sorts of evidence on lots of fronts, my views are shifting all over the place. That said, the inconsistency between one day and the next is relatively large compared to the actual changes in views from one day to the next.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> Could you say a little bit more about evidence from once more advanced AI starts coming in? Like what sort things you’re looking for that would change your mind on things?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Paul Christiano:</strong> Well you get to see things like, on inner alignment you get to see to what extent do you have the kind of crazy shit that people are concerned about? The first time you observe some crazy shit where your AI is like, ‘I’m going to be nice in order to assure that you think I’m nice so I can stab you in the back later.’ You’re like, ‘Well, I guess that really does happen despite modest effort to prevent it.’ That’s a thing you get. You get to learn in general about how models generalize, like to what extent they tend to do– this is sort of similar to what I just said, but maybe a little bit broader– to what extent are they doing crazy-ish stuff as they generalize?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>You get to learn about how reasonable simple oversight is and to what extent do ML systems acquire knowledge that simple overseers don’t have that then get exploited as they optimize in order to produce outcomes that are actually bad. I don’t have a really concise description, but sort of like, to the extent that all these arguments depend on some empirical claims about AI, you get to see those claims tested increasingly.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Ronny Fernandez:</strong> So the impression I get from talking to other people who know you, and from reading some of your blog posts, but mostly from others, is that you’re somewhat more optimistic than most people that work in AI alignment. It seems like some people who work on AI alignment think something like, ‘We’ve got to solve some really big problems that we don’t understand at all or there are a bunch of unknown unknowns that we need to figure out.’ Maybe that’s because they have a broader conception of what solving AI alignment is like than you do?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Paul Christiano:</strong> That seems like it’s likely to be part of it. It does seem like I’m more optimistic than people in general, than people who work in alignment in general. I don’t really know… I don’t understand others’ views that well and I don’t know if they’re that– like, my views aren’t that internally coherent. My suspicion is others’ views are even less internally coherent. Yeah, a lot of it is going to be done by having a narrower conception of the problem.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Then a lot of it is going to be done by me just being… in terms of do we need a lot of work to be done, a lot of it is going to be me being like, I don’t know man, maybe. I don’t really understand when people get off the like high probability of like, yeah. I don’t see the arguments that are like, definitely there’s a lot of crazy stuff to go down. It seems like we really just don’t know. I do also think problems tend to be easier. I have more of that prior, especially for problems that make sense on paper. I think they tend to either be kind of easy, or else– if they’re possible, they tend to be kind of easy. There aren’t that many really hard theorems.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> Can you say a little bit more of what you mean by that? That’s not a very good follow-up question, I don’t really know what it would take for me to understand what you mean by that better. <br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Paul Christiano:</strong> Like most of the time, if I’m like, ‘here’s an algorithms problem’, you can like– if you just generate some random algorithms problems, a lot of them are going to be impossible. Then amongst the ones that are possible, a lot of them are going to be soluble in a year of effort and amongst the rest, a lot of them are going to be soluble in 10 or a hundred years of effort. It’s just kind of rare that you find a problem that’s soluble– by soluble, I don’t just mean soluble by human civilization, I mean like, they are not provably impossible– that takes a huge amount of effort.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>It normally… it’s less likely to happen the cleaner the problem is. There just aren’t many very clean algorithmic problems where our society worked on it for 10 years and then we’re like, ‘Oh geez, this still seems really hard.’ Examples are kind of like… factoring is an example of a problem we’ve worked a really long time on. It kind of has the shape, and this is the tendency on these sorts of problems, where there’s just a whole bunch of solutions and we hack away and we’re a bit better and a bit better and a bit better. It’s a very messy landscape, rather than jumping from having no solution to having a solution. It’s even rarer to have things where going from no solution to some solution is really possible but incredibly hard. There were some examples.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> And you think that the problems we face are sufficiently similar?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Paul Christiano:</strong> I mean, I think this is going more into the like, ‘I don’t know man’ but my what do I think when I say I don’t know man isn’t like, ‘Therefore, there’s an 80% chance that it’s going to be an incredibly difficult problem’ because that’s not what my prior is like. I’m like, reasonable chance it’s not that hard. Some chance it’s really hard. Probably more chance that– if it’s really hard, I think it’s more likely to be because all the clean statements of the problem are impossible. I think as statements get messier it becomes more plausible that it just takes a lot of effort. The more messy a thing is, the less likely it is to be impossible sometimes, but also the more likely it’s just a bunch of stuff you have to do.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Ronny Fernandez:</strong> It seems like one disagreement that you have with MIRI folks is that you think prosaic AGI will be easier to align than they do. Does that perception seem right to you?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Paul Christiano:</strong> I think so. I think they’re probably just like, ‘that seems probably impossible’. Was related to the previous point.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Ronny Fernandez:</strong> If you had found out that prosaic AGI is nearly impossible to align or is impossible to align, how much would that change your-<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Paul Christiano:</strong> It depends exactly what you found out, exactly how you found it out, et cetera. One thing you could be told is that there’s no perfectly scalable mechanism where you can throw in your arbitrarily sophisticated AI and turn the crank and get out an arbitrarily sophisticated aligned AI. That’s a possible outcome. That’s not necessarily that damning because now you’re like okay, fine, you can almost do it basically all the time and whatever.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>That’s a big class of worlds and that would definitely be a thing I would be interested in understanding– how large is that gap actually, if the nice problem was totally impossible? If at the other extreme you just told me, ‘Actually, nothing like this is at all going to work, and it’s definitely going to kill everyone if you build an AI using anything like an extrapolation of existing techniques’, then I’m like, ‘Sounds pretty bad.’ I’m still not as pessimistic as MIRI people.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I’m like, maybe people just won’t destroy the world, you know, it’s hard to say. It’s hard to say what they’ll do. It also depends on the nature of how you came to know this thing. If you came to know it in a way that’s convincing to a reasonably broad group of people, that’s better than if you came to know it and your epistemic state was similar to– I think MIRI people feel more like, it’s already known to be hard, and therefore you can tell if you can’t convince people it’s hard. Whereas I’m like, I’m not yet convinced it’s hard, so I’m not so surprised that you can’t convince people it’s hard.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Then there’s more probability, if it was known to be hard, that we can convince people, and therefore I’m optimistic about outcomes conditioned on knowing it to be hard. I might become almost as pessimistic as MIRI if I thought that the problem was insolubly hard, just going to take forever or whatever, huge gaps aligning prosaic AI, and there would be no better evidence of that than currently exists. Like there’s no way to explain it better to people than MIRI currently can. If you take those two things, I’m maybe getting closer to MIRI’s levels of doom probability. I might still not be quite as doomy as them.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Ronny Fernandez:</strong> Why does the ability to explain it matter so much?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Paul Christiano:</strong> Well, a big part of why you don’t expect people to build unaligned AI is they’re like, they don’t want to. The clearer it is and the stronger the case, the more people can potentially do something. In particular, you might get into a regime where you’re doing a bunch of shit by trial and error and trying to wing it. And if you have some really good argument that the winging it is not going to work, then that’s a very different state than if you’re like, ‘Well, winging it doesn’t seem that good. Maybe it’ll fail.’ It’s different to be like, ‘Oh no, here’s an argument. You just can’t… It’s just not going to work.’<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I don’t think we’ll really be in that state, but there’s like a whole spectrum from where we’re at now to that state and I expect to be further along it, if in fact we’re doomed. For example, if I personally would be like, ‘Well, I at least tried the thing that seemed obvious to me to try and now we know that doesn’t work.’ I sort of expect very directly from trying that to learn something about why that failed and what parts of the problem seem difficult.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Ronny Fernandez:</strong> Do you have a sense of why MIRI thinks aligning prosaic AI is so hard?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Paul Christiano:</strong> We haven’t gotten a huge amount of traction on this when we’ve debated it. I think part of their position, especially on the winging it thing, is they’re like – Man, doing things right generally seems a lot harder than doing them. I guess probably building an AI will be harder in a way that’s good, for some arbitrary notion of good– a lot harder than just building an AI at all.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>There’s a theme that comes up frequently trying to hash this out, and it’s not so much about a theoretical argument, it’s just like, look, the theoretical argument establishes that there’s something a little bit hard here. And once you have something a little bit hard and now you have some giant organization, people doing the random shit they’re going to do, and all that chaos, and like, getting things to work takes all these steps, and getting this harder thing to work is going to have some extra steps, and everyone’s going to be doing it. They’re more pessimistic based on those kinds of arguments.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>That’s the thing that comes up a lot. I think probably most of the disagreement is still in the, you know, theoretically, how much– certainly we disagree about like, can this problem just be solved on paper in advance? Where I’m like, reasonable chance, you know? At least a third chance, they’ll just on paper be like, ‘We have nailed it.’ There’s really no tension, no additional engineering effort required. And they’re like, that’s like zero. I don’t know what they think it is. More than zero, but low.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Ronny Fernandez:</strong> Do you guys think you’re talking about the same problem exactly?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Paul Christiano:</strong> I think there we are probably. At that step we are. Just like, is your AI trying to destroy everything? Yes. No. The main place there’s some bleed over–  the main thing that MIRI maybe considers in scope and I don’t is like, if you build an AI, it may someday have to build another AI. And what if the AI it builds wants to destroy everything? Is that our fault or is that the AI’s fault? And I’m more on like, that’s the AI’s fault. That’s not my job. MIRI’s maybe more like not distinguishing those super cleanly, but they would say that’s their job. The distinction is a little bit subtle in general, but-<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Ronny Fernandez:</strong> I guess I’m not sure why you cashed out in terms of fault.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Paul Christiano:</strong> I think for me it’s mostly like: there’s a problem we can hope to resolve. I think there’s two big things. One is like, suppose you don’t resolve that problem. How likely is it that someone else will solve it? Saying it’s someone else’s fault is in part just saying like, ‘Look, there’s this other person who had a reasonable opportunity to solve it and it was a lot smarter than us.’ So the work we do is less likely to make the difference between it being soluble or not. Because there’s this other smarter person.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>And then the other thing is like, what should you be aiming for? To the extent there’s a clean problem here which one could hope to solve, or one should bite off as a chunk, what fits in conceptually the same problem versus what’s like– you know, an analogy I sometimes make is, if you build an AI that’s doing important stuff, it might mess up in all sorts of ways. But when you’re asking, ‘Is my AI going to mess up when building a nuclear reactor?’ It’s a thing worth reasoning about as an AI person, but also like it’s worth splitting into like– part of that’s an AI problem, and part of that’s a problem about understanding managing nuclear waste. Part of that should be done by people reasoning about nuclear waste and part of it should be done by people reasoning about AI.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>This is a little subtle because both of the problems have to do with AI. I would say my relationship with that is similar to like, suppose you told me that some future point, some smart people might make an AI. There’s just a meta and object level on which you could hope to help with the problem.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I’m hoping to help with the problem on the object level in the sense that we are going to do research which helps people align AI, and in particular, will help the future AI align the next AI. Because it’s like people. It’s at that level, rather than being like, ‘We’re going to construct a constitution of that AI such that when it builds future AI it will always definitely work’. This is related to like– there’s this old argument about recursive self-improvement. It’s historically figured a lot in people’s discussion of why the problem is hard, but on a naive perspective it’s not obvious why it should, because you do only a small number of large modifications before your systems are sufficiently intelligent relative to you that it seems like your work should be obsolete. Plus like, them having a bunch of detailed knowledge on the ground about what’s going down.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>It seems unclear to me how– yeah, this is related to our disagreement– how much you’re happy just deferring to the future people and being like, ‘Hope that they’ll cope’. Maybe they won’t even cope by solving the problem in the same way, they might cope by, the crazy AIs that we built reach the kind of agreement that allows them to not build even crazier AIs in the same way that we might do that. I think there’s some general frame of, I’m just taking responsibility for less, and more saying, can we leave the future people in a situation that is roughly as good as our situation? And by future people, I mean mostly AIs.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Ronny Fernandez:</strong> Right. The two things that you think might explain your relative optimism are something like: Maybe we can get the problem to smarter agents that are humans. Maybe we can leave the problem to smarter agents that are not humans.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Paul Christiano:</strong> Also a lot of disagreement about the problem. Those are certainly two drivers. They’re not exhaustive in the sense that there’s also a huge amount of disagreement about like, ‘How hard is this problem?’ Which is some combination of like, ‘How much do we know about it?’ Where they’re more like, ‘Yeah, we’ve thought about it a bunch and have some views.’ And I’m like, ‘I don’t know, I don’t think I really know shit.’ Then part of it is concretely there’s a bunch of– on the object level, there’s a bunch of arguments about why it would be hard or easy so we don’t reach agreement. We consistently disagree on lots of those points.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Ronny Fernandez:</strong> Do you think the goal state for you guys is the same though? If I gave you guys a bunch of AGIs, would you guys agree about which ones are aligned and which ones are not? If you could know all of their behaviors?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Paul Christiano:</strong> I think at that level we’d probably agree. We don’t agree more broadly about what constitutes a win state or something. They have this more expansive conception– or I guess it’s narrower– that the win state is supposed to do more. They are imagining more that you’ve resolved this whole list of future challenges. I’m more not counting that.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>We’ve had this… yeah, I guess I now mostly use intent alignment to refer to this problem where there’s risk of ambiguity… the problem that I used to call AI alignment. There was a long obnoxious back and forth about what the alignment problem should be called. MIRI does use aligned AI to be like, ‘an AI that produces good outcomes when you run it’. Which I really object to as a definition of aligned AI a lot. So if they’re using that as their definition of aligned AI, we would probably disagree.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Ronny Fernandez:</strong> Shifting terms or whatever… one thing that they’re trying to work on is making an AGI that has a property that is also the property you’re trying to make sure that AGI has.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Paul Christiano:</strong> Yeah, we’re all trying to build an AI that’s trying to do the right thing.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Ronny Fernandez:</strong> I guess I’m thinking more specifically, for instance, I’ve heard people at MIRI say something like, they want to build an AGI that I can tell it, ‘Hey, figure out how to copy a strawberry, and don’t mess anything else up too badly.’ Does that seem like the same problem that you’re working on?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Paul Christiano:</strong> I mean it seems like in particular, you should be able to do that. I think it’s not clear whether that captures all the complexity of the problem. That’s just sort of a question about what solutions end up looking like, whether that turns out to have the same difficulty. <br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>The other things you might think are involved that are difficult are… well, I guess one problem is just how you capture competitiveness. Competitiveness for me is a key desideratum. And it’s maybe easy to elide in that setting, because it just makes a strawberry. Whereas I am like, if you make a strawberry literally as well as anyone else can make a strawberry, it’s just a little weird to talk about. And it’s a little weird to even formalize what competitiveness means in that setting. I think you probably can, but whether or not you do that’s not the most natural or salient aspect of the situation. <br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>So I probably disagree with them about– I’m like, there are probably lots of ways to have agents that make strawberries and are very smart. That’s just another disagreement that’s another function of the same basic, ’How hard is the problem’ disagreement. I would guess relative to me, in part because of being more pessimistic about the problem, MIRI is more willing to settle for an AI that does one thing. And I care more about competitiveness.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> Say you just learn that prosaic AI is just not going to be the way we get to AGI. How does that make you feel about the IDA approach versus the MIRI approach?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Paul Christiano:</strong> So my overall stance when I think about alignment is, there’s a bunch of possible algorithms that you could use. And the game is understanding how to align those algorithms. And it’s kind of a different game. There’s a lot of common subproblems in between different algorithms you might want to align, it’s potentially a different game for different algorithms. That’s an important part of the answer. I’m mostly focusing on the ‘align this particular’– I’ll call it learning, but it’s a little bit more specific than learning– where you search over policies to find a policy that works well in practice. If we’re not doing that, then maybe that solution is totally useless, maybe it has common subproblems with the solution you actually need. That’s one part of the answer.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Another big difference is going to be, timelines views will shift a lot if you’re handed that information. So it will depend exactly on the nature of the update. I don’t have a strong view about whether it makes my timelines shorter or longer overall. Maybe you should bracket that though.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>In terms of returning to the first one of trying to align particular algorithms, I don’t know. I think I probably share some of the MIRI persp– well, no. It feels to me like there’s a lot of common subproblems. Aligning expert systems seems like it would involve a lot of the same reasoning as aligning learners. To the extent that’s true, probably future stuff also will involve a lot of the same subproblems, but I doubt the algorithm will look the same. I also doubt the actual algorithm will look anything like a particular pseudocode we might write down for iterated amplification now.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> Does iterated amplification in your mind rely on this thing that searches through policies for the best policy? The way I understand it, it doesn’t feel like it necessarily does.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Paul Christiano:</strong> So, you use this distillation step. And the reason you want to do amplification, or this short-hop, expensive amplification, is because you interleave it with this distillation step. And I normally imagine the distillation step as being, learn a thing which works well in practice on a reward function defined by the overseer. You could imagine other things that also needed to have this framework, but it’s not obvious whether you need this step if you didn’t somehow get granted something like the–<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> That you could do the distillation step somehow.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Paul Christiano:</strong> Yeah. It’s unclear what else would– so another example of a thing that could fit in, and this maybe makes it seem more general, is if you had an agent that was just incentivized to make lots of money. Then you could just have your distillation step be like, ‘I randomly check the work of this person, and compensate them based on the work I checked’. That’s a suggestion of how this framework could end up being more general.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>But I mostly do think about it in the context of learning in particular. I think it’s relatively likely to change if you’re not in that setting. Well, I don’t know. I don’t have a strong view. I’m mostly just working in that setting, mostly because it seems reasonably likely, seems reasonably likely to have a bunch in common, learning is reasonably likely to appear even if other techniques appear. That is, learning is likely to play a part in powerful AI even if other techniques also play a part.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> Are there other people or resources that you think would be good for us to look at if we were looking at the optimism view?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Paul Christiano:</strong> Before we get to resources or people, I think one of the basic questions is, there’s this perspective which is fairly common in ML, which is like, ‘We’re kind of just going to do a bunch of stuff, and it’ll probably work out’. That’s probably the basic thing to be getting at. How right is that?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>This is the bad view of safety conditioned on– I feel like prosaic AI is in some sense the worst– seems like about as bad as things would have gotten in terms of alignment. Where, I don’t know, you try a bunch of shit, just a ton of stuff, a ton of trial and error seems pretty bad. Anyway, this is a random aside maybe more related to the previous point. But yeah, this is just with alignment. There’s this view in ML that’s relatively common that’s like, we’ll try a bunch of stuff to get the AI to do what we want, it’ll probably work out. Some problems will come up. We’ll probably solve them. I think that’s probably the most important thing in the optimism vs pessimism side.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>And I don’t know, I mean this has been a project that like, it’s a hard project. I think the current state of affairs is like, the MIRI folk have strong intuitions about things being hard. Essentially no one in… very few people in ML agree with those, or even understand where they’re coming from. And even people in the EA community who have tried a bunch to understand where they’re coming from mostly don’t. Mostly people either end up understanding one side or the other and don’t really feel like they’re able to connect everything. So it’s an intimidating project in that sense. I think the MIRI people are the main proponents of the everything is doomed, the people to talk to on that side. And then in some sense there’s a lot of people on the other side who you can talk to, and the question is just, who can articulate the view most clearly? Or who has most engaged with the MIRI view such that they can speak to it?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Ronny Fernandez:</strong> Those are people I would be particularly interested in. If there are people that understand all the MIRI arguments but still have broadly the perspective you’re describing, like some problems will come up, probably we’ll fix them.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Paul Christiano:</strong> I don’t know good– I don’t have good examples of people for you. I think most people just find the MIRI view kind of incomprehensible, or like, it’s a really complicated thing, even if the MIRI view makes sense in its face. I don’t think people have gotten enough into the weeds. It really rests a lot right now on this fairly complicated cluster of intuitions. I guess on the object level, I think I’ve just engaged a lot more with the MIRI view than most people who are– who mostly take the ‘everything will be okay’ perspective. So happy to talk on the object level, and speaking more to arguments. I think it’s a hard thing to get into, but it’s going to be even harder to find other people in ML who have engaged with the view that much.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>They might be able to make other general criticisms of like, here’s why I haven’t really… like it doesn’t seem like a promising kind of view to think about. I think you could find more people who have engaged at that level. I don’t know who I would recommend exactly, but I could think about it. Probably a big question will be who is excited to talk to you about it.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> I am curious about your response to MIRI’s object level arguments. Is there a place that exists somewhere?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Paul Christiano:</strong> There’s some back and forth on the internet. I don’t know if it’s great. There’s some LessWrong posts. Eliezer for example wrote <a href="https://www.lesswrong.com/posts/S7csET9CgBtpi7sCh/challenges-to-christiano-s-capability-amplification-proposal">this post</a> about why things were doomed, why I in particular was doomed. I don’t know if you read that post.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> I can also ask you about it now, I just don’t want to take too much of your time if it’s a huge body of things.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Paul Christiano:</strong> The basic argument would be like, 1) On paper I don’t think we yet have a good reason to feel doomy. And I think there’s some basic research intuition about how much a problem– suppose you poke at a problem a few times, and you’re like ‘Agh, seems hard to make progress’. How much do you infer that the problem’s really hard? And I’m like, not much. As a person who’s poked at a bunch of problems, let me tell you, that often doesn’t work and then you solve in like 10 years of effort.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>So that’s one thing. That’s a point where I have relatively little sympathy for the MIRI way. That’s one set of arguments: is there a good way to get traction on this problem? Are there clever algorithms? I’m like, I don’t know, I don’t feel like the kind of evidence we’ve seen is the kind of evidence that should be persuasive. As some evidence in that direction, I’d be like, I have not been thinking about this that long. I feel like there have often been things that felt like, or that MIRI would have defended as like, here’s a hard obstruction. Then you think about it and you’re actually like, ‘Here are some things you can do.’ And it may still be a obstruction, but it’s no longer quite so obvious where it is, and there were avenues of attack.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>That’s one thing. The second thing is like, a metaphor that makes me feel good– MIRI talks a lot about the evolution analogy. If I imagine the evolution problem– so if I’m a person, and I’m breeding some animals, I’m breeding some superintelligence. Suppose I wanted to breed an animal modestly smarter than humans that is really docile and friendly. I’m like, I don’t know man, that seems like it might work. That’s where I’m at. I think they are… it’s been a little bit hard to track down this disagreement, and I think this is maybe in a fresher, rawer state than the other stuff, where we haven’t had enough back and forth.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>But I’m like, it doesn’t sound necessarily that hard. I just don’t know. I think their position, their position when they’ve written something has been a little bit more like, ‘But you couldn’t breed a thing, that after undergoing radical changes in intelligence or situation would remain friendly’. But then I’m normally like, but it’s not clear why that’s needed? I would really just like to create something slightly superhuman, and it’s going to work with me to breed something that’s slightly smarter still that is friendly.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>We haven’t really been able to get traction on that. I think they have an intuition that maybe there’s some kind of invariance and things become gradually more unraveled as you go on. Whereas I have more intuition that it’s plausible. After this generation, there’s just smarter and smarter people thinking about how to keep everything on the rails. It’s very hard to know.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>That’s the second thing. I have found that really… that feels like it gets to the heart of some intuitions that are very different, and I don’t understand what’s up there. There’s a third category which is like, on the object level, there’s a lot of directions that I’m enthusiastic about where they’re like, ‘That seems obviously doomed’. So you could divide those up into the two problems. There’s the family of problems that are more like the inner alignment problem, and then outer alignment stuff.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>On the inner alignment stuff, I haven’t thought that much about it, but examples of things that I’m optimistic about that they’re super pessimistic about are like, stuff that looks more like verification, or maybe stepping back even for that, there’s this basic paradigm of adversarial training, where I’m like, it seems close to working. And you could imagine it being like, it’s just a research problem to fill in the gaps. Whereas they’re like, that’s so not the kind of thing that would work. I don’t really know where we’re at with that. I do see there are formal obstructions to adversarial training in particular working. I’m like, I see why this is not yet a solution. For example, you can have this case where there’s a predicate that the model checks, and it’s easy to check but hard to construct examples. And then in your adversarial training you can’t ever feed an example where it’ll fail. So we get into like, is it plausible that you can handle that problem with either 1) Doing something more like verification, where you say, you ask them not to perform well on real inputs but on pseudo inputs. Or like, you ask the attacker just to show how it’s conceivable that the model could do a bad thing in some sense.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>That’s one possible approach, where the other would be something more like interpretability, where you say like, ‘Here’s what the model is doing. In addition to it’s behavior we get this other signal that the paper was depending on this fact, its predicate paths, which it shouldn’t have been dependent on.’ The question is, can either of those yield good behavior? I’m like, I don’t know, man. It seems plausible. And they’re like ‘Definitely not.’ And I’m like, ‘Why definitely not?’ And they’re like ‘Well, that’s not getting at the real essence of the problem.’ And I’m like ‘Okay, great, but how did you substantiate this notion of the real essence of the problem? Where is that coming from? Is that coming from a whole bunch of other solutions that look plausible that failed?’ And their take is kind of like, yes, and I’m like, ‘But none of those– there weren’t actually even any candidate solutions there really that failed yet. You’ve got maybe one thing, or like, you showed there exists a problem in some minimal sense.’ This comes back to the first of the three things I listed. But it’s a little bit different in that I think you can just stare at particular things and they’ll be like, ‘Here’s how that particular thing is going to fail.’ And I’m like ‘I don’t know, it seems plausible.’<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>That’s on inner alignment. And there’s maybe some on outer alignment. I feel like they’ve given a lot of ground in the last four years on how doomy things seem on outer alignment. I think they still have some– if we’re talking about amplification, I think the position would still be, ‘Man, why would that agent be aligned? It doesn’t at all seem like it would be aligned.’ That has also been a little bit surprisingly tricky to make progress on. I think it’s similar, where I’m like, yeah, I grant the existence of some problem or some thing which needs to be established, but I don’t grant– I think their position would be like, this hasn’t made progress or just like, pushed around the core difficulty. I’m like, I don’t grant the conception of the core difficulty in which this has just pushed around the core difficulty. I think that… substantially in that kind of thing, being like, here’s an approach that seems plausible, we don’t have a clear obstruction but I think that it is doomed for these deep reasons. I have maybe a higher bar for what kind of support the deep reasons need.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I also just think on the merits, they have not really engaged with– and this is partly my responsibility for not having articulated the arguments in a clear enough way– although I think they have not engaged with even the clearest articulation as of two years ago of what the hope was. But that’s probably on me for not having an even clearer articulation than that, and also definitely not up to them to engage with anything. To the extent it’s a moving target, not up to them to engage with the most recent version. Where, most recent version– the proposal doesn’t really change that much, or like, the case for optimism has changed a little bit. But it’s mostly just like, the state of argument concerning it, rather than the version of the scheme.</p>
+ </HTML>
+ 
+

Conversation with Robin Hanson

2022-09-21T07:37:41+00:00

@@ -1 +1,1277 @@
+ ====== Conversation with Robin Hanson ======
+ 
+ // Published 13 November, 2019; last updated 20 November, 2019 //
+ 
+ <HTML>
+ <p>AI Impacts talked to economist Robin Hanson about his views on AI risk and timelines. With his permission, we have posted and transcribed this interview.</p>
+ </HTML>
+ 
+ 
+ 
+ ===== Participants =====
+ 
+ 
+ <HTML>
+ <ul>
+ <li><div class="li">
+ <a href="http://mason.gmu.edu/~rhanson/home.html">Robin Hanson</a> — Associate Professor of Economics, George Mason University
+                 </div></li>
+ <li><div class="li">Asya Bergal – AI Impacts</div></li>
+ <li><div class="li">
+ <a href="http://robertlong.online/">Robert Long</a> – AI Impacts
+                 </div></li>
+ </ul>
+ </HTML>
+ 
+ 
+ ===== Summary =====
+ 
+ 
+ <HTML>
+ <p>We spoke with Robin Hanson on September 5, 2019. Here is a brief summary of that conversation:<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <ul>
+ <li><div class="li">Hanson thinks that now is the wrong time to put a lot of effort into addressing AI risk:
+                   <ul>
+ <li><div class="li">We will know more about the problem later, and there’s an opportunity cost to spending resources now vs later, so there has to be a compelling reason to spend resources now instead.</div></li>
+ <li><div class="li">Hanson is not compelled by existing arguments he’s heard that would argue that we need to spend resources now:
+                       <ul>
+ <li><div class="li">Hanson famously disagrees with the theory that <a href="http://intelligence.org/files/AIFoomDebate.pdf">AI will appear very quickly and in a very concentrated way</a>, which would suggest that we need to spend resources now because won’t have time to prepare.
+                         </div></li>
+ <li><div class="li">Hanson views the AI risk problem as essentially continuous with existing principal agent problems, and <a href="http://www.overcomingbias.com/2019/04/agency-failure-ai-apocalypse.html">disagrees that the key difference</a>—the agents being smarter—should clearly worsen such problems.
+                         </div></li>
+ </ul>
+ </div></li>
+ <li><div class="li">Hanson thinks that we will see concrete signatures of problems before it’s too late– he is skeptical that there are big things that have to be coordinated ahead of time.
+                       <ul>
+ <li><div class="li">Relatedly, he thinks useful work anticipating problems in advance usually happens with concrete designs, not with abstract descriptions of systems. </div></li>
+ </ul>
+ </div></li>
+ <li><div class="li">Hanson thinks we are still too far away from AI for field-building to be useful.</div></li>
+ </ul>
+ </div></li>
+ <li><div class="li">Hanson thinks AI is probably at least a century, perhaps multiple centuries away:
+                   <ul>
+ <li><div class="li">Hanson thinks the mean estimate for human-level AI arriving is long, and he thinks AI is unlikely to be ‘lumpy’ enough to happen without much warning :
+                       <ul>
+ <li><div class="li">Hanson is interested in how ‘lumpy’ progress in AI is likely to be: whether progress is likely to come in large chunks or in a slower and steadier stream.
+                           <ul>
+ <li><div class="li">Measured in terms of how much a given paper is cited, academic progress is not lumpy in any field.</div></li>
+ <li><div class="li">The literature on innovation suggests that innovation is not lumpy: most innovation is lots of little things, though once in a while there are a few bigger things.</div></li>
+ </ul>
+ </div></li>
+ </ul>
+ </div></li>
+ <li><div class="li">From an outside view perspective, the current AI boom does not seem different from previous AI booms.</div></li>
+ <li><div class="li">We don’t have a good sense of how much research needs to be done to get to human-level AI.</div></li>
+ <li><div class="li">If we don’t expect progress to be particularly lumpy, and we don’t have a good sense of exactly how close we are, we have good reason to think we are not  e.g. five-years away rather than halfway.</div></li>
+ <li><div class="li">Hanson thinks we shouldn’t believe it when AI researchers give 50-year timescales:
+                       <ul>
+ <li><div class="li">Rephrasing the question in different ways, e.g. “When will most people lose their jobs?” causes people to give different timescales.</div></li>
+ <li><div class="li">People consistently give overconfident estimates when they’re estimating things that are <a href="https://www.overcomingbias.com/2010/06/near-far-summary.html">abstract and far away</a>.
+                         </div></li>
+ </ul>
+ </div></li>
+ </ul>
+ </div></li>
+ <li><div class="li">Hanson thinks AI risk takes up far too large a fraction of people thinking seriously about the future.
+                   <ul>
+ <li><div class="li">Hanson thinks more futurists should be exploring other future scenarios, roughly proportionally to how likely they are with some kicker for extremity of consequences.</div></li>
+ <li><div class="li">Hanson doesn’t think that AI is that much worse than other future scenarios in terms of how much future value is likely to be destroyed.</div></li>
+ </ul>
+ </div></li>
+ <li><div class="li">Hanson thinks the key to intelligence is having many not-fully-general tools:
+                   <ul>
+ <li><div class="li">Most of the value in tools is in more specific tools, and we shouldn’t expect intelligence innovation to be different.</div></li>
+ <li><div class="li">Academic fields are often simplified to simple essences, but real-life things like biological organisms and the industrial world progress via lots of little things, and we should expect intelligence to be more similar to the latter examples.</div></li>
+ </ul>
+ </div></li>
+ <li><div class="li">Hanson says the literature on human uniqueness suggests cultural evolution and language abilities came from several modest brain improvements, not clear differences in brain architecture.</div></li>
+ <li><div class="li">Hanson worries that having so many people publicly worrying about AI risk before it is an acute problem will mean it is taken less seriously when it is, because the public will have learned to think of such concerns as erroneous fear mongering. </div></li>
+ <li><div class="li">Hanson would be interested in seeing more work on the following things:
+                   <ul>
+ <li><div class="li">Seeing examples of big, lumpy innovations that made a big difference to the performance of a system. This could change Hanson’s view of intelligence.
+                       <ul>
+ <li><div class="li">In particular, he’d be influenced by evidence for important architectural differences in the brains of humans vs. primates.</div></li>
+ </ul>
+ </div></li>
+ <li><div class="li">Tracking of the automation of U.S. jobs over time as a potential proxy for AI progress.</div></li>
+ </ul>
+ </div></li>
+ <li><div class="li">Hanson thinks there’s a lack of engagement with critics from people concerned about AI risk.
+                   <ul>
+ <li><div class="li">Hanson is interested in seeing concrete outside-view models people have for why AI might be soon.</div></li>
+ <li><div class="li">Hanson is interested in proponents of AI risk responding to the following questions:
+                       <ul>
+ <li><div class="li">Setting aside everything you know except what this looks like from the outside, would you predict AGI happening soon?</div></li>
+ <li><div class="li">Should reasoning around AI risk arguments be compelling to outsiders outside of AI?</div></li>
+ <li><div class="li">What percentage of people who agree with you that AI risk is big, agree for the same reasons that you do?</div></li>
+ </ul>
+ </div></li>
+ </ul>
+ </div></li>
+ <li><div class="li">Hanson thinks even if we tried, we wouldn’t now be able to solve all the small messy problems that insects can solve, indicating that it’s not sufficient to have insect-level amounts of hardware.
+                   <ul>
+ <li><div class="li">Hanson thinks that AI researchers might argue that we can solve the core functionalities of insects, but Hanson thinks that their intelligence is largely in being able to do many small things in complicated environments, robustly.</div></li>
+ </ul>
+ </div></li>
+ </ul>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Small sections of the original audio recording have been removed. The corresponding transcript has been lightly edited for concision and clarity.</p>
+ </HTML>
+ 
+ 
+ ===== Audio =====
+ 
+ 
+ 
+ 
+ ===== Transcript =====
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong>        Great. Yeah. I guess to start with, the proposition we’ve been asking people to weigh in on is whether it’s valuable for people to be expending significant effort doing work that purports to reduce the risk from advanced AI. I’d be curious for your take on that question, and maybe a brief description of your reasoning there.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong>       Well, my highest level reaction is to say whatever effort you’re putting in, probably now isn’t the right time. When is the right time is a separate question from how much effort, and in what context. AI’s going to be a big fraction of the world when it shows up, so it certainly at some point is worth a fair bit of effort to think about and deal with. It’s not like you should just completely ignore it.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>You should put a fair bit of effort into any large area of life or large area of the world, anything that’s big and has big impacts. The question is just really, should you be doing it way ahead of time before you know much about it at all, or have much concrete examples, know the– even structure or architecture, how it’s integrated in the economy, what are the terms of purchase, what are the terms of relationships.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I mean, there’s just a whole bunch of things we don’t know about. That’s one of the reasons to wait–because you’ll know more later. Another reason to wait is because of the opportunity cost of resources. If you save the resources until later, you have more to work with. Those considerations have to be weighed against some expectation of an especially early leverage, or an especially early choice point or things like that.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>For most things you expect that you should wait until they show themselves in a substantial form before you start to envision problems and deal with them. But there could be exceptions. Mostly it comes down to arguments that this is an exception.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong>        Yeah. I think we’re definitely interested in the proposition that you should put in work now as opposed to later. If you’re familiar with the arguments that this might be an exceptional case, I’d be curious for your take on those and where you disagree .<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong>       Sure. As you may know, I started involving in this conversation over a decade ago with my co-blogger Eliezer Yudkowsky, and at that point, the major argument that he brought up was something we now call the Foom Argument. <br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>That argument was a very particular one, that this would appear under a certain trajectory, under a certain scenario. That was a scenario where it would happen really fast, would happen in a very concentrated place in time, and basically once it starts, it happens so fast, you can’t really do much about it after that point. So the only chance you have is before that point.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Because it’s very hard to predict when or where, you’re forced to just do stuff early, because you’re never sure when is how early. That’s a perfectly plausible argument given that scenario, if you believe that it shows up in one time and place all of a sudden, fully formed and no longer influenceable. Then you only have the shot before that moment. If you are very unsure when and where that moment would be, then you basically just have to do it now.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>But I was doubting that scenario. I was saying that that wasn’t a zero probability scenario, but I was thinking it was overestimated by him and other people in that space. I still think many people overestimate the probability of that scenario. Over time, it seems like more people have distanced themselves from that scenario, yet I haven’t heard as many substitute rationales for why we should do any of this stuff early.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I did a recent blog post responding to a <a href="https://www.alignmentforum.org/posts/HBxe6wdjxK239zajf/what-failure-looks-like">Paul Christiano post</a> and my title was <a href="https://www.overcomingbias.com/2019/04/agency-failure-ai-apocalypse.html">Agency Failure AI Apocalypse?</a>, and so at least I saw an argument there that was different from the Foom argument. It was an argument that you’d see a certain kind of agency failure with AI, and that because of that agency failure, it would just be bad.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>It wasn’t exactly an argument that we need to do effort early, though. Even that argument wasn’t per se a reason why you need to do stuff way ahead of time. But it was an argument of why the consequences might be especially bad I guess, and therefore deserving of more investment. And then I critiqued that argument in my post saying he was basically saying the agency problem, which is a standard problem in all human relationships and all organizations, is exasperated when the agent is smart.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>And because the AI is, by assumption, very smart, then it’s a very exasperated agency problem; therefore, it goes really bad. I said, “Our literature on the agency problem doesn’t say that it’s a worse problem when they’re smart.” I just denied that basic assumption, pointing to what I’ve known about the agency literature over a long time. Basically Paul in his response said, “Oh, I wasn’t saying there was an agency problem,” and then I was kind of baffled because I thought that was the whole point of his post that I was summarizing.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>In any case, he just said he was worried about wealth redistribution. Of course, any large social change has the potential to produce wealth redistribution, and so I’m still less clear why this change would be a bigger wealth distribution consequence than others, or why it would happen more suddenly, or require a more early effort. But if you guys have other particular arguments to talk about here, I’d love to hear what you think, or what you’ve heard are the best arguments aside from Foom.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong>        Yeah. I’m at risk of putting words in other people’s mouth here, because we’ve interviewed a bunch of people. I think one thing that’s come up repeatedly is-<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong>       You aren’t going to name them.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong>        Oh, I definitely won’t give a name, but-<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong>       I’ll just respond to whatever-<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> Yeah, just prefacing this, this might be a strawman of some argument. One thing people are sort of consistently excited about is– they use the term ‘field building,’ where basically the idea is: AI’s likely to be this pretty difficult problem and if we do think it’s far away, there’s still sort of meaningful work we can do in terms of setting up an AI safety field with an increasing number of people who have an increasing amount of–the assumption is useful knowledge–about the field.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Then sort of there’s another assumption that goes along with that that if we investigate problems now, even if we don’t know the exact specifics of what AGI might look like, they’re going to share some common sub problems with problems that we may encounter in the future. I don’t know if both of those would sort of count as field building in people’s lexicon.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong>       The example I would give to make it concrete is to imagine in the year 1,000, tasking people with dealing with various of our major problems in our society today. Social media addiction, nuclear war, concentration of capital and manufacturing, privacy invasions by police, I mean any major problem that you could think of in our world today, imagine tasking people in the year 1,000 with trying to deal with that problem.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Now the arguments you gave would sound kind of silly. We need to build up a field in the year 1,000 to study nuclear annihilation, or nuclear conflict, or criminal privacy rules? I mean, you only want to build up a field just before you want to use a field, right? I mean, building up a field way in advance is crazy. You still need some sort of argument that we are near enough that the timescale on which it takes to build a field will match roughly the timescale until we need the field. If it’s a factor of ten off or a thousand off, then that’s crazy.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong>        Yeah. This leads into a specific question I was going to ask about your views. You’ve written based on AI practitioners estimates of how much progress they’ve been making that an outside view calculation suggests we probably have at least a century to go, if maybe a great many centuries at the current rates of progress in AI. That was in 2012. Is that still roughly your timeline? Are there other things that go into your timelines? Basically in general what’s your current AI timeline?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong>       Obviously there’s a median estimate and a mean estimate, and then there’s a probability per-unit-time estimate, say, and obviously most everyone agrees that the median or mean could be pretty long, and that’s reasonable. So they’re focused on some, “Yes, but what’s the probability of an early surprise.”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>That isn’t directly addressed by that estimate, of course. I mean, you could turn that into a per-unit time if you just thought it was a constant per-unit time thing. That would, I think, be overly optimistic. That would give you too high an estimate I think. I have a series of blog posts, which you may have seen on lumpiness. A key idea here would be we’re getting AI progress over time, and how lumpy it is, is extremely directly relevant to these estimates.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>For example, if it was maximally lumpy, if it just shows up at one point, like the Foom scenario, then in that scenario, you kind of have to work ahead of time because you’re not sure when. There’s a substantial… if like, the mean is two centuries, but that means in every year there’s a 1-in-200 chance. There’s a half-a-percent chance next year. Half-a-percent is pretty high, I guess we better do something, because what if it happens next year?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Okay. I mean, that’s where extreme lumpiness goes. The less lumpy it is, then the more that the variance around that mean is less. It’s just going to take a long time, and it’ll take 10% less or 10% more, but it’s basically going to take that long. The key question is how lumpy is it reasonable to expect these sorts of things. I would say, “Well, let’s look at how lumpy things have been. How lumpy are most things? Even how lumpy has computer science innovation been? Or even AI innovation?”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I think those are all relevant data sets. There’s general lumpiness in everything, and lumpiness of the kinds of innovation that are closest to the kinds of innovation postulated here. I note that one of our best or most concrete measures we have of lumpiness is citations. That is, we can take for any research idea, how many citations the seminal paper produces, and we say, “How lumpy are citations?”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Interestingly, citation lumpiness seems to be field independent. Not just time independent, but field independent. Seems to be a general feature of academia, which you might have thought lumpiness would vary by field, and maybe it does in some more fundamental sense, but as it’s translated into citations, it’s field independent. And of course, it’s not that lumpy, i.e. most of the distribution of citations is papers with few citations, and the few papers that have the most citations constitute a relatively small fraction of the total citations.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>That’s what we also know for other kinds of innovation literature. The generic innovation literature says that most innovation is lots of little things, even though once in a while there are a few bigger things. For example, I remember there’s this time series of the best locomotive at any one time. You have that from 1800 or something. You can just see in speed, or energy efficiency, and you see this point—.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>It’s not an exactly smooth graph. On the other hand, it’s pretty smooth. The biggest jumps are a small fraction of the total jumpiness. A lot of technical, social innovation is, as we well understand, a few big things, matched with lots of small things. Of course, we also understand that big ideas, big fundamental insights, usually require lots of complementary, matching, small insights to make it work.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>That’s part of why this trajectory happens this way. That smooths out and makes more effectively less lumpy the overall pace of progress in most areas. It seems to me that the most reasonable default assumption is to assume future AI progress looks like past computer science progress and even past technical progress in other areas. I mean, the most concrete example is AI progress.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I’ve observed that we’ve had these repeated booms of AI concern and interest, and we’re in one boom now, but we saw a boom in the 90s. We saw a boom in the 60s, 70s, we saw a boom in the 30s. In each of these booms, the primary thing people point to is, “Look at these demos. These demos are so cool. Look what they can do that we couldn’t do before.” That’s the primary evidence people tend to point to in all of these areas.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>They just have concrete examples that they were really impressed by. No doubt we have had these very impressive things. The question really is, for example, well, one question is, do we have any evidence that now is different? As opposed to evidence that there will be a big difference in the future. So if you’re asking, “Is now different,” then you’d want to ask, “Are the signs people point to now, i.e. AlphaGo, say, as a dramatic really impressive thing, how different are they as a degree than the comparable things that have happened in the past?”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>The more you understand the past and see it, you saw how impressed people were back in the past with the best things that happened then. That suggests to me that, I mean AlphaGo is say a lump, I’m happy to admit it looks out of line with a smooth attribution of equal research progress to all teams at all times. But it also doesn’t look out of line with the lumpiness we’ve seen over the last 70 years, say, in computer innovation.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>It’s on trajectory. So if you’re going to say, “And we still expect that same overall lumpiness for the next 70 years, or the next 700,” then I’d say then it’s about how close are we now? If you just don’t know how close you are, then you’re still going to end up with a relatively random, “When do we reach this threshold where it’s good enough?” If you just had no idea how close you were, how much is required.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>The more you think you have an idea of what’s required and where you are, the more you can ask how far you are. Then if you say you’re only halfway, then you could say, “Well, if it’s taken us this many years to get halfway,” then the odds that we’re going to get all the rest of the way in the next five years are much less than you’d attribute to just randomly assigning say, “It’s going to happen in 200 years, therefore it’ll be one in two hundred per year.” I do think we’re in more of that sort of situation. We can roughly guess that we’re not almost there.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong>        Can you say a little bit more about how we should think about this question of how close we are?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong>       Sure. The best reliable source on that would be people who have been in this research area for a long time. They’ve just seen lots of problems, they’ve seen lots of techniques, they better understand what it takes to do many hard problems. They have a better sense of, no, they have a good sense of where we are, but ultimately where we have to go.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I think when you don’t understand these things as well by theory or by experience, et cetera, you’re more tempted to look at something like AlphaGo and say, “Oh my God, we’re almost there.” Because you just say, “Oh, look.” You tend more to think, “Well, if we can do human level anywhere, we can do it everywhere.” That was the initial— what people in the 1960s said, “Let’s solve chess, and if we can solve chess, certainly we can do anything.”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I mean, something that can do chess, it’s got to be smart. But they just didn’t fully appreciate the range of tasks, and problems, and problem environments, that you need to deal with. Once you understand the range of possible tasks, task environments, obstacles, issues, et cetera, once you’ve been in AI for a long time and have just seen a wide range of those things, then you have a more of a sense for “I see, AlphaGo, that’s a good job, but let’s list all these simplifying assumptions you made here that made this problem easier”, and you know how to make that list.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Then you’re not so much saying, “If we can do this, we can do anything.” I think pretty uniformly, the experienced AI researchers have said, “We’re not close.” I mean I’d be very surprised if you interviewed any person with a more broad range of AI experience who said, “We’re almost there. If we can do this one more thing we can do everything.”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong>        Yeah. I might be wrong about this–my impression is that your estimate of at least a century or maybe centuries might still be longer than a lot of researchers–and this might be because there’s this trend where people will just say 50 years about almost any technology or something like that.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong>       Sure. I’m happy to walk through that. That’s the logic of that post of mine that you mentioned. It was exactly trying to confront that issue. So I would say there is a disconnect to be addressed. The people you ask are not being consistent when you ask similar things in different ways. The challenge is to disentangle that.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I’m happy to admit when you ask a lot of people how long it will take, they give you 40, 50 year sort of timescales. Absolutely true. Question is, should you believe it? One way to check whether you should believe that is to see how they answer when you ask them different ways. I mean, as you know, I guess one of those surveys interestingly said, “When will most people lose their jobs?”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>They gave much longer time scales than when will computers be able to do most everything, like a factor of two or something. That’s kind of bothersome. That’s a pretty close consistency relation. If computers can do everything cheaper, then they will, right? Apparently not. But I would think that, I mean, I’ve done some writing on this psychology concept called construal-level theory, which just really emphasizes how people have different ways they think about things conceived abstractly and broadly versus narrowly.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>There’s a consistent pattern there, which is consistent with the pattern we are seeing here, that is in the far mode where you’re thinking abstractly and broadly, we tend to be more confident in simple, abstract theories that have simple predictions and you tend to neglect messy details. When you’re in the near mode and focus on a particular thing, you see all the messy difficulties.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>It’s kind of the difference between will you have a happy marriage in life? Sure. This person you’re in a relationship with? Will that work in the next week? I don’t know. There’s all the things to work out. Of course, you’ll only have a happy relationship over a lifetime if every week keeps going okay for the rest of your life. I mean, if enough weeks do. That’s a near/far sort of distinction.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>When you ask people about AI in general and what time scale, that’s a very far mode sort of version of the question. They are aggregating, and they are going on very aggregate sort of theories in their head. But if you take an AI researcher who has been staring at difficult problems in their area for 20 years, and you ask them, “In the problems you’re looking at, how far have we gotten since 20 years ago?,” they’ll be really aware of all the obstacles they have not solved, succeeded in dealing with that, all the things we have not been able to do for 20 years.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>That seems to me a more reliable basis for projection. I mean, of course we’re still in a similar regime. If the regime would change, then past experience is not relevant. If we’re in a similar regime of the kind of problems we’re dealing with and the kind of tools and the kind of people and the kind of incentives, all that sort of thing, then that seems to be much more relevant. That’s the point of that survey, and that’s the point of believing that survey somewhat more than the question asked very much more abstractly.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong>        Two sort of related questions on this. One question is, how many years out do you think it is important to start work on AI? And I guess, a related question is, now even given that it’s super unlikely, what’s the ideal number of people working about or thinking about this?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong>       Well, I’ve said many times in many of these posts that it’s not zero at any time. That is, whenever there’s a problem that it isn’t the right time to work on, it’s still the right time to have some people asking if it’s the right time to work on it. You can’t have people asking a question unless they’re kind of working on it. They’d have to be thinking about it enough to be able to ask the question if it’s the right time to work on it.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>That means you always need some core of people thinking about it, at least, in related areas such they are skilled enough to be able to ask the question, “Hey, what do you think? Is this time to turn and work on this area?” It’s a big world, and eventually this is a big thing, so hey, a dozen could be fine. Given how random academia of course and the intellectual world is, the intellectual world is not at all optimized in terms of number of people per topic. It’s really not.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Relative to that standard, you could be not unusually misallocated if you were still pretty random about it. For that it’s more just: for the other purposes that academic fields exist and perpetuate themselves, how well is it doing for those other purposes? I would basically say, “Academia’s mainly about showing people credentialing impressiveness.” There’s all these topics that are neglected because you can’t credential and impress very well via them. If AI risk was a topic that happened to be unusually able to be impressive with, then it would be an unusually suitable topic for academics to work on.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Not because it’s useful, just because that’s what academics do. That might well be true for ways in which AI problems brings up interesting new conceptual angles that you could explore, or pushes on concepts that you need to push on because they haven’t been generalized in that direction, or just doing formal theorems that are in a new space of theorems.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Like pushing on decision theory, right? Certainly there’s a point of view from which decision theory was kind of stuck, and people weren’t pushing on it, and then AI risk people pushed on some dimensions of decision theory that people hadn’t… people had just different decision theory, not because it’s good for AI. How many people, again, it’s very sensitive to that, right? You might justify 100 people if it not only was about AI risk, but was really more about just pushing on these other interesting conceptual dimensions.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>That’s why it would be hard to give a very precise answer there about how many. But I actually am less concerned about the number of academics working on it, and more about sort of the percentage of altruistic mind space it takes. Because it’s a much higher percentage of that than it is of actual serious research. That’s the part I’m a little more worried about. Especially the fraction of people thinking about the future. I think of, just in general, very few people seem to be that willing to think seriously about the future. As a percentage of that space, it’s huge.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>That’s where I most think, “Now, that’s too high.” If you could say, “100 people will work on this as researchers, but then the rest of the people talk and think about the future.” If they can talk and think about something else, that would be a big win for me because there are tens and hundreds of thousands of people out there on the side just thinking about the future and so, so many of them are focused on this AI risk thing when they really can’t do much about it, but they’ve just told themselves that it’s the thing that they can talk about, and to really shame everybody into saying it’s the priority. Hey, there’s other stuff.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Now of course, I completely have this whole other book, Age of Em, which is about a different kind of scenario that I think doesn’t get much attention, and I think it should get more attention relative to a range of options that people talk about. Again, the AI risk scenario so overwhelmingly sucks up that small fraction of the world. So a lot of this of course depends on your base. If you’re talking about the percentage of people in the world working on these future things, it’s large of course.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>If you’re talking percentage of people who are serious researchers in AI risk relative to the world, it’s tiny of course. Obviously. If you’re talking about the percentage of people who think about AI risk, or talk about it, or treat it very seriously, relative to people who are willing to think and talk seriously about the future, it’s this huge thing.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong>        Yeah. That’s perfect. I was just going to … I was already going to ask a follow-up just about what share of, I don’t know, effective altruists who are focused on affecting the long-term future do you think it should be? Certainly you think it should be far less than this, is what I’m getting there?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong>       Right. First of all, things should be roughly proportional to probability, except with some kicker for extremity of consequences. But I think you don’t actually know about extremity of consequences until you explore a scenario. Right from the start you should roughly write down scenarios by probability, and then devote effort in proportion to the probability of scenarios.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Then once you get into a scenario enough to say, “This looks like a less extreme scenario, this looks like a more extreme scenario,” at that point, you might be justified in adjusting some effort, in and out of areas based on that judgment. But that has to be a pretty tentative judgment so you can’t go too far there, because until you explore a scenario a lot, you really don’t know how extreme… basically it’s about extreme outcomes times the extreme leverage of influence at each point along the path multiplied by each other in hopes that you could be doing things thinking about it earlier and producing that outcome. That’s a lot of uncertainty to multiply though to get this estimate of how important a scenario is as a leverage to think about.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong>        Right, yeah. Relatedly, I think one thing that people say about why AI should take up a large share is that there’s the sense that maybe we have some reason to think that AI is the only thing we’ve identified so far that could plausibly destroy all value, all life on earth, as opposed to other existential risks that we’ve identified. I mean, I can guess, but you may know that consideration or that argument.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong>       Well, surely that’s hyperbole. Obviously anything that kills everybody destroys all value that arises from our source. Of course, there could be other alien sources out there, but even AI would only destroy things from our source relative to other alien sources that would potentially beat out our AI if it produces a bad outcome. Destroying all value is a little hyperbolic, even under the bad AI scenario.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I do think there’s just a wide range of future scenarios, and there’s this very basic question, how different will our descendants be, and how far from our values will they deviate? It’s not clear to me AI is that much worse than other scenarios in terms of that range, or that variance. I mean, yes, AIs could vary a lot in whether they do things that we value or not, but so could a lot of other things. There’s a lot of other ways.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Some people, I guess some people seem to think, “Well, as long as the future is human-like, then humans wouldn’t betray our values.” No, no, not humans. But machines, machines might do it. I mean, the difference between humans and machines isn’t quite that fundamental from the point of view of values. I mean, human values have changed enormously over a long time, we are now quite different in terms of our habits, attitudes, and values, than our distant ancestors.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>We are quite capable of continuing to make huge value changes in many directions in the future. I can’t offer much assurance that because our descendants descended from humans that they would therefore preserve most of your values. I just don’t see that. To the extent that you think that our specific values are especially valuable and you’re afraid of value drift, you should be worried. I’ve written about this: basically in the Journal of Consciousness Studies I commented on a Chalmers paper, saying that generically through history, each generation has had to deal with the fact that the next and coming generations were out of their control.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Not just that, they were out of their control and their values were changing. Unless you can find someway to put some bound on that sort of value change, you’ve got to model it as a random walk; you could go off to the edge if you go off arbitrarily far. That means, typically in history, people if they thought about it, they’d realize we got relatively little control about where this is all going. And that’s just been a generic problem we’ve all had to deal with, all through history, AI doesn’t fundamentally change that fact, people focusing on that thing that could happen with AI, too.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I mean, obviously when we make our first AIs we will make them corresponding to our values in many ways, even if we don’t do it consciously, they will be fitting in our world. They will be agents of us, so they will have structures and arrangements that will achieve our ends. So then the argument is, “Yes, but they could drift from there, because we don’t have a very solid control mechanism to make sure they don’t change a lot, then they could change a lot.”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>That’s very much true, but that’s still true for human culture and their descendants as well, that they can also change a lot. We don’t have very much assurance. I think it’s just some people say, “Yeah, but there’s just some common human nature that’ll make sure it doesn’t go too far.” I’m not seeing that. Sorry. There isn’t. That’s not much of an assurance. When people can change people, even culturally, and especially later on when we can change minds more directly, start tinkering, start shared minds, meet more directly, or just even today we have better propaganda, better mechanisms of persuasion. We can drift off in many directions a long way.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong>        This is sort of switching topics a little bit, but it’s digging into your general disagreement with some key arguments about AI safety. It’s about your views on intelligence. So you’ve written that there may well be no powerful general theories to be discovered revolutionizing AI, and this is related to your view that most everything we’ve learned about intelligence suggests that the key to smarts is having many not-fully-general tools. Human brains are smart mainly by containing many powerful, not-fully-general modules and using many modules to do each task.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>You’ve written that these considerations are one of the main reasons you’re skeptical about AI. I guess the question is, can you think of evidence that might change your mind? I mean, the general question is just to dig in on this train of thought; so is there evidence that would change your mind about this general view of intelligence? And relatedly, why do you think that other people arrive at different views of what intelligence is, and why we could have general laws or general breakthroughs in intelligence?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:      </strong> This is closely related to the lumpiness question. I mean, basically you can not only talk about the lumpiness of changes in capacities, i.e., lumpiness in innovations. You can also talk about the lumpiness of tools in our toolkit. If we just look in industry, if we look in academia, if we look in education, just look in a lot of different areas, you will find robustly that most tools are more specific tools.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Most of the value of tools–of the integral–is in more specific tools, and relatively little of it is in the most general tools. Again, that’s true in things you learn in school, it’s true about things you learn on the job, it’s true about things that companies learn that can help them do things. It’s true about nation advantages that nations have over other nations. Again, just robustly, if you just look at what do you know and how valuable is each thing, most of the value is in lots of little things, and relatively few are big things.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>There’s a power law distribution with most of the small things. It’s a similar sort of lumpiness distribution to the lumpiness of innovation. It’s understandable. If tools have that sort of lumpy innovation, then if each innovation is improving a tool by some percentage, even a distribution percentage, most of the improvements will be in small things, therefore most of the improvements will be small.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Few of the improvements will be a big thing, even if it’s a big improvement in a big thing, that’ll be still a small part of the overall distribution. So lumpiness in the size of tools or the size of things that we have as tools predicts that, in intelligence as well, most of the things that make you intelligent are lumpy little things. It comes down to, “Is intelligence different?”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Again, that’s also the claim about, “Is intelligence innovation different?” If, of course, you thought intelligence was fundamentally different in there being fewer and bigger lumps to find, then that would predict that in the future we would find fewer, bigger lumps, because that’s what there is to find. You could say, “Well, yes. In the past we’ve only ever found small lumps, but that’s because we weren’t looking at the essential parts of intelligence.”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Of course, I’ll very well believe that related to intelligence, there are lots of small things. You might believe that there are also a few really big things, and the reason that in the past, computer science or education innovation hasn’t found many of them is that we haven’t come to the mother lode yet. The mother lode is still yet to be found. When we find it, boy it’ll be big. The belief, you’ll find that in intelligence innovation, is related to a belief that it exists, that it’s a thing to find, which we can relatedly believe that fundamentally, intelligence is simple.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Fundamentally, there’s some essential simplicity to it that when you find it, the pieces will be … each piece is big, because there aren’t very many pieces, and that’s implied by it being simple. It can’t be simple unless … if there’s 100,000 pieces, it’s not simple. If there’s 10 pieces, it could be simple, but then each piece is big. Then the question is, “What reason do you have to believe that intelligence is fundamentally simple?”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I think, in academia, we often try to find simple essence in various fields. So there’d be the simple theory of utilitarianism, or the simple theory of even physical particles, or simple theory of quantum mechanics, or … so if your world is thinking about abstract academic areas like that, then you might say, “Well, in most areas, the essence is a few really powerful, simple ideas.”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>You could kind of squint and see academia in that way. You can’t see the industrial world that way. That is, we have much clearer data about the world of biological organisms competing, or firms competing, or even nations competing. We have much more solid data about that to say, “It’s really lots of little things.” Then it becomes, you might say, “Yeah, but intelligence. That’s more academic.” Because your idea of intelligence is sort of intrinsically academic, that you think of intelligence as the sort of thing that best exemplary happens in the best academics.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>If your model is ordinary stupid people, they have a stupid, poor intelligence, but they just know a lot, or have some charisma, or whatever it is, but Von Neumann, look at that. That’s what real intelligence is. Von Neumann, he must’ve had just five things that were better. He couldn’t have been 100,000 things that were better, had to be five core things that were better, because you see, he’s able to produce these very simple, elegant things, and he was so much better, or something like that.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I actually do think this account is true, that many people have these sort of core emotional attitudinal relationships to the concept of intelligence. And that colors a lot of what they think about intelligence, including about artificial intelligence. That’s not necessarily tied to sort of the data we have on variations, and productivity, and performance, and all that sort of thing. It’s more sort of essential abstract things. Certainly if you’re really into math, in the world of math there are core axioms or core results that are very lumpy and powerful.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Of course even there, again, distribution of math citations follows exactly the same distribution as all the other fields. By the citation measure, math is not more lumpy. But still, when you think about math, you like to think about these core, elegant, powerful results. Seeing them as the essence of it all.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong>        So you mentioned Von Neumann and people have a tendency to think that there must be some simple difference between Von Neumann and us. Obviously the other comparison people make which you’ve written about is the comparison between us as a species and other species. I guess, can you say a little bit about how you think about human uniqueness and maybe how that influences your viewpoint on intelligence?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong>       Sure. That, we have literatures that I just defer to. I mean, I’ve read enough to think I know what they say and that they’re relatively in agreement and I just accept what they say. So what the standard story is then, humans’ key difference was an ability to support cultural evolution. That is, human mind capacities aren’t that different from a chimpanzee’s overall, and an individual [human] who hasn’t had the advantage of cultural evolution isn’t really much better.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>The key difference is that we found a way to accumulate innovations culturally. Now obviously there’s some difference in the sense that it does seem hard, even though we’ve tried today to teach culture to chimps, we’ve also had some remarkable success. But still it’s plausible that there’s something they don’t have quite good enough yet that let’s the mdo that, but then the innovations that made a difference have to be centered around that in some sense.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I mean, obviously most likely in a short period of time, a whole bunch of independent unusual things didn’t happen. More likely there was one biggest thing that happened that was the most important. Then the question is what that is. We know lots of differences of course. This is the “what made humans different” game. There’s all these literatures about all these different ways humans were different. They don’t have hair on their skin, they walk upright, they have fire, they have language, blah, blah, blah.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>The question is, “Which of these matter?” Because they can’t all be the fundamental thing that matters. Presumably, if they all happen in a short time, something was more fundamental that caused most of them. The question is, “What is that?” But it seems to me that the standard answer is right, it was cultural evolution. And then the question is, “Well, okay. But what enabled cultural evolution?” Language certainly seems to be an important element, although it also seems like humans, even before they had language, could’ve had some faster cultural evolution than a lot of other animals.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Then the question is, “How big a brain difference or structure difference would it take?” Then it seems like well, if you actually look at the mechanisms of cultural evolution, the key thing is sitting next to somebody else watching what they’re doing, trying to do what they’re doing. So that takes certain observation abilities, and it takes certain mirroring abilities, that is, the ability to just map what they’re doing onto what you’re doing. It takes sort of fine-grained motor control abilities to actually do whatever it is they’re doing.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Those seem like just relatively modest incremental improvements on some parameters, like chimps weren’t quite up to that. Humans could be more up to that. Even our language ability seems like, well, we have modestly different structured mouths that can more precisely control sounds and chimps don’t quite do that, so it’s understandable why they can’t make as many sounds as distinctly. The bottom line is that our best answer is it looks like there was a threshold passed, sort of ability supporting cultural evolution, which included the ability to watch people, the ability to mirror it, the ability to do it yourself, the ability to tell people through language or through more things like that.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>It looks roughly like there was just a threshold passed, and that threshold allowed cultural evolution, and that’s allowed humans to take off. If you’re looking for some fundamental, architectural thing, it’s probably not there. In fact, of course people have said when you look at chimp brains and human brains in fine detail, you see pretty much the same stuff. It isn’t some big overall architectural change, we can tell that. This is pretty much the same architecture.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Looks like it’s some tools we are somewhat better at and plausibly those are the tools that allow us to do cultural evolution.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong>        Yeah. I think that might be it for my questions on human uniqueness.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong>        I want to briefly go back to, I think I sort of mentioned this question, but we didn’t quite address it. At what timescale do you think people–how far out do you think people should be starting maybe the field building stuff, or starting actually doing work on AI? Maybe number of years isn’t a good metric for this, but I’m still curious for your take.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong>       Well, first of all, let’s make two different categories of effort. One category of effort is actually solving actual problems. Another category of effort might be just sort of generally thinking about the kind of problems that might appear and generally categorizing and talking about them. So most of the effort that will eventually happen will be in the first category. Overwhelmingly, most of the effort, and appropriately so.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I mean, that’s true today for cars or nuclear weapons or whatever it is. Most of the effort is going to be dealing with the actual concrete problems right in front of you. That effort, it’s really hard to do much before you actually have concrete systems that you’re worried about, and the concrete things that can actually go wrong with them. That seems completely appropriate to me.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I would say that sort of effort is mostly, well, you see stuff and it goes wrong, deal with it. Ahead of seeing problems, you shouldn’t be doing that. You could today be dealing with computer security, you can be dealing with hackers and automated tools to deal with them, you could be dealing with deep fakes. I mean, it’s fine time now to deal with actual, concrete problems that are in front of people today.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>But thinking about problems that could occur in the future, that you haven’t really seen the systems that would produce them or even the scenarios that would play out, that’s much more the other category of effort, is just thinking abstractly about the kinds of things that might go wrong, and maybe the kinds of architectures and kinds of approaches, et cetera. That, again, is something that you don’t really need that many people to do. If you have 100 people doing it, probably enough.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Even 10 people might be enough. It’s more about how many people, again, this mind space in altruistic futurism, you don’t need very much of that mind space to do it at all, really. Then that’s more the thing I complain that there’s too much of. Again, it comes down to how unusual will the scenarios be that are where the problem starts. Today, cars can have car crashes, but each crash is a pretty small crash, and happens relatively locally, and doesn’t kill that many people. You can wait until you see actual car crashes to think about how to deal with car crashes.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Then the key question is, “How far do the scenarios we worry about deviate from that?” I mean, most problems in our world today are like that. Most things that go wrong in systems, we have our things that go wrong on a small scale pretty frequently, and therefore you can look at actual pieces of things that have gone wrong to inform your efforts. There are some times where we exceptionally anticipate problems that we never see. Then anticipate even institutional problems that we never see or even worry that by the time the problem gets here, it’ll be too late.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Those are really unusual scenarios in problems. The big question about AI risk is what fraction of the problems that we will face about AI will be of that form. And then, to what extent can we anticipate those now? Because in the year 1,000, it would’ve been still pretty hard to figure out the unusual scenarios that might bedevil military hardware purchasing or something. Today we might say, “Okay, there’s some kind of military weapons we can build that yes, we can build them, but it might be better once we realize they can be built and then have a treaty with the other guys to have neither of us build them.”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Sometimes that’s good for weapons. Okay. That was not very common 1,000 years ago. That’s a newer thing today, but 1,000 years ago, could people have anticipated that, and then what usefully could they have done other than say, “Yeah, sometimes it might be worse having a treaty about not building a weapon if you figure out it’d be worse for you if you have both.” I’m mostly skeptical that there are sort of these big things that you have to coordinate ahead of time, that you have to anticipate, that if you wait it’s too late, that you won’t see actual concrete signatures of the problems before you have to invent them.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Even today, large systems, you often tend to have to walk through a failure analysis. You build a large nuclear plant or something, and then you go through and try to ask everything that could go wrong, or every pair of things that could go wrong, and ask, “What scenarios would those produce?,” and try to find the most problematic scenarios. Then ask, “How can we change the design of it to fix those?”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>That’s the kind of exercise we do today where we imagine problems that most of which never occur. But for that, you need a pretty concrete design to work with. You can’t do that very abstractly with the abstract idea. For that you need a particular plan in front of you, and now you can walk through concrete failure modes of all the combinations of this strut will break, or this pipe will burst, and all those you walk through. It’s definitely true that we often analyze problems that never appear, but it’s almost never in the context of really abstract sparse descriptions of systems.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:       </strong> Got you. Yeah. We’ve been asking people a standard question which I think I can maybe guess your answer to. But the question is: what’s your credence that in a world where we didn’t have these additional EA-inspired safety efforts, what’s your credence that in that world AI poses a significant risk of harm? I guess this question doesn’t really get at how much efforts now are useful, it’s just a question about general danger.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong>       There’s the crying wolf effect, and I’m particularly worried about it. For example, space colonization is a thing that could happen eventually. And for the last 50 years, there have been enthusiasts who have been saying, “It’s now. It’s now. Now is the time for space colonization.” They’ve been consistently wrong. For the next 50 years, they’ll probably continue to be consistently wrong, but everybody knows there’s these people out there who say, “Space colonization. That’s it. That’s it.”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Whenever they hear somebody say, “Hey, it’s time for space colonization,” they go, “Aren’t you one of those fan people who always says that?” The field of AI risk kind of has that same problem where again today, but for the last 70 years or even longer, there have been a subset of people who say, “The robots are coming, and it’s all going to be a mess, and it’s now. It’s about to be now, and we better deal with it now.” That creates sort of a skepticism in the wider world that you must be one of those crazies who keep saying that.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>That can be worse for when there really is, when we really do have the possibility of space colonization, when it is really the right time, we might well wait too long after that, because people just can’t believe it, because they’ve been hearing this for so long. That makes me worried that this isn’t a positive effect. Calling attention to a problem, like a lot of attention to a problem, and then having people experience it as not a problem, when it looks like you didn’t realize that.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Now, if you just say, “Hey, this nuclear power plant type could break. I’m not saying it will, but it could, and you ought to fix that,” that’s different than saying, “This pipe will break, and that’ll happen soon, and better do something.” Because then you lose credibility when the pipe doesn’t usually break.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:     </strong>   Just as a follow-up, I suppose the official line for most people working on AI safety is, as it ought to be, there’s some small chance that this could matter a lot, and so we better work on it. Do you have thoughts on ways of communicating that that’s what you actually think so that you don’t have this crying wolf effect?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong>       Well, if there are only the 100 experts, and not the 100,000 fans, this would be much easier. That does happen in other areas. There are areas in the world where there are only 100 experts and there aren’t 100,000 fans screaming about it. Then the experts can be reasonable and people can say, “Okay,” and take their word seriously, although they might not feel too much pressure to listen and do anything. If you can say that about computer security today, for example, the public doesn’t scream a bunch about computer security.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>The experts say, “Hey, this stuff. You’ve got real computer security problems.” They say it cautiously and with the right degree of caveats that they’re roughly right. Computer security experts are roughly right about those computer security concerns that they warn you about. Most firms say, “Yeah, but I’ve got these business concerns immediately, so I’m just going to ignore you.” So we continue to have computer security problems. But at least from a computer security expert’s point of view, they aren’t suffering from the perception of hyperbole or actual hyperbole.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>But that’s because there aren’t 100,000 fans of computer security out there yelling with them. But AI risk isn’t like that. AI risk, I mean, it’s got the advantage of all these people pushing and talking which has helped produce money and attention and effort, but it also means you can’t control the message.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong>        Are you worried that this reputation effect or this impression of hyperbole could bleed over and harm other EA causes or EA’s reputation in general, and if so are there ways of mitigating that effect?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong>       Well again, the more popular anything is, the harder it is for any center to mitigate whatever effects there are of popular periphery doing whatever they say and do. For example, I think there are really quite reasonable conservatives in the world who are at the moment quite tainted with the alt-right label, and there is an eager population of people who are eager to taint them with that, and they’re kind of stuck.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>All they can do is use different vocabularies, have a different style and tone when they talk to each other, but they are still at risk for that tainting. A lot depends on the degree to which AI risk is seen as central to EA. The more it’s perceived as a core part of EA, then later on when it’s perceived as having been overblown and exaggerated, then that will taint EA. Not much way around that. I’m not sure that matters that much for EA though.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I mean I don’t see EA as driven by popularity or popular attention. It seems it’s more a group of people who– it’s driven by the internal dynamics of the group and what they think about each other and whether they’re willing to be part of it. Obviously in the last century or so, we just had these cycles of hype about AI, so that’s … I expect that’s how this AI cycle will be framed– in the context of all the other concern about AI. I doubt most people care enough about EA for that to be part of the story.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I mean, EA has just a little, low presence in people’s minds in general, that unless it got a lot bigger, it just would not be a very attractive element to put in the story to blame those people. They’re nobody. They don’t exist to most people. The computer people exaggerate. That’s a story that sticks better. That has stuck in the past.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong>        Yeah. This is zooming out again, but I’m curious: kind of around AI optimism, but also just in general around any of the things you’ve talked about in this interview, what sort of evidence you think that either we could get now, or might plausibly see in the future would change your views one way or the other?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong>       Well, I would like to see much more precise and elaborated data on the lumpiness of algorithm innovations and AI progress. And of course data on whether things are changing different[ly] now. For example, forgetting his name, somebody did a <a href="https://www.milesbrundage.com/blog-posts/alphago-and-ai-progress">blog post a few years ago</a> right after AlphaGo, saying this Go achievement seemed off trend if you think about it by time, but not if you thought about it by computing resources devoted to the problem. If you looked at past level of Go ability relative to computer resources, then it was on trend, it wasn’t an exception.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Any case, that’s relevant to the lumpiness issue, right? So the more that we could do a good job of calibrating how unusual things are, the more that we could be able talk about whether we are seeing unusual stuff now. That’s kind of often the way this conversation goes is, “Is this time different? Are we seeing unusual stuff now?” In order to do that, you want us to be able to calibrate these progresses as clearly as possible.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Obviously certainly if you could make some metric for each AI progress being such that you could talk about how important it was by some relative weighting in different fields, and relevant weighting of different kinds of advances, and different kinds of metrics for advances, then you can have some statistics of tracking over time the size of improvements and whether that was changing.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I mean, I’ll also make a pitch for the data thing that I’ve just been doing for the last few years, which is the data on automation per job in the US, and the determinants of that and how that’s changed over time, and its impact over time. Basically there’s a dataset called O*NET and they’re broken into 800 jobs categories and jobs in the US, and for each job in the last 20 years, at some random times, some actual people went and rated each job on a one to five scale of how automated it was.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Now we have those ratings. We are able to say what predicts which jobs are how automated, and has that changed over time? Then the answer is, we can predict pretty well, just like 25 variables lets us predict half the variance in which jobs are automated, and they’re pretty mundane things, they’re not high-tech, sexy things. It hasn’t changed much in 20 years. In addition, we can ask when jobs get more or less automated, how does that impact the number of employees and their wages. We find almost no impact on those things. <br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>A data series like that, if you kept tracking it over time, if there were a deviation from trend, you might be able to see it, you might see that the determinants of automation were changing, that the impacts were changing. This is of course just tracking actual AI impacts, not sort of extreme tail possibilities of AI impacts, right?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Of course, this doesn’t break it down into AI versus other sources of automation. Most automation has nothing to do with AI research. It’s making a machine that whizzes and does something that a person was doing before. But if you could then find a way to break that down by AI versus not, then you could more focus on, “Is AI having much impact on actual business practice?,” and seeing that.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Of course, that’s not really supporting the early effort scenario. That would be in support of, “Is it time now to actually prepare people for major labor market impacts, or major investment market impacts, or major governance issues that are actually coming up because this is happening now?” But you’ve been asking about, “Well, what about doing stuff early?” Then the question is, “Well, what signs would you have that it’s soon enough?”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Honestly, again, I think we know enough about how far away we are from where we need to be, and we know we’re not close, and we know that progress is not that lumpy. So we can see, we have a ways to go. It’s just not soon. We’re not close. It’s not time to be doing things you would do when you are close or soon. But the more that you could have these expert judgments of, “for any one problem, how close are we?,” and it could just be a list of problematic aspects of problems and which of them we can handle so far and which we can’t.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Then you might be able to, again, set up a system that when you are close, you could trigger people and say, “Okay, now it’s time to do field building,” or public motivation, or whatever it is. It’s not time to do it now. Maybe it’s time to set up a tracking system so that you’ll find out when it’s time.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong>        On that cluster of issues surrounding human uniqueness, other general laws of intelligence, is there evidence that could change your mind on that? I don’t know. Maybe it could come from psychology, or maybe it could come from anthropology, new theories of human uniqueness, something like that?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong>       The most obvious thing is to show me actual big lumpy, lumpy innovations that made a big difference to the performance of the system. That would be the thing. Like I said, for many years I was an AI researcher, and I noticed that researchers often created systems, and systems have architectures. So their paper would have a box diagram for an architecture, and explain that their system had an architecture and that they were building on that architecture.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>But it seemed to me that in fact, the architectures didn’t make as much difference as they were pretending. In the performance of the system, most systems that were good, were good because they just did a lot of work to make that whole architecture work. But you could imagine doing counterfactual studies where you vary the effort you go into filling the concept of a system and you vary the architecture. You quantitatively find out how much does architecture matter.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>There could be even already existing data out there in some form or other that somebody has done the right sort of studies. So it’s obvious that architecture makes some difference. Is it a factor of two? Is it 10%? Is it a factor of 100? Or is it 1%? I mean, that’s really what we’re arguing about. If it’s a factor of 10% then you say, “Okay, it matters. You should do it. You should pay attention to that 10%. It’s well worth putting the effort into getting that 10%.”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>But it doesn’t make that much of a difference in when this happens and how big it happens. Right? Or if architecture is a factor of 10 or 100, now you can have a scenario where somebody finds a better architecture and suddenly they’re a factor of 100 better than other people. Now that’s a huge thing. That would be a way to ask a question, “How much of an advance can a new system get relative to other systems?,” would be to say, “how much of a difference does a better architecture matter?”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>And that’s a thing you can actually study directly by having people make systems with different architectures, put different spots of reference into it, et cetera, and see what difference it makes.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong>        Right. And I suspect that some people think that homo sapiens are such a data point, and that it sounds like you disagree with how they’ve construed that. Do you think there’s empirical evidence waiting to change your mind, or do you think people are just sort of misconstruing it, or are ignorant, or just not thinking correctly about what we should make of the fact of our species dominating the planet?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong>       Well, there’s certainly a lot of things we don’t know as well about primate abilities, so again, I’m reflecting what I’ve read about cultural evolution and the difference between humans and primates. But you could do more of that, and maybe the preliminary indications that I’m hearing about are wrong. Maybe you’ll find out that no, there is this really big architectural difference in the brain that they didn’t notice, or that there’s some more fundamental capability introduction.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>For example, abstraction is something we humans do, and we don’t see animals doing much of it, but this construal-level theory thing I described and standard brain architecture says actually all brains have been organized by abstraction for a long time. That is, we see a dimension of the brain which is the abstract to the concrete, and we see how it’s organized that way. But we humans seem to be able to talk about abstractions in ways that other animals don’t.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>So a key question is, “Do we have some extra architectural thing that lets us do more with abstraction?” Because again, most brains are organized by abstraction and concrete. That’s just one of the main dimensions of brains. The forebrain versus antebrain is concrete versus abstraction. Then the more we just knew about brain architecture and why it was there, the more we can concretely say whether there was a brain architectural innovation from primates to humans.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>But everything I’ve heard says it seems to be mostly a matter of relevant emphasis of different parts, rather than some fundamental restructuring. But even small parts can be potent. So one way actually to think about it is that most ordinary programs spend most of the time in just a few lines of code. Then so if you have 100,000 lines of code, it could still only be 100 lines, there’s 100 lines of code where 90% of the time is being spent. That doesn’t mean those 100,000 lines don’t matter. When you think about implementing code on the brain, you realize because the brain is parallel, whatever 90% of the code has been, that’s going to be 90% of the volume of the brain.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Those other 100,000 lines of code will take up relatively little space, but they’re still really important. A key issue at the brain is you might find out that you understand 90% of the volume as a simple structure following a simple algorithm and you can still hardly understand anything about this total algorithm, because it’s all the other parts that you don’t understand where stuff isn’t executing very often, but it still needs to be there to make the whole thing work. That’s a very problematic thing about understanding brain organization at all.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>You’re tempted to go by volume and try to understand because volume is visible first, and whatever volume you can opportunistically understand, but you could still be a long way off from understanding. Just like if you had any big piece of code and you understood 100 lines of it, out of 100,000 lines, you might not understand very much at all. Of course, if that was the 100 lines that was being executed most often, you’d understand what it was doing most of the time. You’d definitely have a handle on that, but how much of the system would you really understand?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong>        We’ve been interviewing a bunch of people. Are there other people who you think have well-articulated views that you think it would be valuable for us to talk to or interview?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong>       My experience is that I’ve just written on this periodically over the years, but I get very little engagement. Seems to me there’s just a lack of a conversation here. Early on, Eliezer Yudkowsky and I were debating, and then as soon as he and other people just got funding and recognition from other people to pursue, then they just stopped engaging critics and went off on pursuing their stuff.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Which makes some sense, but these criticisms have just been sitting and waiting. Of course, what happens periodically is they are most eager to engage the highest status people who criticize them. So periodically over the years, some high-status person will make a quip, not very thought out, at some conference panel or whatever, and they’ll be all over responding to that, and sending this guy messages and recruiting people to talk to him saying, “Hey, you don’t understand. There’s all these complications.”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Which is different from engaging the people who are the longest, most thoughtful critics. There’s not so much of that going on. You are perhaps serving as an intermediary here. But ideally, what you do would lead to an actual conversation. And maybe you should apply for funding to have an actual event where people come together and talk to each other. Your thing could be a preliminary to get them to explain how they’ve been misunderstood, or why your summary missed something; that’s fine. If it could just be the thing that started that actual conversation it could be well worth the trouble.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong>        I guess related to that, is there anything you wish we had asked you, or any other things sort of you would like to be included in this interview?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong>       I mean, you sure are relying on me to know what the main arguments are that I’m responding to, hence you’re sort of shy about saying, “And here are the main arguments, what’s your response?” Because you’re shy about putting words in people’s mouths, but it makes it harder to have this conversation. If you were taking a stance and saying, “Here’s my positive argument,” then I could engage you more.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I would give you a counterargument, you might counter-counter. If you’re just trying to roughly summarize a broad range of views then I’m limited in how far I can go in responding here.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong>        Right. Yeah. I mean, I don’t think we were thinking about this as sort of a proxy for a conversation.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong>       But it is.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong>        But it is. But it is, right? Yeah. I could maybe try to summarize some of the main arguments. I don’t know if that seems like something that’s interesting to you? Again, I’m at risk of really strawmanning some stuff.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong>       Well, this is intrinsic to your project. You are talking to people and then attempting to summarize them.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong>        That’s right, that’s right.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong>       If you thought it was actually feasible to summarize people, then what you would do is produce tentative summaries, and then ask for feedback and go back and forth in rounds of honing and improving the summaries. But if you don’t do that, it’s probably because you think even the first round of summaries will not be to their satisfaction and you won’t be able to improve it much.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Which then says you can’t actually summarize that well. But what you can do is attempt to summarize and then use that as an orienting thing to get a lot of people to talk and then just hand people the transcripts and they can get what they can get out of it. This is the nature of summarizing conversation; this is the nature of human conversation.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong>        Right. Right. Right. Of course. Yeah. So I’ll go out on a limb. We’ve been talking largely to people who I think are still more pessimistic than you, but not as pessimistic as say, MIRI. I think the main difference between you and the people we’ve been talking to is… I guess two different things.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>There’s a sort of general issue which is, how much time do we have between now and when AI is coming, and related to that, which I think we also largely discussed, is how useful is it to do work now? So yeah, there’s sort of this field building argument, and then there are arguments that if we think something is 20 years away, maybe we can make more robust claims about what the geopolitical situation is going to look like.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Or we can pay more attention to the particular organizations that might be making progress on this, and how things are going to be. There’s a lot of work around assuming that maybe AGI’s actually going to look somewhat like current techniques. It’s going to look like deep reinforcement and ML techniques, plus maybe a few new capabilities. Maybe from that perspective we can actually put effort into work like interpretability, like adversarial training, et cetera.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Maybe we can actually do useful work to progress that. A concrete version of this, Paul Christiano has this approach that I think MIRI is very skeptical of, addressing prosaic –AI that looks very similar to the way AI looks now. I don’t know if you’re familiar with iterated distillation and amplification, but it’s sort of treating this AI system as a black box, which is a lot of what it looks like if they’re in a world that’s close to the one now, because neural nets are sort of black box-y.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Treating it as a black box, there’s some chance that this approach where we basically take a combination of smart AIs and use that to sort of verify the safety of a slightly smarter AI, and sort of do that process, bootstrapping. And maybe we have some hope of doing that, even if we don’t have access to the internals of the AI itself. Does that make sense? The idea is sort of to have an approach that works even with black box sort of AIs that might look similar to the neural nets we have now.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:      </strong> Right. I would just say the whole issue is how plausible is it that within 20 years we’ll have human level, broad human-level AI on the basis of these techniques that we see now? Obviously the higher probability you think that is, then the more you think it’s worth doing that. I don’t have any objection at all with conditional on that assumption, his strategies. It would just be, how likely is that? And not only–it’s okay for him to work on that–it’s just more, how big a fraction of mind space does that take up among the wider space of people worried about AI risk?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong>        Yeah. Many of the people that we’ve talked to have actually agreed that it’s taking up too much mind space, or they’ve made arguments of the form, “Well, I am a very technical person, who has a lot of compelling thoughts about AI safety, and for me personally I think it makes sense to work on this. Not as sure that as many resources should be devoted to it.” I think at least a reasonable fraction of people would agree with that. <em>[Note:</em> <em>It’s wrong that many of the people we interviewed said this. This comment was on the basis of non-public conversations that I’ve had.]</em><br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:      </strong> Well, then maybe an interesting follow-up conversation topic would be to say, “what concretely could change the percentage of mind space?” That’s different than … The other policy question is like, “How many research slots should be funded?” You’re asking what are the concrete policy actions that could be relevant to what you’re talking about. The most obvious one I would think is people are thinking in terms of how many research slots should be funded of what sort, when.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>But with respect to the mind space, that’s not the relevant policy question. The policy question might be some sense of how many scenarios should these people be thinking in terms of. Or what other scenarios should get more attention.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong>        Yeah, I guess I’m curious on your take on that. If you could just control the mind space in some way, or sort of set what people were thinking about or what directions, what do you think it would look like?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong>        Very quickly, I think one concrete operationalization of “mind space resource” is what 80,000 Hours tells people to do, with young, talented people say.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong>       That’s even more plausible. I mean, I would just say, study the future. Study many scenarios in the future other than this scenario. Go actually generate scenarios, explore them, tell us what you found. What are the things that could go wrong there? What are the opportunities? What are the uncertainties? Just explore a bunch of future scenarios and report. That’s just a thing that needs to happen.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Other than AI risk. I mean, AI risk is focused on one relatively narrow set of scenarios, and there’s a lot of other scenarios to explore, so that would be a sense of mind space and career work is just say, “There’s 10 or 100 people working in this other area, I’m not going to be that …”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Then you might just say, concretely, the world needs more futurists. If under these … the future is a very important place, but we’re not sure how much leverage we have about it. We just need more scenarios explored, including for each scenario asking what leverage there might be. <br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Then I might say we’ve had a half-dozen books in the last few years about AI risks. How about a book that has a whole bunch of other scenarios, one of which is AI risk which takes one chapter out of 20, and 19 other chapters on other scenarios? And then if people talked about that and said it was a cool book and recommended it, and had keynote speakers about that sort of thing, then it would shift the mind space. People would say, “Yeah. AI risk is definitely one thing, people should be looking at it, but here’s a whole bunch of other scenarios.”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong>        Right. I guess I could also try a little bit to zero in…  I think a lot of the differences in terms of people’s estimates for numbers of years are modeling differences. I think you have this more outside view model of what’s going on, looking at lumpiness.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I think one other common modeling choice is to say something like, “We think progress in this field is powered by compute; here’s some extrapolation that we’ve made about how compute is going to grow,” and maybe our estimates of how much compute is needed to do some set of powerful things. I feel like with those estimates, then you might think things are going to happen sooner? I don’t know how familiar you are with that space of arguments or what your take is like.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong>       I have read most all of the AI Impacts blog posts over the years, just to be clear.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong>        Great. Great.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong>       You have a set of posts on that. So the most obvious data point is maybe we’re near the human equivalent compute level now, but not quite there. We passed the mice level a while ago, right? Well, we don’t have machines remotely capable of doing what mice do. So it’s clear that merely having the computing-power equivalent is not enough. We have machines that went past the cockroach far long ago. We certainly don’t have machines that can do all the things cockroaches can do.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>It’s just really obvious I think, looking at examples like that, that computing power is not enough. We might hit a point where we have so much computing power that you can do some sort of fast search. I mean, that’s sort of the difference between machine learning and AI as ways to think about this stuff. When you thought about AI you just thought about, “Well, you have to do a lot of work to make the system,” and it was computing. And then it was kind of obvious, well, duh, well you need software, hardware’s not enough.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>When you say machine learning people tend to have more hope– Well, we just need some general machine learning algorithm and then you turn that on and then you find the right system and then the right system is much cheaper to execute computationally. The threshold you need is a lot more computing power than the human brain has to execute the search, but it won’t be that long necessarily before we have a lot more.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Then now it’s an issue of how simple is this thing you’re searching for and how close are current machine learning systems to what you need? The more you think that a machine learning system like we have now could basically do everything, if only it were big enough and had enough data and computing power, it’s a different perspective than if you think we’re not even close to having the right machine learning techniques. There’s just a bunch of machine learning problems that we know we’ve solved that these systems just don’t solve.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong>        Right.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong>        So on that question, I can’t pull up the exact quote quickly enough, but I may insert it in the transcript, with permission. Paul Christiano has said more or less, in an 80,000 Hours interview, that he’s very unsure, but he suspects that we might be at insect-level capabilities if we devoted, if we wanted to, if people took it upon themselves to take the compute we have and the resources that we have, we could do what insects do.<span class="easy-footnote-margin-adjust" id="easy-footnote-1-2121"></span><span class="easy-footnote"><a href="#easy-footnote-bottom-1-2121" title='The actual quote is, “Things like, right now we’re kind of at the stage where AI systems are … the sophistication is probably somewhere in the range of insect abilities. That’s my current best guess. And I’m very uncertain about that. … One should really be diving into the comparison to insects now and say, can we really do this? It’s plausible to me that that’s the kind of … If we’re in this world where our procedures are similar to evolution, it’s plausible to me the insect thing should be a good indication, or one of the better indications, that we’ll be able to get in advance.&amp;#8221; from his &lt;a href="https://80000hours.org/podcast/episodes/paul-christiano-ai-alignment-solutions/"&gt;podcast with 80,000 Hours&lt;/a&gt;.'><sup>1</sup></a></span><br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>He’s interested in maybe concretely testing this hypothesis that you just mentioned, humans and cockroaches. But it sounds like you’re just very skeptical of it. It sounds like you’re already quite confident that we are not at insect level. Can you just say a little bit more about why you think that?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson: </strong>      Well, there’s doing something a lot like what insects do, and then there’s doing exactly what insects do. And those are really quite different tasks, and the difference is in part how forgiving you are about a bunch of details. I mean, there’s some who may say an image recognition or something, or even Go… Cockroaches are actually managing a particular cockroach body in a particular environment. They’re pretty damn good at that.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>If you wanted to make an artificial cockroach that was as good as cockroaches at the thing that the cockroach does, I think we’re a long way off from that. But you might think most of those little details aren’t that important. They’re just a lot of work and that maybe you could make a system that did what you think of as the essential core problems similarly.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Now we’re back to this key issue of the division between a few essential core problems and a lot of small messy problems. I basically think the game is in doing them all. Do it until you do them all. When doing them all, include a lot of the small messy things. So that’s the idea that your brain is 100,000 lines of code, and 90% of the brain volume is 100 of those lines, and then there’s all these little small, swirly structures in your brain that manage the small little swirly tasks that don’t happen very often, but when they do, that part needs to be there.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>What percentage of your brain volume would be enough to replicate before you thought you were essentially doing what a human does? I mean, that is sort of an essential issue. If you thought there were just 100 key algorithms and once you got 100 of them then you were done, that’s different than thinking, “Sure, there’s 100 main central algorithms, plus there’s another 100,000 lines of code that just is there to deal with very, very specific things that happen sometimes.”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>And that evolution has spent a long time searching in the space of writing that code and found these things and there’s no easy learning algorithm that will find it that isn’t in the environment that you were in. This is a key question about the nature of intelligence, really.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:       </strong> Right. I’m now hijacking this interview to be about this insect project that AI Impacts is also doing, so apologies for that. We were thinking maybe you can isolate some key cognitive tasks that bees can do, and then in simulation have something roughly analogous to that. But it sounds like you’re not quite satisfied with this as a test of the hypothesis, where you can do all the little bee things and control bee body and wiggle around just like bees do and so forth?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong> I mean, if you could attach it to an artificial bee body and put it in a hive and see what happens, then I’m much more satisfied. If you say it does the bee dance, it does the bee smell, it does the bee touch, I’ll go, “That’s cute, but it’s not doing the bee.”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:       </strong> Then again, it just sounds like how satisfied you are with these abstractions, depends on your views of intelligence and how much can be abstracted away–<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong>       It depends on your view of the nature of the actual problems that most animals and humans face. They’re a mixture of some structures with relative uniformity across a wide range; that’s when abstraction is useful. Plus, a whole bunch of messy details that you just have to get right.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>In some sense I’d be more impressed if you could just make an artificial insect that in a complex environment can just be an insect, and manage the insect colonies, right? I’m happy to give you a simulated house and some simulated dog food, and simulated predators, who are going to eat the insects, and I’m happy to let you do it all in simulation. But you’ve got to show me a complicated world, with all the main actual obstacles that insects have to surviving and existing, including parasites and all sorts of things, right?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>And just show me that you can have something that robustly works in an environment like that. I’m much more impressed by that than I would be by you showing an actual physical device that does a bee dance.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal</strong>:        Yeah. I mean, to be clear, I think the project is more about actually finding a counterexample. If we could find a simple case where we can’t even do this with neural networks then it’s fairly … there’s a persuasive case there.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong>       But then of course people might a month later say, “Oh, yeah?” And then they work on it and they come up with a way to do that, and there will never be an end to that game. The moment you put up this challenge and they haven’t done it yet–<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong>        Yeah. I mean, that’s certainly a possibility.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong>        Cool. I guess I’m done for now hijacking this interview to be about bees, but that’s just been something I’ve been thinking about lately.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong>        I would love to sort of engage with you on your disagreements, but I think a lot of them are sort of like … I think a lot of it is in this question of how close are we? And I think I only know in the vaguest terms people’s models for this.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I feel like I’m not sure how good in an interview I could be at trying to figure out which of those models is more compelling. Though I do think it’s sort of an interesting project because it seems like lots of people just have vastly different sorts of timelines models, which they use to produce some kind of number.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong>       Sure. I suppose you might want to ask people you ask after me sort of the relative status of inside and outside arguments. And who sort of has the burden of proof with respect to which audiences.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong>        Right. Right. I think that’s a great question.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong>       If we’ve agreed that the outside view doesn’t support short time scales of things happening, and we say, “But yes, some experts think they see something different in their expert views of things with an inside view,” then we can say, “Well, how often does that happen?” We can make the outside view of that. We can say, “Well, how often do inside experts think they see radical potential that they are then inviting other people to fund and support, and how often are they right?”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:       </strong> Right. I mean, I don’t think it’s just inside/outside view. I think there are just some outside view arguments that make different modeling choices that come to different conclusions.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong>       I’d be most willing to engage those. I think a lot of people are sort of making an inside/outside argument where they’re saying, “Sure, from the outside this doesn’t look good, but here’s how I see it from the inside.” That’s what I’ve heard from a lot of people.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong>        Yeah. Honestly my impression is that I think not a lot of people have spent … a lot of people when they give us numbers are like, “this is really a total guess.” So I think a lot of the argument is either from people who have very specific compute-based models for things that are short [timelines], and then there’s also people who I think haven’t spent that much time creating precise models, but sort of have models that are compelling enough. They’re like, “Oh, maybe I should work on this slash the chance of this is scary enough.” I haven’t seen a lot of very concrete models. Partially I think that’s because there’s an opinion in the community that if you have concrete models, especially if they argue for things being very soon, maybe you shouldn’t publish those.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong> Right, but you could still ask the question, “Set aside everything you know except what this looks like from the outside. Looking at that, would you still predict stuff happening soon?”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> Yeah, I think that’s a good question to ask. We can’t really go back and add that to what we’ve asked people, but yeah.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong> I think more people, even most, would say, “Yeah, from the outside, this doesn’t look so compelling.” That’s my judgement, but again, they might say, “Well, the usual way of looking at it from the outside doesn’t, but then, here’s this other way of looking at it from the outside that other people don’t use.” That would be a compromise sort of view. And again, I guess there’s this larger meta-question really of who should reasonably be moved by these things? That is, if there are people out there who specialize in chemistry or business ethics or something else, and they hear these people in AI risk saying there’s these big issues, you know, can the evidence that’s being offered by these insiders– is it the sort of thing that they think should be compelling to these outsiders?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> Yeah, I think I have a question about that too. Especially, I think–we’ve been interviewing largely AI safety researchers, but I think the arguments around why they think AI might be soon or far, look much more like economic arguments. They don’t necessarily look like arguments from an inside, very technical perspective on the subject. So it’s very plausible to me that there’s no particular reason to weigh the opinions of people working on this, other than that they’ve thought about it a little bit more than other people have. <em>[Note: I say ‘soon or far’ here, but I mean to say ‘more or less likely to be harmful’.]</em><br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong> Well, as a professional economist, I would say, if you have good economic arguments, shouldn’t you bring them to the attention of economists and have us critique them? Wouldn’t that be the way this should go? I mean, not all economics arguments should start with economists, but wouldn’t it make sense to have them be part of the critique evaluation cycle?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> Yeah, I think the real answer is that these all exist vaguely in people’s heads, and they don’t even make claims to having super-articulated and written-down models.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong> Well, even that is an interesting thing if people agree on it. You could say, “You know a lot of people who agree with you that AI risk is big and that we should deal with something soon. Do you know anybody who agrees with you for the same reasons?”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>It’s interesting, so I did a poll, I’ve done some Twitter polls lately, and I did one on “Why democracy?” And I gave four different reasons why democracy is good. And I noticed that there was very little agreement, that is, relatively equal spread across these four reasons. And so, I mean that’s an interesting fact to know about any claim that many people agree on, whether they agree on it for the same reasons. And it would be interesting if you just asked people, “Whatever your reason is, what percentage of people interested in AI risk agree with your claim about it for the reason that you do?” Or, “Do you think your reason is unusual?”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Because if most everybody thinks their reason is unusual, then basically there isn’t something they can all share with the world to convince the world of it. There’s just the shared belief in this conclusion, based on very different reasons. And then it’s more on their authority of who they are and why they as a collective are people who should be listened to or something.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> Yeah, I agree that that is an interesting question. I don’t know if I have other stuff, Rob, do you?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> I don’t think I do at this time.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong> Well I perhaps, compared to other people, am happy to do a second round should you have questions you generate.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> Yeah, I think it’s very possible, thanks so much. Thanks so much for talking to us in general.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong> You’re welcome. It’s a fun topic, especially talking with reasonable people.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> Oh thank you, I’m glad we were reasonable.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> Yeah, I’m flattered.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong> You might think that’s a low bar, but it’s not.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> Great, we’re going to include that in the transcript. Thank you for talking to us. Have a good rest of your afternoon.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robin Hanson:</strong> Take care, nice talking to you.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <ol class="easy-footnotes-wrapper">
+ <li><div class="li">
+ <span class="easy-footnote-margin-adjust" id="easy-footnote-bottom-1-2121"></span>The actual quote is, “Things like, right now we’re kind of at the stage where AI systems are … the sophistication is probably somewhere in the range of insect abilities. That’s my current best guess. And I’m very uncertain about that. … One should really be diving into the comparison to insects now and say, can we really do this? It’s plausible to me that that’s the kind of … If we’re in this world where our procedures are similar to evolution, it’s plausible to me the insect thing should be a good indication, or one of the better indications, that we’ll be able to get in advance.” from his <a href="https://80000hours.org/podcast/episodes/paul-christiano-ai-alignment-solutions/">podcast with 80,000 Hours</a>.<a class="easy-footnote-to-top" href="#easy-footnote-1-2121"></a>
+ </div></li>
+ </ol>
+ </HTML>
+ 
+

Conversation with Rohin Shah

2022-09-21T07:37:41+00:00

@@ -1 +1,1102 @@
+ ====== Conversation with Rohin Shah ======
+ 
+ // Published 31 October, 2019; last updated 15 September, 2020 //
+ 
+ <HTML>
+ <p>AI Impacts talked to AI safety researcher Rohin Shah about his views on AI risk. With his permission, we have transcribed this interview.</p>
+ </HTML>
+ 
+ 
+ 
+ ===== Participants =====
+ 
+ 
+ <HTML>
+ <ul>
+ <li><div class="li">
+ <a href="https://rohinshah.com/">Rohin Shah</a> — PhD student at the Center for Human-Compatible AI, UC Berkeley
+                 </div></li>
+ <li><div class="li">Asya Bergal – AI Impacts</div></li>
+ <li><div class="li">
+ <a href="http://robertlong.online/">Robert Long</a> – AI Impacts
+                 </div></li>
+ <li><div class="li">Sara Haxhia — Independent researcher</div></li>
+ </ul>
+ </HTML>
+ 
+ 
+ ===== Summary =====
+ 
+ 
+ <HTML>
+ <p>We spoke with Rohin Shah on August 6, 2019. Here is a brief summary of that conversation:</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <ul>
+ <li><div class="li">Before taking into account other researchers’ opinions, Shah guesses an extremely rough~90% chance that even without any additional intervention from current longtermists, advanced AI systems will not cause human extinction by adversarially optimizing against humans. He gives the following reasons, ordered by how heavily they weigh in his consideration:
+                   <ul>
+ <li><div class="li">Gradual development and take-off of AI systems is likely to allow for correcting the AI system online, and AI researchers will in fact correct safety issues rather than hacking around them and redeploying.
+                       <ul>
+ <li><div class="li">Shah thinks that institutions developing AI are likely to be careful because human extinction would be just as bad for them as for everyone else.</div></li>
+ </ul>
+ </div></li>
+ <li><div class="li">As AI systems get more powerful, they will likely become more interpretable and easier to understand because they will use features that humans also tend to use.</div></li>
+ <li><div class="li">Many arguments for AI risk go through an intuition that AI systems can be decomposed into an objective function and a world model, and Shah thinks this isn’t likely to be a good way to model future AI systems.</div></li>
+ </ul>
+ </div></li>
+ <li><div class="li">Shah believes that conditional on misaligned AI leading to extinction, it almost certainly goes through deception.</div></li>
+ <li><div class="li">Shah very uncertainly guesses that there’s a ~50% that we will get AGI within two decades:
+                   <ul>
+ <li><div class="li">He gives a ~30% – 40% chance that it will be via essentially current techniques.</div></li>
+ <li><div class="li">He gives a ~70% that conditional on the two previous claims, it will be a mesa optimizer.</div></li>
+ <li><div class="li">Shah’s model for how we get to AGI soon has the following features:
+                       <ul>
+ <li><div class="li">AI will be trained on a huge variety of tasks, addressing the usual difficulty of generalization in ML systems</div></li>
+ <li><div class="li">AI will learn the same kinds of useful features that humans have learned.</div></li>
+ <li><div class="li">This process of research and training the AI will mimic the ways that evolution produced humans who learn.</div></li>
+ <li><div class="li">Gradient descent is simple and inefficient, so in order to do sophisticated learning, the outer optimization algorithm used in training will have to produce a mesa optimizer.</div></li>
+ </ul>
+ </div></li>
+ </ul>
+ </div></li>
+ <li><div class="li">Shah is skeptical of more ‘nativist’ theories where human babies are born with a lot of inductive biases, rather than learning almost everything from their experiences in the world.</div></li>
+ <li><div class="li">Shah thinks there are several things that could change his beliefs, including:
+                   <ul>
+ <li><div class="li">If he learned that evolution actually baked a lot into humans (‘nativism’), he would lengthen the amount of time he thinks there will be before AGI.</div></li>
+ <li><div class="li">Information from historical case studies or analyses of AI researchers could change his mind around how the AI community would by default handle problems that arise.</div></li>
+ <li><div class="li">Having a better understanding of the disagreements he has with MIRI:
+                       <ul>
+ <li><div class="li">Shah believes that slow takeoff is much more likely than fast takeoff.</div></li>
+ <li><div class="li">Shah doesn’t believe that any sufficiently powerful AI system will look like an expected utility maximizer.</div></li>
+ <li><div class="li">Shah believes less in crisp formalizations of intelligence than MIRi does.</div></li>
+ <li><div class="li">Shah has more faith in AI researchers fixing problems as they come up.</div></li>
+ <li><div class="li">Shah has less faith than MIRI in our ability to write proofs of the safety of our AI systems.</div></li>
+ </ul>
+ </div></li>
+ </ul>
+ </div></li>
+ </ul>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>This transcript has been lightly edited for concision and clarity.</p>
+ </HTML>
+ 
+ 
+ ===== Transcript =====
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> We haven’t really planned out how we’re going to talk to people in general, so if any of these questions seem bad or not useful, just give us feedback. I think we’re particularly interested in skepticism arguments, or safe by default style arguments– I wasn’t sure from our conversation whether you partially endorse that, or you just are familiar with the argumentation style and think you could give it well or something like that.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> I think I partially endorse it.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> Okay, great. If you can, it would be useful if you gave us the short version of your take on the AI risk argument and the place where you feel you and people who are more convinced of things disagree. Does that make sense?</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> Just to clarify, maybe for my own… What’s ‘convinced of things’? I’m thinking of the target proposition as something like “it’s extremely high value for people to be doing work that aims to make AGI more safe or beneficial”.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> Even that statement seems a little imprecise because I think people have differing opinions about what the high value work is. But that seems like approximately the right proposition.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Okay. So there are some very obvious ones which are not the ones that I endorse, but things like, do you believe in longtermism? Do you buy into the total view of population ethics? And if your answer is no, and you take a more standard version, you’re going to drastically reduce how much you care about AI safety. But let’s see, the ones that I would endorse-</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> Maybe we should work on this set of questions. I think this will only come up with people who are into rationalism. I think we’re primarily focused just on empirical sources of disagreement, whereas these would be ethical.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Yup.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> Which again, you’re completely right to mention these things.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> So, there’s… okay. The first one I had listed is that continual or gradual or slow takeoff, whatever you want to call it, allows you to correct the AI system online. And also it means that AI systems are likely to fail in not extinction-level ways before they fail in extinction-level ways, and presumably we will learn from that and not just hack around it and fix it and redeploy it. I think I feel fairly confident that there are several people who will disagree with exactly the last thing I said, which is that people won’t just hack around it and deploy it– like fix the surface-level problem and then just redeploy it and hope that everything’s fine.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I am not sure what drives the difference between those intuitions. I think they would point to neural architecture search and things like that as examples of, “Let’s just throw compute at the problem and let the compute figure out a bunch of heuristics that seem to work.” And I would point at, “Look, we noticed that… or, someone noticed that AI systems are not particularly fair and now there’s just a ton of research into fairness.”</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>And it’s true that we didn’t stop deploying AI systems because of fairness concerns, but I think that is actually just the correct decision from a societal perspective. The benefits from AI systems are in fact– they do in fact outweigh the cons of them not being fair, and so it doesn’t require you to not deploy the AI system while it’s being fixed.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> That makes sense. I feel like another common thing, which is not just “hack around and fix it”, is that people think that it will fail in ways that we don’t recognize and then we’ll redeploy some bigger cooler version of it that will be deceptively aligned (or whatever the problem is). How do you feel about arguments of that form: that we just won’t realize all the ways in which the thing is bad?</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> So I’m thinking: the AI system tries to deceive us, so I guess the argument would be, we don’t realize that the AI system was trying to deceive us and instead we’re like, “Oh, the AI system just failed because it was off distribution or something.”</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>It seems strange that we wouldn’t see an AI system deliberately hide information from us. And then we look at this and we’re like, “Why the hell didn’t this information come up? This seems like a clear problem.” And then do some sort of investigation into this.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I suppose it’s possible we wouldn’t be able to tell it’s intentionally doing this because it thinks it could get better reward by doing so. But that doesn’t… I mean, I don’t have a particular argument why that couldn’t happen but it doesn’t feel like…</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> Yeah, to be fair I’m not sure that one is what you should expect… that’s just a thing that I commonly hear.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Yes. I also hear that.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> I was surprised at your deception comment… You were talking about, “What about scenarios where nothing seems wrong until you reach a certain level?”</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> Right. Sorry, that doesn’t have to be deception. I think maybe I mentioned deception because I feel like I often commonly also see it.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> I guess if I imagine “How did AI lead to extinction?”, I don’t really imagine a scenario that doesn’t involve deception. And then I claim that conditional on that scenario having happened, I am very surprised by the fact that we did not know this deception in any earlier scenario that didn’t lead to extinction. And I don’t really get people’s intuitions for why that would be the case. I haven’t tried to figure that one out though.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Sara Haxhia:</strong> So do you have no model of how people’s intuitions differ? You can’t see it going wrong aside from if it was deceptively aligned? Why?</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Oh, I feel like most people have the intuition that conditional on extinction, it happened by the AI deceiving us. <em>[Note: In this interview, Rohin was only considering risks arising because of AI systems that try to optimize for goals that are not our own, not other forms of existential risks from AI.]</em></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> I think there’s another class of things which is something not necessarily deceiving us, as in it has a model of our goals and intentionally presents us with deceptive output, and just like… it has some notion of utility function and optimizes for that poorly. It doesn’t necessarily have a model of us, it just optimizes the paperclips or something like that, and we didn’t realize before that it is optimizing. I think when I hear deceptive, I think “it has a model of human behavior that is intentionally trying to do things that subvert our expectations”. And I think there’s also a version where it just has goals unaligned with ours and doesn’t spend any resources in modeling our behavior.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> I think in that scenario, usually as an instrumental goal, you need to deceive humans, because if you don’t have a model of human behavior– if you don’t model the fact that humans are going to interfere with your plans– humans just turn you off and nothing, there’s no extinction.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> Because we’d notice. You’re thinking in the non-deception cases, as with the deception cases, in this scenario we’d probably notice.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Sara Haxhia:</strong> That clarifies my question. Great.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> As far as I know, this is an accepted thing among people who think about AI x-risk.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal: </strong>The accepted thing is like, “If things go badly, it’s because it’s actually deceiving us on some level”?</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Yup. There are some other scenarios which could lead to us not being deceived and bad things still happen. These tend to be things like, we build an economy of AI systems and then slowly humans get pushed out of the economy of AI systems and… </p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>They’re still modeling us. I just can’t really imagine the scenario in which they’re not modeling us. I guess you could imagine one where we slowly cede power to AI systems that are doing things better than we could. And at no point are they actively trying to deceive us, but at some point they’re just like… they’re running the entire economy and we don’t really have much say in it.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>And perhaps this could get to a point where we’re like, “Okay, we have lost control of the future and this is effectively an x-risk, but at no point was there really any deception.”</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> Right. I’m happy to move on to other stuff.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Cool. Let’s see. What’s the next one I have? All right. This one’s a lot sketchier-</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> So sorry, what is the thing that we’re listing just so-</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Oh, reasons why AI safety will be fine by default.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> Right. Gotcha, great.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Okay. These two points were both really one point. So then the next one was… I claimed that as AI systems get more powerful, they will become more interpretable and easier to understand, just because they’re using– they will probably be able to get and learn features that humans also tend to use.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I don’t think this has really been debated in the community very much and– sorry, I don’t mean that there’s agreement on it. I think it is just not a hypothesis that has been promoted to attention in the community. And it’s not totally clear what the safety implications are. It suggests that we could understand AI systems more easily and sort of in combination with the previous point it says, “Oh, we’ll notice things– we’ll be more able to notice things than today where we’re like, ‘Here’s this image classifier. Does it do good things? Who the hell knows? We tried it on a bunch of inputs and it seemed like it was doing the right stuff, but who knows what it’s doing inside.'”</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> I’m curious why you think it’s likely to use features that humans tend to use. It’s possible the answer is some intuition that’s hard to describe.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Intuition that I hope to describe in a year. Partly it’s that in the very toy straw model, there are just a bunch of features in the world that an AI system can pay attention to in order to make good predictions. When you limit the AI system to make predictions on a very small narrow distribution, which is like all AI systems today, there are lots of features that the AI system can use for that task that we humans don’t use because they’re just not very good for the rest of the distribution.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> I see. It seems like implicitly in this argument is that when humans are running their own classifiers, they have some like natural optimal set of features that they use for that distribution?</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> I don’t know if I’d say optimal, but yeah. Better than the features that the AI system is using.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> In the space of better features, why aren’t they going past us or into some other optimal space of feature world?</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> I think they would eventually.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> I see, but they might have to go through ours first?</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> So A) I think they would go through ours, B) I think my intuition is something like the features– and this one seems like more just raw intuition and I don’t really have an argument for it– but the features… things like agency, optimization, want, deception, manipulation seem like things that are useful for modeling the world.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I would be surprised if an AI system went so far beyond that those features didn’t even enter into its calculations. Or, I’d be surprised if that happened very quickly, maybe. I don’t want to make claims about how far past those AI systems could go, but I do think that… I guess I’m also saying that we should be aiming for AI systems that are like… This is a terrible way to operationalize it, but AI systems that are 10X as intelligent as humans, what do we have to do for them? And then once we’ve got AI systems that are 10 x smarter than us, then we’re like, “All right, what more problems could arise in the future?” And ask the AI systems to help us with that as well.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> To clarify, the thing you’re saying is… By the time AI systems are good and more powerful, they will have some conception of the kind of features that humans use, and be able to describe their decisions in terms of those features? Or do you think inherently, there’ll be a point where AI systems use the exact same features that humans use?</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Not the exact same features, but broadly similar features to the ones that humans use.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> Where examples of those features would be like objects, cause, agent, the things that we want interpreted in deep nets but usually can’t.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Yes, exactly.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> Again, so you think in some sense that that’s a natural way to describe things? Or there’s only one path through getting better at describing things, and that has to go through the way that humans describe things? Does that sound right?</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Yes.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> Okay. Does that also feel like an intuition?</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Yes.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> Sorry, I think I did a bad interviewer thing where I started listing things, I should have just asked you to list some of the features which I think-</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Well I listed them, like, optimization, want, motivation before, but I agree causality would be another one. But yeah, I was thinking more the things that safety researchers often talk about. I don’t know, what other features do we tend to use a lot? Object’s a good one… the conception of 3D space is one that I don’t think these classifiers have and that we definitely have.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>And the concept of 3D space seems like it’s probably going to be useful for an AI system no matter how smart it gets. Currently, they might have a concept of 3D space, but it’s not obvious that they do. And I wouldn’t be surprised if they don’t.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>At some point, I want to take this intuition and run with it and see where it goes. And try to argue for it more.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> But I think for the purposes of this interview, I think we do understand how this is something that would make things safe by default. At least, in as much as interpretability conduces to safety. Because we could be able to interpret them in and still fuck shit up.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Yep. Agreed. Cool.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Sara Haxhia:</strong> I guess I’m a little bit confused about how it makes the code more interpretable. I can see how if it uses human brains, we can model it better because we can just say, “These are human things and this means we can make predictions better.” But if you’re looking at a neural net or something, it doesn’t make it more interpretable.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> If you mean the code, I agree with that.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Sara Haxhia:</strong> Okay. So, is this kind of like external, like you being able to model that thing?</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> I think you could look at the… you take a particular input to neural net, you pass it through layers, you see what the activations are. I don’t think if you just look directly at the activations, you’re going to get anything sensible, in the same way that if you look at electrical signals in my brain you’re not going to be able to understand them.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Sara Haxhia:</strong> So, is your point that the reason it becomes more interpretable is something more like, you understand its motivations?</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> What I mean is… Are you familiar with Chris Olah’s work?</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Sara Haxhia:</strong> I’m not.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Okay. So Chris Olah does interpretability work with image classifiers. One technique that he uses is: Take a particular neuron in the neural net, say, “I want to maximize the activation of this neuron,” and then do gradient descent on your input image to see what image maximally activates that neuron. And this gives you some insight into what that neuron is detecting. I think things like that will be easier as time goes on.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> Even if it’s not just that particular technique, right? Just the general task?</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Yes.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Sara Haxhia:</strong> How does that relate to the human values thing? It felt like you were saying something like it’s going to model the world in a similar way to the way we do, and that’s going to make it more interpretable. And I just don’t really see the link.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> A straw version of this, which isn’t exactly what I mean but sort of is the right intuition, would be like maybe if you run the same… What’s the input that maximizes the output of this neuron? You’ll see that this particular neuron is a deception classifier. It looks at the input and then based on something, does some computation with the input, maybe the input’s like a dialogue between two people and then this neuron is telling you, “Hey, is person A trying to deceive person B right now?” That’s an example of the sort of thing I am imagining.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> I’m going to do the bad interviewer thing where I put words in your mouth. I think one problem right now is you can go a few layers into a neural network and the first few layers correspond to things you can easily tell… Like, the first layer is clearly looking at all the different pixel values, and maybe the second layer is finding lines or something like that. But then there’s this worry that later on, the neurons will correspond to concepts that we have no human interpretation for, so it won’t even make sense to interpret them. Whereas Rohin is saying, “No, actually the neurons will correspond to, or the architecture will correspond to some human understandable concept that it makes sense to interpret.” Does that seem right?</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Yeah, that seems right. I am maybe not sure that I tie it necessarily to the architecture, but actually probably I’d have to one day.</p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> Definitely, you don’t need to. Yeah.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Anyway, I haven’t thought about that enough, but that’s basically that. If you look at current late layers in image classifiers they are often like, “Oh look, this is a detector for lemon tennis balls,” and you’re just like, “That’s a strange concept you’ve got there, neural net, but sure.”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> Alright, cool. Next way of being safe?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> They’re getting more and more sketchy. I have an intuition that… I should rephrase this. I have an intuition that AI systems are not well-modeled as, “Here’s the objective function and here is the world model.” Most of the classic arguments are: Suppose you’ve got an incorrect objective function, and you’ve got this AI system with this really, really good intelligence, which maybe we’ll call it a world model or just general intelligence. And this intelligence can take in any utility function, and optimize it, and you plug in the incorrect utility function, and catastrophe happens.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>This does not seem to be the way that current AI systems work. It is the case that you have a reward function, and then you sort of train a policy that optimizes that reward function, but… I explained this the wrong way around. But the policy that’s learned isn’t really… It’s not really performing an optimization that says, “What is going to get me the most reward? Let me do that thing.”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>It has been given a bunch of heuristics by gradient descent that tend to correlate well with getting high reward and then it just executes those heuristics. It’s kind of similar to… If any of you are fans of the sequences… Eliezer wrote a sequence on evolution and said… What was it? Humans are not fitness maximizers, they are adaptation executors, something like this. And that is how I view neural nets today that are trained by RL. They don’t really seem like expected utility maximizers the way that it’s usually talked about by MIRI or on LessWrong.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I mostly expect this to continue, I think conditional on AGI being developed soon-ish, like in the next decade or two, with something kind of like current techniques. I think it would be… AGI would be a mesa optimizer or inner optimizer, whichever term you prefer. And that that inner optimizer will just sort of have a mishmash of all of these heuristics that point in a particular direction but can’t really be decomposed into ‘here are the objectives, and here is the intelligence’, in the same way that you can’t really decompose humans very well into ‘here are the objectives and here is the intelligence’.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> And why does that lead to better safety?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> I don’t know that it does, but it leads to not being as confident in the original arguments. It feels like this should be pushing in the direction of ‘it will be easier to correct or modify or change the AI system’. Many of the arguments for risk are ‘if you have a utility maximizer, it has all of these convergent instrumental sub-goals’ and, I don’t know, if I look at humans they kind of sort of pursued convergent instrumental sub-goals, but not really.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>You can definitely convince them that they should have different goals. They change the thing they are pursuing reasonably often. Mostly this just reduces my confidence in existing arguments rather than gives me an argument for safety.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> It’s like a defeater for AI safety arguments that rely on a clean separation between utility…<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Yeah, which seems like all of them. All of the most crisp ones. Not all of them. I keep forgetting about the… I keep not taking into account the one where your god-like AI slowly replace humans and humans lose control of the future. That one still seems totally possible in this world.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> If AGI is through current techniques, it’s likely to have systems that don’t have this clean separation.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Yep. A separate claim that I would argue for separately– I don’t think they interact very much– is that I would also claim that we will get AGI via essentially current techniques. I don’t know if I should put a timeline on it, but two decades seems plausible. Not saying it’s likely, maybe 50% or something. And that the resulting AGI will look like mesa optimizer.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> Yeah. I’d be very curious to delve into why you think that.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> Yeah, me too. Let’s just do that because that’s fast. Also your… What do you mean by current techniques, and what’s your credence in that being what happens?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Sara Haxhia:</strong> And like what’s your model for how… where is this coming from?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> So on the meta questions, first, the current techniques would be like deep learning, gradient descent broadly, maybe RL, maybe meta-learning, maybe things sort of like it, but back propagation or something like that is still involved.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I don’t think there’s a clean line here. Something like, we don’t look back and say: That. That was where the ML field just totally did a U-turn and did something else entirely.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> Right. Everything that’s involved in the building of the AGI is something you can roughly find in current textbooks or like conference proceedings or something. Maybe combined in new cool ways.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Yeah. Maybe, yeah. Yup. And also you throw a bunch of compute at it. That is part of my model. So that was the first one. What is current techniques? Then you asked credence.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Credence in AGI developed in two decades by current-ish techniques… Depends on the definition of current-ish techniques, but something like 30, 40%. Credence that it will be a mesa optimizer, maybe conditional on this being… The previous thing being true, the credence on it being a mesa optimizer, 60, 70%. Yeah, maybe 70%.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>And then the actual model for why this is… it’s sort of related to the previous points about features wherein there are lots and lots of features and humans have settled on the ones that are broadly useful across a wide variety of contexts. I think that in that world, what you want to do to get AGI is train an AI system on a very broad… train an AI system maybe by RL or something else, I don’t know. Probably RL.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>On a very large distribution of tasks or a large distribution of something, maybe they’re tasks, maybe they’re not like, I don’t know… Human babies aren’t really training on some particular task. Maybe it’s just a bunch of unsupervised learning. And in doing so over a lot of time and a lot of compute, it will converge on the same sorts of features that humans use.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I think the nice part of this story is that it doesn’t require that you explain how the AI system generalizes– generalization in general is just a very difficult property to get out of ML systems if you want to generalize outside of the training distribution. You mostly don’t require that here because, A) it’s being trained on a very wide variety of tasks and B) it’s sort of mimicking the same sort of procedure that was used to create humans. Where, with humans you’ve also got the sort of… evolution did a lot of optimization in order to create creatures that were able to work effectively in the environment, the environment’s super complicated, especially because there are other creatures that are trying to use the same resources.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>And so that’s where you get the wide variety or, the very like broad distribution of things. Okay. What have I not said yet?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long: </strong>That was your model. Are you done with the model of how that sort of thing happens or-<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> I feel like I’ve forgotten aspects, forgotten to say aspects of the model, but maybe I did say all of it.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> Well, just to recap: One thing you really want is a generalization, but this is in some sense taken care of because you’re just training on a huge bunch of tasks. Secondly, you’re likely to get them learning useful features. And one-<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> And thirdly, it’s mimicking what evolution did, which is the one example we have of a process that created general intelligence.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> It feels like implicit in this sort of claim for why it’s soon is that compute will grow sufficiently to accommodate this process, which is similar to evolution. It feels like there’s implicit there, a claim that compute will grow and a claim that however compute will grow, that’s going to be enough to do this thing.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Yeah, that’s fair. I think actually I don’t have good reasons for believing that, maybe I should reduce my credences on these a bit, but… That’s basically right. So, it feels like for the first time I’m like, “Wow, I can actually use estimates of human brain computation and it actually makes sense with my model.”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I’m like, “Yeah, existing AI systems seem more expensive to run than the human brain… Sorry, if you compare dollars per hour of human brain equivalent. Hiring a human is what? Maybe we call it $20 an hour or something if we’re talking about relatively simple tasks. And then, I don’t think you could get an equivalent amount of compute for $20 for a while, but maybe I forget what number it came out to, I got to recently. Yeah, actually the compute question feels like a thing I don’t actually know the answer to.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> A related question– this is just to clarify for me– it feels like maybe the relevant thing to compare to is not the amount of compute it takes to run a human brain, but like-<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Evolution also matters.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> Yeah, the amount of compute to get to the human brain or something like that.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Yes, I agree with that, that that is a relevant thing. I do think we can be way more efficient than evolution.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> That sounds right. But it does feel like that’s… that does seem like that’s the right sort of quantity to be looking at? Or does it feel like-<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> For training, yes.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> I’m curious if it feels like the training is going to be more expensive than the running in your model.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> I think the… It’s a good question. It feels like we will need a bunch of experimentation, figuring out how to build essentially the equivalent of the human brain. And I don’t know how expensive that process will be, but I don’t think it has to be a single program that you run. I think it can be like… The research process itself is part of that.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>At some point I think we build a system that is initially trained by gradient descent, and then the training by gradient descent is comparable to humans going out in the world and acting and learning based on that. A pretty big uncertainty here is: How much has evolution put in a bunch of important priors into human brains? Versus how much are human brains actually just learning most things from scratch? Well, scratch or learning from their parents.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>People would claim that babies have lots of inductive biases, I don’t know that I buy it. It seems like you can learn a lot with a month of just looking at the world and exploring it, especially when you get way more data than current AI systems get. For one thing, you can just move around in the world and notice that it’s three dimensional.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Another thing is you can actually interact with stuff and see what the response is. So you can get causal intervention data, and that’s probably where causality becomes such an ingrained part of us. So I could imagine that these things that we see as core to human reasoning, things like having a notion of causality or having a notion, I think apparently we’re also supposed to have as babies an intuition about statistics and like counterfactuals and pragmatics.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>But all of these are done with brains that have been in the world for a long time, relatively speaking, relative to AI systems. I’m not actually sure if I buy that this is because we have really good priors.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> I recently heard… Someone was talking to me about an argument that went like: Humans, in addition to having priors, built-ins from evolution and learning things in the same way that neural nets do, learn things through… you go to school and you’re taught certain concepts and algorithms and stuff like that. And that seems distinct from learning things in a gradient descenty way. Does that seem right?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> I definitely agree with that.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> I see. And does that seem like a plausible thing that might not be encompassed by some gradient descenty thing?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> I think the idea there would be, you do the gradient descenty thing for some time. That gets you in the AI system that now has inside of it a way to learn. That’s sort of what it means to be a mesa optimizer. And then that mesa optimizer can go and do its own thing to do better learning. And maybe at some point you just say, “To hell with this gradient descent, I’ll turn it off.” Probably humans don’t do that. Maybe humans do that, I don’t know.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> Right. So you do gradient descent to get to some place. And then from there you can learn in the same way– where you just read articles on the internet or something?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Yeah. Oh, another reason that I think this… Another part of my model for why this is more likely– I knew there was more– is that, exactly that point, which is that learning probably requires some more deliberate active process than gradient descent. Gradient design feels really relatively dumb, not as dumb as evolution, but close. And the only plausible way I’ve seen so far for how that could happen is by mesa optimization. And it also seems to be how it happened with humans. I guess you could imagine the meta-learning system that’s explicitly trying to develop this learning algorithm.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>And then… okay, by the definition of mesa optimizers, that would not be a mesa optimizer, it would be an inner optimizer. So maybe it’s an inner optimizer instead if we use-<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> I think I don’t quite understand what it means that learning requires, or that the only way to do learning is through mesa optimization<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> I can give you a brief explanation of what it means to me in a minute or two. I’m going to go and open my summary because that says it better than I can.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Learned optimization, that’s what it was called. All right. Suppose you’re searching over a space of programs to find one that plays tic-tac-toe well. And initially you find a program that says, “If the board is empty, put something in the center square,” or rather, “If the center square is empty, put something there. If there’s two in a row somewhere of yours, put something to complete it. If your opponent has two in a row somewhere, make sure to block it,” and you learn a bunch of these heuristics. Those are some nice, interpretable heuristics but maybe you’ve got some uninterpretable ones too.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>But as you search more and more, eventually someday you stumble upon the minimax algorithm, which just says, “Play out the game all the way until the end. See whether in all possible moves that you could make, and all possible moves your opponent could make, and search for the path where you are guaranteed to win.”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>And then you’re like, “Wow, this algorithm, it just always wins. No one can ever beat it. It’s amazing.” And so basically you have this outer optimization loop that was searching over a space of programs, and then it found a program, so one element of the space, that was itself performing optimization, because it was searching through possible moves or possible paths in the game tree to find the actual policy it should play.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>And so your outer optimization algorithm found an inner optimization algorithm that is good, or it solves the task well. And the main claim I will make, and I’m not sure if… I don’t think the paper makes it, but the claim I will make is that for many tasks if you’re using gradient descent as your optimizer, because gradient descent is so annoyingly slow and simple and inefficient, the best way to actually achieve the task will be to find a mesa optimizer. So gradient descent finds parameters that themselves take an input, do some sort of optimization, and then figure out an output.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> Got you. So I guess part of it is dividing into sub-problems that need to be optimized and then running… Does that seem right?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> I don’t know that there’s necessarily a division into sub problems, but it’s a specific kind of optimization that’s tailored for the task at hand. Maybe another example would be… I don’t know, that’s a bad example. I think the analogy to humans is one I lean on a lot, where evolution is the outer optimizer and it needs to build things that replicate a bunch.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>It turns out having things replicate a bunch is not something you can really get by heuristics. What you need to do is to create humans who can themselves optimize and figure out how to… Well, not replicate a bunch, but do things that are very correlated with replicating a bunch. And that’s how you get very good replicators.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> So I guess you’re saying… often the gradient descent process will– it turns out that having an optimizer as part of the process is often a good thing. Yeah, that makes sense. I remember them in the mesa optimization stuff.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Yeah. So that intuition is one of the reasons I think that… It’s part of my model for why AGI will be a mesa optimizer. Though I do– in the world where we’re not using current ML techniques I’m like, “Oh, anything can happen.”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> That makes sense. Yeah, I was going to ask about that. Okay. So conditioned on current ML techniques leading to it, it’ll probably go through mesa optimizers?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Yeah. I might endorse the claim with much weaker confidence even without current ML techniques, but I’d have to think a lot more about that. There are arguments for why mesa optimization is the thing you want– is the thing that happens– that are separate from deep learning. In fact, the whole paper doesn’t really talk about deep learning very much.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> Cool. So that was digging into the model of why and how confident we should be on current technique AGI, prosaic AI I guess people call it? And seems like the major sources of uncertainty there are: does compute actually go up, considerations about evolution and its relation to human intelligence and learning and stuff?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Yup. So the Median Group, for example, will agree with most of this analysis… Actually no. The Median Group will agree with some of this analysis but then say, and therefore, AGI is extremely far away, because evolution threw in some horrifying amount of computation and there’s no way we can ever match that.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> I’m curious if you still have things on your list of like safety by default arguments, I’m curious to go back to that. Maybe you covered them.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> I think I have covered them.  The way I’ve listed this last one is ‘AI systems will be optimizers in the same way that humans are optimizers, not like Eliezer-style EU maximizers’… which is basically what I’ve just been saying.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Sara Haxhia:</strong> But it seems like it still feels dangerous.. if a human had loads of power, it could do things that… even if they aren’t maximizing some utility.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Yeah, I agree, this is not an argument for complete safety. I forget where I was initially going with this point. I think my main point here is that mesa optimizers don’t nice… Oh, right, they don’t nicely factor into utility function and intelligence. And that reduces my credence in existing arguments, and there are still issues which are like, with a mesa optimizer, your capabilities generalize with distributional shift, but your objective doesn’t.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Humans are not really optimizing for reproductive success. And arguably, if someone had wanted to create things that were really good at reproducing, they might have used evolution as a way to do it. And then humans showed up and were like, “Oh, whoops, I guess we’re not doing that anymore.”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I mean, the mesa optimizers paper is a very pessimistic paper. In their view, mesa optimization is a bad thing that leads to danger and that’s… I agree that all of the reasons they point out for mesa optimization being dangerous are in fact reasons that we should be worried about mesa optimization.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I think mostly I see this as… convergent instrumental sub-goals are less likely to be obviously a thing that this pursues. And that just feels more important to me. I don’t really have a strong argument for why that consideration dominates-<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> The convergent instrumental sub-goals consideration?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Yeah.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> I have a meta credence question, maybe two layers of them. The first being, do you consider yourself optimistic about AI for some random qualitative definition of optimistic? And the follow-up is, what do you think is the credence that by default things go well, without additional intervention by us doing safety research or something like that?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> I would say relative to AI alignment researchers, I’m optimistic. Relative to the general public or something like that, I might be pessimistic. It’s hard to tell. I don’t know, credence that things go well? That’s a hard one. Intuitively, it feels like 80 to 90%, 90%, maybe. 90 feels like I’m being way too confident and like, “What? You only assign 10%, even though you have literally no… you can’t predict the future and no one can predict the future, why are you trying to do it?” It still does feel more like 90%.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> I think that’s fine. I guess the follow-up is sort of like, between the sort of things that you gave, which were like: Slow takeoff allows for correcting things, things that are more powerful will be more interpretable, and I think the third one being, AI systems not actually being… I’m curious how much do you feel like your actual belief in this leans on these arguments? Does that make sense?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Yeah. I think the slow takeoff one is the biggest one. If I believe that at some point we would build an AI system that within the span of a week was just way smarter than any human, and before that the most powerful AI system was below human level, I’m just like, “Shit, we’re doomed.”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> Because there it doesn’t matter if it goes through interpretable features particularly.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> There I’m like, “Okay, once we get to something that’s super intelligent, it feels like the human ant analogy is basically right.” And unless we… Maybe we could still be fine because people thought about it and put in… Maybe I’m still like, “Oh, AI researchers would have been able to predict that this would’ve happened and so were careful.”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I don’t know, in a world where fast takeoff is true, lots of things are weird about the world, and I don’t really understand the world. So I’m like, “Shit, it’s quite likely something goes wrong.” I think the slow takeoff is definitely a crux. Also, we keep calling it slow takeoff and I want to emphasize that it’s not necessarily slow in calendar time. It’s more like gradual.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> Right, like ‘enough time for us to correct things’ takeoff.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Yeah. And there’s no discontinuity between… you’re not like, “Here’s a 2X human AI,” and a couple of seconds later it’s now… Not a couple of seconds later, but like, “Yeah, we’ve got 2X AI,” for a few months and then suddenly someone deploys a 10,000X human AI. If that happened, I would also be pretty worried.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>It’s more like there’s a 2X human AI, then there’s like a 3X human AI and then a 4X human AI. Maybe this happens from the same AI getting better and learning more over time. Maybe it happens from it designing a new AI system that learns faster, but starts out lower and so then overtakes it sort of continuously, stuff like that.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>So that I think, yeah, without… I don’t really know what the alternative to it is, but in the one where it’s not human level, and then 10,000X human in a week and it just sort of happened, that I’m like, I don’t know, 70% of doom or something, maybe more. That feels like I’m… I endorse that credence even less than most just because I feel like I don’t know what that world looks like. Whereas on the other ones I at least have a plausible world in my head.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> Yeah, that makes sense. I think you’ve mentioned, in a slow takeoff scenario that… Some people would disagree that in a world where you notice something was wrong, you wouldn’t just hack around it, and keep going.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> I have a suggestion which it feels like maybe is a difference and I’m very curious for your take on whether that seems right or seems wrong. It seems like people believe there’s going to be some kind of pressure for performance or competitiveness that pushes people to try to make more powerful AI in spite of safety failures. Does that seem untrue to you or like you’re unsure about it?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> It seems somewhat untrue to me. I recently made a comment about this on the Alignment Forum. People make this analogy between AI x-risk and risk of nuclear war, on mutually assured destruction. That particular analogy seems off to me because with nuclear war, you need the threat of being able to hurt the other side whereas with AI x-risk, if the destruction happens, that affects you too. So there’s no mutually assured destruction type dynamic.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>You could imagine a situation where for some reason the US and China are like, “Whoever gets to AGI first just wins the universe.” And I think in that scenario maybe I’m a bit worried, but even then, it seems like extinction is just worse, and as a result, you get significantly less risky behavior? But I don’t think you get to the point where people are just literally racing ahead with no thought to safety for the sake of winning.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I also don’t think that you would… I don’t think that differences in who gets to AGI first are going to lead to you win the universe or not. I think it leads to pretty continuous changes in power balance between the two.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I also don’t think there’s a discrete point at which you can say, “I’ve won the race.” I think it’s just like capabilities keep improving and you can have more capabilities than the other guy, but at no point can you say, “Now I have won the race.” I suppose if you could get a decisive strategic advantage, then you could do it. And that has nothing to do with what your AI capability… If you’ve got a decisive strategic advantage that could happen.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I would be surprised if the first human-level AI allowed you to get anything close to a decisive strategic advantage. Maybe when you’re at 1000X human level AI, perhaps. Maybe not a thousand. I don’t know. Given slow takeoff, I’d be surprised if you could knowably be like, “Oh yes, if I develop this piece of technology faster than my opponent, I will get a decisive strategic advantage.”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> That makes sense. We discussed a lot of cruxes you have. Do you feel like there’s evidence that you already have pre-computed that you think could move you in one direction or another on this? Obviously, if you’ve got evidence that X was true, that would move you, but are there concrete things where you’re like, “I’m interested to see how this will turn out, and that will affect my views on the thing?”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> So I think I mentioned the… On the question of timelines, they are like the… How much did evolution actually bake in to humans? It seems like a question that could put… I don’t know if it could be answered, but maybe you could answer that one. That would affect it… I lean on the side of not really, but it’s possible that the answer is yes, actually quite a lot. If that was true, I just lengthen my timelines basically.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Sara Haxhia:</strong> Can you also explain how this would change your behavior with respect to what research you’re doing, or would it not change that at all?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> That’s a good question. I think I would have to think about that one for longer than two minutes.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>As background on that, a lot of my current research is more trying to get AI researchers to be thinking about what happens when you deploy, when you have AI systems working with humans, as opposed to solving alignment. Mostly because I for a while couldn’t see research that felt useful to me for solving alignment. I think I’m now seeing more things that I can do that seem more relevant and I will probably switch to doing them possibly after graduating because thesis, and needing to graduate, and stuff like that.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Yes, but you were asking evidence that would change my mind-<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> I think it’s also reasonable to be not sure exactly about concrete things. I don’t have a good answer to this question off the top of my head.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> It’s worth at least thinking about for a couple of minutes. I think I could imagine getting more information from either historical case studies of how people have dealt with new technologies, or analyses of how AI researchers currently think about things or deal with stuff, could change my mind about whether I think the AI community would by default handle problems that arise, which feels like an important crux between me and others.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I think currently my sense is if the like… You asked me this, I never answered it. If the AI safety field just sort of vanished, but the work we’ve done so far remained and conscientious AI researchers remained, or people who are already AI researchers and already doing this sort of stuff without being influenced by EA or rationality, then I think we’re still fine because people will notice failures and correct them.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I did answer that question. I said something like 90%. This was a scenario I was saying 90% for. And yeah, that one feels like a thing that I could get evidence on that would change my mind.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I can’t really imagine what would cause me to believe that AI systems will actually do a treacherous turn without ever trying to deceive us before that. But there might be something there. I don’t really know what evidence would move me, any sort of plausible evidence I could see that would move me in that direction.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Slow takeoff versus fast takeoff…. I feel like MIRI still apparently believes in fast takeoff. I don’t have a clear picture of these reasons, I expect those reasons would move me towards fast takeoff.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Oh, on the expected utility max or the… my perception of MIRI, or of Eliezer and also maybe MIRI, is that they have this position that any AI system, any sufficiently powerful AI system, will look to us like an expected utility maximizer, therefore convergent instrumental sub-goals and so on. I don’t buy this. I wrote a <a href="https://www.alignmentforum.org/s/4dHMdK5TLN6xcqtyc/p/NxF5G6CJiof6cemTw">post</a> explaining why I don’t buy this.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Yeah, there’s a lot of just like.. MIRI could say their reasons for believing things and that would probably cause me to update. Actually, I have enough disagreements with MIRI that they may not update me, but it could in theory update me.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Asya Bergal:</strong> Yeah, that’s right. What are some disagreements you have with MIRI?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Well, the ones I just mentioned. There is this great post from maybe not a year ago, but in 2018, called ‘Realism about Rationality’, which is basically this perspective that there is the one true learning algorithm or the one correct way of doing exploration, or just, there is a platonic ideal of intelligence. We could in principle find it, code it up, and then we would have this extremely good AI algorithm.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>Then there is like, to the extent that this was a disagreement back in 2008, Robin Hanson would have been on the other side saying, “No, intelligence is just like a broad… just like conglomerate of a bunch of different heuristics that are all task specific, and you can’t just take one and apply it on the other space. It is just messy and complicated and doesn’t have a nice crisp formalization.”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>And, I fall not exactly on Robin Hanson’s side, but much more on Robin Hanson’s side than the ‘rationality is a real formalizable natural thing in the world’.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Sara Haxhia:</strong> Do you have any idea where the cruxes of disagreement are at all?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> No, that one has proved very difficult to…<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> I think that’s an AI Impacts project, or like a dissertation or something. I feel like there’s just this general domain specificity debate, how general is rationality debate…<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>I think there are these very crucial considerations about the nature of intelligence and how domain specific it is and they were an issue between Robin and Eliezer and no one… It’s hard to know what evidence, what the evidence is in this case.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Yeah. But I basically agree with this and that it feels like a very deep disagreement that I have never had any success in coming to a resolution to, and I read arguments by people who believe this and I’m like, “No.”<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Sara Haxhia:</strong> Have you spoken to people?<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> I have spoken to people at CHAI, I don’t know that they would really be on board this train. Hold on, Daniel probably would be. And that hasn’t helped that much. Yeah. This disagreement feels like one where I would predict that conversations are not going to help very much.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> So, the general question here was disagreements with MIRI, and then there’s… And you’ve mentioned fast takeoff and maybe relatedly, the Yudkowsky-Hanson–<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Realism about Rationality is how I’d phrase it. There’s also the– are AI researchers conscientious? Well, actually I don’t know that they would say they are not conscientious. Maybe they’d say they’re not paying attention or they have motivated reasoning for ignoring the issues… lots of things like that.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> And this issue of do advanced intelligences look enough like EU maximizers…<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Oh, yes. That one too. Yeah, sorry. That’s one of the major ones. Not sure how I forgot that.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Robert Long:</strong> I remember it because I’m writing it all down, so… again, you’ve been talking about very complicated things.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p><strong>Rohin Shah:</strong> Yeah. Related to the Realism about Rationality point is the use of formalism and proof. Nor formalism, but proof at least. I don’t know that MIRI actually believes that what we need to do is write a bunch of proofs about our AI system, but it sure sounds like it, and that seems like a too difficult, and basically impossible task to me, if the proofs that we’re trying to write are about alignment or beneficialness or something like that.<br/></p>
+ </HTML>
+ 
+ 
+ <HTML>
+ <p>They also seem to… No, maybe all the other disagreements can be traced back to these disagreements. I’m not sure.<br/></p>
+ </HTML>
+ 
+