Accuracy of AI Predictions
Published 04 June, 2015; last updated 24 November, 2020
It is unclear how informative we should expect expert predictions about AI timelines to be. Individual predictions are undoubtedly often off by many decades, since they disagree with each other. However their aggregate may still be quite informative. The main potential reason we know of to doubt the accuracy of expert predictions is that experts are generally poor predictors in many areas, and AI looks likely to be one of them. However we have not investigated how accurate ‘poor’ is, or whether AI really is such a case.
Predictions of AI timelines are likely to be biased toward optimism by roughly decades, especially if they are voluntary statements rather than surveys, and especially if they are from populations selected for optimism. We expect these factors account for less than a decade and around two decades’ difference in median predictions respectively.
Support
Considerations regarding accuracy
A number of reasons have been suggested for distrusting predictions about AI timelines:
Models of areas where people predict well
Research has produced a characterization of situations where experts predict well and where they do not. See table 1
here. AI appears to fall into several classes that go with worse predictions. However we have not investigated this evidence in depth, or the extent to which these factors purportedly influence prediction quality.
Expert predictions are generally poor
Experts are notoriously poor predictors. However our impression is that this is because of their disappointing inability to predict some things well, rather than across the board failure. For instance, experts can predict the Higgs boson’s existence, outcomes of chemical reactions, and astronomical phenomena. So the question falls back to where AI falls in the spectrum of expert predictability, discussed in the last point.
Disparate predictions
One sign that AI predictions are not very accurate is that they differ over a range of a century or so. This strongly suggests that many individual predictions are inaccurate, though not that the aggregate distribution is uninformative.
Similarity of old and new predictions
Older predictions seem to form a fairly similar distribution to more recent predictions, except for very old predictions. This is weak evidence that new predictions are not strongly affected by evidence, and are therefore more likely to be inaccurate.
Similarity of expert and lay opinions
Armstrong and Sotala found that expert and non-expert predictions look very similar. This finding is in doubt at the time of writing, due to errors in the analysis. If it were true, this would be weak evidence against experts having relevant expertise, since if they did, this might cause a difference with the opinions of lay-people. Note that it may also not, if the laypeople go to experts for information.
Predictions are about different things and often misinterpreted
Comments made around predictions of human-level AI suggest that predictors are sometimes thinking about different events as ‘AI arriving’. Even when they are predictions about the same event, ‘prediction’ can mean different things. One person might ‘predict’ the year when they think human-level AI is more likely than not, while another ‘predicts’ the year that AI seems almost certain.
This list is not necessarily complete.
Purported biases
A number of biases have been posited to affect predictions of human-level AI:
Conclusions
AI appears to exhibit several qualities characteristic of areas that people are not good at predicting. Individual AI predictions appear to be inaccurate by many decades in virtue of their disagreement. Other grounds for particularly distrusting AI predictions seem to offer weak evidence against them, if any. Our current guess is that AI predictions are less reliable than many kinds of prediction, though still potentially fairly informative.
Biases toward early estimates appear to exist, as a result of optimistic people becoming experts, and optimistic predictions being more likely to be published for various reasons. These are the only plausible substantial biases we know of.