speed_of_ai_transition:pace_of_ai_progress_without_feedback:historical_continuity_of_progress:methodology_for_discontinuous_progress_investigation

Methodology for discontinuous progress investigation

Published 05 June, 2019; last updated 08 March, 2021

AI Impacts’ discontinuous progress investigation was conducted according to methodology outlined on this page.

Details

Contributions to the discontinuous progress investigation were made over at least 2015-2019, by a number of different people, and methods have varied somewhat. In 2019 we attempted to make methods across the full collection of case studies more consistent. The following is a description of methodology as of December 2019.

Overview

To learn about the prevalence and nature of discontinuities in technological progress1, we:

  1. Searched for potential examples of discontinuous progress (e.g. ‘Eli Whitney’s cotton gin’) We collected around ninety suggestions of technological change which might have been discontinuous, from various people. Some of these pointed to particular technologies, and others to trends.2 We added further examples as they arose when we were already working on later steps. A list of suggested examples not ultimately included is available here.
  2. Chose specific metrics related to some of these potential examples (e.g. ‘cotton ginned per person per day’, ‘value of cotton ginned per cost’) and found historic data on progress on those metrics (usually in conjunction). We took cases one by one and searched for data on relevant metrics in their vicinity. For instance, if we were told that fishing hooks became radically stronger in 1997, we might look for data on the strength of fishing hooks over time, but also for the cost of fishing hooks per year, or how many fish could be caught by a single fishing hook, because these are measures of natural interest that we might expect to be affected by a change in fishing hook strength. Often we ended up collecting data on several related trends. This was generally fairly dependent on what data could be found. Many suggestions are not included in the investigation so far because we have not found relevant data. Though sometimes we proceeded with quite minimal data, if it was possible to at least assess a single development’s likelihood of having been discontinuous.
  3. Defined a ‘rate of past progress’ throughout each historic dataset At each datapoint in a trend after the first one, we defined a ‘previous rate of progress’. This was generally either linear or exponential, and was the average rate of progress between the previous datapoint and some earlier datapoint, though not necessarily the first. For instance, if a trend was basically flat from 1900 until 1967, then became steep, then in defining the previous rate of progress for the 1992 datapoint, we may decide to call this linear progress since 1967, rather than say exponential progress since 1900.
  4. Measured the discontinuity at each datapoint We did this by comparing the progress at the point to the expected progress at that date based on the last datapoint and the rate of past progress. For instance, if the last datapoint five years ago was 600 units, and progress had been going at two units per year, and now a development took it to 800 units, we would calculate 800 units – 600 units = 200 units of progress = 100 years of progress in 5 years, so a 95 year discontinuity.
  5. Noted any discontinuities of more than ten years (‘moderate discontinuities’), and more than one hundred years (‘large discontinuities’)
  6. Noted anything interesting about the circumstances of each discontinuity (e.g. the type of metric it was in, the events that appeared to lead to the discontinuity, the patterns of progress around it.)

Choosing areas

We collected around ninety suggestions of technological change which might have been discontinuous. Many of these were offered to us in response to a Facebook question, a Quora question, personal communications, and a bounty posted on this website. We obtained some by searching for abrupt graphs in google images, and noting their subject matter. We found further contenders in the process of investigating others. Some of these are particular technologies, and others are trends.3

We still have around fifty suggestions for trends that may have been discontinuities that we have not looked into, or have not finished looking into

Choosing metrics

For any area of technological activity, there are many specific metrics one could measure progress on. For instance consider ginning cotton (that is, taking the seeds out of it so that the fibers may be used for fabric). The development of new cotton gins might be expected to produce progress in all of the following metrics:

  • Cotton ginnable per minute under perfect laboratory conditions
  • Cotton ginned per day by users
  • Cotton ginned per worker per day by users
  • Quality-adjusted cotton ginned per quality-adjusted worker per day
  • Cost to produce $1 of ginned cotton
  • Number of worker injuries stemming from cotton ginning
  • Prevalence of cotton gins
  • Value of cotton

(These are still not entirely specific—in order to actually measure one, you would need to also for instance specify how the information would reach you. For instance, “cotton ginned per day by users, as claimed in a source findable by us within one day of searching online”.)

We choose both general areas to investigate, and particular metrics according to:

  • Apparent likelihood of containing discontinuous progress (e.g. because it was suggested to us in a bounty submission4, suggestions from readers and friends, and our own understanding.[/note], or by readers.)
  • Ease of collecting clear data (e.g. because someone pointed us to a dataset, or because we could find one easily). We often began investigating a metric and then set it aside to potentially finish later, or gave up.
  • Not seeming trivially likely to contain discontinuities for uninteresting reasons. For instance we expect the following to have a high number of discontinuities, which do not seem profitable to individually investigate:
    • obscure metrics constructed to contain a discontinuity (e.g. Average weekly rate of seltzer delivery to Katja’s street from a particular grocery store over the period during which Katja’s household discovered that that grocery store had the cheapest seltzer)
    • metrics very far from anyone’s concern (e.g. number of live fish in Times Square)
    • metrics that are very close to metrics we already know contain discontinuities (e.g. if explosive power per gram of material sees a large discontinuity, then probably explosive power per gram of material divided by people needed to detonate bomb would also see a large discontinuity.)

Our goal with the project was to understand roughly how easy it is to find large discontinuities, and to learn about the situations in which they tend to arise, rather than to clearly assess the frequency of discontinuities within a well-specified reference class of metrics (which would have been hard, for instance because good data is rarely available). Thus we did not follow a formal procedure for selecting case studies. One important feature of the set of case studies and metrics we have is that they are likely to be heavily skewed in favor of having more large discontinuities, since we were explicitly trying to select discontinuous technologies and metrics.

Data collection

Most data was either from a particular dataset that we found in one place, or was gathered by AI Impacts researchers.

When we gathered data ourselves, we generally searched for sources online until we felt that we had found most of what was readily available, or had at least investigated thoroughly the periods relevant to whether there were discontinuities. For instance, it is important to know about the trend just prior to an apparent discontinuity, than it is to know about the trend between two known records, where it is clear that little total progress has taken place.

In general, we report the maximal figures that we are confident of. i.e. we report the best known thing at each date, not the best possible thing at that date. So if in 1909 a thing was 10-12, we report 10, though we may note if we think 12 is likely and it makes a difference to the point just after. If all we know is that progress was made between 2010 and 2015, we report it in 2015.

Discontinuity calculation

We measure discontinuities in terms of how many years it would have taken to see the same amount of progress, if the previous trend had continued.

To do this, we:

  • Decide which points will be considered as potential discontinuities
  • Decide what we think the previous trend was for each of those points
    • Determine the shape of the previous curve
    • Estimate the growth rate of that curve
  • Calculate how many years the previous trend would need to have continued to see as much progress as the new point represents
  • Report as ‘discontinuities’ all points that represented more than ten years of progress at previous rates

Requirements for measuring discontinuities

Sometimes we exclude points from being considered as potential discontinuities, though include them to help establish the trend. This is usually because:

  • We have fewer than two earlier points, so no prior trend to compare them to
  • We expect that we are missing prior data, so even if they were to look discontinuous, this would be uninformative.
  • The value of the metric at the point is too ambiguous

Sometimes when we lack information we still reason about whether a point is a discontinuity. For instance, we think the Great Eastern very likely represents a discontinuity, even though we don’t have an extensive trend for ship size, because we know that a recent Royal Navy ship was the largest ship in the world, and we know the trend for Royal Navy ship size, which the trend for overall ship size cannot ever go below. So we can reason that the recent trend for ship size cannot be any steeper than that of Royal Navy ship size, and we know that at that rate, the Great Eastern represented a discontinuity.

Calculating previous rates of progress

Time period selection and trend fitting

As history progresses, a best guess about what the trend so far is can change. The best guess trend might change apparent shape (e.g. go from seeming linear to seeming exponential) or change apparent slope (e.g. what seemed like a steeper slope looks after a few slow years like noise in a flatter slope) or change its apparent relevant period (e.g. after multiple years of surprisingly fast progress, you may decide to treat this as a new faster growth mode, and expect future progress accordingly).

We generally reassess the best guess trend so far for each datapoint, though this usually only changes occasionally within a dataset.

We have based this on researcher judgments of fit, which have generally had the following characteristics:

  • Trends are expected to be linear or exponential unless they are very clearly something else. We don’t tend search for better fitting curves.
  • If curves are not upward curving, we tend to treat them as linear
  • In ambiguous cases, we lean toward treating curves as exponential
  • When there appears to be a new a newer faster growth mode, we generally recognize this and start a new trend at the third point (i.e. if there has been one discontinuity we don’t immediately treat the best guess for future progress as much faster, but after two in a row, we do).

We color the growth rate column in the spreadsheets according to periods where the growth rate is calculated as having the same overall shape and same starting year (though within those periods, the calculated growth rate changes as new data points are added to the trend).

Trend calculation

We calculate the rate of past progress as the average progress between the first and last datapoints in a subset of data, rather than taking a line of best fit. (This being a reasonable proxy for expected annual progress is established via trend selection described in the last section.)

Discontinuity measurement

For each point, we calculate how much progress it represents since the last point, and how many years of progress that is according to the past trend, then subtract the number of years that actually passed, for the discontinuity size.

This means that if no progress is seen for a hundred years, and then all of the progress expected in that time occurs at once, this does not count as a discontinuity.

Reporting discontinuities

We report discontinuities as ‘substantial’ if they are at least ten years of progress at once, and ‘large’ if they are at least one hundred years of progress at once.

‘Robust’ discontinuities

Many developments classified as discontinuities by the above methods are ahead of a best guess trend, but unsurprising because the data should have left much uncertainty about the best trend. For instance, if the data does not fit a consistent curve well, or is very sparse, one should be less surprised if a new point fails to line up with any particular line through it.

In this project we are more interested in clear departures from established trends than in noisy or difficult to extrapolate trends, so a researcher judged each discontinuity as a clear divergence from an established trend or not. We call discontinuities judged to clearly involve a departure from an established trend ‘robust discontinuities’.

See the project’s main page for authorship and acknowledgements.

  1. See the project page for more on the motivations and goals of the project.
  2. For instance, a person might suggest to us that ‘building heights’ has been discontinuous, or that the Burj Khalifa was some sort of discontinuity.
  3. For instance, a person might suggest to us that ‘building heights’ has been discontinuous, or that the Burj Khalifa was some sort of discontinuity.
  4. We initially posted a bounty on discontinuous technologies. We have not investigated all of the submissions yet, and are yet to pay out bounties on suggestions we recently found to be discontinuous.
speed_of_ai_transition/pace_of_ai_progress_without_feedback/historical_continuity_of_progress/methodology_for_discontinuous_progress_investigation.txt · Last modified: 2022/09/21 07:37 (external edit)