ai_timelines:hardware_and_ai_timelines:computing_capacity_of_all_gpus_and_tpus

*Published 3 April 2023, last updated 3 April 2023*

A back-of-the-envelope calculation based on market size, price-performance, hardware lifespan estimates, and the sizes of Google’s data centers estimates that there is around 3.98 * 10^21 FLOP/s of computing capacity on GPUs and TPUs as of Q1 2023.

Consulting firms reported their estimates of the world-wide GPU market size for 2017, 2018, 2019, 2020, 2021, and 2022.^{1)}^{2)}^{3)}^{4)}^{5)}^{6)} Unfortunately, the details of these reports are paywalled, so the methodology and scope of these estimates is not completely clear. A sample of GMI’s report on 2022 GPU market size seems to indicate that the estimate is reasonably comprehensive.^{7)}

- My best guess estimate models market size growth as a piecewise linear function, with slow growth before 2017 and fast growth after 2017.
- My lower bound estimate fits the years before 2017 to an exponential curve and assumes that the market size will not change between 2022 and 2023.
- My upper bound estimate assumes that there was no change in market size between 2010 and 2017 and fits 2023 to an exponential curve.

Hobbhahn & Besiroglu (2022) used a dataset of GPUs to find trends in FLOP/s per dollar over time. This provides a simple way to convert the above estimates of total spending on GPUs per year into estimates of total GPU computing capacity produced per year. Hobbhahn & Besiroglu found no statistically significant difference in the trends of FP16 and FP32 precision compute, and I will follow their example by focusing on FP32 precision for the computing capacity of GPUs.^{8)}

- My best guess estimate for FLOP/s per dollar over time is based on the trendline over all of the data in the dataset.
- My lower bound estimate is based on the trendline for machine learning GPUs
- My upper bound estimate is based on the trendline for the most price-performance efficient GPUs

Because GPUs eventually fail, the computing capacity produced in previous years does not all still exist. I could not find much empirical data about the usual lifespan of GPUs, so the estimates in this section are highly speculative.

- Based on anecdotal claims that GPUs usually last 5 years with heavy use or 7 or more years with moderate use, my best guess estimate is that GPU lifespan is normally distributed with a mean of 6 years and a standard deviation of 1.5 years.
^{9)} - Because Ostrouchov et al. (2017) show that GPUs in datacenters usually last around 3 years, my lower bound estimate has a mean of 3 years and a standard deviation of 1.5 years.
^{10)} - For my upper bound estimate, I assumed that all GPUs since 2010 still exist.

To estimate total GPU computing capacity, I summed the products of each years’ estimated market size, price-performance, and surviving GPUs.

- My best guess estimate for total GPU computing capacity is 3.95 * 10^21 FLOP/s (FP32)
- My lower bound estimate for total GPU computing capacity is 1.40 * 10^21 FLOP/s (FP32)
- My upper bound estimate for total GPU computing capacity is 7.71 * 10^21 FLOP/s (FP32)

TPUs are specialized chips designed by Google for machine learning. These chips can be rented through the cloud, and seem to be located in 7 Google data centers.^{11)} There are 3 versions of the TPU available: V2, which has 45 * 10^12 FLOP/s per chip, V3, which has 123 * 10^12 FLOP/s per chip, and V4, which has 275 * 10^12 FLOP/s per chip.^{12)}^{13)} Google does not say exactly how many TPUs it owns, but the company did report that its data center in Oklahoma (the only with V4 chips available) had 9 * 10^18 FLOP/s of computing capacity.^{14)}

- Based on assumptions that all of Google’s data centers that have TPUs have the same number of TPUs and that if a data center has two types of TPUs it has equal numbers of each, my best guess estimate for computing capacity of all TPUs is 2.93 * 10^19 FLOP/s (bf16)
- Because the only computing capacity publicly reported by Google is that of the Oklahoma data center, my lower bound estimate assumes that the 9 * 10^18 FLOP/s (bf16) is the total for all TPUs.
- Because Google claimed that their Oklahoma data center had more computing capacity than any other publicly available compute cluster, my upper bound estimate assumes that all 7 data centers have 9 * 10^18 FLOP/s of compute, for a total of 6.3 * 10^19 FLOP/s (bf16)

Although the above estimates for the computing capacity of all TPUs and for all GPUs are measured in different precision, bf16 and FP32 seem to be similar enough that computing capacities measured in both can reasonably be added together for a rough calculation.^{15)} Adding the estimates together gives the below estimates.

- My best guess estimate for computing capacity of all GPUs and TPUs is 3.98 * 10^21 FLOP/s
- My lower bound estimate is 1.41 * 10^21 FLOP/s
- My upper bound estimate is 7.77 * 10^21 FLOP/s

This calculation is rough, and the estimates could be wrong. In particular, some uncertainty remains about the methods used by the cited consulting firms to estimate GPU market size, and the details of those methods might substantially change the estimates in either direction.

Google’s Pathways Language Model (PaLM) was trained using 2.56 * 10^24 FLOPs over 64 days.^{16)} My best guess estimate would indicate that training PaLM used about 0.01% of the world’s current GPU and TPU computing capacity, which seems plausible.^{17)}

There are likely other, possibly better, ways that one could go about estimating the computing capacity of GPUs and TPUs. Estimations that are calculated in different ways could be useful for reducing uncertainty. Other methods might involve researching the manufacturing capacity of semiconductor fabrication plants. Other methods may also make it feasible to estimate the computing capacity of all microprocessors, including CPUs, rather than just GPUs and TPUs.

Bhutani, Ankita, and Preeti Wadhwani. 2019. “Graphics Processing Unit (GPU) Market Size By Component, Hardware, Device Type, Service, By Deployment Model, By Application, Industry Analysis Report, Regional Outlook, Growth Potential, Competitive Market Share & Forecast, 2018 - 2024.” Global Market Insights. https://web.archive.org/web/20220303120312/https://www.gminsights.com/industry-analysis/gpu-market.

“GPU Market Size, Share, Trends | Forecast 2026.” n.d. Acumen Research and Consulting. Accessed March 31, 2023. https://www.acumenresearchandconsulting.com/gpu-market.

R, Rachita. 2020. “GPU Market Size, Share & Forecast by 2027 : Graphics Processing Unit.” Allied Market Research. https://www.alliedmarketresearch.com/graphic-processing-unit-market.

Alsop, Thomas. 2021. “Graphics processing unit (GPU) market size worldwide in 2020 and 2028.” Statista. https://web.archive.org/web/20220901103010/https://www.statista.com/statistics/1166028/gpu-market-size-worldwide/.

“Graphic Processing Unit (GPU) Market Size, Share, Trends & Forecast.” 2022. Verified Market Research. https://www.verifiedmarketresearch.com/product/graphic-processing-unit-gpu-market/.

Wadhwani, Preeti, and Aayush Jain. 2023. “Graphics Processing Unit (GPU) Market Size, Share Report – 2032.” Global Market Insights. https://www.gminsights.com/industry-analysis/gpu-market.

The report include revenue related to integrated GPUs, including from Apple. Because Apple does not appear to sell their integrated GPUs directly to consumers, this likely means that the report is including GPUs that come installed in Apple devices, rather than only stand-alone GPUs. If this is not the case, and the estimate only includes sales of stand-alone GPUs, then this page might significantly underestimate the total compute of all GPUs and TPUs.

Hobbhahn, Marius. 2022. “Trends in GPU price-performance.” Epoch AI. https://epochai.org/blog/trends-in-gpu-price-performance.

“How Long Do GPUs Last? (Average Lifespan & Effectiveness).” n.d. Cybersided. Accessed March 31, 2023. https://cybersided.com/how-long-do-gpus-last/.

G. Ostrouchov, D. Maxwell, R. A. Ashraf, C. Engelmann, M. Shankar and J. H. Rogers, “GPU Lifetimes on Titan Supercomputer: Survival Analysis and Reliability,” SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, Atlanta, GA, USA, 2020, pp. 1-14, doi: 10.1109/SC41405.2020.00045.

“TPU regions and zones.” n.d. Google Cloud. Accessed March 31, 2023. https://cloud.google.com/tpu/docs/regions-zones.

Kennedy, Patrick. 2017. “Case Study on the Google TPU and GDDR5 from Hot Chips 29.” ServeTheHome. https://www.servethehome.com/case-study-google-tpu-gddr5-hot-chips-29/.

“System Architecture | Cloud TPU.” n.d. Google Cloud. Accessed March 31, 2023. https://cloud.google.com/tpu/docs/system-architecture-tpu-vm.

Lardinois, Frederic. 2022. “Google launches a 9 exaflop cluster of Cloud TPU v4 pods into public preview.” TechCrunch. https://techcrunch.com/2022/05/11/google-launches-a-9-exaflop-cluster-of-cloud-tpu-v4-pods-into-public-preview/.

“In fact, the dynamic range of bfloat16 is identical to that of FP32.”
Wang, Shibo, and Pankaj Kanwar. 2019. “BFloat16: The secret to high performance on Cloud TPUs.” Google Cloud. https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus.

Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H. W., Sutton, C., Gehrmann, S., Schuh, P., Shi, K., Tsvyashchenko, S., Maynez, J., Rao, A., Barnes, P., Tay, Y., Shazeer, N., Prabhakaran, V., … Fiedel, N. (2022, October 5). Palm: Scaling language modeling with pathways. arXiv.org. Retrieved March 31, 2023, from https://arxiv.org/abs/2204.02311

(2.56 * 10^24 / 64 / 24 / 60 / 60) / (3.98 * 10^21) = ~ 1.16 * 10^-4

ai_timelines/hardware_and_ai_timelines/computing_capacity_of_all_gpus_and_tpus.txt · Last modified: 2023/04/17 23:27 by harlanstewart