|
|
— |
ai_timelines:brain_performance_in_teps [2022/09/21 07:37] (current) |
| ====== Brain performance in TEPS ====== |
| |
| // Published 06 May, 2015; last updated 10 December, 2020 // |
| |
| <HTML> |
| <p>Traversed Edges Per Second (TEPS) is a benchmark for measuring a computer’s ability to communicate information internally. Given several assumptions, we can also estimate the human brain’s communication performance in terms of TEPS, and use this to meaningfully compare brains to computers. We estimate that (given these assumptions) the human brain performs around 0.18 – 6.4 * 10<sup>14</sup> TEPS. This is within an order of magnitude more than existing supercomputers.</p> |
| </HTML> |
| |
| |
| <HTML> |
| <p>At current prices for TEPS, we estimate that it costs around $4,700 – $170,000/hour to perform at the level of the brain. Our best guess is that ‘human-level’ TEPS performance will cost less than $100/hour in seven to fourteen years, though this is highly uncertain.</p> |
| </HTML> |
| |
| |
| |
| ===== Motivation: why measure the brain in TEPS? ===== |
| |
| |
| ==== Why measure communication? ==== |
| |
| |
| <HTML> |
| <p>Performance benchmarks such as floating point operations per second (FLOPS) and millions of instructions per second (MIPS) mostly measure how fast a computer can perform individual operations. However a computer also needs to move information around between the various components performing operations.<span class="easy-footnote-margin-adjust" id="easy-footnote-1-510"></span><span class="easy-footnote"><a href="#easy-footnote-bottom-1-510" title='&#8220;According to Richard Murphy, a Principal Member of the Technical Staff at Sandia, “The Graph500’s goal is to promote awareness of complex data problems.” He goes on to explain, “Traditional HPC benchmarks – HPL being the preeminent – focus more on compute performance. Current technology trends have led to tremendous imbalance between the computer’s ability to calculate and to move data around, and in some sense produced a less powerful system as a result. Because “big data” problems tend to be more data movement and less computation oriented, the benchmark was created to draw awareness to the problem.”&#8230;And yet another perspective comes from Intel’s John Gustafson, a Director at Intel Labs in Santa Clara, CA, “The answer is simple: Graph 500 stresses the performance bottleneck for modern supercomputers. The Top 500 stresses double precision floating-point, which vendors have made so fast that it has become almost completely irrelevant at predicting performance for the full range of applications. Graph 500 is communication-intensive, which is exactly what we need to improve the most. Make it a benchmark to win, and vendors will work harder at relieving the bottleneck of communication.”&#8221; &#8211; <a href="http://insidehpc.com/2012/03/the-case-for-the-graph-500-really-fast-or-really-productive-pick-one/">Marvyn, The Case for the Graph 500 &#8211; Really Fast or Really Productive? Pick One</a>'><sup>1</sup></a></span> This communication takes time, space and wiring, and so can substantially affect overall performance of a computer, especially on data intensive applications. Consequently when comparing computers it is useful to have performance metrics that emphasize communication as well as ones that emphasize computation. When comparing computers to the brain, there are further reasons to be interested in communication performance, as we shall see below.</p> |
| </HTML> |
| |
| |
| === Communication is a plausible bottleneck for the brain === |
| |
| |
| <HTML> |
| <p>In modern high performance computing, communication between and within processors and memory is often a significant cost.<span class="easy-footnote-margin-adjust" id="easy-footnote-2-510"></span><span class="easy-footnote"><a href="#easy-footnote-bottom-2-510" title='&#8220;Unfortunately, due to a lack of locality, graph applications are often memory-bound on shared-memory systems or communication-bound on clusters.&#8221; &#8211;&nbsp;<a href="http://www.cs.berkeley.edu/~sbeamer/gap/">Beamer et al, Graph Algorithm Platform</a>'><sup>2</sup></a></span> <span class="easy-footnote-margin-adjust" id="easy-footnote-3-510"></span><span class="easy-footnote"><a href="#easy-footnote-bottom-3-510" title='&#8220;While traditional performance benchmarks for high-performance computers measure the speed of arithmetic operations, memory access time is a more useful performance gauge for many large problems today. The Graph 500 benchmark has been developed to measure a computer’s performance in memory retrieval&#8230;Results are explained in detail in terms of the machine architecture, which demonstrates that the Graph 500 benchmark indeed provides a measure of memory access as the chief bottleneck for many applications.&#8221; <a href="http://userpages.umbc.edu/~gobbert/papers/Graph500ParallelComput.pdf">Angel et al (2012), The Graph 500 Benchmark on a Medium-Size Distributed-Memory Cluster with High-Performance Interconnect</a>'><sup>3</sup></a></span> <span class="easy-footnote-margin-adjust" id="easy-footnote-4-510"></span><span class="easy-footnote"><a href="#easy-footnote-bottom-4-510" title='&#8220;The Graph 500 was created to chart how well the world&#8217;s largest computers handle such data intensive workloads&#8230;In a nutshell, the Graph 500 benchmark looks at &#8220;how fast [a system] can trace through random memory addresses,&#8221; Bader said. With data intensive workloads, &#8220;the bottleneck in the machine is often your memory bandwidth rather than your peak floating point processing rate,&#8221; he added.&#8221; <a href="http://www.computerworld.com/article/2493162/high-performance-computing/world-s-most-powerful-big-data-machines-charted-on-graph-500.html">Jackson (2012) World&#8217;s most powerful big data machines charted on Graph 500</a>'><sup>4</sup></a></span> <span class="easy-footnote-margin-adjust" id="easy-footnote-5-510"></span><span class="easy-footnote"><a href="#easy-footnote-bottom-5-510" title='&#8220;Making transistors — the tiny on-off switches of silicon chips — smaller and smaller has enabled the computer revolution and the $1 trillion-plus electronics industry. But if some smart scientist doesn’t figure out how to make copper wires better, progress could grind to a halt. In fact, the copper interconnection between transistors on a chip is now a bigger challenge than making the transistors smaller.&#8221; <a href="http://venturebeat.com/2012/12/11/copper-wires-might-be-the-bottleneck-in-the-way-of-moores-law/">Takahashi (2012) Copper wires might be the bottleneck in the way of Moore’s Law</a>'><sup>5</sup></a></span> Our impression is that in many applications it is more expensive than performing individual bit operations, making operations per second a less relevant measure of computing performance.</p> |
| </HTML> |
| |
| |
| <HTML> |
| <p>We should expect computers to become increasingly bottlenecked on communication as they grow larger, for theoretical reasons. If you scale up a computer, it requires linearly more processors, but superlinearly more connections for those processors to communicate with one another quickly. And empirically, this is what happens: the computers which prompted the creation of the TEPS benchmark were large supercomputers.</p> |
| </HTML> |
| |
| |
| <HTML> |
| <p>It’s hard to estimate the relative importance of computation and communication in the brain. But there are some indications that communication is an important expense for the human brain as well. A substantial part of the brain’s energy is used to transmit action potentials along axons rather than to do non-trivial computation.<span class="easy-footnote-margin-adjust" id="easy-footnote-6-510"></span><span class="easy-footnote"><a href="#easy-footnote-bottom-6-510" title='See <a href="http://www.bcs.rochester.edu/people/plennie/pdfs/Lennie03a.pdf">Lennie (2003)</a>, table 1. Spikes and resting potentials appear to make up around 40% of energy use in the brain. Around 30% of energy in spikes is spent on axons, and we suspect more of the energy on resting potentials is spent on&nbsp;axons. Thus we estimate that at least 10% of energy in the brain is used on communication. We don&#8217;t know a lot about the other components of energy use in this chart, so the fraction&nbsp;could be much higher.'><sup>6</sup></a></span> Our impression is also that the parts of the brain responsible for communication (e.g. axons) comprise a substantial fraction of the brain’s mass. That substantial resources are spent on communication suggests that communication is high value on the margin for the brain. Otherwise, resources would likely have been directed elsewhere during our evolutionary history.</p> |
| </HTML> |
| |
| |
| <HTML> |
| <p>Today, our impression is that networks are typically implemented on single machines because communication between processors is otherwise very expensive. But the power of individual processors is not increasing as rapidly as costs are falling, and even today it would be economical to use thousands of machines if doing so could yield human-level AI. So it seems quite plausible that communication will become a very large bottleneck as neural networks scale further.</p> |
| </HTML> |
| |
| |
| <HTML> |
| <p>In sum, we suspect communication is a bottleneck for the brain for three reasons: the brain is a large computer, similar computing tasks tend to be bottlenecked in this way, and the brain uses substantial resources on communication.</p> |
| </HTML> |
| |
| |
| <HTML> |
| <p>If communication is a bottleneck for the brain, this suggests that it will also be a bottleneck for computers with similar performance to the brain. It does not strongly imply this: a different kind of architecture might be bottlenecked by different factors.</p> |
| </HTML> |
| |
| |
| === Cost-effectiveness of measuring communication costs === |
| |
| |
| <HTML> |
| <p>It is much easier to estimate communication within the brain than to estimate computation. This is because action potentials seem to be responsible for most of the long-distance communication<span class="easy-footnote-margin-adjust" id="easy-footnote-7-510"></span><span class="easy-footnote"><a href="#easy-footnote-bottom-7-510" title='&#8220;To achieve long distance, rapid communication, neurons have evolved special abilities for sending electrical signals (<a class="glossary">action potentials</a>) along axons. This mechanism, called <a class="glossary">conduction</a>, is how the cell body of a neuron communicates with its own terminals via the axon. Communication between neurons is achieved at <a class="glossary">synapses</a> by the process of <a class="glossary">neurotransmission</a>.&#8221; &#8211; <a href="http://www.mind.ilstu.edu/curriculum/neurons_intro/neurons_intro.php">Stufflebeam (2008), Neurons, Synapses, Action Potentials and Neurotransmission</a>'><sup>7</sup></a></span>, and their information content is relatively easy to quantify. It is much less clear how many ‘operations’ are being done in the brain, because we don’t know in detail how the brain represents the computations it is doing.</p> |
| </HTML> |
| |
| |
| <HTML> |
| <p>Another issue that makes computing performance relatively hard to evaluate is the potential for custom hardware. If someone wants to do a lot of similar computations, it is possible to design custom hardware which computes much faster than a generic computer. This could happen with AI, making timing estimates based on generic computers too late. Communication may also be improved by appropriate hardware, but we expect the performance gains to be substantially smaller. We have not investigated this question.</p> |
| </HTML> |
| |
| |
| <HTML> |
| <p>Measuring the brain in terms of communication is especially valuable because it is a relatively independent complement to estimates of the brain’s performance based on computation. <a href="http://www.scientificamerican.com/article/rise-of-the-robots/">Moravec</a>, <a href="http://en.wikipedia.org/wiki/The_Singularity_Is_Near">Kurzweil</a> and <a href="http://www.fhi.ox.ac.uk/brain-emulation-roadmap-report.pdf">Sandberg and Bostrom</a> have all estimated the brain’s computing performance, and used this to deduce AI timelines. We don’t know of estimates of the total communication within the brain, or the cost of programs with similar communication requirements on modern computers. These an important and complementary aspect of the cost of ‘human-level’ computing hardware.</p> |
| </HTML> |
| |
| |
| ==== TEPS ==== |
| |
| |
| <HTML> |
| <p><a href="http://en.wikipedia.org/wiki/Traversed_edges_per_second">Traversed edges per second</a> (TEPS) is a metric that was recently developed to measure communication costs, which were seen as neglected in high performance computing.<span class="easy-footnote-margin-adjust" id="easy-footnote-8-510"></span><span class="easy-footnote"><a href="#easy-footnote-bottom-8-510" title='&#8220;According to Richard Murphy, a Principal Member of the Technical Staff at Sandia, “The Graph500’s goal is to promote awareness of complex data problems.” He goes on to explain, “Traditional HPC benchmarks – HPL being the preeminent – focus more on compute performance. Current technology trends have led to tremendous imbalance between the computer’s ability to calculate and to move data around, and in some sense produced a less powerful system as a result. Because “big data” problems tend to be more data movement and less computation oriented, the benchmark was created to draw awareness to the problem.”- <a href="http://insidehpc.com/2012/03/the-case-for-the-graph-500-really-fast-or-really-productive-pick-one/">Marvyn, The Case for the Graph 500 &#8211; Really Fast or Really Productive? Pick One</a></p> <p>&#8220;The Graph 500 was created to chart how well the world&#8217;s largest computers handle such data intensive workloads&#8230;In a nutshell, the Graph 500 benchmark looks at &#8220;how fast [a system] can trace through random memory addresses,&#8221; Bader said. With data intensive workloads, &#8220;the bottleneck in the machine is often your memory bandwidth rather than your peak floating point processing rate,&#8221; he added.&#8221; <a href="http://www.computerworld.com/article/2493162/high-performance-computing/world-s-most-powerful-big-data-machines-charted-on-graph-500.html">Jackson (2012) World&#8217;s most powerful big data machines charted on Graph 500</a></p> <p>&#8220;While traditional performance benchmarks for high-performance computers measure the speed of arithmetic operations, memory access time is a more useful performance gauge for many large problems today. The Graph 500 benchmark has been developed to measure a computer’s performance in memory retrieval&#8230;Results are explained in detail in terms of the machine architecture, which demonstrates that the Graph 500 benchmark indeed provides a measure of memory access as the chief bottleneck for many applications.&#8221; <a href="http://userpages.umbc.edu/~gobbert/papers/Graph500ParallelComput.pdf">Angel et al (2012), The Graph 500 Benchmark on a Medium-Size Distributed-Memory Cluster with High-Performance Interconnect</a>'><sup>8</sup></a></span> The TEPS benchmark measures the time required to perform a <a href="http://en.wikipedia.org/wiki/Breadth-first_search">breadth-first search</a> on a large random graph, requiring propagating information across every edge of the graph (either by accessing memory locations associated with different nodes, or communicating between different processors associated with different nodes).<span class="easy-footnote-margin-adjust" id="easy-footnote-9-510"></span><span class="easy-footnote"><a href="#easy-footnote-bottom-9-510" title='From <a href="http://www.graph500.org/specifications">Graph 500 specifications page</a>:</p> <p>The benchmark performs the following steps:</p> <ol> <li>Generate the edge list.</li> <li>Construct a graph from the edge list (<strong>timed</strong>, kernel 1).</li> <li>Randomly sample 64 unique search keys with degree at least one, not counting self-loops.</li> <li>For each search key: <ol> <li>Compute the parent array (<strong>timed</strong>, kernel 2).</li> <li>Validate that the parent array is a correct BFS [breadth first search] search tree for the given search tree.</li> </ol> </li> <li>Compute and output performance information.</li> </ol> '><sup>9</sup></a></span> |