Do neural networks learn human concepts?

Published 06 December, 2021

This page is a stub. It does not necessarily represent much of what is known on the topic.

Our understanding is that the degree to which neural networks learn concepts that are potentially understandable to humans is an open question.

Details

A very incomplete list of sources on the topic:

Acquisition of Chess Knowledge in AlphaZero (McGrath et al, 2021)¹⁾

From the paper: ‘…In this work we provide evidence that human knowledge is acquired by the AlphaZero neural network as it trains on the game of chess. By probing for a broad range of human chess concepts we show when and where these concepts are represented in the AlphaZero network….’

Zoom in: An Introduction to Circuits (Olah et al, 2020)²⁾

From the paper: ‘In contrast to the typical picture of neural networks as a black box, we’ve been surprised how approachable the network is on this scale. Not only do neurons seem understandable (even ones that initially seemed inscrutable), but the “circuits” of connections between them seem to be meaningful algorithms corresponding to facts about the world. You can watch a circle detector be assembled from curves. You can see a dog head be assembled from eyes, snout, fur and tongue. You can observe how a car is composed from wheels and windows. You can even find circuits implementing simple logic: cases where the network implements AND, OR or XOR over high-level visual features.

Harmonizing the object recognition strategies of deep neural networks with humans (Fel et al, 2022)³⁾

From the paper: 'Across 84 different DNNs trained on ImageNet and three independent datasets measuring the where and the how of human visual strategies for object recognition on those images, we find a systematic trade-off between DNN categorization accuracy and alignment with human visual strategies for object recognition.'

¹⁾

McGrath, Thomas, Andrei Kapishnikov, Nenad Tomašev, Adam Pearce, Demis Hassabis, Been Kim, Ulrich Paquet, and Vladimir Kramnik. “Acquisition of Chess Knowledge in AlphaZero. November 27, 2021. http://arxiv.org/abs/2111.09259

²⁾

Chris Olah, Nick Cammarata, Ludwig Schubert, Gabriel Goh, Michael Petrov, and Shan Carter. “Zoom In: An Introduction to Circuits.” Distill 5, no. 3. March 10, 2020. https://doi.org/10.23915/distill.00024.001

³⁾

Thomas Fel, Ivan Felipe, Drew Linsley, Thomas Serre. “Harmonizing the object recognition strategies of deep neural networks with humans.” Nov 8, 2022. https://arxiv.org/abs/2211.04533

AI Impacts Wiki

User Tools

Site Tools

Do neural networks learn human concepts?

Details

Page Tools