Published 06 December, 2021
This page is a stub. It does not necessarily represent much of what is known on the topic.
Our understanding is that the degree to which neural networks learn concepts that are potentially understandable to humans is an open question.
A very incomplete list of sources on the topic:
From the paper: ‘…In this work we provide evidence that human knowledge is acquired by the AlphaZero neural network as it trains on the game of chess. By probing for a broad range of human chess concepts we show when and where these concepts are represented in the AlphaZero network….’
From the paper: ‘In contrast to the typical picture of neural networks as a black box, we’ve been surprised how approachable the network is on this scale. Not only do neurons seem understandable (even ones that initially seemed inscrutable), but the “circuits” of connections between them seem to be meaningful algorithms corresponding to facts about the world. You can watch a circle detector be assembled from curves. You can see a dog head be assembled from eyes, snout, fur and tongue. You can observe how a car is composed from wheels and windows. You can even find circuits implementing simple logic: cases where the network implements AND, OR or XOR over high-level visual features.
From the paper: 'Across 84 different DNNs trained on ImageNet and three independent datasets measuring the where and the how of human visual strategies for object recognition on those images, we find a systematic trade-off between DNN categorization accuracy and alignment with human visual strategies for object recognition.'