The new approach allows scientists to better understand the behavior of neural networks.
Neural networks are harder to fool through adversarial training.
Researchers at Los Alamos National Laboratory have developed a new neural network comparison method that examines the “black box” of artificial intelligence to help researchers understand the behavior of neural networks. Neural networks identify patterns in data sets and are used in applications as diverse as virtual assistants, facial recognition systems and autonomous vehicles.
“The AI research community doesn’t necessarily have a full understanding of what neural networks do; they give us good results, but we don’t know how or why,” said Haydn Jones, a researcher with the Advanced Research in Cyber Systems group at Los Alamos. “Our new method does a better job of comparing neural networks, which is a crucial step towards better understanding the math behind AI.”
Los Alamos researchers are investigating new ways to compare neural networks. This image was created with AI software called Stable Diffusion, using the “Peeking into the black box of neural networks” prompt. Credit:
Los Alamos National Laboratory
Jones is lead author on a recent paper presented at the Uncertainty in Artificial Intelligence Conference. The article is an important step in characterizing the behavior of robust neural networks in addition to studying the similarity of the networks.
Neural networks are powerful, but fragile. For example, autonomous vehicles use neural networks to recognize signs. They are quite adept at doing it under perfect circumstances. The neural network, however, can mistakenly detect a sign and never stop if there’s even the slightest abnormality, like a sticker on a stop sign.
Therefore, in order to improve neural networks, researchers are looking for strategies to increase the robustness of the network. A state-of-the-art method is to “attack” networks as they form. The AI is trained to ignore anomalies that researchers deliberately introduce. Essentially, this technique, known as adversarial training, makes it harder for networks to trick.
In a startling discovery, Jones and his Los Alamos collaborators Jacob Springer and Garrett Kenyon, along with Jones’ mentor, Juston Moore, applied their new network similarity metric to adversarially trained neural networks. They found that as attack severity increases, adversarial training causes neural networks in the computer vision domain to converge to very similar data representations, regardless of network architecture.
“We found that when we train neural networks to be robust against adversary attacks, they start doing the same things,” Jones said.
There has been considerable effort in industry and in the academic community to find the “right architecture” for neural networks, but the findings of the Los Alamos team indicate that the introduction of Contradictory training greatly reduces this search space. As a result, the AI research community may not need to spend so much time exploring new architectures, knowing that adversarial training converges various architectures into similar solutions.
“By discovering that robust neural networks are similar to each other, we make it easier to understand how robust AI actually works. We might even uncover clues about how perception occurs in humans and other animals,” Jones said.
Reference: “If You Trained One, You Trained Them All: Cross-Architecture Similarity Increases with Robustness” by Haydn T. Jones, Jacob M. Springer, Garrett T. Kenyon, and Juston S. Moore, February 28, 2022 , conference on uncertainty in artificial intelligence.
#method #exposes #artificial #intelligence #works