News Release

CleaNN: unsupervised, embedded defense against neural Trojan attacks

ExampleTrojans: (a)BadNets with a stickynote and TrojanNN with (b)square and (c)watermark triggers

San Diego, Calif., Nov. 25, 2020-- Engineers at UC San Diego have developed a new defense against neural network Trojan attacks on autonomous devices such as cars, drones, or security cameras. Their algorithm and hardware co-designed solution is the first end-to-end framework that enables the online real time mitigation of these Trojan attacks for embedded deep neural network algorithms.

The CleaNN defense is completely unsupervised, meaning it doesn’t require access to Trojan samples or any labeled data sets. It is the first defense to recover the ground-truth labels of Trojan data without performing any model training or fine-tuning.

Researchers also developed a customized hardware stack that is specifically designed to optimize the performance of their defense algorithm so that it can work in real time settings. They trained CleaNN on image analysis tasks, but they believe it would work on other types of data as well. 

Currently, most methods to detect if a machine learning model has been compromised by a Trojan attack are implemented before the model is used on any device, and require large data sets or lots of computational power. Researchers in the Center for Machine-Integrated Computing and Security at UC San Diego took a different approach, aiming to create a defense that would identify and inoculate against Trojans in real time while the device is functioning.

Their CleaNN algorithm continually runs in the background, embedded in the autonomous device while it goes about its set task. The algorithm is constantly checking the state of the neural network model and its inputs, and if it detects a Trojan trying to make the model mispredict, it removes the malicious effect the Trojan was trying to achieve and reverts back to the original task or ground truth.

In tests using state-of-the-art Trojan attacks, CleaNN brought down the attack success rate to 0% for a variety of physical and complex digital Trojans, with a minimal drop in the accuracy of the underlying Deep Neural Network. The team presented CleaNN on Nov. 2 at the International Conference on Computer-Aided Design.  

(a)Example Trojan data with watermark and square triggers, (b) reconstruction error heatmap, and (c) output mask from the outlier detection module.

 The researchers developed CleaNN by using a legacy signal processing algorithm, a field which had taken a back seat for about a decade in favor of neural networks and deep learning architectures. The engineers went back to a statistical analysis and sparse recovery approach, which not only worked, but also requires less computational power, making it possible to run the algorithm directly on the embedded computing device such as a drone or an autonomous car.

“What we have right now is an algorithm that is not based on deep learning or modifying the model in anyway,” said Mojan Javaheripi, an electrical engineering PhD student and co-first author of the paper. “This is something that is just based on sparse recovery of the signals. Because what we observed was when you add a Trojan pattern in the data, there are some nuances in the behavior of the signals all throughout the network, both in the input space and also as you’re propagating the input throughout the network, you can see there is suspicious behavior rising.”

CleaNN consists of two core modules: a Discrete Cosine Transform (DCT) that transforms the input image to the frequency domain, and then performs sparse recovery on the extracted frequency components and reconstructs the original signal using sparse approximation.

The second module, a feature analyzer, investigates patterns in the latent features extracted by the victim neural network to find abnormal structures. A sparse recovery module within this feature analyzer denoises input features for use in the remaining layers of the neural network, allowing CleaNN to recover the ground-truth labels for Trojan samples by removing the Trojan triggers entering the network.

"The significance of the work is in achieving a real time detection for the first time, while maintaining a very good accuracy,” said Farinaz Koushanfar, professor of electrical and computer engineering at UC San Diego and co-director of the Center for Machine-Integrated Computing and Security. “The AI models form the core computing engines in modern time sensitive applications such as autonomous driving, financial applications, and disaster response. Introduction of CleaNN just-in-time defense provides a paradigm shift in robustness of AI-based solutions against the nefarious backdoor attacks.”

While attacks and defenses in machine learning have become a cat and mouse game, the researchers cite CleaNN’s unsupervised defense establishment and its success in detecting state-of-the-art Trojan attacks BadNet and TrojanNN as reason to be optimistic that the embedded algorithm will be successful against future iterations of Trojan attacks.

“Our algorithm doesn’t make any assumptions about the type of Trojan, its size, pattern or anything,” said Javaheripi. “Our defense is constructed without any knowledge of what the attack might look like. It worked well on existing attacks out there, so I’m hopeful it would work against new, future attacks as well.”

Media Contacts

Katherine Connor
Jacobs School of Engineering
858-534-8374
khconnor@ucsd.edu