Combating Adversarial Attack with the Dropout-based Drift Diffusion Model
Introduction
Deep Learning has proven to be effective in a broad range of real-world tasks. Deep artificial neural networks have been applied to a plethora of real-world use cases such as facial recognition, autonomous driving, and audio and text classification. While these systems have proven useful, they have also proven to be unrobust and sensitive to input noise and adversarial perturbations due to the nature of their one-shot inference functionality. In contrast, the brains of humans and animals are far less vulnerable to such noises, as they make decisions based on an accumulation of evidence that tends to trade time for accuracy. Now the question stands can this process be modeled, reworked, and applied to artificial neural networks?
The Danger of Adversarial Attacks
Adversarial attacks are attempts made in the machine learning world to deceive deep learning models by presenting them with inaccurate, misrepresentative, or maliciously designed data during or after a model’s training in order cause it to generate false classifications. Although upon first read this may sound like an inconsequential issue, there are numerous potential real-world dangers than can be unleashed by adversarial attacks. For example, Keen Security Lab proved that placing a few small stickers on the ground can cause a Tesla Model S to move itself into the opposite traffic lane while in autopilot mode. Other proven potential attacks include classifying a benign mole as malignant, or classifying a stop sign as one displaying a speed limit.
Upon quick glance, these attacks may appear inconsequential and simple to solve, thus not posing much of a danger, but each and every day we as a society become more reliant of machine learning, deep learning, and artificial neural networks to facilitate and aid in our daily lives and growth and progression. Is we are to one day achieve fully autonomous vehicles, we must establish the means to defend against attacks against them. The same goes for developing true artificial intelligence — it must be robust and uncorruptible, or else malpractitioners will influence it for their own desires and goals.
Thus the question stands, can deep learning models be made to function more like the brain, and thus reduce the risk of adversarial attack?
The Dropout-based Drift-Diffusion Model (DDDM)
Humans and animals are less vulnerable to noise and perturbations than artificial neural networks. When a human is confused or misled by an image, it is natural for them to spend extra time considering the image in order to better understand it and restore accuracy. The more uncertain the brain is, the longer it takes to make a decision. The Drift-Diffusion Model (DDM) is celebrated for being a successful model of the decision-making process in both humans and animals. It describes the process in which the brain takes a sample and utilizes both external and internal noise to accumulate evidence and make a decision once a predetermined evidence threshold has been met. This ability to trade speed for accuracy is currently one of the most prominent differences in the functionalities of biological and artificial neural networks. A group of Chinese researchers have proposed the Dropout-based Drift-Diffusion Model (DDDM) as an intuitive framework to enhance the flexibility and robustness of arbitrary neural networks. DDDM adds noise to neural networks during the test phase by introducing random dropouts. This results in multiple stochastic copies of the original one. After this, DDDM utilizes an evidence accumulation mechanism similar to that of DDM in order to reach a decision about these outputs.
The two components of the design of the Dropout-based Drift Diffusion Model (DDDM)
(1) Turning the one-shot inference process used in traditional deep neural networks to a dynamic noise inference process.
To simulate the series of temporal signals that are generated during decision making in a biological brain (the internal noise as mentioned previously), noisy predictions from the stochastic copies are generated. Test-phase dropout is utilized to introduce randomness into the system, simulating neuronal noise in biological synaptic transmission, and make it more difficult for adversarial attacks to succeed. Dropout classifiers that have outputs subject to later processing are resultants of this process.
(2) Taking the noisy dropout classifier outputs and applying the Drift- Diffusion Model with a Bayesian multiple sequential probability ratio test (MSPRT) implementation.
In essence, a Bayesian multiple sequential probability ratio test (MSPRT) considers a number of alternative choices and for each specific choice draws the evidence supporting this choice from a Gaussian distribution and chooses the choice that has the highest probability of being correct either within a predetermined time or once it is deemed accurate enough.
A choice can be made either once its probability of being correct reaches a specific accuracy threshold, or the predetermined time threshold is surpassed. In this sense, the noisy dropout classifier outputs are processed in a way that is more similar to a biological brain’s process of decision making than a traditional artificial neural network as it gets rid of the one-shot inference process and instead makes a decision once an accuracy or time threshold is met based on an accumulation of evidence that trades time for accuracy.
Applying the Model in an Experiment
Now that we understand the basic logic of the Dropout-based Drift Diffusion Model (DDDM), let’s see how it has been applied in an experimental setting.
DDDM in Image Classification
(1) With the MNIST Dataset
The experiment measured the effectiveness of DDDM in defending against eight different adversarial attacks — four white-box attacks and four black-box attacks. White-box attacks are attacks that understand everything about the deployed model, so the attack is made with the knowledge of the model architecture, the inputs, the weights and coefficient values, and even the internal gradients of the model. Black-box attacks are attacks where the attack only knows the model’s inputs, but have the ability to retrieve the model’s outputs nonetheless. The eight attacks are attacks known within the machine learning community and are named: the Fast Gradient Sign Method (FGSM) attack, the Projected Gradient Descent (PGD) attack, the L2 Carlini and Wagner (L2 C&W) attack, the L2 DeepFool attack, the Salt and Pepper attack, the L∞ uniform noise attack, the Spatial attack, and the Square attack.
The experiment utilized a network consisting of two convolutional layers with 32 and 64 filters respectively, each followed by 2 × 2 max-pooling, and a fully connected layer of size 1024.
Here are the results of the experiment:
As you can see, test-phase dropouts defended the network against adversarial perturbations in all of the attacks, and the accuracy was retained once all of the evidence was accumulated. One interesting thing to note: the network defended itself better against some attacks vs. other attacks, demonstrating that DDDM is not a one size fits all, constant solution. Despite this, the fact that the network including the Dropout-based Drift Diffusion Model had a better accuracy for all attacks makes a strong argument for its strength.
As previously mentioned, the DDDM is meant to replicate the human and animal decision making process. Thus, it sacrifices time for better accuracy and introduces a tradeoff into the system. Here is the tradeoff from the experiment:
The response time was measured by the number of forward passes in the network. As you can see, baseline accuracy drops dramatically as perturbation size increases, but DDDM and dropout classifier accuracy is relatively maintained. The response time increases monotonically as perturbation size increases, thus demonstrating a tradeoff present between accuracy and response time when it comes to DDDM.
(2) With the CIFAR10 Dataset
The researchers then moved on to the CIFAR10 Dataset. For this dataset, they utilized the VGG16 architecture without batch normalization and a dropout layer added to each of the final six convolutional layers. Results were measured on three of the eight attacks stated above: PGD, L2 DeepFool, and Spatial. Here are the results:
Yet again, DDDM effectively defends against the three attacks with but a minor dip from the clean trial accuracy. One interesting thing to note is that performance gaps have become smaller for the dropout classifier, likely because VGG16 is a much larger model and six dropout layer are not enough to introduce enough randomness in the dropout classifier. Therefore, the conclusion can be made that when models are very large, there needs to be a sufficient amount of dropout layers in order to introduce enough randomness into the system to allow DDM to work to its full potential.
DDDM in Audio Classification
To experiment with audio classification, the SpeechCommands dataset was employed containing 35 keywords and 105829 audio clips pertaining to one of those keywords. Within the machine learning world of audio classification, most works focus on the classification and/or recognition of audio snippets for various applications pertaining to entities like voice input and voice assistants. Audio adversarial attacks endanger the safety, integrity, and robustness of these automatic speech recognition (ASR) systems.
The experiment used a simplified version of a model called the DeepSpeech2 model and an attack that adds human-imperceptible perturbations to the audio clips’ waveforms. The model used only included a Melspectrogram conversion layer, a 1D convolutional layer, two LSTMs, and two fully-connected layers. Dropout was only utilized once and was applied after the 1D convolutional layer. Here are the results:
Again, DDDM held up well against the attack, and the low number of dropout layers likely lowered the accuracy of the dropout classifier as seen in the trial with the CIFAR10 dataset.
DDDM with Text Classification
Lastly, the experiment evaluated the effects of DDDM on text classification. Languages consist of characters and words, and thus the process of mapping tokens to stored values prevents from attacks being directly applied from the image domain on text classifiers. As such, special attacks have been made. The one used here is called the TextBugger attack. The dataset used was the IMDB dataset. Here are the final results
DDDM successfully protected the text classifier, and was far better than the dropout classifier. The robust accuracy of DDDM under the TextBugger attack attributes to its high accuracy that nearly approaches the clean accuracy — even the clean accuracy for the DDDM was higher than the baseline model.
Concluding Insights and Observations
The work and results obtained from the experiment across image, audio, and text classification is working proof of the effectiveness that the idea of combining evidence accumulation that is present within biological species’ decision making with deep artificial neural networks enhances robustness of such networks and provides defense against adversarial attacks. The true reason for the framework’s success is in combining and interplaying two key components of the framework. The two components are a mechanism to introduce noise to the system and accumulating evidence in the DDM. This overall provides the system with degradation so that increasing performance and robustness can be a true challenge as well as a means to solve this problem and grant the system more flexibility.
Although this article may seem rather complex, DDDM is actually quite simple, and perhaps that is where its attractiveness lies. It does not require networks to be trained specially for different types of attacks, but is actually agnostic to what kind of attack is being posed as well as the type of noise. Therefore, it makes systems more robust and protects against all kinds of attacks. Additionally, the framework does not rely on a special kind of network, and instead can be used with any networks that support dropout.
Importance of this Contribution to the Deep Learning Community
Adversarial attacks are innately dangerous. They can cause a plethora of problems for deep learning systems and artificial neural networks. Whether its crashing a car utilizing autopilot, allowing robbers and hackers access to private accounts, or manipulating models to generate or say what a malpractitioner desires it to do, adversarial attacks are dangerous entities that are innately difficult to combat. A framework that is simple, scalable, and applicable to many different avenues of deep learning such as the Dropout-based Drift Diffusion Model (DDDM) is a demonstration of the machine learning community’s progress in combatting attacks. DDDM can be deployed to protect models against all sorts of attacks in many different use cases and across a variety of network architectures.
As well as in protecting artificial neural networks and architectures against attacks, DDDM also demonstrates where the deep learning community stands in fusing artificial networks and processes with biological and natural ones. DDDM is an attempt to step away from the already existing inference functionalities that are unrobust, sensitive, and single-tracked by incorporating the decision making processes we use each and every day as human beings. By continuously working off what we already know and how we operate, we can continue to apply these natural, working, and effective processes to enhance deep learning objectives — just as DDDM has.
Thank you for reading my article!
References:
https://www.ijcai.org/proceedings/2022/0397.pdf✎ EditSign
https://keenlab.tencent.com/en/whitepapers/Experimental_Security_Research_of_Tesla_Autopilot.pdf✎ EditSign