# Adversarial Examples

## Experimental Setup

First, we import some libraries.

Here, we use pre-trained inception3 model. Specifically, since inceptionv3 uses dropout and batch normalization. Their behaviors at evaluation time differs from the train time. we need to set it to the eval() mode.

Next, we load an image and show it. This is a cute cat.

Now, we need to preprocess the input image before passing it through inceptionv3 model. we have to convert it into a torch tensor. We also need to reshape our input image. Inceptionv3 expects height and width of the input to be 299. Finally, we will perform normalization. Note that inceptionv3 model of pytorch uses pre-trained weights and they expect inputs with pixel values in between -1 to 1.

Next, we will preprocess our input image using the function created above.

And we will classify this image using pre-trained inceptionv3 model that we just loaded.

We get the output 282.

tensor(282)


The pre-trained model is from imagenet, and imagenet has 1000 classes. we create a dictionary to map index to imagenet class. The index 282 is tiger cat. It’s the true label for our cute cat.

tiger cat


And this is the probability of a cat.

85.5751


Ok. Next, I introduce the attack algorithm. Fast Gradient Sign Method.

# Fast Gradient Sign Method

Fast Gradient Sign Method (FGSM)$^{[1]}$ is a fast and computationally efficient method to generate adversarial examples. However, it usually has a lower success rate.

$$X^{adv} =$$ adversarial example (designed to be misclassified by our model)

$$X =$$ original (clean) input

$\nabla_{X} J(X,Y_{true}) =$ gradient of loss function w.r.t to input (X)

[1] Goodfellow I J, Shlens J, Szegedy C. Explaining and harnessing adversarial examples[J]. arXiv preprint arXiv:1412.6572, 2014.

Following code cell computes the adversarial example

tensor([282])


We can find that the class of the adversarial example is Persian cat. The result changed.

Persian cat
42.322


Let’s visualize the results. On the left is our input image. When it adds adversarial perturbation, adversarial example becomes other class. It is difficult for us to find the difference between these two images, but it does completely change the model’s prediction. Although they are all cats. In fact, we can even turn the input image into completely unrelated classes, such as car and airplane. And we will still see no difference. Adversarial example is really an interesting question.