陈力'blogs

Adversarial Example (一)

Adversarial Examples

Experimental Setup

First, we import some libraries.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#import required libs

import torch
import torch.nn
from torch.autograd.gradcheck import zero_gradients
import torch.nn.functional as F
import torchvision.models as models
from PIL import Image
from torchvision import transforms
import numpy as np
import requests, io
import matplotlib.pyplot as plt
from torch.autograd import Variable
%matplotlib inline

Here, we use pre-trained inception3 model. Specifically, since inceptionv3 uses dropout and batch normalization. Their behaviors at evaluation time differs from the train time. we need to set it to the eval() mode.

1
2
3
inceptionv3 = models.inception_v3(pretrained=True) 
#download and load pretrained inceptionv3 model
inceptionv3.eval();

Next, we load an image and show it. This is a cute cat.

1
2
img = Image.open('cat3.jpg')
plt.imshow(img)

Now, we need to preprocess the input image before passing it through inceptionv3 model. we have to convert it into a torch tensor. We also need to reshape our input image. Inceptionv3 expects height and width of the input to be 299. Finally, we will perform normalization. Note that inceptionv3 model of pytorch uses pre-trained weights and they expect inputs with pixel values in between -1 to 1.

1
2
3
4
5
6
7
8
9
#mean and std will remain same irresptive of the model you use
mean=[0.485, 0.456, 0.406]
std=[0.229, 0.224, 0.225]

preprocess = transforms.Compose([
transforms.Resize((299,299)),
transforms.ToTensor(),
transforms.Normalize(mean, std)
])

Next, we will preprocess our input image using the function created above.

1
2
3
4
5
6
image_tensor = preprocess(img) 
#preprocess an i
image_tensor = image_tensor.unsqueeze(0)
# add batch dimension. C _ H _ W ==> B _ C _ H _ W
img_variable = Variable(image_tensor, requires_grad=True)
#convert tensor into a variable

And we will classify this image using pre-trained inceptionv3 model that we just loaded.

1
2
3
4
output = inceptionv3.forward(img_variable)
label_idx = torch.max(output.data, 1)[1][0]
#get an index(class number) of a largest element
print(label_idx)

We get the output 282.

tensor(282)

The pre-trained model is from imagenet, and imagenet has 1000 classes. we create a dictionary to map index to imagenet class. The index 282 is tiger cat. It’s the true label for our cute cat.

1
2
3
4
5
6
7
import json
f = open('labels.json',)

labels_json = json.load(f)
labels = {int(idx):label for idx, label in labels_json.items()}
x_pred = labels[int(label_idx)]
print(x_pred)
tiger cat

And this is the probability of a cat.

1
2
3
output_probs = F.softmax(output, dim=1)
x_pred_prob = round(float(torch.max(output_probs.data, 1)[0][0]) * 100,4)
print(x_pred_prob)
85.5751

Ok. Next, I introduce the attack algorithm. Fast Gradient Sign Method.

Fast Gradient Sign Method

Fast Gradient Sign Method (FGSM) is a fast and computationally efficient method to generate adversarial examples. However, it usually has a lower success rate.

\(X^{adv} =\) adversarial example (designed to be misclassified by our model)

\(X =\) original (clean) input

gradient of loss function w.r.t to input (X)

[1] Goodfellow I J, Shlens J, Szegedy C. Explaining and harnessing adversarial examples[J]. arXiv preprint arXiv:1412.6572, 2014.

Following code cell computes the adversarial example

1
2
3
4
y_true = 282   
#tiger cat ##change this if you change input image
target = Variable(torch.LongTensor([y_true]), requires_grad=False)
print(target)
tensor([282])
1
2
3
4
5
6
#perform a backward pass in order to get gradients
loss = torch.nn.CrossEntropyLoss()
loss_cal = loss(output, target)
loss_cal.backward(retain_graph=True)
#this will calculate gradient of each variable (with requires_grad=True) and
#can be accessed by "var.grad.data"
1
2
3
4
5
6
7
8
9
10
11
eps = 0.03
x_grad = torch.sign(img_variable.grad.data)
#calculate the sign of gradient of the loss func (with respect to input X) (adv)
x_adversarial = img_variable.data + eps * x_grad
#find adv example using formula shown above
output_adv = inceptionv3.forward(Variable(x_adversarial))
#perform a forward pass on adv example
x_adv_pred = labels[int(torch.max(output_adv.data, 1)[1][0])]
#classify the adv example
op_adv_probs = F.softmax(output_adv, dim=1)
#get probability distribution over classes

We can find that the class of the adversarial example is Persian cat. The result changed.

1
2
3
4
adv_pred_prob =  round(float(torch.max(op_adv_probs.data, 1)[0][0] * 100),4)

print(x_adv_pred)
print(adv_pred_prob)

Persian cat
42.322
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
def visualize(x, x_adv, x_grad, epsilon, clean_pred, adv_pred, clean_prob, adv_prob):

x = x.squeeze(0)
#remove batch dimension # B X C H X W ==> C X H X W
x = x.mul(torch.FloatTensor(std).view(3,1,1)).add(
torch.FloatTensor(mean).view(3,1,1)).numpy()#reverse of normalization op- "unnormalize"
x = np.transpose( x , (1,2,0))
# C X H X W ==> H X W X C
x = np.clip(x, 0, 1)

x_adv = x_adv.squeeze(0)
x_adv = x_adv.mul(torch.FloatTensor(std).view(3,1,1)).add(
torch.FloatTensor(mean).view(3,1,1)).numpy()#reverse of normalization op
x_adv = np.transpose( x_adv , (1,2,0))
# C X H X W ==> H X W X C
x_adv = np.clip(x_adv, 0, 1)

x_grad = x_grad.squeeze(0).numpy()
x_grad = np.transpose(x_grad, (1,2,0))
x_grad = np.clip(x_grad, 0, 1)

figure, ax = plt.subplots(1,3, figsize=(18,8))
ax[0].imshow(x)
ax[0].set_title('Clean Example', fontsize=20)


ax[1].imshow(x_grad)
ax[1].set_title('Perturbation', fontsize=20)
ax[1].set_yticklabels([])
ax[1].set_xticklabels([])
ax[1].set_xticks([])
ax[1].set_yticks([])


ax[2].imshow(x_adv)
ax[2].set_title('Adversarial Example', fontsize=20)

ax[0].axis('off')
ax[2].axis('off')

ax[0].text(1.1,0.5, "+{}*".format(round(epsilon,3)), size=15, ha="center",
transform=ax[0].transAxes)

ax[0].text(0.5,-0.13, "Prediction: {}\n Probability: {}".format(clean_pred, clean_prob),
size=15, ha="center", transform=ax[0].transAxes)

ax[1].text(1.1,0.5, " = ", size=15, ha="center", transform=ax[1].transAxes)

ax[2].text(0.5,-0.13, "Prediction: {}\n Probability: {}".format(adv_pred, adv_prob),
size=15, ha="center", transform=ax[2].transAxes)


plt.show()

Let’s visualize the results. On the left is our input image. When it adds adversarial perturbation, adversarial example becomes other class. It is difficult for us to find the difference between these two images, but it does completely change the model’s prediction. Although they are all cats. In fact, we can even turn the input image into completely unrelated classes, such as car and airplane. And we will still see no difference. Adversarial example is really an interesting question.

1
visualize(image_tensor, x_adversarial, x_grad, eps, x_pred, x_adv_pred, x_pred_prob, adv_pred_prob)

标签:

专题:

发表于2019-06-22 10:22:14,最后修改于2020-07-08 16:40:20。

本站文章欢迎链接分享,禁止全文转载。


« 上一篇 TensorFlow入门(一) - mnist手写数字识别(网络搭建) 下一篇 » Hello World

推荐阅读

Big Image