使用NAD算法提升模型安全性

    本教程介绍MindArmour提供的模型安全防护手段,引导您快速使用MindArmour,为您的AI模型提供一定的安全防护能力。

    AI算法设计之初普遍未考虑相关的安全威胁,使得AI算法的判断结果容易被恶意攻击者影响,导致AI系统判断失准。攻击者在原始样本处加入人类不易察觉的微小扰动,导致深度学习模型误判,称为对抗样本攻击。MindArmour模型安全提供对抗样本生成、对抗样本检测、模型防御、攻防效果评估等功能,为AI模型安全研究和AI应用安全提供重要支撑。

    • 对抗样本生成模块支持安全工程师快速高效地生成对抗样本,用于攻击AI模型。

    • 对抗样本检测、防御模块支持用户检测过滤对抗样本、增强AI模型对于对抗样本的鲁棒性。

    • 评估模块提供多种指标全面评估对抗样本攻防性能。

    这里通过图像分类任务上的对抗性攻防,以攻击算法FGSM和防御算法NAD为例,介绍MindArmour在对抗攻防上的使用方法。

    以MNIST为示范数据集,自定义的简单模型作为被攻击模型。

    利用MindSpore的dataset提供的MnistDataset接口加载MNIST数据集。

    1. # generate dataset for train of test
    2. def generate_mnist_dataset(data_path, batch_size=32, repeat_size=1,
    3. num_parallel_workers=1, sparse=True):
    4. """
    5. create dataset for training or testing
    6. """
    7. # define dataset
    8. ds1 = ds.MnistDataset(data_path)
    9. # define operation parameters
    10. resize_height, resize_width = 32, 32
    11. rescale = 1.0 / 255.0
    12. shift = 0.0
    13. # define map operations
    14. resize_op = CV.Resize((resize_height, resize_width),
    15. interpolation=Inter.LINEAR)
    16. rescale_op = CV.Rescale(rescale, shift)
    17. hwc2chw_op = CV.HWC2CHW()
    18. type_cast_op = C.TypeCast(mstype.int32)
    19. # apply map operations on images
    20. if not sparse:
    21. one_hot_enco = C.OneHot(10)
    22. ds1 = ds1.map(operations=one_hot_enco, input_columns="label",
    23. num_parallel_workers=num_parallel_workers)
    24. type_cast_op = C.TypeCast(mstype.float32)
    25. ds1 = ds1.map(operations=type_cast_op, input_columns="label",
    26. num_parallel_workers=num_parallel_workers)
    27. ds1 = ds1.map(operations=resize_op, input_columns="image",
    28. num_parallel_workers=num_parallel_workers)
    29. ds1 = ds1.map(operations=rescale_op, input_columns="image",
    30. num_parallel_workers=num_parallel_workers)
    31. ds1 = ds1.map(operations=hwc2chw_op, input_columns="image",
    32. # apply DatasetOps
    33. buffer_size = 10000
    34. ds1 = ds1.shuffle(buffer_size=buffer_size)
    35. ds1 = ds1.batch(batch_size, drop_remainder=True)
    36. ds1 = ds1.repeat(repeat_size)
    37. return ds1

    这里以LeNet模型为例,您也可以建立训练自己的模型。

    1. 定义LeNet模型网络。

      ``` def conv(in_channels, out_channels, kernel_size, stride=1, padding=0):

      1. weight = weight_variable()
      2. return nn.Conv2d(in_channels, out_channels,
      3. kernel_size=kernel_size, stride=stride, padding=padding,
      4. weight_init=weight, has_bias=False, pad_mode="valid")
    1. 训练LeNet模型。利用上面定义的数据加载函数generate_mnist_dataset载入数据。

      1. mnist_path = "./MNIST/"
      2. batch_size = 32
      3. # train original model
      4. ds_train = generate_mnist_dataset(os.path.join(mnist_path, "train"),
      5. batch_size=batch_size, repeat_size=1,
      6. sparse=False)
      7. net = LeNet5()
      8. loss = SoftmaxCrossEntropyWithLogits(sparse=False)
      9. opt = nn.Momentum(net.trainable_params(), 0.01, 0.09)
      10. model = Model(net, loss, opt, metrics=None)
      11. model.train(10, ds_train, callbacks=[LossMonitor()],
      12. dataset_sink_mode=False)
      13. # 2. get test data
      14. ds_test = generate_mnist_dataset(os.path.join(mnist_path, "test"),
      15. batch_size=batch_size, repeat_size=1,
      16. sparse=False)
      17. inputs = []
      18. labels = []
      19. for data in ds_test.create_tuple_iterator():
      20. inputs.append(data[0].asnumpy().astype(np.float32))
      21. labels.append(data[1].asnumpy())
      22. test_inputs = np.concatenate(inputs)
      23. test_labels = np.concatenate(labels)
    2. 测试模型。

      1. # prediction accuracy before attack
      2. net.set_train(False)
      3. test_logits = []
      4. batches = test_inputs.shape[0] // batch_size
      5. for i in range(batches):
      6. batch_inputs = test_inputs[i*batch_size : (i + 1)*batch_size]
      7. batch_labels = test_labels[i*batch_size : (i + 1)*batch_size]
      8. logits = net(Tensor(batch_inputs)).asnumpy()
      9. test_logits.append(logits)
      10. accuracy = np.mean(tmp)
      11. LOGGER.info(TAG, 'prediction accuracy before attacking is : %s', accuracy)

    调用MindArmour提供的FGSM接口(FastGradientSignMethod)。

    1. # attacking
    2. # get adv data
    3. attack = FastGradientSignMethod(net, eps=0.3, loss_fn=loss)
    4. adv_data = attack.batch_generate(test_inputs, test_labels)
    5. # get accuracy of adv data on original model
    6. adv_logits = []
    7. for i in range(batches):
    8. batch_inputs = adv_data[i*batch_size : (i + 1)*batch_size]
    9. logits = net(Tensor(batch_inputs)).asnumpy()
    10. adv_logits.append(logits)
    11. adv_logits = np.concatenate(adv_logits)
    12. adv_proba = softmax(adv_logits, axis=1)
    13. tmp = np.argmax(adv_proba, axis=1) == np.argmax(test_labels, axis=1)
    14. accuracy_adv = np.mean(tmp)
    15. LOGGER.info(TAG, 'prediction accuracy after attacking is : %s', accuracy_adv)
    16. attack_evaluate = AttackEvaluate(test_inputs.transpose(0, 2, 3, 1),
    17. test_labels,
    18. adv_data.transpose(0, 2, 3, 1),
    19. adv_proba)
    20. LOGGER.info(TAG, 'mis-classification rate of adversaries is : %s',
    21. attack_evaluate.mis_classification_rate())
    22. LOGGER.info(TAG, 'The average confidence of adversarial class is : %s',
    23. attack_evaluate.avg_conf_adv_class())
    24. LOGGER.info(TAG, 'The average confidence of true class is : %s',
    25. attack_evaluate.avg_conf_true_class())
    26. LOGGER.info(TAG, 'The average distance (l0, l2, linf) between original '
    27. 'samples and adversarial samples are: %s',
    28. attack_evaluate.avg_lp_distance())
    29. LOGGER.info(TAG, 'The average structural similarity between original '
    30. 'samples and adversarial samples are: %s',
    31. attack_evaluate.avg_ssim())

    攻击结果如下:

    1. prediction accuracy after attacking is : 0.052083
    2. mis-classification rate of adversaries is : 0.947917
    3. The average confidence of adversarial class is : 0.803375
    4. The average confidence of true class is : 0.042139
    5. The average distance (l0, l2, linf) between original samples and adversarial samples are: (1.698870, 0.465888, 0.300000)
    6. The average structural similarity between original samples and adversarial samples are: 0.332538

    对模型进行FGSM无目标攻击后,模型精度由98.9%降到5.2%,误分类率高达95%,成功攻击的对抗样本的预测类别的平均置信度(ACAC)为 0.803375,成功攻击的对抗样本的真实类别的平均置信度(ACTC)为 0.042139,同时给出了生成的对抗样本与原始样本的零范数距离、二范数距离和无穷范数距离,平均每个对抗样本与原始样本间的结构相似性为0.332538,平均每生成一张对抗样本所需时间为0.003125s。

    攻击前后效果如下图,左侧为原始样本,右侧为FGSM无目标攻击后生成的对抗样本。从视觉角度而言,右侧图片与左侧图片几乎没有明显变化,但是均成功误导了模型,使模型将其误分类为其他非正确类别。

    NaturalAdversarialDefense(NAD)是一种简单有效的对抗样本防御方法,使用对抗训练的方式,在模型训练的过程中构建对抗样本,并将对抗样本与原始样本混合,一起训练模型。随着训练次数的增加,模型在训练的过程中提升对于对抗样本的鲁棒性。NAD算法使用FGSM作为攻击算法,构建对抗样本。

    调用MindArmour提供的NAD防御接口(NaturalAdversarialDefense)。

    1. accuracy of TEST data on defensed model is : 0.974259
    2. accuracy of adv data on defensed model is : 0.856370
    3. defense mis-classification rate of adversaries is : 0.143629
    4. The average confidence of adversarial class is : 0.616670

    使用NAD进行对抗样本防御后,模型对于对抗样本的误分类率从95%降至14%,模型有效地防御了对抗样本。同时,模型对于原来测试数据集的分类精度达97%。