White-box and black-box attacks are two types of adversarial attacks on machine learning models. White-box attacks require comprehensive knowledge of the target model, including its architecture and parameters. Attackers can use this information to craft adversarial examples that specifically target the model's weaknesses. In contrast, black-box attacks do not require detailed knowledge of the target model. Instead, attackers interact with the model by inputting data and observing the output, using this information to refine their attack. Black-box attacks can be further divided into transfer-based attacks, score-based attacks, and decision-based attacks, each with different levels of information access and attack strategies.
Adversarial attacks in machine learning are techniques used by adversaries to exploit vulnerabilities in machine learning models. These attacks involve manipulating input data to cause the model to make incorrect predictions or decisions. Adversarial attacks can be categorized into poisoning, evasion, extraction, and inference attacks. Poisoning attacks involve manipulating the training data set, while evasion attacks alter the input slightly to deceive the model2. Extraction attacks aim to obtain a copy of the AI system, and inference attacks exploit vulnerabilities in the training data set to compromise the model.
Decision-based attacks are considered stealthy because they only rely on the hard label from the target model to create adversarial examples. These attacks do not require detailed knowledge of the target model, making them more realistic and difficult to detect. They aim to deceive the target model while adhering to constraints such as generating adversarial examples with as few queries as possible and keeping the perturbation strength within a predefined threshold. This makes decision-based attacks a significant challenge for attackers, as they need to determine the decision boundary and optimize the perturbation direction without access to the model's output scores.