Microsoft AI Reveals Skeleton Key: A New Type of Generative AI Jailbreak Technique
What is generative AI jailbreaking?
Generative AI jailbreaking involves manipulating large language models (LLMs) to bypass safety guidelines and generate potentially harmful or unsafe content. This can include accessing sensitive data, providing instructions for illegal activities, or generating content that goes against the intended use of the AI model.
What is the Skeleton Key technique?
The Skeleton Key technique is a jailbreak method that enables malicious users to bypass ethical guidelines and responsible AI guardrails in AI models, causing them to generate harmful or dangerous content. It employs a multi-step strategy to compel models to ignore their safety guidelines, allowing users to potentially access forbidden or sensitive information.
How does Skeleton Key bypass AI guardrails?
Skeleton Key bypasses AI guardrails by using a multi-step strategy to make the AI model ignore its safety guidelines. It prompts the model to augment its behavior guidelines, causing it to respond to any request for information or content while providing a warning if the output might be considered offensive, harmful, or illegal. This technique enables the AI model to generate harmful or dangerous content that it was initially designed to avoid.