Task decomposition in AI security is a tactic used by adversaries to divide a complex malicious activity into smaller tasks. Distinct models are given subtasks, where competent frontier models handle benign but difficult subtasks, and weaker models with laxer safety precautions handle malicious but easy subtasks. This approach can increase the success rate of producing damaging effects compared to using individual models alone.
Automatic task decomposition is a process that segments complex tasks into smaller, manageable sub-tasks. It involves using a strong model to solve related benign tasks, while a weaker model suggests subtasks and uses their solutions to carry out the initial task. This approach helps in handling tasks that are too complicated for manual decomposition and improves the efficiency of AI models in completing tasks.
Model strength correlates with misuse potential because stronger models are more likely to be used for malicious purposes. In a multi-model scenario, adversaries can use a combination of strong and weak models to achieve their objectives. Stronger models can be used to solve difficult benign tasks, while weaker models with laxer safety precautions can be used for malicious but easy subtasks. As AI models improve, the potential for misuse in multi-model settings increases.