The main goal dominating AI agent evaluation is accuracy. However, focusing solely on accuracy can lead to agents that are unnecessarily complicated and costly to run. A shift to an evaluation paradigm that takes cost into account, where accuracy and cost are optimized together, is proposed for more practical and effective agents.
Focusing solely on accuracy in AI agent development can lead to unnecessarily complex and costly agents. This is because increasing accuracy can be achieved by simply calling the underlying model multiple times, leading to higher inference costs. The researchers argue for a cost-controlled evaluation paradigm that jointly optimizes accuracy and cost, resulting in agents that are cost-effective without sacrificing accuracy.
The Princeton team proposes a new evaluation paradigm for AI agents that considers both accuracy and cost2. This approach aims to optimize both factors simultaneously, resulting in agents with lower costs without sacrificing accuracy. The researchers suggest representing the cost and accuracy of agents as a Pareto frontier, which opens up a new avenue for agent design2.