Salesforce study finds LLM agents flunk CRM and confidentiality tests
A new benchmark developed by academics shows that LLM-based AI agents perform below par on standard CRM tests and fail to understand the need for customer confidentiality. A team led by Kung-Hsiang Huang, a Salesforce AI researcher, showed that using a new benchmark relying on synthetic data, LLM agents achieve around a 58 percent success rate on tasks that can be completed in a single step without needing follow-up actions or more information. Using the benchmark tool CRMArena-Pro, the team a