Research shows ‘more agents’ isn’t a reliable path to better enterprise AI systems

By Ben DicksonDecember 23, 2025VentureBeat

Researchers at Google and MIT have conducted a comprehensive analysis of agentic systems and the dynamics between the number of agents, coordination structure, model capability, and task properties. While the prevailing sentiment in the industry has been "more agents is all you need," the research suggests that scaling agent teams is not a guaranteed path to better performance. Image credit: VentureBeat with ChatGPT Different single- and multi-agent systems (source: arXiv) Based on their findings, the researchers have defined a quantitative model that can predict the performance of an agentic system on an unseen task. Their work reveals that adding more agents and tools acts as a double-edged sword: Although it can unlock performance on specific problems, it often introduces unnecessary overhead and diminishing returns on others. These findings offer a critical roadmap for developers and enterprise decision-makers trying to determine when to deploy complex multi-agent architectures versus simpler, more cost-effective single-agent solutions. The state of agentic systems To understand the study's implications, it is necessary to distinguish between the two primary architectures in use today. Single-agent systems (SAS) feature a solitary reasoning locus. In this setup, all perception, planning, and action occur within a single sequential loop controlled by one LLM instance, even when the system is using tools, self-reflection, or chain-of-thought (CoT) reasoning. Conversely, a multi-agent system (MAS) comprises multiple LLM-backed agents communicating through structured message passing, shared memory, or orchestrated protocols. The enterprise sector has seen a surge in interest regarding MAS , driven by the premise that specialized collaboration can consistently outperform single-agent systems. As tasks grow in complexity and require sustained interaction with environments (e.g., coding assistants or financial analysis bots) developers often assume that splitting the work among "specialist" agents is the superior approach. However, the researchers argue that despite this rapid adoption, there remains no principled quantitative framework to predict when adding agents amplifies performance and when it erodes it. A key contribution of the paper is the distinction between "static" and "agentic" tasks. The researchers applied an "Agentic Benchmark Checklist" to differentiate tasks that require sustained multi-step interactions, iterative information gathering, and adaptive strategy refinement from those that do not. This distinction is vital because strategies that work for static problem-solving (like voting on a coding quiz) often fail when applied to true agentic tasks where "coordination overhead” and “error propagation” can spread across the problem-solving process. Testing the limits of collaboration To isolate the specific effects of system architecture, the researchers designed a rigorous experimental framework. They tested 180 unique configurations involving five distinct architectures, three LLM families (OpenAI, Google, and Anthropic), and four agentic benchmarks. The architectures included a single-agent control group and four multi-agent variants: independent (parallel agents with no communication), centralized (agents reporting to an orchestrator), decentralized (peer-to-peer debate), and hybrid (a mix of hierarchy and peer communication). The study was designed to eliminate "implementation confounds" by standardizing tools, prompt structures, and token budgets. This ensured that if a multi-agent system outperformed a single agent, the gain could be...

Preview: ~500 words

Continue reading at Venturebeat

Read Full Article

Read on Your E-Reader

Research shows ‘more agents’ isn’t a reliable path to better enterprise AI systems

More from VentureBeat