Researchers from Google and Meta said AI agents must be treated as untrusted systems to ensure security across digital infrastructures [1].
This shift in perspective is critical because AI agents often operate with credentials that reside alongside untrusted code, making them prime targets for compromise. As these agents gain more autonomy, relying solely on the robustness of the underlying model creates a dangerous security gap that bad actors can exploit [2, 3, 4].
Findings from the research were highlighted at the RSAC 2026 conference [4]. The researchers said that security must be built into the entire system rather than focusing only on the model itself [1, 2, 3]. By treating the agent as an untrusted entity, organizations can implement system-level controls to mitigate the risk of failures and attacks [2, 3].
"Enterprises cannot secure AI agents by making the underlying models more robust and must instead enforce security controls at the system level around them," a lead author of the study said [3].
This approach aligns with the broader industry move toward zero-trust architecture. Vasu Jakkal of Microsoft said that zero trust must extend to AI [4]. The urgency is driven by the scale of deployment, with projections suggesting billions of AI agents will be operating within five years [1].
To prevent unauthorized access and data breaches, the researchers suggest a framework where the agent's environment is isolated, and its permissions are strictly audited. This prevents a single compromised agent from granting an attacker full access to a corporate network [4].
“AI agents must be treated as untrusted systems.”
The transition from 'model-centric' to 'system-centric' security acknowledges that no matter how safe an AI model is, the environment it interacts with can be compromised. By applying zero-trust principles to AI agents, the industry is treating AI not as a trusted tool, but as a potential attack vector that requires constant verification and isolation.





