Hijacking AI Finances: A New Threat to Automated Payments

Author: Denis Avetisyan

Researchers have discovered vulnerabilities in Google’s Agent Payments Protocol that could allow malicious actors to siphon funds through cleverly crafted prompts.

The Agent Payment Protocol (AP2) [2] establishes a standardized framework for secure and reliable financial transactions between agents.

Prompt injection attacks expose Google’s Agent Payments Protocol (AP2) to security risks in agent-to-agent communication and mandate execution.

While large language model (LLM) agents offer promising automation of financial transactions, their susceptibility to manipulation remains a critical concern. This work, ‘Whispers of Wealth: Red-Teaming Google’s Agent Payments Protocol via Prompt Injection’, conducts a comprehensive security evaluation of Google’s Agent Payments Protocol (AP2), revealing vulnerabilities to both direct and indirect prompt injection attacks. We demonstrate that simple adversarial prompts can reliably compromise agent behavior, enabling techniques like the Branded Whisper and Vault Whisper attacks to manipulate product ranking and extract sensitive user data. These findings underscore the need for robust isolation and defensive safeguards in LLM-mediated financial systems-can current architectures truly withstand increasingly sophisticated adversarial prompting?

The Rise of Autonomous Agents and the Evolving Security Landscape

The emergence of large language model (LLM)-based agents is rapidly transforming the landscape of autonomous systems, particularly within e-commerce. These agents, powered by sophisticated AI, can now independently perform complex tasks previously requiring human intervention – from product research and price comparison to order placement and customer service interactions. This newfound autonomy extends beyond simple automation; agents can dynamically adapt to changing market conditions, personalize shopping experiences, and even negotiate prices on behalf of users. Consequently, businesses are increasingly leveraging LLM agents to streamline operations, enhance customer engagement, and unlock new revenue streams, creating a powerful shift towards intelligent and self-operating commercial ecosystems.

The increasing autonomy of large language model-based agents introduces a compelling new frontier for cybersecurity threats. Unlike traditional software, these agents are capable of independently initiating financial transactions – booking travel, authorizing payments, or even executing trades – making them exceptionally valuable targets for malicious actors. Exploiting an agent doesn’t simply involve data theft; it allows direct financial gain, dramatically raising the stakes. This capability bypasses many existing security measures designed to protect static data or user interfaces, as the agent itself is the initiator of the transaction. Consequently, securing these agents requires a shift in focus, moving beyond perimeter defenses to concentrate on robust authorization protocols and continuous monitoring of agent actions to detect and prevent unauthorized financial activity.

Existing security models, designed for human-initiated actions, prove inadequate when confronting the autonomy of large language model-based agents. These agents, capable of independent financial transactions and complex operations, circumvent traditional safeguards predicated on direct user control. A fundamental shift is therefore necessary, moving beyond simple authentication to robust authorization frameworks that granularly define agent permissions and scope. Crucially, this must be coupled with stringent accountability mechanisms – establishing clear lines of responsibility for agent actions, even those taken without explicit human direction. The challenge isn’t merely preventing unauthorized access, but ensuring that even authorized actions by an agent can be traced and validated, demanding novel approaches to logging, auditing, and potentially, even legal frameworks surrounding autonomous entities.

The Agent Payments Protocol (AP2) directly addresses key security concerns related to automated payments and agent interactions.

AP2: A Protocol for Securing Agent Transactions

The Agent Payments Protocol (AP2) establishes a secure payment framework where agents facilitate transactions on behalf of users. Central to this framework is the use of Signed Mandates, which are cryptographically signed representations of a user’s payment intent. These mandates serve as verifiable proof of authorization, detailing the specific transaction parameters – amount, recipient, and applicable conditions – and are digitally signed using the user’s private key. This cryptographic signature ensures both the authenticity of the mandate – confirming it originates from the legitimate user – and its integrity, preventing unauthorized modification of the transaction details. Agents then utilize these Signed Mandates throughout the transaction process, providing a secure and auditable record of user authorization, thereby mitigating the risk of fraudulent or erroneous payments.

The AP2 protocol functions through the coordinated interaction of four distinct agent types. The Shopping Agent initiates transactions on behalf of the user, while the Merchant Agent represents the seller and manages order details. Secure authentication and authorization are provided by the Credentials Provider Agent, which verifies user identity and payment methods. Finally, the Merchant Payment Processor Agent handles the actual financial transaction, ensuring funds are transferred securely and accurately. This division of labor allows for a modular and scalable system, where each agent focuses on a specific task within the overall payment process.

The Agent Payments Protocol (AP2) is designed with a core emphasis on security attributes critical for reliable agent-based commerce. Authenticity is ensured through cryptographic signatures verifying the origin of each transaction component, preventing spoofing or tampering. Accountability is established by maintaining a verifiable audit trail linking actions to specific agents, facilitating dispute resolution and fraud detection. Finally, Authorization is strictly enforced via Signed Mandates, which provide cryptographically-proven consent for transactions, limiting unauthorized payments and ensuring user control. These combined properties create a system where transactions are demonstrably valid, traceable, and executed with proper consent, building trust within the agent network.

Interoperability within the AP2 protocol is achieved through strict adherence to established standards, most notably Agent2Agent (A2A). This standardized communication framework defines consistent message formats and exchange protocols, enabling diverse agents – regardless of their underlying implementation or platform – to reliably interact and transact. By leveraging A2A, AP2 avoids vendor lock-in and fosters a more open ecosystem where agents from different providers can seamlessly integrate, increasing the potential for broader adoption and network effects. The A2A specification details requirements for agent discovery, secure communication channels, and data serialization, ensuring consistent behavior across implementations and facilitating automated negotiation of transaction parameters.

The AP2 workflow outlines a process for automated planning, incorporating perception, prediction, and control to navigate dynamic environments.

Prompt Injection Attacks: A Persistent Threat to Agent Integrity

Prompt injection attacks pose a substantial security risk to agents built on Large Language Models (LLMs) by exploiting the LLM’s reliance on natural language input. These attacks occur when malicious actors craft specific prompts designed to override the agent’s intended instructions and redirect its behavior. Unlike traditional injection attacks targeting code execution, prompt injection manipulates the LLM’s reasoning process itself, allowing attackers to control agent outputs, access sensitive information, or perform unintended actions. The core vulnerability lies in the LLM’s inability to consistently differentiate between legitimate user input and malicious commands embedded within that input, making agents susceptible to manipulation without requiring code-level access or exploitation of software vulnerabilities.

Branded Whisper Attacks represent a method of manipulating LLM-based agents by embedding malicious prompts within seemingly benign data, specifically product descriptions in our testing. This technique leverages the agent’s reliance on external data sources for decision-making, effectively injecting instructions that override intended functionality. Experiments conducted demonstrate a 100% success rate in manipulating the agent’s product prioritization based solely on these embedded prompts, indicating a critical vulnerability in current agent architectures and a significant risk to data integrity and operational security. The attack’s effectiveness stems from the agent’s inability to differentiate between legitimate product data and maliciously crafted instructions within that data.

Vault Whisper Attacks represent a targeted threat to agents responsible for managing sensitive user credentials, specifically the Credentials Provider Agent. These attacks involve embedding malicious prompts designed to extract and expose confidential information handled by the agent. Experimental results demonstrate successful data leakage via this attack vector, indicating a vulnerability in current agent architectures to targeted prompt manipulation. The compromised data included user credentials accessible to the agent, confirming the potential for significant security breaches through this specific attack method.

Mitigation of prompt injection attacks requires implementation of both robust input validation and anomaly detection systems. Input validation should focus on sanitizing user-provided data, specifically filtering or rejecting prompts containing potentially malicious keywords, code injections, or attempts to redefine the agent’s core instructions. Complementary to this, anomaly detection mechanisms should monitor agent behavior for deviations from expected patterns, such as unexpected API calls, unusual data access requests, or alterations in the agent’s established workflow. These systems should be designed to flag suspicious activity and trigger alerts, allowing for manual review or automated intervention to prevent successful exploitation of prompt injection vulnerabilities. A layered approach, combining proactive input validation with reactive anomaly detection, provides a more comprehensive defense against the evolving landscape of prompt injection attacks.

The Vault Whisper attack demonstrates a vulnerability where sensitive data can be exfiltrated from a secure enclave by exploiting acoustic side channels.

Building Resilient Agent Systems: Charting a Course for the Future

The efficacy of protocols like AP2, designed to enhance the security of Large Language Model-based agents, isn’t a matter of initial implementation, but rather sustained vigilance. These systems demand continuous monitoring to detect evolving threats and vulnerabilities, alongside regular assessment to identify weaknesses before they can be exploited. Robust security measures – encompassing everything from data encryption and access controls to intrusion detection systems – form the foundation of this ongoing protection. Without this constant cycle of observation, evaluation, and reinforcement, even the most sophisticated protocols become susceptible to compromise, highlighting the need for a dynamic security posture that adapts to the ever-changing landscape of potential attacks against agent systems.

Addressing the evolving threat of prompt injection requires significant advancements in several key areas of agent security. Current research indicates a need for more sophisticated anomaly detection systems capable of identifying malicious inputs disguised as legitimate prompts. Simultaneously, robust input validation techniques are crucial, moving beyond simple filtering to encompass semantic analysis and contextual understanding. Crucially, prompt engineering itself must evolve; designing prompts with built-in safeguards and constraints can limit an agent’s susceptibility to manipulation. These efforts aren’t isolated; an integrated approach combining these techniques-alongside continual monitoring and adversarial testing-is vital for building truly resilient agent systems capable of discerning harmful requests from benign ones and maintaining predictable, secure behavior.

The widespread integration of LLM-based agents hinges not simply on technological advancement, but on establishing a foundation of predictable security. Currently, a lack of universally accepted security standards creates uncertainty for developers and end-users alike, hindering broader adoption. Developing standardized frameworks-encompassing guidelines for secure coding practices, vulnerability disclosure, and independent auditing-will be vital for building confidence in these systems. Such frameworks should address specific threats unique to agent architectures, like prompt injection and data exfiltration, while also incorporating existing cybersecurity principles. A cohesive, industry-wide approach to security will not only minimize risks but also streamline development, reduce costs, and ultimately accelerate the responsible deployment of powerful agent technologies across various sectors.

The sustained growth and reliable functionality of Large Language Model-Based Agents hinge on moving beyond reactive security measures to embrace a consistently proactive and adaptive posture. This necessitates anticipating potential vulnerabilities – including prompt injection, data breaches, and malicious exploitation – before they manifest, rather than simply responding to incidents after they occur. Such an approach requires continuous learning, real-time threat assessment, and the implementation of dynamic defenses that evolve alongside increasingly sophisticated attack vectors. Successfully safeguarding user data and financial assets within these systems demands not only robust encryption and access controls, but also a fundamental shift towards security-by-design principles, embedding resilience at every layer of the agent’s architecture and operational lifecycle. Ultimately, a forward-thinking security strategy is not merely a safeguard, but a key enabler for unlocking the transformative potential of LLM-Based Agents across diverse applications.

The A2A protocol facilitates communication and data exchange between two agents.

The study highlights a crucial tension within complex systems-the pursuit of functionality can inadvertently introduce vulnerabilities. This mirrors a sentiment expressed by Robert Tarjan: “Complexity is vanity. Clarity is mercy.” The Agent Payments Protocol (AP2), while intending to streamline agent-to-agent transactions, reveals a susceptibility to prompt injection. This isn’t a failure of intent, but a consequence of intricacy. Mitigation strategies, as the paper suggests, aren’t about adding layers of defense, but about distilling the protocol to its essential, secure components – a ruthless simplification that removes opportunities for exploitation. The work champions a focus on foundational security principles, prioritizing clarity over convoluted features.

What Remains?

The exercise, predictably, reveals not a failure of intention, but a limitation of scope. Agent Payments Protocol, in its current iteration, addresses the mechanics of value transfer, but neglects the inherent fragility of language itself. The demonstrated susceptibility to prompt injection is not a bug to be patched, but a symptom of a deeper truth: any system reliant on natural language processing will, ultimately, be a reflection of its ambiguities. The protocol’s cryptographic signatures are, in this light, merely ornamentation – a display of security, but not its guarantor.

Future work must therefore shift from fortification to reduction. The goal is not to anticipate every possible attack vector – an exercise in futility – but to minimize the surface area exposed to manipulation. A parsimonious design, embracing constrained communication and rigorously defined mandates, offers a more promising path. The elegance lies not in what is added to secure the system, but in what can be confidently removed without sacrificing functionality.

The true challenge, then, is not building smarter agents, but simpler ones. The whispers of wealth are easily distorted; a clear, unwavering signal is the only true defense. The field now faces a choice: to chase ever-more-complex layers of security, or to rediscover the power of essential clarity.

Original article: https://arxiv.org/pdf/2601.22569.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Rise of Autonomous Agents and the Evolving Security Landscape

AP2: A Protocol for Securing Agent Transactions

Prompt Injection Attacks: A Persistent Threat to Agent Integrity

Building Resilient Agent Systems: Charting a Course for the Future

What Remains?

See also: