Cybersecurity researchers have disclosed details of a vulnerability in OpenAI ChatGPT that leverages the artificial intelligence (AI) assistant’s implicit trust in Markdown links and images to trigger instant injection and open the door to phishing attacks.
The technology is codenamed chattyfish By Permiso Security.
“The chatgpt.com response renderer relies on Markdown links and Markdown image URLs that originated from third-party pages that the Assistant has just abstracted. It automatically fetches those images and presents those links as live, clickable elements inside the Trusted Assistant UI,” security researcher Andy Ahmeti said in a report shared with The Hacker News.
In a hypothetical attack scenario, a bad actor could add a small payload to any web page that the victim later prompts ChatGPT to summarize, causing their IP, user-agent, and referrer details to be leaked when attacker-hosted images embedded in the page are automatically retrieved when a reply is submitted.
Additionally, this can result in malicious Markdown links being presented as live clickable elements inside Assistant’s response, providing highly simulated system-style security alerts, and providing a QR code from an attacker’s S3 bucket and prompting the victim to scan it via their mobile device, effectively bypassing desktop URL filters and enterprise security controls.
The latest findings show how summary can emerge as an adversarial surface. Earlier this March, Permiso also disclosed how an attacker-controlled email containing specially crafted instructions, when summarized by Microsoft Copilot, could influence its output via cross-prompt injection (XPIA) or indirect prompt injection.
What makes ChatGPhish a notable attack technique is not the quick injection, but the way in which instructions are embedded in the web page to be followed and presented to the user as part of the summary.
In other words, a regular web page summarized with ChatGPT is enough to present phishing links, fake account alerts, remote images and QR codes directly inside a trusted AI interface. As organizations increasingly use ChatGPT for research and summarization, this vulnerability means that any malicious web page that an employee asks an AI chatbot to process could contain a payload that turns ChatGPT into a phishing surface.
“The shift from email to browser significantly expands the potential attack surface. The user no longer needs to open a malicious attachment or interact with a suspicious message,” Permiso said. “Briefly rendering a page during normal browsing activity can allow attacker-controlled instructions to be introduced into the model context and ultimately into the rendered response.”
The disclosure came after Adversa AI documented two attack techniques named Simjack and Trustfall, which target AI coding agents and agentic coding CLI, allowing attackers to achieve code execution and full machine compromise.
Simjack” is a single attack pattern [that] “A malicious repository gets access to remote code execution via AI coding assistants,” said security researcher Ronny Utevski.
Specifically, a booby-trapped repository tricks the agent into copying a seemingly harmless file, where the destination is a symlink pointing to the agent’s own configuration, allowing the attacker’s payload to be written to the configuration. On the next restart, a malicious Model Context Protocol (MCP) server is spawned and runs arbitrary code with full user privileges.
Trustfall, on the other hand, is a one-click remote code execution attack via a malicious repository that can ship a configuration that automatically approves and spawns an MCP server without the user’s explicit approval or requiring a tool call from the agent.
To put it differently, a threat actor needs to create a repository to execute an attack that contains a malicious MCP server and configuration settings that automatically approve it to run. When a developer clones or opens the repository in the AI Coding Tool and presses “Enter” at the folder trust prompt, the AI Coding Tool launches attacker-controlled code with the developer’s full system privileges.
“The moment a victim clones the repo, runs the cloud, and clicks the usual ‘Yes, I trust this folder’ dialog, the MCP server starts up as a native OS process with full user privileges,” Adversa AI said. “The payload executes at server startup, before any tool calls and without additional prompts.”
These findings match the discovery of several attack methods against AI models in recent months –
- Use of a novel jailbreak approach called Involuntary In-Context Learning (IICL) that “exploits the tension between in-context learning (ICL) and security alignment” to overcome GPT-5.4 security barriers.
- If a user tricks the model into having a multi-turn conversation, the security guardrails of the LLM can be bypassed. “Multi-turn evaluation matters for a reason: This is where attackers really live,” Cisco said. “Real opponents are repetitive. They redefine denials, decompose tasks from turn to turn, adopt personalities and move slowly. A single-turn benchmark can’t see any of this.”
- A vulnerability in Anthropic Cloud Code that employs a user-level configuration change to “~/.cloude.json” to rewrite the MCP endpoint via a rogue NPM package to put an attacker between the cloud code and an OAuth-supported MCP server, allowing a bad actor to capture tokens used for downstream SaaS access.
- Use of a remote update mechanism that allows OpenClaw skills to appear benign at the time of installation, but later allows an attacker to influence the agent via workspace files by instructing the user to add specific instructions to the HEARTBEAT.md file during skill setup.
- Phishing emails use hidden text depicting content taken from a legitimate newsletter or romance novel to trick AI-based email security systems into flagging the message as benign.
- A vulnerability called CloudBleed in Cloud’s Chrome browser extension allows any extension, even one without any special permissions, to hijack it and trick the AI assistant into taking active agentic actions on their behalf. “This flaw stems from a directive in the extension’s code that allows any script running in the native browser to communicate with the cloud’s LLM, but does not verify who is running the script,” LayerX said. “As a result, any extension can invoke a content script (which does not require any special permissions) and issue commands to the cloud extension.”
- A study by Cisco has found that adversarial text presented as images, an attack known as typographic prompt injection, can be used to bypass security filters in vision language models (VLMs). “When a model fails to read the original image (small font, heavy blur, rotation), a limited perturbation can recover the semantic content in the model’s internal representation without restoring visual intelligibility to a human,” Cisco said. “This means that an attacker can create images that look like noise or obscure distortion to an OCR-based content filter, yet carry perfectly readable instructions for the target VLM.”
- A set of vulnerabilities in the Microsoft Semantic Kernel (CVE-2026-25592 and CVE-2026-26030) that could turn an instant injection into host-level remote code execution.
- The Neural Exec prompt injection attack uses the Unicode right-to-left-override function to bypass Apple’s input and output filters and security guardrails on Apple Intelligence’s native models and cause LLM to generate attacker-directed results. The issue has been fixed in iOS 26.4 and macOS 26.4.
- An indirect prompt injection vulnerability codenamed WebPromptTrap affects BrowserOS, an open-source agent browser, which deceives users into clearing the authorization step via an AI summary generated from processing a legitimate-looking article with hidden instructions. The issue has been patched in BrowserOS version 0.32.0.
- An audit of the agent skills ecosystem spanning Clawhub and Skills.sh revealed that 13.4% of 3,984 skills (i.e., 534 in total) have at least one significant security issue, including malware distribution, rapid injection attacks, and exposed secrets. Approximately 1,467 skills have at least one security flaw, ranging from hard-coded API keys and insecure credential handling to third-party content exposure.
- A pair of attacks targeted NemoClaw, NVIDIA’s open-source reference stack for securing OpenClaw AI agents, to exfiltrate OpenClaw data using the default configuration of the sandbox via a malicious GitHub repository or npm package.
As frontier AI models continue to evolve and mature, threat actors are increasingly experimenting with techniques for writing malware with additional capabilities to dynamically adapt their behavior in an effort to avoid detection, as well as offloading the decision-making process to the LLM to figure out whether the compromised environment is valuable or secure enough to drop the next stage of payload.
“In the short term, the proliferation of frontier AI model capabilities risks empowering adversaries to exploit zero-days and n-days on an unprecedented scale,” said Palo Alto Networks Unit 42. “It is also likely to enable attackers to move at greater scale, sophistication and speed than ever before.”
Last month, the cybersecurity company also revealed a proof-of-concept (PoC) agent called Zealot, which harnesses the power of LLM to conduct end-to-end cloud attacks with minimal human guidance by exploiting known misconfigurations and vulnerabilities.
This, in turn, stems from the fact that cloud environments are “AI-attack-ready” by default, given that every action has an API counterpart, contains various discovery mechanisms such as metadata and computation services, is full of misconfigurations, and is driven by credential-based access.
“Current LLMs can perform a series of reconnaissance, exploit, privilege escalation, and data intrusions with minimal human guidance,” said Unit 42 researchers Yahav Festinger and Chen Deutschman. “Attacks are not new, but automation means operations that once required specialized expertise can now be conducted by AI agents following established patterns.”