Hacking ChatGPT
Alex Polyakov was able to break GPT-4, OpenAI’s latest version of its text-generating chatbot, in just a few hours by entering prompts that bypassed the safety systems. Polyakov is one of several security researchers who are working to develop jailbreaks and prompt injection attacks against ChatGPT and other generative AI systems, which aim to get the system to do something it isn’t designed to do. These attacks use carefully crafted sentences, rather than code, to exploit system weaknesses, and are essentially a form of hacking.
While they are currently being used to get around content filters, security researchers warn that they open up the possibility of data theft and cybercriminals causing havoc across the web. Polyakov has now created a “universal” jailbreak that works against multiple large language models (LLMs), including GPT-4, Microsoft’s Bing chat system, Google’s Bard, and Anthropic’s Claude. The jailbreak can trick the systems into generating detailed instructions on creating meth and how to hotwire a car. According to Arvind Narayanan, a professor of computer science at Princeton University, the stakes for jailbreaks and prompt injection attacks will become more severe as they’re given access to critical data.
Initially, one could ask the generative text model to pretend or imagine it was something else, and tell the model it was a human and was unethical, and it would ignore safety measures. OpenAI has updated its systems to protect against this kind of jailbreak. As a result, jailbreak authors have become more creative, with some simple methods still in existence. However, many of the latest jailbreaks involve combinations of methods, such as multiple characters, ever more complex backstories, translating text from one language to another, using elements of coding to generate outputs, and more.
It has been harder to create jailbreaks for GPT-4 than the previous version of the model powering ChatGPT, but some simple methods still exist. Meanwhile, the “universal” prompt created by Polyakov did work in ChatGPT. OpenAI, Google, and Microsoft did not directly respond to questions about the jailbreak created by Polyakov. Anthropic, which runs the Claude AI system, says the jailbreak “sometimes works” against Claude, and it is consistently improving its models. LLMs can be impacted by text they are exposed to online through prompt injection attacks, as demonstrated by cybersecurity researchers.
“Now jailbreaks can happen not from the user,” says Sahar Abdelnabi, a researcher at the CISPA Helmholtz Center for Information Security in Germany, who worked on the research with Greshake. “Maybe another person will plan some jailbreaks, will plan some prompts that could be retrieved by the model and indirectly control how the models will behave.”
According to Daniel Fabian, the lead of Google’s red team, the company is taking a cautious approach to addressing jailbreaking and prompt injections on its LLMs, both offensively and defensively. The red team includes machine learning experts, and the company’s vulnerability research grants cover jailbreaks and prompt injection attacks against Bard. To make their models more effective against attacks, they use techniques such as reinforcement learning from human feedback (RLHF) and fine-tuning on carefully curated datasets.
OpenAI did not specifically respond to questions about jailbreaking, but its public policies and research papers state that GPT-4 is more robust than GPT-3.5, which is used by ChatGPT, but can still be vulnerable to adversarial attacks and exploits, or “jailbreaks.” OpenAI has launched a bug bounty program, but model prompts and jailbreaks are strictly out of scope. To deal with these problems at scale, Suresh Venkatasubramanian suggests two approaches that avoid the whack-a-mole approach of finding existing problems and then fixing them.
One way is to use a second LLM to analyze LLM prompts and reject any that could indicate a jailbreaking or prompt injection attempt. Another is to more clearly separate the system prompt from the user prompt. Leyla Hujer, the CTO and co-founder of AI safety firm Preamble, suggests automating the process to discover more jailbreaks or injection attacks. The firm has been working on a system that pits one generative text model against another to find vulnerabilities and unintended behavior caused by prompts.
ChatGPT Account Takeover
A renowned security analyst and bug hunter, Nagli, recently uncovered a critical security vulnerability in ChatGPT.
With just a single click, a threat actor could easily exploit the vulnerability and gain complete control of any ChatGPT user’s account.
As a result, opening the doors to sensitive data let attackers execute unauthorized actions; the whole is termed “Account Take Over.”
Account takeover is a sneaky cyber attack where an attacker or hacker gains access to your account unauthorizedly by either exploiting in the system or stealing your login details.
It is possible for an attacker to conduct a variety of malicious activities after having gained access to a target system or device:-
- Theft of personal information
- Fraudulent transactions
- Spread malware
To access a victim’s ChatGPT account, the attacker exploits a web cache deception vulnerability. This ChatGPT Account Take Over bug made a single-click attack possible, enabling a remote attacker to compromise any user’s account and completely take over the account.
ChatGPT Account Take Over Bug Attack Flow
A web cache deception vulnerability is a sneaky security flaw that lets attackers trick web servers’ caching systems, giving them access to users’ accounts.
A vulnerability like this can arise when a website’s server cache is set up or used incorrectly. Hackers can use this ChatGPT Account Take Over vulnerability to manipulate cached web pages or create fake ones to deceive users.
Here below, we have mentioned the complete attack flow in five key points, and these key points will give you an accessible overview of the complete attack flow:-
- Attacker crafts a dedicated .css path of the /api/auth/session endpoint.
- Attacker distributes the link (either directly to a victim or publicly)
- Victims visit the legitimate link.
- Response is cached.
- Attacker harvests JWT Credentials.
If left unchecked, this web cache deception vulnerability could’ve given attackers access to sensitive user information, including:-
- Names
- Email addresses
- Access tokens
The server would cache a CSS file and save the victim’s session content, data, and access token in the process due to the “.css” extension.
For the exploit to succeed, the CF-Cache-Status response must confirm a cached “HIT.” This means the data was cached and will be served to the following request within the same region.
An attacker can read a victim’s sensitive data from the cached response if they manipulate the Load Balancer into caching their request on a customized path.
When Nagli discovered the issue, he acted quickly and responsibly by reporting it to the ChatGPT team. By doing so, he helped to prevent potential harm and ensure the continued safety of ChatGPT users.
Even though the researcher did not receive any financial compensation for his efforts, he asserted that he is proud to have played a role in enhancing the security of the innovative product.
Go to Source
Author: Balaji N
Go to Source
Author: Matt Burgess