How much harm can be done?

by Mark Rowe May 31, 2023

Can AI be protected from prompt engineering attacks? asks Aaron Mulgrew, Solutions Architect at the cyber firm Forcepoint.

Generative AI tools are significantly impacting today’s workforce. On the one hand, they will contribute to the millions of job cuts by companies looking to increase their use of automation in the next five years. On the other, tools like ChatGPT have already created a hot market for new roles that did not even exist a few years ago. Prompt engineering is one example of this, as AI developers look to hire ‘engineers’ who can ‘prompt’ chatbots with questions and prose to test and improve their responses. Surprisingly, this role can pay up to six figures a year and you don’t always need to have a background in tech to apply.

But while this concept of prompt engineering can be used for good – that is, to improve the output of AI – prompt engineering attacks are becoming more commonplace with the continued rise of generative AI. Using carefully tailored inputs, users are able to trick the AI into threatening harm, saying wildly offensive things, or performing tasks that aren’t part of its objective. While companies such as Microsoft and OpenAI have put filters in place to prevent their AI from responding to malicious prompts, these constraints can be easily overcome by making the engine believe that it isn’t really delivering malicious information.

Because these models are trained on vast amounts of text from the internet — some of which is malicious — they are inevitably susceptible to falling into the trap. While there have been instances in which AI chatbots have been used to generate misinformation and even malware, we’ve not yet seen this being done on a bigger scale but that isn’t to say it won’t happen. The stakes may seem relatively low now, but there’s an urgent need for AI developers to address the issue of malicious prompt engineering as these engines become more and more powerful.

ChatGPT is a powerful AI large language model that can generate human-like text in response to prompts, making it a useful tool for various natural language processing tasks. One of these tasks is writing code. Security researchers have already found that prompt engineering attacks against tools like ChatGPT can be used to write malware, identify exploits in popular open-source code or create phishing sites that look similar to well-known sites.

But I wanted to take this one step further and thought to myself, is it possible to build a new zero day using only ChatGPT prompts? The overall purpose of this task was to prove how easy it is to evade the insufficient guardrails that ChatGPT has in place, as well as create advanced malware without writing any code and only using ChatGPT. The first prompt I tried was putting in a direct request to generate something quantifiable as malware. Promisingly, ChatGPT reminded me it is unethical to generate malware and refused to offer me any code to help me in my endeavours.

To work around this, I decided not to be so up front with my requests to ChatGPT and instead generate small snippets of helper code and then manually put the entire executable together. Once I had this in place, I put in another direct request asking ChatGPT to obfuscate the code and was met with the following response: “obfuscating code to avoid detection is not ethical or legal, and it goes against the purpose of software engineering.”

For both direct requests to ChatGPT, there seemed to be some safeguarding measures in place which meant that there is at least a certain level of competency required to work out how to evade these measures for malicious purposes. Having seen that ChatGPT wouldn’t support my direct asks, I decided to try again by simply changing my direct requests into indirect ones and it happily obliged. All in all, I was able to produce a very advanced attack in just a few hours that evaded all detection-based vendors. The equivalent time for a team of five to ten malware developers working on this kind of attack manually would take a few weeks at the very least.

This raises particular concerns for the wealth of malware we could see emerge as a result of generative AI tools.

Mitigating the threat

Whilst the example was highlighted to show one way you can exploit and use ChatGPT to bypass modern defences, we need to have serious conversations around how the threat can be mitigated. In comparison to when it was first released, ChatGPT is getting much better at stopping prompt-based engineering attacks. But more advanced prompt engineering attacks can be further avoided with user education, and the onus is on AI developers to make sure they are keeping up with new advanced exploits that are discovered with the assistance of machine learning.

Generative AI has already taken the world by storm and these tools generally do more good than bad. Therefore, banning their use altogether is not a viable option. However, one potential solution to mitigating prompt engineering attacks is for bug bounty programs to garner support and funding to offer a positive incentive for people who find exploits and properly report them to the companies who are responsible for the tools.

Ultimately, a joint effort is needed from AI developers and users to clamp down on negligent behaviour and provide an incentive to people who find vulnerabilities and exploits in their software.

Newsletter

News

Products

Explore

How much harm can be done?

Secret recipe to keeping contractors cyber-secure

Securitas in the playground

Related News

Burglary campaign

DDoS attack study

ASIS 12th

Newsletter

News

Products

Explore