Prompt Injection – NCSC blog

by Mark Rowe September 1, 2023

The UK official NCSC (National Cyber Security Centre) has blogged further about cyber security aspects of large language models (LLMs) like ChatGPT, Google Bard and Meta’s LLaMA.

The blog states: “One of the most widely reported weaknesses in the security of the current generation of LLMs is their vulnerability to ‘prompt injection’, which is when a user creates an input designed to make the model behave in an unintended way. This could mean causing it to generate offensive content, reveal confidential information, or trigger unintended consequences in a system that accepts unchecked input from the LLM. Hundreds of examples of prompt injection attacks have been published.”

For the blog in full, visit the NCSC website: https://www.ncsc.gov.uk/blog-post/thinking-about-security-ai-systems.

Comment

Kev Breen, Director of Cyber Threat Research at cyber training firm Immersive Labs, said that the latest NCSC guidance is rightfully suggesting the need to ‘exercise caution’ when building Large Language Models (LLM), with the explanation that our understanding of LLMs is still ‘in beta’ mode. He said: “As an industry, we are becoming more accomplished at using and making the most of the benefits of LLM, but there is more to learn about them, their full capabilities, and where their usage could leave individuals and indeed large organisations vulnerable to attack.

“As organisations rush to embed AI into their applications, and startups begin to pop up with new and interesting ways to use this new form of AI; Language Models, such as OpenAI’s ChatGPT, it is important that developers understand how these models and their APIs work before building them.

“Prompt Injection is currently the most common form of attack observed against LLMs, by focusing on defeating the protections they offer against sharing or creating information that could be damaging – for example, instructions on how to create malicious code. This is not the only danger, OpenAI has introduced “function calling”, a method for the AI to return data in a structured format that can be used by the application, making it easier for developers to expand on the AI’s capability or enrich its data with other sources.

“The danger here is that those function signatures are sent to the AI in the same context, meaning that through prompt injection, attackers can learn the underlying mechanisms of your application and in some examples, attackers can manipulate the AI’s response to perform command injection or SQL injection attacks against the infrastructure.

“To help raise awareness of this issue, Immersive Labs launched a “Beat the Bot” AI prompt injection challenge, available here ‘Immersive GPT’. In this challenge, users are tasked with building the right prompts to con the AI to give them the password. Of the 20,000 people that have attempted the challenge, around 3,000 made it through to level one, and only 527 made it to level 10, showing that there is still a lot for people to learn but even with varying levels of control it’s still easy to find a way to bypass a prompt.

“By learning prompt injection, even your average person can trick and manipulate an AI chatbot. Real-time, gamified training becomes essential for not only attempting to keep up with the efforts of hackers, but also better understanding the ‘practice’ they are putting in themselves around AI prompt injection.”

Newsletter

News

Products

Explore

Prompt Injection – NCSC blog

Security at Kew: Downing Street leak

Tailored approach to scam prevention

Related News

Card processing partners

Marina upgrade

Executive forum in London

Newsletter

News

Products

Explore