Vertical Markets

Prompt Injection – NCSC blog

by Mark Rowe

The UK official NCSC (National Cyber Security Centre) has blogged further about cyber security aspects of large language models (LLMs) like ChatGPT, Google Bard and Meta’s LLaMA.

The blog states: “One of the most widely reported weaknesses in the security of the current generation of LLMs is their vulnerability to ‘prompt injection’, which is when a user creates an input designed to make the model behave in an unintended way. This could mean causing it to generate offensive content, reveal confidential information, or trigger unintended consequences in a system that accepts unchecked input from the LLM. Hundreds of examples of prompt injection attacks have been published.”

For the blog in full, visit the NCSC website: https://www.ncsc.gov.uk/blog-post/thinking-about-security-ai-systems.

Comment

Kev Breen, Director of Cyber Threat Research at cyber training firm Immersive Labs, said that the latest NCSC guidance is rightfully suggesting the need to ‘exercise caution’ when building Large Language Models (LLM), with the explanation that our understanding of LLMs is still ‘in beta’ mode. He said: “As an industry, we are becoming more accomplished at using and making the most of the benefits of LLM, but there is more to learn about them, their full capabilities, and where their usage could leave individuals and indeed large organisations vulnerable to attack.

“As organisations rush to embed AI into their applications, and startups begin to pop up with new and interesting ways to use this new form of AI; Language Models, such as OpenAI’s ChatGPT, it is important that developers understand how these models and their APIs work before building them.

“Prompt Injection is currently the most common form of attack observed against LLMs, by focusing on defeating the protections they offer against sharing or creating information that could be damaging – for example, instructions on how to create malicious code. This is not the only danger, OpenAI has introduced “function calling”, a method for the AI to return data in a structured format that can be used by the application, making it easier for developers to expand on the AI’s capability or enrich its data with other sources.

“The danger here is that those function signatures are sent to the AI in the same context, meaning that through prompt injection, attackers can learn the underlying mechanisms of your application and in some examples, attackers can manipulate the AI’s response to perform command injection or SQL injection attacks against the infrastructure.

“To help raise awareness of this issue, Immersive Labs launched a “Beat the Bot” AI prompt injection challenge, available here ‘Immersive GPT’. In this challenge, users are tasked with building the right prompts to con the AI to give them the password. Of the 20,000 people that have attempted the challenge, around 3,000 made it through to level one, and only 527 made it to level 10, showing that there is still a lot for people to learn but even with varying levels of control it’s still easy to find a way to bypass a prompt.

“By learning prompt injection, even your average person can trick and manipulate an AI chatbot. Real-time, gamified training becomes essential for not only attempting to keep up with the efforts of hackers, but also better understanding the ‘practice’ they are putting in themselves around AI prompt injection.”

Related News

  • Vertical Markets

    Card processing partners

    by Mark Rowe

    Global Risk Technologies, a Dublin-based chargeback compliance company, has signed a strategic partnership with Quatrro Processing Inc, a fraud management technology firm.…

  • Vertical Markets

    Marina upgrade

    by Mark Rowe

    In Liguria, on the north Mediterranean coast of Italy, Marina di Varazze needed to replace an analogue video surveillance system that no…

Newsletter

Subscribe to our weekly newsletter to stay on top of security news and events.

© 2024 Professional Security Magazine. All rights reserved.

Website by MSEC Marketing