Large language models (LLMs) are a help, not a threat, to security …. for now, writes Dustin Childs, Head of Threat Awareness, at the cyber product company Trend Micro.
Large language models (LLMs) are the driving force behind the Generative AI (GenAI) boom. Trained on huge data sets, these AI models can recognise and interpret human language and other complex data types – offering the potential to help DevSecOps teams enhance the security of products before shipping, among many other things. However, with any positive use of technology, there’s always a flipside. Now researchers are claiming LLMs could help hackers by autonomously exploiting vulnerabilities in real-world systems.
It’s certainly a possibility, but at present, LLM technology helps network defenders rather than threat actors.
Not designed for hacking
The research in question claims that, when given the relevant CVE description, OpenAI’s GPT-4 can exploit 87 per cent of vulnerabilities. It’s an alarming prospect, but there are reasons to be sanguine about the potential threat from LLMs.
Yes, the technology could be used to generate code snippets for common exploits, if prompted correctly. Using SQL Injection as an example, a threat actor might prompt the LLM to generate different payloads to test various input fields of a web application for SQL injection vulnerabilities. They could also use these payloads in the target web application and analyse the responses. If the response changes in a way that indicates a successful injection, further exploitation might be possible. Based on the initial results, the threat actor could ask the LLM to generate more sophisticated payloads or try different techniques to gain deeper access or extract data.
However, this is where today’s technology reaches the end of its usefulness for offensive vulnerability exploitation. Although GPT-4 and the GenAI tools that use it are powerful technologies for natural language processing and generation, they’re not explicitly designed for autonomous hacking or complex attacks. They can generate code and suggest possible exploits but cannot directly interact with systems or execute commands autonomously. Rather, they rely on external scripts or human operators to carry out actions on real-world systems, limiting their ability to act independently.
Exploitation is tough
The bottom line is that finding and exploiting bugs in software is difficult – particularly in commercial software where the source code is not available. AI has trouble reverse engineering code to find vulnerabilities. It has even more trouble going one stage further to write functioning exploits. Just 5-10 per cent of all flaws get exploited, highlighting just how challenging the process is. Finding zero-day vulnerabilities is particularly hard because it requires novel and creative approaches that lie beyond the capability of the pattern-based generation that LLMs perform.
AI is using public data to create an exploit. It’s not finding a novel exploit as such but working from information already in the wild. The UK’s National Cyber Security Centre (NCSC) agrees.
“AI is likely to assist with malware and exploit development, vulnerability research and lateral movement by making existing techniques more efficient,” it notes in a January 2024 assessment. “However, in the near term, these areas will continue to rely on human expertise, meaning that any limited uplift will highly likely be restricted to existing threat actors that are already capable.”
Helping the good guys
In the meantime, LLMs will continue to provide a helping hand to stretched IT, developer and security teams. They can do so by gathering information on possible vulnerabilities, summarising details of known exploits and perhaps suggesting tools or techniques to use in penetration testing. Models could also be trained to review code for issues before a product ship – saving DevSecOps teams time and money during the development cycle and heading off any financial and reputational risk in advance.
Such efforts could be used alongside more traditional tools and techniques like fuzzing and vulnerability scanning. The former remains a more effective technology than LLMs for finding bugs in closed-source applications, while the latter is better at finding vulnerable systems on a network.
One eye on the future
However, we can’t rest too easily. The pace of technological innovation in AI is astonishing. And we can’t blithely assume that LLMs will always find it more difficult than humans to discover and exploit vulnerabilities. As their capabilities increase and improve, it will be imperative that AI designers build robust safeguards into their tools, to avoid abuse. By combining technical controls, ethical guidelines and continuous monitoring, we can harness the benefits of LLMs while minimising the risks associated with their misuse in autonomous hacking and other malicious activities. The future of vulnerability exploitation hasn’t been written yet. But when it comes, AI will be at its centre.




