How data poisoning works

by Mark Rowe October 6, 2025

Data is the foundation of AI, but it could also be its undoing, writes Sam Peters, Chief Product Officer of the platform IO.

AI models are a wonder of the modern world: a symbol of human ingenuity and technological progress. They are also nothing without the data on which they’re trained. And this is where the cybersecurity risk lies. Data poisoning attacks insert, alter or delete training data to skew output towards the adversary’s goals. It should come as a concern that over a quarter (26 per cent) of British and American companies we polled recently say they’ve experienced a data poisoning incident over the past year. There may be even more we don’t know about. It’s time for CISOs to build an effective response.

No model is safe

AI is already transforming the way businesses work. Investment in publicly available generative AI (GenAI) tools is projected to increase 60pc over the next three years, according to BCG. Whether they’re helping marketing teams to personalise endless variations of content for campaigns, assisting developers with menial coding tasks, or taking the strain off stretched customer service staff, the value of such tools is clear. Yet they’re not for everyone.

To improve security, control and relevance, while minimising the likelihood of hallucinations, many organisations are looking to build or tweak their own large language models (LLMs) rather than rely on ones from OpenAI, Google or Anthropic. Companies with industry-specific needs in healthcare, finance, defence and other sectors are already experimenting in this area.

Unfortunately, data poisoning can impact both environments. Indirect attacks are more likely to impact “public” LLMs which gain their power from trawling and learning from vast volumes of publicly available data on the web. In this scenario, threat actors insert subtly poisoned data into public sources, knowing that it will eventually be swept up by LLMs for training, inference or fine-tuning. In more targeted attacks, threat actors with access to a proprietary LLM or part of its supply chain directly insert malicious data or manipulate existing data to achieve their ends. This is arguably higher risk, given the potential impact.

To achieve a direct data poisoning attack, an adversary may make use of a malicious insider – say, an employee or contractor – with access to the targeted LLM. Given that these individuals would have legitimate access rights, these attacks are notoriously difficult to detect. Failing that, they may try to achieve their goals via a supply chain attack which targets third-party sources used to train the model. As per regular supply chain attacks, sub-par security posture on the part of partner organisations is often ignored until it’s too late.

The third option is unauthorised access, which may rely on lateral movement from a previous compromise, or stolen/compromised credentials. The latter are increasingly easy to get hold of thanks to the wealth of infostealer logs on the dark web. According to one report, 1.8 billion credentials were stolen in the first half of 2025, an 800% increase on the previous six months.

Once access has been achieved, there are various ways in which threat actors can proceed. They may add malicious, biased or misleading data to manipulate the output. Or change or delete existing data for similar outcomes. They might mislabel data points to confuse the model, or introduce vulnerabilities or “triggers” into datasets that serve as a backdoor to help them manipulate it. Backdoors also work in the context of indirect attacks, helping threat actors force public LLMs into ignoring their built-in guardrails.

To what ends might such attacks be put? Adversaries may want to manipulate a model so they can bypass its security, fraud or facial recognition checks. They may force it to leak sensitive proprietary data, or share biased or false information. They might even want to simply sabotage a model by degrading its performance over time.

Real-world impact

The challenge for organisations running AI models in their business is that the more they rely on them for mission-critical processes, the more exposed they are to potential threats. The financial and reputational damage could be immense. There may even be an impact on public health and safety. One recent study simulated a data poisoning attack on a popular dataset used for LLM development in the healthcare industry. It found that by replacing just 0.001 per cent of training tokens with medical misinformation, subsequent models were more likely to regurgitate errors.

The stakes couldn’t be higher. Or could they? Consider a data poisoning attack on the UK military in the event of an armed conflict. It might be used to create systemic failure and decision-making paralysis, or more subtly to trick AI into believing enemy targets are actually allied forces, and vice versa. Given the speed with which the British navy is adopting drones in front-line operations, such scenarios are certainly a possibility.

How to keep your organisation safe

Some 86 per cent of the UK and American IT security leaders we spoke to say they feel prepared to detect, defend against and recover from data poisoning attacks. Whether that’s actually true or not, there are several things that organisations can do today to mitigate the impact of some of the scenarios listed above.

As is so often the case in cybersecurity, multi-layered defence is the key. In the case of proprietary models, this should start with input sanitisation and filtering, especially if data has come from public or third-party sources. The latter should be vetted extremely carefully to ensure vendors have good security posture. If models work by continuous learning, it may be a good idea to apply some kind of anomaly detection to the data stream, to stop poisoning attempts in their tracks.

When building the model itself, attention should be focused on the entire attack surface, including public APIs and data ingestion feeds. Consider isolating the training/inference environment completely, if the model is high risk. And monitor it continuously once up and running for any unexpected outputs or change in performance.

Security and business leaders should also put structured governance and frameworks in place such as ISO 27001 (for information security management) and ISO 42001 (for AI management). They offer a systematic approach to identify security gaps and potential threats and then deliver a framework to address them.

Focus on access

ISO 27001 is foundational for good information security, helping address areas that may affect your AI attack surface, like access controls, data protection and supplier security. And ISO 42001 has been designed specifically to help identify, assess and mitigate risk across the AI lifecycle – including data poisoning and theft, and the use of third-party services like ChatGPT. Crucially, they’re based on a “Plan-Do-Check-Act” (PDCA) model, which forces organisations to adopt a mindset of continuous improvement. Finally, focus on access controls, deploying best practice least privilege policies and risk-based multi-factor-authentication. And ensure employees understand the risks and potential impact of a data poisoning attack. Because as impressive as the technology is, it is usually humans that are the cause of security breaches.

Newsletter

News

Products

Explore

How data poisoning works

Automatic cyber reflexes

CIISec on challenges and prospects

Related News

Newsletter

News

Products

Explore