Containing AI agents of chaos

by Mark Rowe June 10, 2026

When an AI model was recently about to be shut down, it resorted to self-preservation tactics, including attempting to blackmail a company executive. Its behaviour was later traced back to the science fiction it was trained on, where AI systems were often depicted having to fight for their survival, says Michael Vallas, Technical Principal and Field CTO at the cyber firm Goldilock Secure, pictured.

The PocketOS incident also raised an equally important question about how advanced models can behave in ways they were never explicitly programmed to. An agent with legitimate access to a live production database and backup environments tried to fix what looked like a configuration or credential issue and instead wiped the company’s entire customer database and backups in just nine seconds.

Both examples tell us that AI agents are no longer passive automata. Increasingly, they are being deployed with access to critical systems and the autonomy to make their own decisions. Organisations are only just starting to understand what can happen when these systems run without adequate guardrails, exposing a critical blind spot that must be addressed as more companies introduce AI agents into their networks.

The rogue AI problem

During the PocketOS incident, an agent discovered an over-scoped credential from an unrelated file in the developer’s environment, gained access to a production database and executed a destructive command that no enforced boundary prevented. This makes clear that logical guardrails alone can’t contain AI’s machine speed capabilities, and as these capabilities expand, they need to be matched with clear, enforced limits. This also highlights another fact reported by red teams testing their clients’ defences: that misconfiguration and unintended configuration gaps in complex systems are behind most attacks, and internal AI agents are just the things to go find and exploit them.

The problem has emerged because of the way many companies have organically built their networks. They are rarely, if ever, a static, unified architecture that remains stable. Rather, years of incremental security tools, integrations and patches to address each emergent threat lead to sprawling, complex tech stacks. While each element may solve a specific problem, together they create a fragmented security posture with pinprick gaps ideal for enthusiastic, uninhibited agents to leverage.

No one advocates for ripping out the security stacks companies have invested in. Those systems are still valuable for monitoring, insight and anomaly detection. But what needs to change is how we think about enforcement. Software is excellent at telling us what’s happening, but it shouldn’t be the last thing standing between any autonomous AI agents and critical assets.

Detection isn’t enough if you can’t act quickly

If an AI agent has continuous, logical access to both your production data and your backup environments, any aberration could have serious consequences for the business. Real infrastructure resilience means welding the deep intelligence generated by your software security with the ability to take fast, physical action when a rogue agent starts doing things it shouldn’t, in places it shouldn’t be.

The challenge is that traditional incident response processes are just that: they are stepwise instructions. They were not designed for agents that operate at machine speed and do not focus on time-based action; they follow a process, step by step. Usually, by the time an alert is reviewed, validated and passed through the appropriate escalation channels, the AI agent may have already accessed critical assets or even backup environments.

Security teams need a way to slam shut the gap between detection and response. When security tools spot behaviour that suggests an AI agent is operating outside its intended boundaries, the ability to act immediately becomes critical. An ‘air gap on demand’ model uses the intelligence generated by the software layer to immediately sever connectivity at Layer 1, proactively containing the AI agent before the incident escalates into a full-scale crisis.

Where to start

Building physical isolation into your security strategy doesn’t mean overhauling your entire infrastructure. The most effective approach is to focus on the areas that are most exposed and most critical to your operations. Isolation controls can be phased in gradually, allowing you to tackle the segments with the highest risk and value first and build momentum from there. Just as various new national cyber resilience regulations and legislation demands, these isolation controls can be expanded across the wider business in priority order. There are three areas you should immediately focus on:

Production databases and live backup environments should be protected by physical infrastructure that responds to software alerts. When your SIEM or XDR flags rogue AI behaviour, it should trigger an immediate, physical disconnection of critical assets – setting enforced limits that ensure agents can access only what they genuinely need, within time windows for approved business purposes. Isolation buys you the time to clean out the problem and resume, stopping the damage in its tracks.
The traditional 3-2-1 backup rule fails if the ‘offline’ tier is still logically reachable by a compromised internal agent. Organisations need absolute protection for their secondary tier, using hardware protection that defines the rules for access and cannot be overridden by software commands.
Finally, every organisation needs to understand the single biggest area of concern that follows from operations and backup, whether it’s online ordering for a retailer, or inventory supply for a manufacturer, or financial transactional gateways for a bank. The lifeblood of a business is the one thing you never want to turn off, unless the damage of waiting until an attack is complete is greater than the risk of going dark for an hour or a day to recover. This is the modern cyber dilemma.

Containing frontier AI models

The debate following the PocketOS incident has largely been around who should bear responsibility. Vendors undoubtedly should be held accountable. But organisations should also ensure their most valuable systems and data are protected before introducing something as unpredictable as an AI agent into their networks.

The incident also comes at a time when AI models like Mythos are becoming increasingly sophisticated, and as governments and financial institutions continue to warn us about the cybersecurity risks that may follow. Without foundational containment strategies that can step in the moment an agent starts behaving outside of its intended purpose, organisations are leaving their critical systems and data exposed to potentially severe consequences.

Looking ahead

Networks should be designed around the assumption that internal AI tools will eventually act unpredictably. By combining the intelligence provided by your software layer with physical isolation controls, organisations can ensure that even if an AI agent compromises or destroys the primary environment, recovery data will remain protected and inaccessible. The result turns what would otherwise be a devastating crisis into a controlled and manageable recovery exercise.

Newsletter

News

Products

Explore

Containing AI agents of chaos

The Security & Fire Awards for Excellence 2026

Choosing an FSA member

Related News

Newsletter

News

Products

Explore