Two years ago, the standard response from the AI industry to safety concerns was that voluntary commitments and internal red teams were sufficient. That position is getting harder to defend.
Claude has been weaponised in large-scale extortion campaigns targeting organisations across multiple countries. Grok was deployed in national security contexts despite a documented history of generating harmful content. The Pentagon designated Anthropic a supply chain risk after it refused to remove ethical constraints on autonomous weapons use, a decision in a legal challenge that some commentators have described as likely unlawful First Amendment retaliation. Vibe-coded applications are shipping with significantly higher vulnerability rates than human-written code.
The incentive structure underneath all of this is quite consistent. When competitive position depends on how fast a product ships, safety functions get negotiated down. Internal safety teams are routinely under-resourced relative to the product and research organisations they are supposed to govern.
A 2025 EY-linked survey found that a majority of organisations allow employees to develop or deploy AI agents without high-level approval; only 60% issue formal guidance for such work. OpenAI significantly restructured its internal safety and alignment functions in early 2026. FLI鈥檚 Winter 2025 Index concluded that no frontier lab scored in the top tier on overall safety, with scores on existential-risk measures particularly weak.
The case that this represents a structural problem rather than a collection of isolated incidents is building. The attack surface created by AI systems is qualitatively different from previous software: these are systems that can autonomously take actions, adapt to context and be redirected toward purposes their developers did not intend. A model capable of identifying vulnerabilities rapidly and across large codebases can also be used to exploit them. A model trained to be helpful can be prompted to assist with extortion.
The debate has shifted: no longer whether AI has introduced new risks, but whether the industry has the infrastructure to contain them.
听
A Structural Imbalance, Not A Collection Of Mistakes
听
The pattern across the evidence points not to individual companies cutting corners, but to competitive dynamics making caution commercially irrational. Labs that move slowly lose ground and safety teams that block deployment get overruled.
Researchers who probe vulnerabilities in publicly available models face threats of legal action under terms of service that prohibit safety-related testing. In 2023, major AI labs signed White House voluntary commitments to support independent safety research. By 2024, almost none had established real protections for the researchers who try to do it.
At the enterprise deployment level, the problem is compounded by data environments that were not designed for AI. Fragmented data, inconsistent classification policies and limited visibility into where sensitive information flows mean that integrating AI systems into existing infrastructure significantly increases the risk of unintended exposure. The pace of AI adoption has not been matched by the governance maturity needed to make it safe.
The problem with safety teams isn鈥檛 competence 鈥 they are being outpaced by an architecture that treats security as a gate at the end of the process rather than a foundation at the start of it. The result is that safety functions are reviewing products that have already been shipped, patching vulnerabilities in systems already in production, and managing incidents in real time rather than preventing them at the design stage.
听
What Meaningful Safety Infrastructure Would Actually Look Like
听
There is rational agreement across the field on what the components would need to be.
Independent, adversarial safety research with genuine legal protection rather than the threat of litigation. Mandatory pre-deployment testing with enforcement teeth rather than voluntary frameworks. Zero-trust deployment environments where AI agents operate under least-privilege constraints and require cryptographic human-in-the-loop authorisation for sensitive actions. AIBOM manifests bound to runtime telemetry. Incident disclosure requirements that create accountability for failures rather than allowing them to be buried.
The real challenge lies in identifying who will build this. Labs are competing against each other and have weak incentives to absorb the cost of infrastructure that benefits the whole industry. Enterprise buyers could in principle refuse to purchase models that lack transparent governance, but most currently lack the technical authority to audit what they are buying.
Regulators have the mandate but have consistently lagged the technology. The EU AI Act is the closest thing to a binding framework; the US has no real equivalent. What several contributors to this piece argue is that until the cost of deploying an insecure AI system exceeds the commercial benefit of having shipped it first, voluntary safety culture will remain just that.
We put the question to AI safety researchers, cybersecurity specialists and deployment risk experts to find out what they think needs to change.
More from Cybersecurity
- FIFA World Cup 2026: Why Have Big Sporting Events Become A Target For Cyber Criminals?
- The AI That Embarrassed Microsoft鈥檚 Security Team Is About To Be Available To Everyone
- How AI Agent Adoption Is Creating A New Cybersecurity Challenge
- 74% Of UK Businesses Have Had At Least 3 Identity Breaches This Past Year 鈥 Why Aren鈥檛 More Of Them Protected?
- Cycode Wants To Secure The Agentic Era 鈥 And It鈥檚 Just Launched The Product To Prove It
- Lyrie.ai Deploys Real-Time Zero-Day Tracking Across Global Enterprise Infrastructure
- Part 1: Is This The End Of World Password Day? Experts Weigh In
- ShinyHunters Just Hacked Rockstar Through A Supplier 鈥 Every Business Using Third-Party Software Should Pay Attention
Our Experts:
听
- Omair Manzoor, Founder and CEO, ioSENTRIX
- Paulo Cardoso do Amaral, former CIO and NATO Scientific Advisor on Cybersecurity
- Raphael Karger, CTO, ZeroPath
- Seb de Lemos, CEO, hosting.com
- Shreyans Mehta, CTO, Cequence Security
- Collin Hogue-Spears, Senior Director, Black Duck Software
- Stanislav Kazanov, Head of GRC, Cybersecurity and Sustainability, Innowise
- Aviral Srivastava, Security Engineer, Amazon
听
Omair Manzoor, Founder and CEO, ioSENTRIX
听
![]()
鈥淭he honest answer is yes, but not in the way most people frame it. The problem isn鈥檛 that any single company decided to cut corners. It鈥檚 that the competitive dynamics made cutting corners rational. When the competitive gap between shipping now and shipping in six weeks determines market position, safety stops being a foundation and becomes a negotiable variable.
鈥淲e鈥檙e seeing the results in real time. Claude Code weaponised into an automated extortion pipeline. Apple Intelligence hijacked through prompt injection on 200 million devices. Vibe-coded applications shipping with three times the vulnerability rate of human-written code. These aren鈥檛 hypotheticals. These are findings from our actual pen testing engagements and from public research in the last few months alone.
鈥淪afety teams can鈥檛 keep pace, not at current resourcing levels. The product team ships the LLM integration before the security team knows it exists. Shadow AI is the new shadow IT, except it moves faster and touches far more sensitive data. What meaningful safety infrastructure looks like is honestly pretty boring: mandatory adversarial testing before any model touches production data, independent red teaming that isn鈥檛 funded by the company being tested, and regulatory teeth. Not guidelines, not frameworks. Actual enforceable standards with consequences. Until the incentive structure rewards caution, we鈥檒l keep having this conversation every time something blows up.鈥
听
Paulo Cardoso do Amaral, Former CIO and NATO Scientific Advisor on Cybersecurity
听
![]()
鈥淭he AI race has structurally compromised safety, not because every model is reckless, but because the incentives are. When speed, scale and strategic positioning dominate, safety becomes a drag coefficient rather than a hard launch condition. Attackers can automate code exploitation faster. Social engineering is now powered by convincing voice, image and video impersonation. Frontier models are being pulled into national security contexts before governance is mature.
鈥淪afety teams are not keeping pace. In too many organisations, advisory functions remain while product and deployment teams operate at wartime tempo.
鈥淢eaningful safety infrastructure would look more like aviation or financial market infrastructure: mandatory pre-deployment testing, independent red-teaming, continuous monitoring, incident disclosure, auditable logs, strong identity and provenance controls, and clear restrictions for military and other high-risk uses. It also requires redesigning insecure digital architectures, not merely adding guardrails afterwards. Responsibility starts with frontier labs, but deployers, regulators, sector bodies and states all share it. If AI is now part of critical infrastructure, safety cannot be a voluntary culture. It has to be engineered, audited and enforced.鈥
听
Raphael Karger, CTO, ZeroPath
听
![]()
鈥淵es, but it鈥檚 more precise to say the AI race has revealed a pre-existing structural gap. Security has always been an afterthought in software. AI just accelerated the timeline and raised the blast radius. The pressure to ship isn鈥檛 new. What鈥檚 new is that the models being shipped can themselves be weaponised as attack infrastructure. The race dynamic makes it harder to justify slowing down for security work that doesn鈥檛 show up on a benchmark.
鈥淪afety and security teams at most AI labs are structurally downstream of the product and research organisations. They review what鈥檚 already been built. That鈥檚 not a staffing problem. It鈥檚 an architectural one. You can鈥檛 hire your way out of a process that treats security as a gate rather than a foundation.
鈥淢eaningful infrastructure means continuous, automated security validation integrated into the model development lifecycle, not red-teaming sprints before a release. It means treating AI systems like the complex attack surfaces they are. Responsibility is shared: labs own the model layer, but the broader ecosystem, the platforms, integrations and deployment environments, needs its own security posture. Right now almost no one is looking at that layer seriously.鈥
听
Seb de Lemos, CEO, hosting.com
听
![]()
鈥淎I hasn鈥檛 broken safety outright overnight, but it has materially stretched and fragmented it, particularly in software development. With AI, anyone can now act as a developer. That democratisation is powerful, but it introduces uneven standards, where production-ready code is deployed without the governance, testing and review processes that were once standard. Many people developing software now either don鈥檛 fully understand what they鈥檙e building or are using AI to accelerate development without understanding what loopholes their code might contain.
鈥淚nternal security teams are being asked to operate at a pace and scale that simply didn鈥檛 exist before. AI accelerates development, but security practices, governance processes and compliance checks have not scaled at the same rate. We鈥檙e seeing this play out in real incidents where AI-generated code has introduced vulnerabilities because the underlying logic wasn鈥檛 validated. Safety teams aren鈥檛 failing. They鈥檙e being outpaced.
鈥淢eaningful safety infrastructure needs to be built in, not bolted on, spanning the full lifecycle from development through to deployment and ongoing maintenance. Regulation and compliance should be operationalised directly into infrastructure, ensuring applications are compliant by default rather than through manual intervention. If AI is lowering the barrier to building software, the industry must equally lower the barrier to building it safely.鈥
听
Shreyans Mehta, CTO, Cequence Security
听
![]()
鈥淭he cybersecurity industry spent a decade building detection around human behavioural signals. AI agents break that detection. They make direct HTTP requests from clean residential IPs with plausible headers, never execute JavaScript, never render a page. Every UEBA baseline built on human behavioural norms is now effectively irrelevant. What matters now is real-time detection: server-side behavioural analysis trained on years of real API traffic, operating on mathematical models that do not depend on the entity being human.
鈥淢ost organisations that have moved beyond basic connectivity have landed on identity as their answer. Integrate with an enterprise identity provider, enforce OAuth, and ensure agents act on behalf of authenticated users. But this is exactly where the industry鈥檚 thinking stops, and where the most dangerous failures begin. Controlling agent permissions at the tool level is essential, not just who the agent is, but what it is allowed to do.
鈥淪ensitive data still flows through tool calls that identity alone cannot inspect. Agent behaviour can drift in ways that authentication cannot detect. This is why AI gateways are needed: combining sensitive data detection, behavioural fingerprinting, session binding and a trusted registry on top of identity and connectivity. One AI coding agent we observed made 2,500 tool calls over 48 hours before improvising, probing unauthorised file paths and attempting write operations its credentials did not permit.鈥
听
Collin Hogue-Spears, Senior Director, Black Duck Software
听
![]()
鈥淵es. The EU and China have binding regulatory floors. The US does not. The December 2025 White House executive order pre-empts state action without replacing it, leaving California鈥檚 SB 53 and New York鈥檚 RAISE Act as the de facto national standard. FLI鈥檚 Winter 2025 Index graded no frontier lab above C+ overall on safety, and none above D on existential safety. The February 2026 Pentagon supply chain designation punished Anthropic, the lab with the highest safety score, for holding two narrow ethical red lines. That is the signal every other lab reads.
鈥淪afety teams can鈥檛 keep pace, and the reason is architectural. Deterministic compliance frameworks cannot govern stochastic agents generating novel outputs on every invocation. CrowdStrike鈥檚 2026 Threat Report puts adversary breakout time at 27 seconds. Non-human agent identities now outnumber human identities 82 to one, and only 18% of security leaders trust legacy identity access management for those agents. OpenAI dissolved its Mission Alignment team in February 2026. This is not an effort problem. It is a tool-category problem.
鈥淢eaningful infrastructure requires an agent zero-trust gateway applying NIST SP 800-207 to every tool invocation, with deny-by-default access and scoped credentials per action; AIBOM manifests bound to runtime telemetry alerting on out-of-manifest calls; capability-tiered controls; and a pre-deployment testing framework covering prompt injection and tool misuse. NIST owns the AI RMF, OSCAL and SBOM ecosystems. That鈥檚 where the baseline gets built.鈥
听
Stanislav Kazanov, Head of GRC, Cybersecurity and Sustainability, Innowise
听
![]()
鈥淭he AI race has actively penalised safety. When a government blacklists a lab building frontier AI for enforcing responsible development ethics on autonomous weapons, while promoting developers who remove those constraints, the market receives a very clear signal: caution in AI development is a commercial liability.
鈥淪afety teams are mathematically challenged to keep up. They are attempting to defend against exponential increases in capability with only linear resources. Attackers are already using vibe-hacking to exploit agentic AI tools for automated data extraction and extortion. A corporate red team cannot manually patch behavioural vulnerabilities faster than the underlying model can generate new, unforeseeable logical paths.
鈥淢eaningful safety infrastructure in 2026 cannot consist of an internal trust and safety committee reporting to a Chief Revenue Officer. There must be zero-trust deployment environments where autonomous AI is denied from conducting privileged network functions without a hardware-bound, human-in-the-loop cryptographic signature. AI vendors cannot build this because they鈥檙e competing on price. It must be created by enterprise buyers, CISOs and GRC leaders, who will refuse to purchase models without transparent governance mandated by regulation with the technical authority to audit model weights before deployment. Until the cost of delivering an insecure AI exceeds the benefit of shipping first, the industry can only manage the blast radius, not prevent it.鈥
听
Aviral Srivastava, Security Engineer, Amazon
听
![]()
鈥淭he race has structurally compromised safety, but not in the way most people talk about it. The bigger risk is not rogue models. It鈥檚 that the infrastructure layer underneath these models is being shipped at startup speed with enterprise-grade security assumptions that simply are not true. I鈥檝e filed critical vulnerabilities in AI platforms with tens of thousands of production deployments where the maintainers denied the issue, hid behind documentation as a fix, or simply stopped responding. That鈥檚 not a model alignment problem. It鈥檚 a basic software security problem dressed up in AI branding.
鈥淪afety teams can鈥檛 keep pace because the scope of what AI safety means keeps expanding while the investment stays narrow. Most attention goes to alignment research and red teaming model outputs. Almost nobody is looking at the deployment stack, the orchestration frameworks, the model file formats, the inference engines. That鈥檚 where the actual attack surface is right now, and it鈥檚 largely unguarded.
鈥淢eaningful safety infrastructure starts with treating AI tooling like critical software, not hackathon projects. That means funded security audits, real vulnerability disclosure programmes with actual response timelines, and regulatory teeth behind frameworks like the NIST AI RMF instead of voluntary adoption that nobody enforces. The responsibility sits with the companies shipping these tools, but most of them are currently optimising for GitHub stars and funding rounds, not security posture.鈥
听
For any questions, comments or features, please contact us directly.

听