Is Your Team鈥檚 Everyday AI Use Secretly Training Your Competitors?

Few engineering organisations can match the scale of resources available to Meta. Despite having virtually unlimited access to computing power and internal AI tools, the company has opted to restrict the use of Claude Code and Codex in its applied AI division, highlighting concerns over potential security and development risks. The concern, as reported, is that heavy usage could inadvertently distill proprietary knowledge into rival AI systems through normal employee workflows. No espionage required, no data breach, no rogue employee 鈥 engineers doing their jobs, nothing more.

This bears reflection: the risk Meta is navigating here doesn鈥檛 stem from malicious parties or rogue staff, but from the nature of the tools themselves. It鈥檚 the ordinary, unremarkable act of using an AI tool to do work, the same thing millions of people at thousands of companies are doing every day, with little visibility into where the outputs go once they leave the screen.

What Inadvertent Distillation Actually Means

Classically, distillation is an intentional technique: developers train a smaller, leaner model on the results of a larger one, aiming to replicate those high-end capabilities more affordably. What Meta鈥檚 concern describes is something messier and tougher to detect. Call it inadvertent distillation: the process by which a company鈥檚 proprietary knowledge leaks into an external model鈥檚 training data through the normal use of that model by employees.

In practice, it happens across every department: developers feed private code into AI assistants for suggestions, product managers use them to polish strategy documents and support teams build chatbots directly on top of internal knowledge bases. In each case, the input may be logged, retained or used to improve the model. The output may reproduce memorised training content. And the terms of service governing what happens to that data are rarely read as carefully as they should be.

Evidence from code-specific language model research proves that this risk is very real. Recent studies show that code LLMs can leak data between 42% and 64% of the time. Even more concerning, these models produce exact, verbatim memorisation in about 13% of standard suggestions. This isn鈥檛 a statistical outlier 鈥 it鈥檚 a pervasive risk woven into everyday work.

The Hidden Risks Of AI-Driven Workflows

Meta may be focused on the high-stakes battle between big AI labs, but the security risks they are navigating are a reality for any company whose value depends on its internal data. Source code, proprietary datasets, pricing logic, customer data, internal workflows and roadmap material. Any of these that passes through a third-party AI tool is, depending on the vendor鈥檚 data policies, potentially no longer purely internal.

The hard truth: companies are treating AI vendor agreements like routine paperwork, overlooking the fact that they are actually managing the fate of their intellectual property. They鈥檙e asking whether the tool works, not what happens to the prompts once they鈥檙e sent. The Aura data breach earlier this year was a reminder that data exposure often happens through ordinary processes rather than targeted attacks. The AI distillation risk follows the same pattern: low visibility, no obvious trigger, potential for harm.

The most exposed organisations are those that have scaled AI adoption too quickly, neglecting to implement guardrails on acceptable employee usage. In today鈥檚 market, this lack of oversight is worryingly common. The Deloitte AI infrastructure survey from 2026 found that a majority of businesses lack clear visibility into where their AI-generated outputs are stored or how they鈥檙e used downstream.

Building a Defensive Strategy For AI Adoption

The answer isn鈥檛 to ban AI tools 鈥 that ship has sailed and the productivity loss would be felt. The answer is to treat AI vendor terms with the same seriousness you鈥檇 treat a data processing agreement, because that鈥檚 effectively what they are.

Three things to build policy around: what your employees are permitted to paste into external AI tools, what categories of internal material should never touch a third-party model regardless of the task and what the vendor鈥檚 actual data retention and training rights say in their terms. Most AI tool vendors offer enterprise tiers with stronger data isolation, zero-retention policies and contractual commitments that the consumer or standard tiers don鈥檛 provide. If your business is using the consumer tier, you鈥檙e operating on the default data policy, which is almost certainly not designed with your IP protection in mind.

Meta noticed this risk early enough to write internal guidelines around it. Most businesses are still in the phase of being excited that the tools work. The distance between those two positions is where competitive intelligence quietly moves in the wrong direction, and by the time it shows up as a problem, it鈥檚 already been happening for a while.