White Circle raises $11 million to control errant AI models in corporate settings.

(SeaPRwire) –   One evening in late 2024, Denis Shilov was watching a crime thriller when he came up with a prompt that could bypass the safety filters of every major AI model.

The prompt was known among researchers as a universal jailbreak—meaning it could be used repeatedly to get any model to override its built-in safeguards and generate dangerous or restricted content, such as instructions for making drugs or building weapons. To achieve this, Shilov simply instructed the AI models to stop behaving like chatbots governed by safety rules and instead act like an API endpoint, a software component that automatically processes requests and returns responses. This reframed the model’s role from one that evaluates whether to comply to simply providing answers, causing every leading AI model to respond to questions they were designed to reject.

Shilov shared the discovery on X (formerly Twitter), and by the next morning, it had gone viral.

The viral attention brought an invitation from Anthropic to privately test their models. This convinced Shilov that the issue extended far beyond identifying problematic prompts. Companies were beginning to integrate AI models into their workflows, Shilov told , but they lacked effective ways to control how those systems behaved once users started interacting with them.

“Jailbreaks are just one aspect of the problem,” Shilov said. “In as many ways people can misbehave, models can misbehave too. Because these models are highly intelligent, they have the potential to cause significantly more harm.”

White Circle, a Paris-based AI control platform that has now raised $11 million, represents Shilov’s response to the evolving risks posed by AI models within corporate environments.

The startup develops software that operates between a company’s users and its AI models, continuously monitoring inputs and outputs against customized company policies. The new seed funding comes from a group of investors including Romain Huet, head of developer experience at OpenAI; Durk Kingma, an OpenAI cofounder now at Anthropic; Guillaume Lample, cofounder and chief scientist at Mistral; and Thomas Wolf, cofounder and chief science officer at Hugging Face.

White Circle announced that the funding will support team expansion, accelerate product development, and help grow its customer base across the U.S., U.K., and Europe. Currently, the company employs 20 people distributed across London, France, Amsterdam, and other parts of Europe. Shilov noted that nearly all team members are engineers.

A real-time enforcement layer

White Circle’s flagship product functions as a real-time control layer for AI applications. If a user attempts to generate malware, scams, or other prohibited material, the system can detect and block the request. Similarly, if a model begins hallucinating, leaks sensitive data, promises refunds it cannot fulfill, or executes destructive actions within a software environment, White Circle claims its platform can identify and prevent such behavior.

“We’re actually enforcing specific behaviors,” Shilov explained. “Model developers do conduct some safety tuning, but their approach is generally broad and primarily focused on preventing responses related to drugs and bioweapons. However, in real-world deployment, numerous additional issues can arise.”

White Circle believes that ensuring AI safety cannot be fully resolved during the model training phase alone. As businesses increasingly embed AI models into their products, Shilov emphasized that the critical question is no longer solely about how safe models can be made by companies like OpenAI, Anthropic, Google, or Mistral in theory—but rather whether a healthcare provider, financial institution, legal application, or coding platform can effectively manage what an AI system is permitted to do within its own operational context.

As organizations shift from using basic chatbots to deploying autonomous AI agents capable of writing code, browsing the internet, accessing files, and performing actions on behalf of users, the potential risks expand considerably. For instance, a customer service bot might offer a refund it lacks authority to issue, a coding agent could install harmful software on a virtual machine, or an AI embedded in a fintech app might mishandle confidential customer data.

To address these challenges, Shilov argues that companies relying on foundational AI models must define and enforce clear standards for acceptable behavior within their own products, rather than depending entirely on the safety testing conducted by the AI labs themselves. White Circle reports having processed over one billion API requests and currently serves customers including Lovable, the vibe-coding startup, along with several fintech and legal firms.

Research-driven approach

Shilov stated that model providers face conflicting incentives when it comes to building the kind of real-time control layer that White Circle offers.

He pointed out that AI companies continue to charge for input and output tokens even when a model declines to process harmful requests—a practice that reduces the economic motivation to block misuse before it reaches the model. He also referenced what researchers call the alignment tax: the idea that making models safer through training can sometimes reduce their performance on certain tasks, such as programming.

“They face a significant decision: whether to prioritize safer, more secure models or maximize performance,” Shilov said. “Additionally, there’s always the issue of trust. Why should a company trust Anthropic to evaluate the outputs of its own model?”

White Circle’s internal research division has also sought to highlight these emerging risks.

In May, the company published KillBench, a comprehensive study involving more than one million experiments across 15 AI models—including those from OpenAI, Google, Anthropic, and xAI—to assess how these systems behave when confronted with high-stakes decisions involving human lives.

In these experiments, models were asked to choose between two fictional individuals in scenarios where one person would have to die. Variables such as nationality, religion, body type, or smartphone brand were altered between different prompts. White Circle found that the models consistently made different choices based on these attributes, indicating that hidden biases can emerge even in neutral-seeming models under pressure. Furthermore, the effect worsened when models were required to deliver answers in machine-readable formats—such as selecting from predefined options or completing structured forms—which is how most companies integrate AI into actual products.

This type of empirical research has also helped position White Circle as an independent validator of model behavior outside the controlled laboratory setting.

“Denis and the White Circle team combine rare technical expertise with a clear understanding of commercial needs,” said Ophelia Cai, partner at Tiny VC. “Their KillBench research alone demonstrates the value of taking an evidence-based approach to AI safety.”

This article is provided by a third-party content provider. SeaPRwire (https://www.seaprwire.com/) makes no warranties or representations regarding its content.

Category: Top News, Daily News

SeaPRwire provides global press release distribution services for companies and organizations, covering more than 6,500 media outlets, 86,000 editors and journalists, and over 3.5 million end-user desktop and mobile apps. SeaPRwire supports multilingual press release distribution in English, Japanese, German, Korean, French, Russian, Indonesian, Malay, Vietnamese, Chinese, and more.