The world's first frontier AI regulation is surprisingly thoughtful: the EU's Code of Practice
Only the US can make us ready for AGI, but Europe just made us readier.
We’ve previously written about what an individual can do to make the development of transformative AI less likely to end in disaster. How about an AGI company?1 What steps should they take right now to prepare for crunch time?
The first thing we’d recommend an AGI company do is to coordinate with other companies and with governments to stop the reckless race toward superintelligence. Failing that, our backup recommendation would be for an AGI company to invest in planning and transparency.
We expect that during takeoff, leading AGI companies will have to make high-stakes decisions based on limited evidence under crazy time pressure. As depicted in AI 2027, the leading American AI company might have just weeks to decide whether to hand their GPUs to a possibly misaligned superhuman AI R&D agent they don’t understand. Getting this decision wrong in either direction could lead to disaster. Deploy a misaligned agent, and it might sabotage the development of its vastly superhuman successor. Delay deploying an aligned agent, and you might pointlessly vaporize America’s lead over China or miss out on valuable alignment research the agent could have performed.
Because decisions about when to deploy and when to pause will be so weighty and so rushed, AGI companies should plan as much as they can beforehand to make it more likely that they decide correctly. They should do extensive threat modelling to predict what risks their AI systems might create in the future and how they would know if the systems were creating those risks. The companies should decide before the eleventh hour what risks they are and are not willing to run. They should figure out what evidence of alignment they’d need to see in their model to feel confident putting oceans of FLOPs or a robot army at its disposal.
AGI companies should leave these plans open to revision as they gain more evidence about the trajectory of AI development. But it’s wiser for them to make a plan now rather than improvising one from scratch after the superhuman AI R&D agent is already trained. For the time being, we’re still under a veil of ignorance that prevents powerful actors from knowing what policies will benefit them in particular at crunch time. We should therefore expect them to make a more prosocial plan now than they would make later. We’re also concerned that if companies wait until too late in the game to plan for AGI, they won’t have enough time to consult with important external actors. The leading company’s executives and a small group of government overseers might just have to make a snap decision about how much existential risk it’s acceptable to run, without time to ask Congress or the public for input. The company might be locked down for security to the point where their engineers can no longer run the alignment and control plan by external experts. All of this argues in favor of planning in advance.
Planning for takeoff also includes picking a procedure for making tough calls in the future. Companies need to think carefully about who gets to influence critical safety decisions and what incentives they face. It shouldn't all be up to the CEO or the shareholders because when AGI is imminent and the company’s valuation shoots up to a zillion, they’ll have a strong financial interest in not pausing. Someone whose incentive is to reduce risk needs to have influence over key decisions. Minimally, this could look like a designated safety officer who must be consulted before a risky deployment. Ideally, you’d implement something more robust, like three lines of defense.
AGI companies should also be transparent to governments about their internal capabilities and security levels. This is because one AGI company on their own cannot do everything that needs to be done for takeoff to go well. We’ll need binding regulation on all American AGI companies to break the race to the bottom on safety. We’ll need to negotiate an international agreement to stop the AGI race between the US and China from escalating into war. And we’ll need to coordinate scarce talent and compute to help the AGI companies tighten their security and execute successfully on their alignment and control plans. This will all ultimately require government intervention.
That intervention is much more likely to be timely and helpful if AGI companies are transparent to officials all along. If government sees capabilities rising in real time, they can prepare to oversee takeoff by building capacity and situational awareness internally. But if AGI companies instead keep government in the dark until they develop a superhuman AI R&D agent, and then give the President a midnight phone call asking for help, government’s response is unlikely to be competent and productive. It’s therefore safer for AGI companies to keep the government informed of their internal capabilities and security levels, even as the gap between internally and externally deployed capabilities grows, and the public loses visibility into frontier AI development.
Up until now, AGI companies have made voluntary commitments on planning and transparency, but they’ve faced no legal obligation to prepare for takeoff, and they’ve only had to be as transparent to government as any random startup. This has changed recently, with the publication of the EU’s GPAI Code of Practice. We think the Code is an incremental but important step toward preparing the world for takeoff. For the first time, it imposes crisp, legally enforceable safety and transparency requirements on AGI companies.
A brief history of AGI companies’ safety commitments
Up until mid 2023, leading AGI companies had made many informal commitments about planning for dangerous capabilities and about transparency. OpenAI’s charter mentioned the need for “adequate safety precautions” during “late-stage AGI development,” and their blog post on Planning for AGI and Beyond called for iterative deployment on the way to AGI, “giv[ing] people, policymakers, and institutions time to understand what’s happening.” OpenAI also suggested that “major world governments” ought to have “insight about training runs above a certain scale.” Google’s AI Principles promised that they would test their models for safety before release according to formal risk assessment frameworks. Anthropic’s Core Views on AI Safety stressed the importance of planning for the arrival of more powerful future AI, saying “it is prudent to do foundational work now to help reduce risks from advanced AI if and when much more powerful systems are developed.” And in the White House Voluntary AI Commitments, these three frontier AI companies plus Meta and Microsoft all agreed to work toward sharing “information on advances in frontier capabilities and emerging risks and threats” with the US government. On the whole, AGI companies were saying many of the right things, but without much specificity.
Then in September 2023, Anthropic became the first AGI company to publish a frontier safety policy. Their original RSP made an attempt at high-level threat modelling, identifying CBRN or cyber misuse and autonomous replication as key paths by which an AI model could cause catastrophe. Anthropic then specified what dangerous capability measurements would convince them that their models posed an elevated risk of causing catastrophe and what precautions they would take if they saw that evidence. Further, Anthropic promised that by the time they developed a model that crossed their first set of dangerous capability thresholds, they would define a second level of capability thresholds and corresponding precautions. Then before crossing the second level, they would define a third level, and so on, so that at every point there’s always a plan for what to do next. The policy stressed that if at any level Anthropic was unable to meet the next level of safety and security requirements, they would refrain from training or deploying a model that passed the next dangerous capability threshold.
In the following year, other leading AGI companies such as OpenAI and Google DeepMind adopted frontier safety policies of their own. No two companies’ policies are exactly alike, and all of them have undergone changes, but they display some common features. As a rule, they all identify specific dangerous capabilities AI models may develop, lay down capability thresholds that would indicate elevated risk, and commit companies to taking specific safety precautions when their models exceed those thresholds. Generally, a frontier safety policy also includes conditions under which a company would stop building or deploying more powerful AI models for fear of catastrophe.
Frontier safety policies (FSPs) are a great first step toward preparing for takeoff, and AGI companies should be applauded for adopting them. But that said, FSPs also suffer from some serious limitations. One is that safety policies are entirely voluntary, and not all frontier AGI companies have chosen to adopt them. For instance, xAI had no official published safety policy until late last month, and most frontier AI companies in China still don’t have safety policies.2 Another important limitation is that safety policies are entirely self-enforced. Companies may promise to honor their FSPs, but they are not legally bound to do so. It’s unclear whether AGI companies will take costly actions like pausing lucrative deployments just because they promised to do so in an obscure PDF five years earlier. Even Anthropic, a company that takes its FSP relatively seriously, has already backpedalled on one of its original commitments when it became inconvenient.
Introducing the GPAI Code of Practice
The state of frontier AI safety changed quietly but significantly this year when the European Commission published the GPAI Code of Practice. The Code is not a new law but rather a guide to help companies comply with an existing EU Law, the AI Act of 2024. The Code was written by a team of thirteen independent experts (including Yoshua Bengio) with advice from industry and civil society. It tells AI companies deploying their products in Europe what steps they can take to ensure that they’re following the AI Act’s rules about copyright protection, transparency, safety, and security. In principle, an AI company could break the Code but argue successfully that they’re still following the EU AI Act. In practice, European authorities are expected to put heavy scrutiny on companies that try to demonstrate compliance with the AI Act without following the Code, so it’s in companies’ best interest to follow the Code if they want to stay right with the law. Moreover, all of the leading American AGI companies except Meta have already publicly indicated that they intend to follow the Code.3
The most important part of the Code for AGI preparedness is the Safety and Security Chapter, which is supposed to apply only to frontier developers training the very riskiest models. The current definition presumptively covers every developer who trains a model with over 10^25 FLOPs of compute unless they can convince the European AI Office that their models are behind the frontier. This threshold is high enough that small startups and academics don’t need to worry about it,4 but it’s still too low to single out the true frontier we’re most worried about. The chairs and vice-chairs who wrote the Code have publicly acknowledged as much, and the European Commission has indicated that they plan to raise the compute threshold over time as the frontier advances. We think this is a wise plan since forcing trailing-edge developers to follow the Safety and Security Chapter could burden them without buying us much security.
Even if the current threshold stays where it is, there’s important language in the Code that ensures it won’t fall too hard on smaller developers. For one, the AI Act exempts models developed purely for research purposes, so academics are in the clear. Commercial developers above the training compute threshold can still make a case to the AI Office that they are behind the frontier and shouldn't be covered. If their case is accepted, they’re exempt, and otherwise the Code still emphasizes proportionality, meaning that a developer whose best model is farther behind the frontier can get away with lighter safety and security measures. And if your model is weaker than at least one open weight model, the Code allows you to secure it as loosely as you like. Finally, enforcement of the Code doesn’t start until August 2026, so all companies that will be affected have plenty of time to prepare.
But regardless of precisely where the threshold is placed, genuine AGI companies will have to comply with the safety and security chapter. Once they do, we think this chapter will make AGI companies substantially more prepared for takeoff and much more transparent to EU officials than they are now.
The Code enhances AGI companies’ planning by requiring them to adopt safety and security frameworks similar to but stronger than existing FSPs in several ways. First, the Code requires companies to do more comprehensive threat modelling than any of them have done before. It says companies have to explicitly consider risks from CBRN weapons engineering, offensive cyber, harmful manipulation, and loss of control. This is a major step up since no FSP currently in force considers all four of these risk categories. AGI companies then have to write detailed scenarios and do formal risk modelling for each risk category, something no company has ever done as far as is publicly known. Such extensive threat modelling exercises will help AGI companies understand how precisely their models could cause harm, and that understanding should enable them to make more sensible and grounded plans.
Second, the Code requires AGI companies to get every frontier model evaluated by “adequately qualified independent external evaluators” before deployment, effectively making them build and maintain relationships with external safety experts. This amounts to a kind of emergency preparedness. Companies must identify in advance who they would call for help if they needed to determine whether a model was severely dangerous, and they must practice working together with those experts.
Third, AGI companies will have to assign responsibility for managing severe risks to specific people within their organizations. These internal risk overseers must be granted “appropriate resources” to do their job, they must have some level of independence, and they must be incentivized to correctly estimate risk. We expect it will be hard for EU regulators to tell from the outside whether AGI companies are following the spirit of this provision, just like it’s hard to tell now whether Anthropic’s Responsible Scaling Officer is incentivized in the best way, or whether GDM’s AGI Safety Council is as independent as one would like. In practice, we think AGI companies that don’t yet have safety officers will appoint them because of the Code, and any company that tries to disempower or compromise its safety team will face some healthy scrutiny from the EU.
The Code also improves AGI companies’ transparency on several fronts. First, every time a company wants to place a new frontier model on the EU market, they have to evaluate it rigorously and send the results to the European AI Office within three weeks of deployment.5 These evals need to be “at least state-of-the art” and they need to include open-ended tests such as red-teaming and human uplift studies. In other words, an AGI company can’t just run a few cheap Q&A benchmarks on their new model and call it a day. Also, their evaluations need to measure the new model’s propensities as well as its capabilities. In particular, AGI companies need to make a sincere effort to evaluate whether models are scheming or strategically undermining evaluations, eg, by sandbagging. The findings from all these evaluations must then be shared with EU officials, keeping Brussels abreast of capabilities and propensities trends at the frontier.
Second, an AGI company must forecast when their AI models will exceed the next risk tiers in their framework and share the forecasts with the AI Office. These need to be quantitative forecasts supported by justifications, not just wild-ass guesses. Sharing these forecasts is a big deal for EU officials’ situational awareness. Almost no-one is better positioned to predict the course of AI development than the experts inside AGI companies, and those experts are about to start sharing their predictions with the EU.
Third, AGI companies need to tell EU regulators how they’re doing on security and control every time they deploy a new frontier model publicly. The Code requires each company to set an explicit security goal, saying what types of threat actor they aim to be secure against. At minimum, companies must be secure against nonstate external threats and inside threats (roughly RAND SL3), though they’re encouraged to set more ambitious goals. Then a company has to implement reasonable security measures, document those measures, and explain to the EU why they’re sufficient to meet the security goal. This means that if a company is building a 100x AI R&D agent with woefully inadequate SL2 security, the EU will know about it and can punish them for it. Notably, the Code also directs companies to guard against “(self-)exfiltration or sabotage carried out by models,” possibly by applying control measures to their AIs. The AI Office will get to see these measures and check whether they’re sufficient.
Fourth, companies are required to monitor for serious incidents involving their AI models and to report these incidents promptly to authorities. This reporting requirement could make it more likely that we recognize an AI warning shot if one happens. If an AI company discovers that one of their models has self-exfiltrated, facilitated an attack on critical infrastructure, or been stolen by hackers, they must notify both the AI Office and relevant national governments within days. While authorities would obviously know about some incidents—eg, a cyberattack knocking out power to a whole region—they might have no idea that an AI model was involved without the company's report. And importantly, some critical incidents might go totally undetected without these reports. For instance, there's no obvious mechanism by which authorities would learn of a rogue replicating AI unless the company that developed it sounds the alarm.
Finally, the Code also says AGI companies have to share the model spec and system prompt for every new frontier model with the AI Office. We’ve previously argued that it’s good for companies to be transparent with their specs and system prompts, so we’re pleased to see this step in that direction.
All of this planning and transparency required by the Code is only as good as the AGI companies’ execution. What’s to stop them from writing crumby safety and security frameworks and model reports? How is the AI Office supposed to hold them to a high standard? The Code’s general approach is not to set a static, absolute standard that companies have to meet. Instead, it sets a dynamic standard by requiring companies’ FSPs, model evals, risk estimation, and elicitation techniques all to be “at least state-of-the-art.” Roughly, this means that a frontier developer’s safety practices always need to be as good as its industry peers’ practices or better. We hope that this language will create a healthy ratchet effect, where every time one AGI company improves its safety practices, the EU can force all other frontier companies to improve in the same way.
Will the Code matter at crunch time?
It’s great that the Code of Practice makes AGI companies do some sensible things now, but you might wonder whether it will actually matter later in the timeline, when the stakes are higher and the EU has less leverage. The EU’s main tools for enforcing the AI Act are its power to fine and its control of the European market. Break the Act, and the European Commission can fine you up to 3% of your revenue or even block you from serving your models in Europe if your breach was especially egregious. Right now, AGI companies still care about not getting fined and make lots of money selling their services to European businesses and consumers, so they have a strong incentive to play nice with the Commission. But we expect this to change as the companies get closer to AGI. As the models scale up, they’ll get vastly more expensive to serve without becoming much more performant in mundane use cases, so it will make less commercial sense to serve frontier models to the public. Also, the opportunity cost of serving a model publicly will rise when it becomes possible to accelerate AI R&D by deploying the model internally instead. Both of these effects will push toward fewer frontier models released in the EU.
No more frontier models released in Europe means no more model reports submitted to the AI Office, so most of the transparency provided by the Code of Practice goes away. The EU will also lose most of its leverage to stop AGI companies from breaking or watering down their safety policies once those companies aren’t afraid of the fines and no longer care about their access to the European market. The Commission can try to fine a company, but the maximum fine would be small for a company that’s going for broke on AGI and barely bothering to ship products or make revenue. Companies might simply refuse to pay, perhaps claiming immunity from the Code on national security grounds.6 They might even pull out of the EU altogether at crunch time, leaving the European Commission with virtually no leverage left.7
Yet even if the EU is mostly powerless to enforce the Code of Practice at crunch time, some of the measures AGI companies previously put in place to comply with the Code may prove sticky. Companies will have no reason to throw away their threat modelling, detailed scenarios, and risk estimation just because they’re no longer bound by the Code. As they grow more afraid of their own models, the companies will be grateful that they built up competent safety teams to comply with the Code, and they’ll voluntarily turn to those teams all the more. Some of them will keep their partnerships with independent evaluators going as long as security restrictions allow them to, and maybe even pull staff from orgs like METR or Apollo into the Project if it becomes impossible to keep working with them externally. And the security and control measures a company implemented to achieve their security goal won’t automatically disappear the moment they stop caring about the AI Act.
The Code’s transparency requirements also make it somewhat more likely that the EU—and maybe also the US government and the public—are aware enough to make wise decisions at crunch time. The European Commission will know what risks AGI companies find most worrying, when the companies predict those risks will arise, how robust each company’s security is, and much more. Some of this information will also have been shared with the public, since the Code tells AGI companies to publish their safety frameworks and model reports “if and insofar as necessary to assess and/or mitigate systemic risks.” And maybe most importantly, the Code ensures that AGI companies write critical documentation now so that it will be available to the US government in an emergency. If there were no Code, the companies might not bother to systematically document their security measures, control techniques, and capability forecasts, so if the US government urgently requested this information—either through legislation or through executive action—the companies would waste time scrambling to collect it. But thanks to the Code, the critical documents will already have been written for the AI Office, and they’ll be ready to go if they US requests them.
Building on the Code
We’re pleased with the GPAI Code of Practice and consider it a win for humanity. Still, it has shortcomings, the most notable of which is that it’s not an American law. Only the AGI companies’ home governments will realistically be able to enforce regulations on them all the way through takeoff because no other government will have a sufficiently big legal stick to threaten them. All the leading AGI labs are in the US or China, so European regulation can only do so much down the stretch.
Second, the Code doesn’t do as much as one might wish for transparency into internal deployments. For the reasons cited above (and in AI 2027), we predict that AGI companies will move away from deploying their models to the public so they can allocate more GPU hours to internal automated AI R&D. If such an internal deployment were going on, it could be extremely risky, but European officials probably wouldn't know anything about it since companies aren’t required to file reports for models they don’t place on the EU market.8
Third, employees within AGI companies—even those based in the EU—don’t get any new whistleblower protections under the AI Act. Since it looks plausible that the outside world will first hear about dangerous things going on inside AGI companies from whistleblowers, we’d prefer for them to be protected more extensively.
One more shortcoming is that the Code does relatively little for public transparency. It requires an AGI company to write a safety and security framework and to share it with the AI Office, but they don’t have to publish it. Similarly, every time an AGI company releases a new AI model, they have to send a model report to the AI Office, but they are not strictly required to share the report with consumers using the model. This is far from ideal. Surely AGI companies shouldn't have to publish everything they disclose to government authorities—eg, to protect IP or state secrets—but they shouldn't be allowed to keep the public fully in the dark either. We call upon AGI companies to publish their safety and security frameworks and model reports, justifying any redactions they may have made, and we hope that future regulations will mandate them to do so.
We would like to see the US (and China) pass regulations that mirror the best parts of the GPAI Code of Practice and improve upon its weak points. Several states are already considering bills that would require basic planning and transparency from AGI companies. California’s Senate Bill 53 would require large AI companies to publish FSPs, publish model cards for their publicly deployed AI models, and report serious incidents involving their models to state officials. The Bill goes beyond the GPAI Code of Practice in strengthening whistleblower protections for AGI company employees and in requiring AGI companies to share their FSPs and model documentation with the public, not just with regulators. New York’s proposed RAISE Act would also make large AI companies publish safety policies and report serious incidents to authorities.
These state bills do many of the right things, but there are limits to what state-level regulation can achieve. To ensure that the AGI companies prepare for takeoff and maintain adequate transparency with government, we'll need federal regulation along the lines of the GPAI Code. And this is just the first step. To avoid an unacceptably high chance of disaster, we’ll need government to do much more than enforcing transparency. Our next scenario and essay series will explain in detail what we want government to do—stay tuned.
By “AGI company,” we mean an AI company that’s on course to be among the first to develop AGI.
The one notable exception is Shanghai AI Laboratory, which has an extremely detailed FSP.
xAI only pledged to follow the Code’s Safety and Security chapter, but as we’re about to explain, this is by far the most important chapter of the Code. Also note that Meta and xAI still have to comply with the EU AI Act as long as they do business in the EU. Their refusals to sign the full Code of Practice just means that they will have to demonstrate compliance with the Act by some other means.
A 10^25 FLOP training run is estimated to cost at least millions of dollars, beyond small developers’ means.
This three week grace period is actually better for transparency than requiring simultaneous model report submission. AGI companies rushing to claim SOTA and demonstrate rapid progress would pressure their safety teams to write hasty, uninformative reports if the writing process delayed deployment. The grace period instead lets companies deploy immediately while giving their safety teams three weeks to write comprehensive reports.
The Code already has an explicit carve-out for companies to withhold parts of their model reports from the EU AI Office if national security laws require it.
The EU currently controls less than 7% of global AI compute, so the AGI companies don’t especially need European datacenters. They’re currently somewhat reliant on European talent, with most of the AGI companies maintaining offices in the EU. But this won’t matter much once the companies have superhuman AI R&D agents.
The AI Office might figure out that a company was using a secret model for internal AI R&D by piecing together indirect evidence, including the developer’s own risk forecasts.