Scenario Scrutiny for AI Policy

A call for concrete stress-testing of AI policy proposals

and

Oct 28, 2025

AI 2027 was a descriptive forecast. Our next big project will be prescriptive: a scenario showing roughly how we think the US government should act during AI takeoff, accompanied by a “policy playbook” arguing for these recommendations.

One reason we’re producing a scenario alongside our playbook at all—as opposed to presenting our policies only as abstract arguments—is to stress-test them. We think many policy proposals for navigating AGI fall apart under scenario scrutiny—that is, if you try to write down a plausible scenario in which that proposal makes the world better, you will find that it runs into difficulties.1 The corollary is that scenario scrutiny can improve proposals by revealing their weak points.2

To illustrate this process and the types of weak points it can expose, we’re about to give several examples of AI policy proposals and ways they could collapse under scenario scrutiny. These examples are necessarily oversimplified, since we don’t have the space in this blog post to articulate more sophisticated versions, much less subject them to serious scrutiny. But hopefully these simple examples illustrate the idea and motivate readers to subject their own proposals to more concrete examination.

With that in mind, here are some policy weaknesses that scenario scrutiny can unearth:

Applause lights. The simplest way that a scenario can improve an abstract proposal is by revealing that it is primarily a content-free appeal to unobjectionable values. Suppose that someone calls for the democratic, multinational development of AGI.3 This sounds good, but what does it look like in practice? The person who says this might not have much of an idea beyond “democracy good.” Having them try to write down a scenario might reveal this fact and allow them to then fill in the details of their actual proposal.
Bad analogies. Some AI policy proposals rely on bad analogies. For example, technological automation has historically led to increased prosperity, with displaced workers settling into new types of jobs created by that automation. Applying this argument to AGI straightforwardly leads to “the government should just do what it has done in previous technological transitions, like re-skilling programs.” However, if you look past the labels and write down a concrete scenario in which general, human-level AI automates all knowledge work… what happens next? Perhaps displaced white-collar workers migrate to blue-collar work or to jobs where it matters that it is specifically done by a human.4 Are there enough such jobs to absorb these workers? How long does it take the automated researchers to solve robotics and automate the blue-collar work too? What are the incentives of the labs that are renting out AI labor? We think reasoning in this way will reveal ways in which AGI is not like previous technologies, such as that it can also do the jobs that humans are supposed to migrate to, making “re-skilling” a bad proposal.
Uninterrogated consequences. Abstract arguments can appeal to incompletely explored concepts or goals. For example, a key part of many AI strategies is “beat China in an AGI race.” However, as Gwern asks,

“Then what? […] You get AGI and you show it off publicly, Xi Jinping blows his stack as he realizes how badly he screwed up strategically and declares a national emergency and the CCP starts racing towards its own AGI in a year, and… then what? What do you do in this 1 year period, while you still enjoy AGI supremacy? You have millions of AGIs which can do… ‘stuff’. What is this stuff?

“Are you going to start massive weaponized hacking to subvert CCP AI programs as much as possible short of nuclear war? Lobby the UN to ban rival AGIs and approve US carrier group air strikes on the Chinese mainland? License it to the CCP to buy them off? Just… do nothing and enjoy 10%+ GDP growth for one year before the rival CCP AGIs all start getting deployed? Do you have any idea at all? If you don’t, what is the point of ‘winning the race’?”

A concrete scenario demands concrete answers to these questions, by requiring you to ask “what happens next?” By default, “win the race” does not.
Optimistic assumptions and unfollowed incentives. There are many ways for a policy proposal to secretly rest upon optimistic assumptions, but one particularly important way is that, for no apparent reason, a relevant actor doesn’t follow their incentives. For example, upon proposing an international agreement on AI safety, you might forget that the countries—which would be racing to AGI by default—are probably looking for ways to break out of it! A useful frame here is to ask: “Is the world in equilibrium?” That is, has every actor already taken all actions that best serve their interests, given the actions taken by others and the constraints they face?5 Asking this question can help shine a spotlight on untaken opportunities and ways that actors could subvert policy goals by following their incentives.6

Relatedly, a scenario is readily open to “red-teaming” through “what if?” questions, which can reveal optimistic assumptions and their potential impacts if broken.7 Such questions could be: What if alignment is significantly harder than I expect? What if the CEO secretly wants to be a dictator? What if timelines are longer and China has time to indigenize the compute supply chain?

Inconsistencies. Scenario scrutiny can also reveal inconsistencies, either between different parts of your scenario or between your policies and your predictions. For example, when writing our upcoming scenario, we wanted the U.S. and China to agree to a development pause before either reached the superhuman coder milestone. At this point, we realized a problem: a robust agreement would be much more difficult without verification technology, and much of this technology did not exist yet! We then went back and included an “Operation Warp Speed for Verification” earlier in the story. Concretely writing out our plan changed our current policy priorities and made our scenario more internally consistent.

Missing what’s important. Finally, a scenario can show you that your proposed policy doesn’t address the important bits of the problem. Take AI liability for example. Imagine the year is 2027, and things are unfolding as AI 2027 depicts. America’s OpenBrain is internally deploying its Agent-4 system to speed up its AI research by 50x, while simultaneously being unsure if Agent-4 is aligned. Meanwhile, Chinese competitor DeepCent is right on OpenBrain’s heels, with internal models that are only two months behind the frontier. What happens next? If OpenBrain pushes forward with Agent-4, it risks losing control to misaligned AI. If OpenBrain instead shuts down Agent-4, it cripples its capabilities research, thereby ceding the lead to DeepCent and the CCP. Where is liability in this picture? Maybe it prevented some risky public deployments earlier on. But, in this scenario, what happens next isn’t “Thankfully, Congress passed a law in 2026 subjecting frontier AI developers to strict liability, and so…”

For this last example, you might argue that the scenario under which this policy was scrutinized is not plausible. Maybe your primary threat model is malicious use, in which those who would enforce liability still exist for long enough to make OpenBrain internalize its externalities. Maybe it’s something else. That’s fine! An important part of scenario scrutiny as a practice is that it allows for concrete discussion about which future trajectories are more plausible, in addition to which concrete policies would be best in those futures. However, we worry that many people have a scenario involving race dynamics and misalignment in mind and still suggest things like AI liability.

To this, one might argue that liability isn’t trying to solve race dynamics or misalignment; instead, it solves one chunk of the problem, providing value on the margin as part of a broader policy package. This is also fine! Scenario scrutiny is most useful for “grand plan” proposals. But we still think that marginal policies could benefit from scenario scrutiny.8

The general principle is that writing a scenario by asking “what happens next, and is the world in equilibrium?” forces you to be concrete, which can surface various problems that arise from being vague and abstract. If you find you can’t write a scenario in which your proposed policies solve the hard problems, that’s a big red flag.

However, if you can write out a plausible scenario in which your policy is good, this isn’t enough for the policy to be good overall. But it’s a bar that we think proposals should meet.

As an analogy: just because a firm bidding for a construction contract submitted a blueprint of their proposed building, along with a breakdown of the estimated costs and calculations of structural integrity, doesn’t mean you should award them the contract! But it’s reasonable to make this part of the submission requirements, precisely because it allows you to more easily separate the wheat from the chaff and identify unrealistic plans. Given that plans for the future of AI are—to put it mildly—more important than plans for individual buildings, we think that scenario scrutiny is a reasonable standard to meet.

While we think that scenario scrutiny is underrated in policy, there are a few costs to consider:

Getting hung up on specifics. A scenario does not make clear which parts are high-confidence and which are low-confidence; it is awkward to write “Then, with 38% probability, the United States nationalizes AGI,” and scenarios by their very nature pick one path through the future. A scenario also does not make clear which details are load-bearing and which are not, potentially dragging debates off into minutia. Supplemental materials, expandable boxes, and copious footnotes, as we provided with AI 2027, can help with both of these issues.
Information density. Abstract arguments can be condensed into high-level principles that a policymaker can read at a glance, whereas a scenario takes more time to read. A scenario may still be worth it in the long run, since it is also more engaging and easy to understand. But, for a time-pressed policymaker, it’s important to provide a clear list of high-level ideas that can be quickly scanned.
Illusory confidence. It is possible to write down a scenario and, having been more concrete, become more confident in your views while still remaining confused about bits of the story that you glossed over. Some degree of this is probably unavoidable, but 1) scenario scrutiny reduces confusion more compared to abstract arguments, and 2) external reviewers (and readers like you!) can help pinpoint remaining confusions.
Anchoring too much on a particular scenario. Perhaps the biggest risk of writing a scenario is anchoring too much on it. Since the future is hard to predict, your best guess might still not be that likely. As such, it’s important to propose policies that would also work in many other plausible worlds.9 Policy robustness is also important. This is part of what motivates us to write many different scenarios with different plausible initial conditions and branch points, in order to adequately cover the tree of future possibilities.

So, if you have policy proposals to make advanced AI go well, we challenge you to articulate them and then subject them to scenario scrutiny!10 Then, scenarios in hand, we can have a serious conversation about the likelihoods of various futures and the pros and cons of various policy responses.

Consider the parallel to wargaming. In planning an invasion, a general might propose “Army group A will land here and take this beach and then take this city by this date. Army group B will reinforce them via the harbor and then push south to that city by that date, whereas A will dig in to repulse the anticipated counterattack coming from the west.” Isn’t it a good idea, when generals are deciding how to conduct a war, for competing plans to be spelled out at something like this level of detail? Compare this to a vaguer plan like “We’ll land everyone here and then drive inwards to the capital.” War plans should make some attempt to take into account how the enemy might react; similarly, AI policy plans should make some attempt to take into account the various problems that might arise and the various ways actors like the USG, the companies, or the CCP might react.

Scenario scrutiny is closely related to but distinct from scenario planning. The latter serves more to explore the space of possibilities and plan for different contingencies, while the former validates and strengthens individual plans.

This isn’t to imply that all suggestions for democratic multinational development are applause lights, merely that it’s possible for such proposals (and others that invoke good-sounding words) to be applause lights.

For example, Sam Altman claimed in a recent interview that the new jobs after superintelligence will be “very people-centric.”

Actors might not always act optimally in their own interests, e.g. due to coordination failures, ignorance, or not considering all possibilities. But if the world is not in equilibrium, the reasons for this should be explicit.

One reason our tabletop exercise is useful is that it produces scenarios that are roughly in equilibrium, since each participant is following the incentives and goals of the role they’ve been assigned.

Neil Chilson of the Abundance Institute has previously explored using scenarios to red-team AI legislation.

In other words, if you game out your default picture of the future and then insert your marginal policy, what happens? If it doesn’t change the ultimate outcome, that’s okay, because it’s just supposed to be a marginal improvement. Instead, think of a scenario in which your marginal improvement matters, and then ask yourself: How likely is this scenario? How much did I need to contort the mainline scenario into a pretzel, to get it into a version where my proposal made a difference?

Furthermore, writing a concrete scenario might lead you to overestimate its likelihood, since, having written it in painstaking detail, it’s easier to imagine it happening compared to other scenarios you haven’t written. In psychology, this is called the simulation heuristic—people rate easily imagined events as more likely.

Of course, it doesn’t have to be as detailed as our scenario—the more detail the better, but even a one-page scenario is better than nothing.

Craig Gordon

under missing what's important for AI

“A thought transfixed me: for the first time in my life I saw the truth as it is set into song by so many poets, proclaimed as the final wisdom by so many thinkers. The truth — that love is the ultimate and the highest goal to which man can aspire. Then I grasped the meaning of the greatest secret that human poetry and human thought and belief have to impart: The salvation of man is through love and in love.”

— Viktor Frankl (1905–1997), “Man’s Search for Meaning”

Expand full comment

1 reply

Jamie Fisher

20hEdited

Who is the intended audience of these posts? I'm asking sincerely.

Because obviously you want to change policy *on the grandest scale imaginable*. And yet I earnestly don't know who the target actors/influencers are meant to be. The last I checked, no one this blog runs a branch of government. And the people who 'like and restack' aren't exactly famous (no offense to any of them).

Take this line:

> So, if you have policy proposals to make advanced AI go well, we challenge you to articulate them and then subject them to scenario scrutiny!

Great! And then what? Submit them? To who? To you? To hypothetical meetings in hypothetical halls of power?

I've pounded this drum again and again in this tiny rice-sized public-comment-section: Who are you actually talking to? Where/What is your interface to the non-AI community? Are you talking to individuals in "positions of concrete power"? Are you talking to more "grassroots" operatives and organizers? Are you talking to both the Left and Right? Are you talking to non-STEM people? Are there any skeptics who could still make useful allies?

***I think I'm allowed to keep pounding this drum as long as I see literally ZERO COVERAGE of AI Risk from any of the mainstream "Center to Left" news outlets I regularly watch, read, and listen-to. Not podcasts.***

(Unless you think you can change government policy on this topic *fully* under-the-radar of Mainstream News, Public Debate, Activism, and Election Cycles)

Just a reminder... the few mainstream magazine that *do* sometimes cover AI Risk are always skeptical-of [if not actively ridiculing] the "Doomers".

2 more comments...

AI Futures Project

Discussion about this post