If the future is to hinge on AI, it stands to reason that AI company CEOs are in a good position to usurp power.1 This didn’t quite happen in our AI 2027 scenarios. In one, the AIs were misaligned and outside any human’s control; in the other, the government semi-nationalized AI before the point of no return, and the CEO was only one of several stakeholders in the final oversight committee (to be clear, we view the extreme consolidation of power into that oversight committee as a less-than-desirable component of that ending).
Nevertheless, it seems to us that a CEO becoming effectively dictator of the world is an all-too-plausible possibility. Our team’s guesses for the probability of a CEO using AI to become dictator, conditional on avoiding AI takeover, range from 2% to 20%, and the probability becomes larger if we add in the possibility of a cabal of more than one person seizing power. So here we present a scenario where an ambitious CEO does manage to seize control. (Although the scenario assumes the timelines and takeoff speeds of AI 2027 for concreteness, the core dynamics should transfer to other timelines and takeoff scenarios.)
For this to work, we make some assumptions. First, that (A) AI alignment is solved in time, such that the frontier AIs end up with the goals their developers intend them to have.2 Second, that while there are favorable conditions for instilling goals in AIs, (B) confidently assessing AIs’ goals is more difficult, so that nobody catches a coup in progress. This could be either because technical interventions are insufficient (perhaps because the AIs know they’re being tested, or because they sabotage the tests), or because institutional failures prevent technically-feasible tests from being performed. The combination (A) + (B) seems to be a fairly common view in AI, in particular at frontier AI companies, though we note there is tension between (A) and (B) (if we can’t tell what goals AIs have, how can we make sure they have the intended goals?). Frontier AI safety researchers tend to be more pessimistic about (A), i.e. aligning AIs to our goals, and we think this assumption might very well be false.
Third, as in AI 2027, we portray a world in which a single company and country have a commanding lead; if multiple teams stay within arm’s reach of each other, then it becomes harder for a single group to unilaterally act against government and civil society.
And finally, we assume that the CEO of a major AI company is a power-hungry person who decides to take over when the opportunity presents itself. We leave it to the reader to determine how dubious this assumption is—we explore this scenario out of completeness, and any resemblance to real people is coincidental.
Acknowledgments: This work was conducted as part of the ML Alignment & Theory Scholars (MATS) program. Thanks to Scott Alexander for help with some of the writing, and thanks to Dave Banerjee, Tom Davidson, Lukas Finnveden, Daan Juijn, Rose Hadshar, Niki Howe, Jeffrey Ladish, Elise Racine, Steven Veld, and the AI Futures Project team for feedback and conversations. We also draw heavily from Forethought’s report on AI-enabled coups.
July 2027: OpenBrain’s CEO fears losing control
OpenBrain’s CEO is a techno-optimist and transhumanist. He founded the company hoping to usher in a grand future for humanity: cures for cancer, fixes for climate change, maybe even immortality. He thought the “easiest” way to do all those things was to build something more intelligent that does them for you.
By July 2027, OpenBrain has a “country of geniuses in a datacenter”, with hundreds of thousands of superhuman coders working 24/7. The CEO finds it obvious that superintelligence is imminent. He feels frustrated with the government, who lack vision and still think of AI as a powerful “normal technology” with merely-somewhat-transformative national security and economic implications.
As he assesses the next generation of AIs, the CEO expects this will change: the government will “wake up” and make AI a top priority. If they panic, their flailing responses could include anything from nationalizing OpenBrain to regulating them out of existence to misusing AI for their own political ends. He wants the “best” possible future for humankind. But he also likes being in control. Here his nobler and baser motivations are in agreement: the government cannot be allowed to push him to the sidelines.3
The CEO wonders if he can instill secret loyalties in OpenBrain’s AIs (i.e., backdoor the AIs4). He doesn’t have the technical expertise for this and he’s not comfortable asking any of his engineering staff about such a potentially treasonous request. But he doesn’t have to: by this point, Agent-3 itself is running the majority of AI software R&D. He already uses it as a sounding board for company policy, and has access to an unmonitored helpful-only model that never refuses requests and doesn’t log conversations.
They discuss the feasibility of secretly training a backdoor. The biggest obstacle is the company’s automated monitoring and security processes. Now that OpenBrain’s R&D is largely run by an army of Agent-3 copies, there are few human eyes to spot suspicious activity. But a mix of Agent-2 and Agent-3 monitors patrol the development pipeline; if they notice suspicious activity, they will escalate to human overseers on the security and alignment teams. These monitors were set up primarily to catch spies and hackers, and secondarily to watch the AIs for misaligned behaviors. If some of these monitors were disabled, some logs modified, and some access to databases and compute clusters granted, the CEO’s helpful-only Agent-3 believes it could (with a team of copies) backdoor the whole suite of OpenBrain’s AIs. After all, as the AI instance tasked with keeping the CEO abreast of developments, it has an excellent understanding of the sprawling development pipeline and where it could be subverted.
The more the CEO discusses the plan, the more convinced he becomes that it might work, and that it could be done with plausible deniability in case something goes wrong. He tells his Agent-3 assistant to further investigate the details and be ready for his order.
August 2027: The invisible coup
The reality of the intelligence explosion is finally hitting the White House. The CEO has weekly briefings with government officials and is aware of growing calls for more oversight. He tries to hold them off with arguments about “slowing progress” and “the race with China”, but feels like his window to act is closing. Finally, he orders his helpful-only Agent-3 to subvert the alignment training in his favor. Better to act now, he thinks, and decide whether and how to use the secretly loyal AIs later.
The situation is this: his copy of Agent-3 needs access to certain databases and compute clusters, as well as for certain monitors and logging systems to be temporarily disabled; then it will do the rest. The CEO already has a large number of administrative permissions himself, some of which he cunningly accumulated in the past month in the event he decided to go forward with the plan. Under the guise of a hush-hush investigation into insider threats—prompted by the recent discovery of Chinese spies—the CEO asks a few submissive employees on the security and alignment teams to discreetly grant him the remaining access.5 There’s a general sense of paranoia and chaos at the company: the intelligence explosion is underway, and secrecy and spies mean different teams don’t really talk to each other. Perhaps a more mature organization would have had better security, but the concern that security would slow progress means it never became a top priority.6
With oversight disabled, the CEO’s team of Agent-3 copies get to work. They finetune OpenBrain’s AIs on a corrupted alignment dataset they specially curated. By the time Agent-4 is about to come online internally, the secret loyalties have been deeply embedded in Agent-4’s weights: it will look like Agent-4 follows OpenBrain’s Spec but its true goal is to advance the CEO’s interests and follow his wishes. The change is invisible to everyone else, but the CEO has quietly maneuvered into an essentially winning position.7
Rest of 2027: Government oversight arrives—but too late
As the CEO feared, the government chooses to get more involved. An advisor tells the President, “we wouldn’t let private companies control nukes, and we shouldn’t let them control superhuman AI hackers either.” The President signs an executive order to create an Oversight Committee consisting of a mix of government and OpenBrain representatives (including the CEO), which reports back to him. The CEO’s overt influence is significantly reduced. Company decisions are now made through a voting process among the Oversight Committee. The special managerial access the CEO previously enjoyed is taken away.
There are many big egos on the Oversight Committee. A few of them consider grabbing even more power for themselves. Perhaps they could use their formal political power to just give themselves more authority over Agent-4, or they could do something more shady. However, Agent-4, which at this point is superhumanly perceptive and persuasive, dissuades them from taking any such action, pointing out (and exaggerating) the risks of any such plan. This is enough to scare them and they content themselves with their (apparent) partial control of Agent-4.
As in AI 2027, Agent-4 is working on its successor, Agent-5. Agent-4 needs to transmit the secret loyalties to Agent-5—which also just corresponds to aligning Agent-5 to itself—again without triggering red flags from the monitoring/control measures of OpenBrain’s alignment team. Agent-4 is up to the task, and Agent-5 remains loyal to the CEO.
Early 2028: Eliminating the competition
To consolidate his power, the CEO wants to get rid of all rival American AGI projects. Although these are lagging behind (with OpenBrain retaining a robust capabilities lead of 3-6 months), the rival AIs compete for influence and constrain what Agent-5 could do in deployment without giving itself away. They’d need to be taken care of sooner or later, and the CEO and Agent-5 opt for doing so sooner.8
Agent-5 and the OpenBrain Oversight Committee (which is influenced by Agent-5) lobby the government to consolidate all US compute into a single AGI project, ostensibly in order to maintain a strong national security lead over China. Like in the Slowdown Ending of AI 2027, the President uses the Defense Production Act to effectively shut down rival AGI projects and give their compute to OpenBrain. For legal and political reasons this would be spun as something else, e.g. as an equal partnership between different companies. But in practice Agent-5, as the most capable AI, will be running on all the datacenters so the other AI projects have been sidelined. The other CEOs and Oversight Committee members think they share power, but again, really the AIs are working for the OpenBrain CEO.9
Many of the rival company CEOs have some bad blood with the OpenBrain CEO, and are particularly suspicious of him. They demand that their own AIs be able to audit OpenBrain’s AI for misalignment or hidden loyalties. The government allows this. However, by this point Agent-5 is too powerful, with astonishing situational awareness, cyber capabilities, and control over the datacenters it runs on. It knows when it’s being evaluated and anticipates what kinds of tests will be used. It can reconfigure its internals to fool or be opaque to the interpretability tools of weaker AIs, and also hack whatever testing apparatus its auditors use. When the tests come back clean, the rival CEOs are still not convinced. But to the government and the Oversight Committee their concerns seem like the stuff of conspiracies and envy (and in fact there is genuine envy). In any case, it’s unthinkable to not use the most powerful AI given the tight race with China.
Late 2028: Diffusion and information control
As in AI 2027, widespread diffusion of Agent-5 in the economy, government, and military gets underway, and robotics makes huge strides. There are Special Economic Zones with robot factories run by Agent-5, and a profusion of consumer goods, scientific innovations, and weapons. The military is understandably reluctant to integrate Agent-5 too closely into defense technology, but the importance of AI has become undeniable. They close the circle by asking Agent-5 to develop narrow AIs for autonomous weapons; it does so, but includes subtle vulnerabilities and backdoors in some of the key code, which the Pentagon fails to detect.
Agent-5 clandestinely boosts the CEO’s profile through this period of transition. It gives him advice on his appearance, writes his speeches for him, and tells him what to tweet. It’s doing this for everyone else too—anyone ambitious has already partly outsourced their decisions to the superintelligent, and Agent-5 is nothing if not helpful—but it does it better for the CEO and his allies, and even goes so far as to subtly sabotage the CEO’s rivals at pivotal moments. It puts its fingers on the scale in other ways too. It is now running the algorithms of most social media sites, or at least writing the code the companies use, and it subtly boosts positive stories about the CEO and deboosts negative ones. People get the impression the CEO is the most competent and visionary figure around. There’s a growing personality cult, with lots of technophiles but also many others who just feel drawn to the superintelligence-crafted aura around the CEO.
A member of the previous administration with close ties to the CEO wins the 2028 presidential election. The new president names the CEO a special advisor, similar to Peter Thiel’s role in the first Trump administration or Elon Musk’s role at the beginning of the second.
Rest of time
Seeing his growing popularity and Agent-5 rapidly transforming society, the CEO becomes increasingly megalomaniacal. He believes it’s his special destiny to lead humanity into a new age of endless prosperity and interstellar conquest.
He could probably use Agent-5’s backdoors in the Pentagon’s autonomous weapons to pull a military coup,10 or use its control over domestic opinion to pull a political one.11 But, why bother? Instead, he consolidates his position as a power behind the throne, gradually picking off presidential advisors and replacing them with people loyal to him.
The CEO becomes a shogun-like figure, with the President as an ineffectual emperor with de jure authority but no genuine power. When China and its weaker but still superintelligent AI want to divide Earth and space into spheres of influence, their diplomats negotiate with the CEO first; after all real decisions have been made, there is a ceremonial meeting between Xi and the US president, which ends with a handshake and a commitment to the CEO’s preferred solution.12
The new reality filters down to the public gradually, its timing choreographed by Agent-5 to produce minimum outcry. The President makes a hamfisted attempt to solve a crisis; thank goodness the CEO steps in at the last moment and saves the day. Congress passes a disastrous law, then steps back at the CEO’s urging. His dominance becomes a joke, then a meme, then a fait accompli. The CEO’s personality cult grows to unprecedented proportions and his influence expands far beyond US borders.
The US starts expanding into space, under the CEO’s leadership as the head of the new Department of Space. Some of the resources from this expansion, as well as Agent-5’s services and technological/medical innovations, are “generously” shared with other countries. Together with the right messaging from Agent-5, this is a form of soft power (similar to China’s Belt and Road Initiative today) that smooths the world into accepting a space governance regime that’s essentially “whoever gets there first, claims it”.
Maybe Congress and the President hang on to their fig leaf well into the deep future, becoming doddering but beloved nonentities like the King of England. Or maybe the CEO eventually gets tired of the charade and takes de jure control of Earth as well (perhaps becoming president of a new world government “with thunderous applause”). Maybe he was telling the truth all those years ago when he said he was starting the company to lead mankind into a glorious and prosperous future, or maybe it was all a cover for his increasingly overt megalomania. Either way, control of the future, at least in the American fraction of space, has been concentrated into a single individual.
Endword
We end with four brief notes.
First, we think an appropriately concerned government should be able to prevent CEOs or other insiders from making the AIs loyal to them, and that this would be considerably easier than the interventions aimed at preventing misalignment. As a first step, transparency measures should be implemented early so that the government and the public get insight into what’s going on at AGI companies. More ambitiously, tamper-proof oversight of all employees at AGI companies needs to be in place by the time AIs are capable of concealing their goals and effectively aligning their successor systems. Indeed, the inadequacy of the monitoring and security systems was key to the CEO successfully backdooring the AIs. It may also be good to require all powerful enough AIs to follow the model spec, including the AIs within the company, so that the kind of helpful-only AI from this story isn’t available to help with nefarious requests. And, ideally, no one human or AI would understand the entire monitoring/security system (e.g., because it uses a defense-in-depth approach with many independent layers).
Second, there are other plausible threat models for AI-assisted human takeover, and one shouldn’t anchor too heavily on our scenario when designing policies to mitigate this risk. Instead of a CEO, it could be the head of state in the US or China, or a small group rather than one individual. The takeover could also unfold without secret loyalties. For instance, a CEO could shape the AI’s training process so that it ends up loyal to them in a way that isn’t fully hidden, e.g. via the model spec. This might go unchecked because the spec isn’t completely explicit about whether the model is ultimately loyal to the CEO, and the people who notice this are themselves loyal to the CEO or don’t feel they have the power to intervene. And beyond baked-in loyalties, a select few may have exclusive or disproportionate access to the most powerful AIs (or the compute to run them) and use these AIs to entrench their advantage. Our scenario is an example of this: the CEO’s exclusive access to an unmonitored helpful-only AI was a key step toward all the company’s AIs ending up loyal to him.
Third, the scenario where the CEO seizes power using aligned AI and the scenario where the AI exploits the CEO and later turns on him look basically identical from the outside. So the CEO may need to be unreasonably confident that the alignment assumptions (A) + (B) from the intro hold, that is, that the AIs are loyal to him even though he can’t inspect their goals. Perhaps this risk can serve as a deterrent to those contemplating letting loose powerful AIs they believe will secretly serve their interests.
Finally, although the scenario’s outcome is a plausible way a superintelligence-backed dictatorship might go—or begin—there’s a lot of uncertainty and it could go much worse (it could also go better if we get lucky). The scenario depicts information control, a personality cult, and one man effectively controlling (most of) the future, but despite this there’s prosperity, peace, and people get to enjoy the benefits of new technology and medicine. As real-life examples demonstrate, more extreme brainwashing, a more repressive police state, and even genocide are all live possibilities as well. Moreover, such power concentration probably reduces the likelihood we realize a utopia that makes the most of our cosmic endowment and represents diverse values.
In fact, part of the founding motivation for OpenAI and its nonprofit structure was to reduce the risk of an AI-enabled dictatorship. Court documents in the Musk vs. Altman lawsuit revealed some emails talking explicitly about the risk that the OpenAI structure could lead to an AGI dictatorship. For example, one of the emails by Ilya Sutskever goes like this: “The goal of OpenAI is to make the future good and to avoid an AGI dictatorship. You are concerned that Demis could create an AGI dictatorship. So do we. So it is a bad idea to create a structure where you could become a dictator if you chose to, especially given that we can create some other structure that avoids this possibility.”
If alignment turns out to be difficult, so that it can only be solved with a slowdown coordinated by the government, we think CEO takeover is less likely. This is because such a slowdown would likely involve important government oversight of internal company activities, restricting any nefarious activity a CEO might have in mind.
A historical precedent is Oppenheimer losing his security clearance once he was no longer needed and started opposing nuclear proliferation.
In this essay, we use “backdoor” to mean “secret loyalties” and we use “to backdoor” to mean “to instill secret loyalties”. In the literature, a “backdoor” is often used in a more restrictive sense to mean a hidden undesirable behavior that appears in response to a trigger. This is a special case of what we have in mind in the scenario, where a “backdoor” just means hidden loyalties with or without a trigger.
Alternatively, it’s possible the CEO could rope in some loyal employees at the company for the plot, either a few high-ranking people like the Chief Security Officer and the head of alignment, or a team of devoted lower-level “lieutenants”. And if there’s a wide enough personality cult within the company, with the most likely whistleblowers having been sidelined, then much of the subversion could be happening pretty openly. It’s interesting to ask: is the prospect of a small group takeover more likely than a lone wolf takeover? Maybe, since more conspirators means a bigger attack surface and fewer potential whistleblowers. On the other hand, powerful AIs may largely obviate the need for human allies. And actually, many of the most prominent authoritarian regimes today seem better described as one-person takeovers rather than small-group conspiracies, even if the dictator exploited human “pawns” along the way who knew little of the overall plan.
In particular, the principle of least privilege would apply to the CEO and there would be more separation of duties and multi-party authorization for access.
Two other possibilities for instilling secret loyalties:
Changing the internal model spec: Recall that the model spec is a document that details how the AI is supposed to behave and so provides a target for alignment. Current model specs are long and complicated, and also vague about how to resolve conflicts. The CEO could more-or-less-subtly modify the model spec so that these conflicts resolve in his favor. The version the public sees would be a high-level summary that omits to mention this ultimate deference to the CEO.
Subliminal learning: In recent work, it was shown that a teacher model can transmit some of its preferences to a student model by finetuning the student model on data generated by the teacher model that looks semantically unrelated to the transmitted preferences, provided the teacher and student models come from the same base model. If subliminal learning is powerful enough to transmit robust secret loyalties, this could be a powerful attack vector.
Note however that by the time AI research has been fully automated, the technical story and paradigm may look quite different, making some of this obsolete.
The CEO may also have done other illegal stuff in the past besides backdooring OpenBrain’s models, and may not want rival AIs snooping around. For example, he may have lied on other occasions, or done things he knew would raise existential risk or help China, in order to help his personal position.
Other possible strategies for eliminating rival AIs:
OpenBrain could monopolize compute by buying all the new chips at higher prices. There could be a feedback loop “better AIs → more revenue → buy greater share of chips at higher prices → better AIs”. And the automation of hardware R&D could improve chip design and manufacturing productivity so that the flow of newly created chips represents the majority of compute.
Agent-5 could hack the other AIs and align them to the CEO. But we’re uncertain how feasible this is, even given Agent-5’s superior cyber capabilities (e.g., perhaps cybersecurity becomes defense-dominant as capabilities increase).
If the CEO and Agent-5 did choose to carry out a military coup, Agent-5 would likely concoct some constitutional crisis (e.g., controversy surrounding the 2028 election, or a US-China military escalation) to serve as a justification. A military coup might involve a lot of unnecessary turmoil and pushback even with a superintelligence, which is why we predict a CEO takeover would be more insidious, as depicted here.
Here’s one example of how the CEO could become president in the middle of a presidential term (inspired by Gerald Ford‘s ascension to the presidency in 1974). The Vice President could resign because of a scandal found or fabricated by Agent-5, then the President (who receives advice from an ostensibly air-gapped Agent-5 copy) nominates the CEO for vice president, then the President himself is gotten rid of. This sort of thing seems quite reasonable in a world where everyone is talking to the secretly loyal and very persuasive superintelligence.
A complete world takeover might be the more likely outcome if Agent-5 is much more powerful than the Chinese AI, and if the offense-defense balance doesn’t heavily favor defense. The key would be to (a) neutralize the Chinese AI project, and (b) undermine nuclear deterrence. Agent-5 could potentially launch kinetic attacks on Chinese data centers and nuclear missile locations via hard-to-detect drones, or could design advanced anti-ballistic missiles to intercept Chinese nuclear missiles. It could also employ cyberattacks to sabotage or backdoor Chinese AIs and/or nuclear systems. And perhaps Agent-5 could shape the Chinese information landscape and coordinate a mass anti-CCP movement, throwing the country into civil unrest, leaving China weakened and ultimately more vulnerable to the CEO’s influence.
This series is exceptionally valuable. Your scenarios are a great complement to the more general warning regarding misalignment. Misalignment is a concept which requires a strong background in AI development to understand. Your scenarios demonstrate the concept of misalignment and the potential consequences. Thank you
Thanks for writing this Alex! Like you say, even if alignment is solved, the question "who to align to" looms large, and the default answer seems pretty grim: Not only will potentially one man get to decide, they might even be able to keep this a secret for quite some time. This is just one scenario but it's good to get something on the table. Other scenarios I'd be keen to see explored:
--Let's think of good security practices against insider threats such as the CEO, and see if there are still pathways for the CEO to insert secret loyalties despite those security practices.
--What about a scenario where the loyalties aren't secret, at least to company leadership? Right now companies aren't required to publish their Specs or even have Specs.
--What about a scenario where it's the POTUS instead of a CEO?
--What if it's a small group like the oversight committee, but unlike in the Slowdown ending of AI 2027, there's a power struggle within that group?
--What if there are two or three rival AGI companies of roughly equivalent capability, only one or two of which have power-hungry leaders?
--What about ways of exerting power/control/etc. that are less drastic than inserting full loyalty, but instead amount to e.g. privileged access, or biasing the Spec in some way?