Training AGI in Secret would be Unsafe and…

Daniel Kokotajlo

Apr 17

Bad for loss of control risks, bad for concentration of power risks

Read →

12 Comments

Robert

Apr 18

Thanks for sharing your thoughts.

You have laid out some plausible scenarios that really need to be more completely thought out as we proceed.

Expand full comment

Doina

Apr 18

Thank God SOMEONE is talking some sense about the elephant in the room. I really can't believe the apathy around this topic and how it is not feverishly discussed by everyone everywhere all the time

Expand full comment

Karen Doore

Jul 8

Thank you for your thoughtful post about this complex situation. What is sad is that the race to AGI / ASI reflects maladaptive human organizational structures where there extrinsic rewards reinforce egocentric adaptations creating catastrophic positive feedback cycles. The need for holistic models that value and integrate negative feedback to align with humanity is a recipe for tragedy. Humanity would not exist if not for the balancing and grounding aspect of yin-energy that provides opposing forces to adversarial models...collaboration is based on an understanding that the universe is best modeled as a fully connected complex adaptive-organism and love (eco-centric relationships) are the missing component...neuroscience, ecology, complexity...these sciences provide the missing perspectives about the self-organized criticality inherent as the intelligence of living systems.....Your efforts are critically important to highlight the deep flaws of the scaling approach because in addition to AGI / ASI threats....the impacts of these race conditions on efforts to slow the global ecological sustainability and poly-crisis...are where the wise VC money should actually be invested on behalf of humanity....Wise actions aligns with moral virtue, integrity, and intellectual humility and that seems in short supply in egocentric VC-hype cycles.

Expand full comment

Kind Futures

Jun 12

In a time where AI is advancing at unprecedented speed, a few voices are quietly choosing a harder path:

One that puts safety before scale. Wisdom before hype. Humanity before power.

There’s a new initiative called Safe Superintelligence Inc. — a lab built around one single goal:

To develop AGI that is safe by design, not just by hope or regulation.

If you're someone with world-class technical skills and the ethical depth to match —

this is your call to action.

We don’t need more AI.

We need better, safer, more compassionate AI.

Spread the word. Support the mission

Expand full comment

Kind Futures

Jun 9

https://www.youtube.com/watch?v=RGYBlN_MPZI

Expand full comment

Lila

Jun 8

Hi Daniel, would it be possible to hear more from you in a youtube like channel please? I'm very interested in getting educated on your concerns so that more people can start acting based on your awareness. I'm an English teacher from Brazil and over here majority of people haven't even heard of gpts, let alone their jobs will be taken. I have already used some of your interviews in my lessons but I gathered people must be made aware with more insiders views..that maybe too late but still there might be hope we together can act to make sure we are safe as a kind. I truly hope you can get back to me on this somehow. You come across as a new Snowden, very genuine with no personal interest other than to alert us to what we are about to face making Armageddon looks a walk in the park.I can't really say here why I know your concerns are right, but I would live to know an email address which I could send you more details proving your message has indeed to be taken extremely seriously.Hope to hear from you💫

Expand full comment

The Militant Intellectual

Jun 7

Spot on and thanks for sharing AI 2027. Gives me some backup to my thoughts on the timelines.

Expand full comment

Jamie Fisher

Apr 26

"Trump Administration Pressures Europe to Ditch AI Rulebook"

https://www.bloomberg.com/news/articles/2025-04-25/trump-administration-pressures-europe-to-reject-ai-rulebook

Expand full comment

Loic

Apr 19

Have you revised your beliefs about the lab wanting to inform the US government in light of recent developments with the administration?

Avoiding any political derailing here, I believe the game theory may have seriously shifted from the lab's perspective on the topic of keeping the government in the loop. In my view it now seems rational for a lab to run the risk of the administration finding out while operating in secrecy, rather than to risk perpetually informing them only for them to threaten to take aggressive action if they aren't granted access to further their agenda in potentially harmful or distracting ways. Their recent AI statements also make the case for concern "Your AI is really good now but you didn't tell us? shut it down!" seem much less likely. I can imagine lab employees are much less likely to whistleblow directly to the government, given they likely recognize this tension

Given Trump 1 and Biden terms as a prior, waking up the government discreetly made more sense than now in my eyes; very curious to hear if you've been thinking about this.

Expand full comment

Tom Pfau

Apr 18

Thanks for laying out these risks so clearly. Right now, AI is all about utility - quick performance and beating the competition - while it’s missing practical wisdom (phronesis), Aristotle’s idea of balancing means and ends as situations change. To me, practical wisdom means thinking ahead and making sound judgments, whereas utility focuses on immediate wins without looking at the bigger picture. From that angle, a secret‑first AGI approach chooses short‑term advantage over the long‑term insight that practical wisdom demands.

We should open up benchmarks so a wider community can spot flaws and make red‑teaming a collaborative effort that rewards practicality and ethics. People who’ve trained their minds - ethical philosophers and experienced meditators - should be invited to participate in the Alignment Olympics. Their insights could catch blind spots that purely technical audits may miss.

To build AGI that’s both powerful and responsible, we need the speed of utility and the foresight of practical wisdom.

Expand full comment

James Rice

Apr 18

With Trump, there is zero doubt in my mind that he would use ASI to take over the US, and subsequently the world, if it were developed during his presidency (which is likely). This is the most likely misuse case now.

Expand full comment

Sean

Apr 18

The idea of the API access to red team and research alignment on the leading edge models is appealing-- democratize the effort to make them safe, increase effort on the task. I wonder if there is a way to incubate a competitive aspect on that side? What if there were a race dynamic toward interpretability or steer-ability, in tandem with the race to capability?

To me a key problem is that I don't clearly see agreed upon figures of merit that would serve as good measures of success to crown a winner in these "Alignment Olympics".

Expand full comment

AI Futures Project

Training AGI in Secret would be Unsafe and…