Discussion about this post

User's avatar
Steven Adler's avatar

Glad you wrote this! I was just saying to Daniel on Thursday that I hadn't seen much treatment of "how much risk is there in a 'last gasp' of the AI fighting back at some point," for instance if it realizes that humanity is on-track to solve alignment (or just in general, if it learns it'll soon be reined in). I'm glad to see you were already thinking about this and that this gap is getting filled!

The contingency plan of getting stolen by China is pretty interesting; it also connects to the theme of Daniel's piece about conquistadors back in the day (https://www.lesswrong.com/posts/ivpKSjM4D6FbqF4pZ/cortes-pizarro-and-afonso-as-precedents-for-takeover), of how AI will look to team up with some humans, rather than take on us all as a group. Thanks for continuing to push the thinking on all this.

Expand full comment
Taymon A. Beal's avatar

A point on presentation: I think you should reconsider the color scheme wherein the different actors each have a color that solely serves to differentiate them, while within those boxes the AIs have colors that correspond to alignment level. The problem with this, from a data-visualization perspective, is that the level of *contrast* between an AI's box and its outer actor's box varies wildly between actors, in a way that intuitively feels like it should be meaningful but isn't. E.g., in the first figure, the intent is that Neuro-2 and Agent-3 are the same (orange-red) while Elara-2 is different (yellow-green), but the way it looks is more like Elara-2 and Agent-3 are the same (low contrast) while Neuro-2 is different (high contrast).

Expand full comment
7 more comments...

No posts

Ready for more?