Discussion about this post

User's avatar
SorenJ's avatar

I was just reading AI 2027 yesterday and doing the same thing, seeing how well it had fared so far!

I was surprised progress on quantitative metrics is at roughly 65%, subjectively reality felt slower than the forceast, but not by too much.

"(To avoid singling out any one existing company, we’re going to describe a fictional artificial general intelligence company, which we’ll call OpenBrain. We imagine the others to be 3–9 months behind OpenBrain.)

The race appears to be closer than we predicted, more like a 0-2 month lead between the top US AGI companies."

It doesn't even seem like it is accurate to describe any company as being in the lead anymore, because the different companies have focused on different things and are ahead in different areas. You're also not saying which company you think is in a 0-2 month lead! How are you grading yourself on this forcecast then?

"So when OpenBrain finishes training Agent-1, a new model under internal development, it’s good at many things but great at helping with AI research."

You didn't comment on this directly, but it seems like the gap in time between what is made available to consumers and what companies have available internally is very small nowadays. Originally when I read AI 2027 I had the impression that a company would have this Agent-1 internally for something like at least 4 months and make a ton of progress behind closed doors. Now it seems like consumers are getting the best coding agents available with only a ~1 month time lag. (The time delay between Opus 4.5 and Opus 4.6 was 2 months and 12 days. It doesn't seem like Anthropic had Opus 4.6 available internally at the time of the Opus 4.5 release? I guess I am just speculating...)

"By this point “finishes training” is a bit of a misnomer; models are frequently updated to newer versions trained on additional data or partially re-trained to patch some weaknesses."

Hmmm I would give this one a grade of a B- to B? You never specified what you meant by "frequently". While originally reading AI2027 was under the impression that you meant by this something more like continual learning: every day or week the model would be trained on the internet's worth of text that had occured in that time period and be retrained so that it was up to date with the news. But that might have just been me misinterpreting. It still feels like we are waiting, for example, for Sonnet 5.0 to "finish training" so that it can be released.

Mark Russell's avatar

Writing in November about your project (https://open.substack.com/pub/mwrussell1969/p/hyperstition?r=av0kj&utm_campaign=post&utm_medium=web), I closed with an assessment of it's predictive validity to date, which I thought was pretty damn good, if a bit slower than advertised.

I didn't put a % on it, but pressed to go back and do so I would have thought 75-80%.

I find your grade quite fair, and much more well calculated than mine (duh!).

Keep up the good work!

18 more comments...

No posts

Ready for more?