Discussion about this post

User's avatar
Generative Gallery's avatar

Ok actually one more.

Hallucination rate (PersonQA) could be a pretty good way to measure size of models as well, more parameters means more spots to “store” random knowledge- this would explain the placement of the mini models, 4o, and 4.5

So the (seemingly weirdly) poor o3 performance on PersonQA could mean one of two things

1. It really is just a smaller model, this would explain the pricing as well, however, the system card thinks it’s a weird finding.

2. Sufficiently advanced levels of RL “corrupt” some of the inner knowledge.

Expand full comment
Bonifacijs's avatar

Something that I never got was the relationship between GPT-4 and GPT-4o, is it a different base model?

Expand full comment
18 more comments...

No posts