Thread Reader
Megan Kinniment

Megan Kinniment
@MKinniment

Mar 20, 2025
1 tweets
Tweet
I agree it’s better than chess. I like the paper FWIW, but don’t think we should conclude that models can just generically do hour-long tasks and this doubles every 7 months (which is something I see people interpreting this paper to be saying).

Agree the result is not ‘AIs can do most tasks that humans can do in 1 hour’. I do think it should be an update towards capabilities following an exponential trend - at least on benchmarks. (This seems much more robust to e.g. task dist changes than the exact time horizon).

Megan Kinniment

Megan Kinniment

@MKinniment
I like agents, human or otherwise. @METR_Evals
Follow on 𝕏
Missing some tweets in this thread? Or failed to load images or videos? You can try to .