๐๐ผ Hi, I’m Archit. I work on AI safety โ on questions about whether we can trust AI systems to do what we think they’re doing. This is where I share some of it.
2026 1
March
Do Models Know They're Being Tested? Probing Eval-Awareness Across Scale and Architecture