๐๐ผ Hi, I’m Archit. I work on AI safety, mostly on questions about what language models know about themselves โ which turns out to be less than you’d hope and more than you’d expect. This is where I share some of it.
2026 1
March
Do Models Know They're Being Tested? Probing Eval-Awareness Across Scale and Architecture