Research groups
Short info
Looking at technical AI safety topics, currently mechanistic interpretability of agentic frontier models. Frontier AIs can autonomously perform increasingly longer horizon tasks: https://arxiv.org/abs/2503.14499
It would be nice to have ways to ensure they're not dangerous.
It would be nice to have ways to ensure they're not dangerous.