- Incentivizing neural networks to give answers which are easily checkable. We are doing this using prover-verifier games for which the equilibrium requires finding a proof system.
- Understanding (in terms of neural net architectures) when mesa-optimizers are likely to arise, their patterns of generalization, and how this should inform the design of a learning algorithm.
- Better tools for understanding neural networks.
- Better understanding of neural net scaling laws (which are an important input to AI forecasting).
Roger Grosse
Why do you care about AI Existential Safety?
Humanity has produced some powerful and dangerous technologies, but as of yet none that deliberately pursued long-term goals that may be at odds with our own. If we succeed in building machines smarter than ourselves — as seems likely to happen in the next few decades — our only hope for a good outcome is if we prepare well in advance.
Please give one or more examples of research interests relevant to AI existential safety: