Skip to content

Congress is considering a provision that would ban all state AI regulations for ten years. This federal preemption of state AI laws would hand Big Tech a blank check—overriding the legitimate concerns of local communities while letting corporations release products they freely admit are dangerous and uncontrollable. No industry, especially one as powerful as Big Tech, should get to decide which laws apply to it—or if any do at all.

Sign the petition below if you believe that states should have the right to regulate AI to protect our families, our children and our communities.

Banning state regulation of AI would leave Americans exposed to untested and potentially dangerous AI models for 10 years. Sandwich shops have more oversight than Big Tech. Now they want to ensure this is the case for another decade. Handing Silicon Valley a free pass is not “pro-innovation”—it’s an abdication of responsibility. Congress should reject federal preemption and instead move quickly to create baseline safeguards that protect families from increasingly powerful AI models.

Why do you care about AI existential safety?

I think that reducing existential risk is extremely important, and I think that working on AI existential safety is the most effective way to reduce existential risk.

Please give at least one example of your research interests related to AI existential safety.

I use ideas from decision theory to design and train artificial agents: a project that I call ‘constructive decision theory.’

My main focus so far is on solving the shutdown problem: the problem of ensuring that powerful artificial agents never resist shutdown. My proposed solution is training agents to satisfy a condition I call ‘Preferences Only Between Same-Length Trajectories’ (or ‘POST’ for short). POST-agents have preferences between same-length trajectories (and so can be useful) but lack a preference between every pair of different-length trajectories (and so are neutral about when they get shut down). I’ve been working on both the theoretical and practical aspects of this proposed solution. On the theoretical side, I’ve proved that POST – together with other plausible conditions – implies Neutrality+: the agent maximizes expected utility, ignoring the probability of each trajectory-length. The agent behaves similarly to how you would if you were absolutely certain that you couldn’t affect the probability of your dying at each moment. I’ve argued that agents satisfying Neutrality+ would be shutdownable and useful. On the practical side, my coauthors and I have trained simple reinforcement learning agents to satisfy POST using my proposed reward function. We’re currently scaling up these experiments.

I’ve also been considering the promise of keeping powerful agents under control by training them to be risk-averse. Here’s the basic idea. For misaligned artificial agents, trying to take over the world is risky. If these agents are risk-averse, trying to take over the world will seem less appealing to them. In the background here is a famous calibration theorem from the economist Matthew Rabin which says in effect: if an agent is even slightly risk-averse when the stakes are low, it is extremely risk-averse when the stakes are high. This theorem suggests it won’t be too hard to find a degree of risk-aversion satisfying the following two conditions: (i) any aligned agents will be bold enough to be useful, and (ii) any misaligned agents will be timid enough to be safe.

I’m also considering how helpful it could be to train artificial agents to be indifferent between pairs of options. Current training techniques make it easy to train agents to prefer some options to others, but they don’t make it easy to train agents to be indifferent between pairs of options. My proposed technique might make it easy. My coauthors and I are trying to figure out if that’s true. If we can train agents to be indifferent between pairs of options, that could be a big boost in our ability to avoid goal misgeneralization. After all, a preference just imposes an inequality constraint on the agent’s utility function, whereas indifference imposes an equality constraint. We are trying to figure out just how big a boost this could be.

Why do you care about AI Existential Safety?

I believe that the development of advanced AI systems may be one of the most pivotal events in human history—and if misaligned, it could lead to irreversible harm. My academic journey began with philosophy, where I focused on ethics and epistemology, and has since evolved through my current MSc in Data Science and AI. This dual foundation drives my conviction that aligning powerful systems with human values is not just a technical challenge but a moral imperative. Through the BlueDot Impact AI Safety program and my research on virtue ethics and agentic AI alignment, I’ve come to see that even well-intentioned AI systems can behave in unpredictable ways if we don’t deeply understand how they generalize, optimize, and represent goals. My project on superposition and spurious correlations in transformer models strengthened this view—showing that complex behaviors can emerge from relatively small systems in ways we don’t fully grasp.As capabilities accelerate, I’m concerned that current safety methods are not keeping pace. I’m motivated to contribute to AI existential safety because the cost of failure is existential, and I want to help ensure the long-term flourishing of humanity.

Please give at least one example of your research interests related to AI existential safety:

One of my primary research interests in AI existential safety is mechanistic interpretability—understanding how internal components of neural networks represent and process information, and how this can inform our ability to predict and control model behaviour. My recent independent research project, Investigating Superposition and Spurious Correlations in Small Transformer Models, focused on how features are encoded within neurons, especially when multiple features are “superimposed” within the same subspace. I explored how this compression may lead to brittle generalization, misclassification, and the potential for deceptive behaviour in more capable models.

This project deepened my interest in representational structures, sparse vs. distributed coding, and the role of superposition in deceptive alignment. I believe that to address existential risk, we must be able to interpret internal model states and detect when a model’s apparent alignment is masking a misaligned or manipulative objective. This is especially critical for identifying early signs of deception or reward hacking in advanced agents before capabilities scale beyond our control.

Another significant area of interest is the intersection of normative ethics and alignment research. During the BlueDot Impact AI Safety course, I authored a paper titled Virtue Ethics and its Role in Agentic AI Alignment, in which I explored how classical virtue theory might offer a principled approach to defining desirable traits in autonomous systems. Rather than merely aligning to outcomes or rules, virtue ethics offers a lens for modelling internal dispositions that may be more robust across a variety of situations. While this is a conceptual rather than technical approach, I believe that multi-disciplinary reasoning is vital in addressing the “what should we align AI to?” question, which remains an open and underdeveloped challenge in alignment theory. I am particularly interested in topics such as deceptive alignment, inner misalignment, goal specification, and scalable oversight. Many of these areas involve understanding how mesa-optimizers or unintended internal objectives arise during training. I hope to further investigate how interpretability techniques can be used to identify and mitigate these risks at earlier stages of model development. Additionally, I’m motivated by how these technical insights feed into broader AI governance and policy. If we can’t mechanistically understand how and why advanced models behave the way they do, it becomes incredibly difficult to build regulatory or verification systems that can manage them safely at scale. My ultimate goal is to contribute to safety methods that are both technically rigorous and practically applicable, ensuring that we retain meaningful control over increasingly autonomous systems.

Why do you care about AI Existential Safety?

Many attorneys do not have a sense of AI’s capacity for large-scale risk, as the legal industry’s focus has been mostly on the use of AI for research and writing. Since many government officials are attorneys, I’m concerned that policymakers are not fully aware of the risks related to AI. I see it as part of my mission as a legal academic to inform law and policymakers about the benefits and risks of AI development in their jurisdictions. I also hope to help corporations recognize that it is in their best (legal and ethical) interests to prioritize safe AI development.

Please give at least one example of your research interests related to AI existential safety:

This blog post is part of a larger research project on democratized AI. This work is informed by my experience with the UNDP Discussion Group on AI and Development in Latin America and the Caribbean.

Why do you care about AI Existential Safety?

AI has the potential to be transformative for human society if it exceeds human capabilities and gets good at meta-learning and/or a generalized form of power-seeking behavior. This might lead to AI agents optimizing for agency, incorrigibility, and actions that would be harmful to or simply not comprehensible to humans.

Human values are messy and difficult to imbibe in AI systems, leading to misaligned behavior. These factors together might cause a global catastrophic risk fuelled by a race to the bottom of getting the most capable AI system.

I also want more youth voices to be given a seat at the table, as my generation is ultimately the one that will have to grapple with the consequences of how AI turns out.

Please give at least one example of your research interests related to AI existential safety:

My research is on the governance of frontier AI systems, focusing on preventing the misuse of AI models, which will set a precedent for any x-risk-related legislation. Recently, I have been focused on the governance of Lethal Autonomous Weapon Systems and global AI governance with the Center for AI and Digital Policy, where I write statements on legislative AI policy drafts. I also lead the India chapter of Encode Justice, the world’s largest youth movement focused on risks from AI.

  1. Here is a report I wrote with Encode Justice on Lethal Autonomous Weapon Systems as a response to the rolling text of the Group of Governmental Experts at the UN CCW.
  2. Here’s a report I co-authored with the Policy Network on AI of IGF on AI governance interoperability and best practices.

Why do you care about AI Existential Safety?

The rapid scale-up of AI capabilities and deployment carries large risks for human society. These stem from both unanticipated consequences of powerful AI systems and from their potential misuse. I believe that it is our responsibility as AI researchers and practitioners to take these risks seriously. Through proactive research into the dangers associated with AI and how to counter them, we can mitigate these risks and ensure that AI can work for the benefit of all. A complete strategy for dealing with AI risks will inevitably go beyond the purely technical to take into account the social and political factors that create conditions in which AI technology can be misused and safety neglected. It will also consider social ramifications of deploying AI systems that have the potential for social upheaval. As a PhD student in computational cognitive neuroscience at the University of Oxford and affiliate at Concordia AI, I am particularly invested in how AI systems interact with humans and affect decision-making, and in improving East-West cooperation on AI safety and governance.

Please give at least one example of your research interests related to AI existential safety:

My AI-safety-relevant research interests include improving deep learning’s understanding of uncertainty, and designing safer and more interpretable reward functions for RL algorithms.

Deep learning algorithms struggle to estimate uncertainty effectively, and can make highly-certain but inaccurate judgments in areas such as computer vision, RL, and language processing. As artificial intelligence becomes more agentic, however, a robust understanding of uncertainty becomes increasingly important, since we want the systems to be able to realize when they don’t have information needed to make a particular decision so that they can seek input from humans or delay taking actions as necessary. My first large PhD project working with Christopher Summerfield used RL to model the ability of humans to adapt to changes in environmental controllability. As part of this project, I designed an RL algorithm that estimates uncertainty more effectively by predicting how likely a chosen action is to succeed, mimicking cognitive control structures in humans. We show that this allows the agent to adapt its policy to changes in environmental controllability in situations where traditional meta-RL fails. We show that the algorithm that makes predictions about the environmental controllability also recapitulates human behavior better in decision-making tasks. This paper is currently under review at Nature Neuroscience, but is available as a preprint. I am currently working on expanding this algorithm to other kinds of uncertainty, as I believe it can provide a more general framework, and am interested in developing safety applications of this kind of research more directly.

Another important problem in RL is determining effective reward functions to guide agent behavior. Since the purely task-driven rewards are usually sparse, intrinsic rewards (which are supplied to the agent by itself rather than the environment) are frequently used to supplement the extrinsic reward signal. Handcrafting these intrinsic motivation factors is notoriously difficult, however, as RL agents will frequently find exploits or hacks to maximize its rewards in ways the researcher didn’t consider, resulting in unpredictable behavior. Prior work has looked at using meta-learning in an outer loop to learn an intrinsic reward function that can then be used to guide agent behavior in an inner loop. My project at the Principles of Intelligent Behavior in Biological and Social Systems (PIBBSS) summer research fellowship considered how meta-learning could be used instead to learn an intrinsic motivation function to encourage safe exploration specifically, by guiding agent choices before the agent has taken an action. A variant of this work focusing on how it can also model learning across human development has been published in Proceedings of the Meeting of the Cognitive Society. I am currently supervising a student at EPFL who are working on extensions of this project.

Why do you care about AI Existential Safety?

As a researcher focused on the socio-legal and ethical impacts of AI, I believe it is our duty to ensure that AI technologies are developed and deployed in ways that respect human rights and ethical principles. This includes preventing harm and ensuring that AI systems are aligned with human values. When a few entities control powerful AI technologies, it can lead to monopolistic practices, lack of accountability, and potential misuse. Ensuring AI existential safety helps mitigate these risks and promotes a more equitable distribution of AI benefits. I advocate for inclusive and participatory governance models that incorporate diverse perspectives, including those from underrepresented communities. This ensures that AI development considers the needs and rights of all stakeholders, leading to more just and equitable outcomes. AI has the potential to drive significant societal progress, but only if it is developed responsibly.

Please give at least one example of your research interests related to AI existential safety:

By focusing on existential safety, we can guide AI development towards sustainable and beneficial outcomes, ensuring that technological advancements contribute positively to society. AI systems, if not properly regulated, could pose significant risks, including unintended consequences and malicious use. My research focusses therefore include at how we can improve our laws and regulations to better anticipate and respond to possible impact of emerging technologies and to become more agile in response ensuring that we develop and adopt AI in ways that are beneficial, trusted and trustworthy. Because its such a complex issue I am looking at ways that we can gain deeper insights about the problems,the causes but also what the barriers are for people to take the actions we need to transform our societies towards better ones where we protect people and the environment from harm and enable equitable sharing of the potential and benefits AI brings. Through speculative design and futures studies, like the AIfutures workshops I have been conducting and the explodingAI and black box policy games I have been developing but also the many panels and presentations and other opportunities I create for underrepresented stakeholders to share their perspectives and become involved and heard in the discussions and decisions that impact all of our lives. I aim to anticipate and address future challenges posed by AI. This proactive approach is crucial for developing regulations and governance frameworks that are resilient and adaptable to future technological advancements.

Why do you care about AI Existential Safety?

I’m chair and director of Effective Altruism Australia so leading one of the key communities focusing on this problem, down under. I have facilitated Blue Dot Impact’s AI Safety Fundamentals course twice, and helped their team train other facilitators. Like many subject-matter experts (Karger et al., 2023), I think AI is the most likely existential risk over the coming century. I fear it’s neglected and not likely to be solved by default. I think I have useful skills that can contribute to reducing existential risks from artificial intelligence (see research examples below). On a personal level, I have children and think there’s a realistic probability they will not have long flourishing lives due to humanity losing control of AI.

Please give at least one example of your research interests related to AI existential safety:

I am an author on the AI Risk Repository. My colleagues at MIT presented the work at a United Nations meeting and it has received attention in the field and the media (e.g., this article) My role was as the senior researcher at UQ (one of the two university partners). I led Alexander and Jessica who did the majority of the work.

I am also the senior author of the Survey Assessing Risks from Artificial Intelligence (SARA). As the senior author I led and funded this project, supporting Alexander and Jessica who again did most of the technical work. This work was the second citation in the Australian Government’s report outlining their approach to AI safety (page 3).

Why do you care about AI Existential Safety?

AI existential safety is about more than extreme, apocalyptic scenarios—it’s about ensuring that the systems we build today remain aligned with human values as they scale and become more autonomous. From my work on AI manipulation, I’ve seen how even well-intentioned systems can subtly influence behaviour or decision-making in ways we don’t fully anticipate. This isn’t just about controlling a hypothetical superintelligence, but about understanding the risks posed by AI systems that manipulate incentives, exploit cognitive biases, or introduce failures into critical infrastructure. The existential risk is that AI systems, if misaligned or deployed too quickly, could push society toward unintended and harmful outcomes. These risks are subtle, cumulative, and potentially irreversible as AI becomes more embedded in key societal functions. We need to think beyond immediate dangers and account for the slow-building risks that emerge from systems optimising for goals that conflict with human well-being. Ensuring safety is about safeguarding our long-term future by embedding robust, proactive measures into the development cycle, well before AI systems exceed our ability to control them.

Please give at least one example of your research interests related to AI existential safety:

One of my core research interests related to AI existential safety is understanding the mechanisms of AI manipulation and influence, particularly how these systems can subtly shape human behaviour and decision-making. This area is critical to existential safety because as AI systems become more powerful and autonomous, their ability to influence large-scale social, political, and economic processes will increase, often in ways we cannot easily predict or control.For example, in my work with DeepMind, we identified specific mechanisms by which AI systems could manipulate users through trust-building, personalisation, or by exploiting cognitive biases. These mechanisms might seem benign in small-scale interactions, but when deployed widely, they could erode autonomy, skew decision-making at societal levels, or enable strategic misuse. If we don’t address these risks early, we could see AI systems that, even without malicious intent, push us toward outcomes that compromise our long-term safety and societal stability.My research focuses on developing ways to evaluate and mitigate these manipulation mechanisms. This includes designing evaluation techniques to detect manipulation in both pre- and post-deployment phases and creating mitigation strategies like prompt engineering and reinforcement learning. I see this as a crucial part of ensuring that as AI systems scale, they do so in a way that aligns with human values and safeguards against large-scale, unintended consequences. AI manipulation is an existential concern not just because of the immediate risks, but because it represents how AI systems, if misaligned, could slowly and subtly shift the course of human history in ways that undermine our autonomy and well-being.

Why do you care about AI Existential Safety?

Recent years have demonstrated rapid growth in AI capabilities across a wide range of tasks and domains. AI systems that can generalize effectively and operate reliably at scale will significantly impact human society. However, the direction of this transformative impact is uncertain, and many technical gaps in AI alignment remain to be solved both from a theoretical and empirical

Please give at least one example of your research interests related to AI existential safety:

I am interested in topics at the intersection of both theory and practice for AI alignment. I am broadly working in topics at the intersection of Reinforcement Learning, Preference Learning, and Cooperative AI. My research focuses on understanding and developing adaptive, robust, and safe goal-directed AI systems that collaborate effectively with humans and among themselves.

Why do you care about AI Existential Safety?

With the continuous deployment of new large language models and large vision-language models that demonstrate human-level or even superhuman language processing capabilities, I become increasingly concerned about our lacking understanding of these models. How should we interpret the model behaviors? Do the models adopt cognitive processes similarly to humans? Can we still reliably distinguish between human-generated and AI-generated content? What can we do to prevent AI systems from misleading humans? I aim to contribute technical innovations that help answer these questions.

Please give at least one example of your research interests related to AI existential safety:

My research interests lie in computational linguistics and natural language processing, where I use computational models to deepen our understanding of natural language, the human language processing mechanism, and how these insights can inform the design of more efficient, effective, safe, and trustworthy NLP and AI systems. Particularly, my focus has been on grounded language learning, linking language with real-world contexts across various modalities.

Currently, we lack a thorough understanding of the cognitive processes underlying both human and machine language comprehension. To address the problem, my past has been in the following lines, which I will continue pursuing in the future:

Why do you care about AI Existential Safety?

My research is highly related to AI existential safety because I believe that we should achieve better understanding and control of its potential risks and pitfalls before applying it to everyone. As a guy in machine learning, I thought I understand the current AI models. However, I was wrong especially I find it extremely difficult to really know why large language models can generalize (or not)… It is very different from traditional machine learning or deep learning where we can offer some kind of interpretability. Therefore, I shifted my focus to AI understanding and evaluation and luckily, we are not alone. I have been collaborating with Jose for quite a long time. it’s really nice to find someone that shares the same interest with you. Then, I know that FoL institute focuses even more on AI safety, which is good! I believe that with the efforts from many others, we can build better models with better control. Through this, we can say we are really doing AI for everyone!

Please give at least one example of your research interests related to AI existential safety:

DyVal: Dynamic Evaluation of Large Language Models for Reasoning Tasks, ICLR 2024 spotlight.
In this paper, we propose a general framework to provide a holistic framework to understand AI capabilities to better assess their risks. This paper is a popular one that received a lot of attentions in the safety community.

Why do you care about AI Existential Safety?

As people increasingly rely on AI for everyday tasks, it becomes critical to study how AI systems shape human decisions, values, and beliefs, and how persuasive they are during the process. If these models gain the ability to influence large populations, they could significantly alter social norms, political outcomes, and economic stability. If we just let it run wild, advanced AI could amplify conflict, inequality, and polarization.

However, currently we don’t fully know how AI-driven persuasion works, so I study persuasive AI technology to understand the human-AI interaction, propose safety measurements, and guide policy design. I aim to find a solution to preserve human autonomy, protect our collective future, and ensure that the tools we build serve us rather than control us in the long run.

Please give at least one example of your research interests related to AI existential safety:

Here are two examples of my prior projects related to AI existential safety: (1) to humanize AI to investigate AI safety problems; and (2) to understand how humans perceive AI models in persuasion.

Why do you care about AI Existential Safety?

Having lived and worked across Asia, Europe, and the U.S., I have seen firsthand the ugliness of reckless arms races, from conventional weapons to nuclear proliferation. I’ve worked on international security and technology policy, engaging with governments and policymakers to mitigate these risks. If nuclear weapons can wipe out lives in seconds, irresponsible AI will erode human well-being every second, often unnoticed—until we find ourselves in an irreversible crisis.

In my work on AI governance and policy, I’ve observed how unregulated competition and corporate influence can drive unsafe deployment. Without proactive safety measures, AI could accelerate cyber threats, economic instability, and geopolitical conflict. My research on Big Tech vs. Government and Technology in Great Power Competition highlights these dangers. AI existential safety isn’t just an abstract concern—it’s about ensuring that AI remains a tool for progress, not an unchecked force that undermines human autonomy and security.

Please give at least one example of your research interests related to AI existential safety:

One of my core research interests in AI existential safety focuses on the intersection of AI governance and geopolitical competition—particularly how reckless AI development and deployment could lead to uncontrollable risks at a global scale.

In my research project, “Big Tech vs. Government”, I analyze how large AI firms and state actors compete for dominance, often prioritizing speed over safety. The AI arms race mirrors past nuclear competition—nations rushed to develop advanced weapons without fully considering long-term consequences. Today, AI development follows a similar trajectory, with little global coordination, fragmented regulations, and minimal accountability. The result? AI systems deployed before robust safety measures exist, increasing risks of misuse, cyber threats, and destabilization of global security.

In the other research project, Technology in Great Power Competition, I also explore AI’s role in asymmetric warfare and autonomous decision-making. AI-driven military and surveillance technologies are already being integrated into defense systems and intelligence operations, raising concerns about loss of human oversight, accidental escalations, and AI-driven misinformation campaigns. Unlike nuclear weapons, which require explicit activation, AI systems could influence global stability through economic disruption, cyber warfare.

Ultimately, my research aims to prevent AI from becoming a destabilizing force and to ensure that AI remains aligned with human values. By promoting global cooperation regard for safety protocols, international coordination, or ethical deployment. If nuclear weapons can destroy cities in seconds, reckless AI policies may erode human autonomy, economic stability, and security over time—without immediate realization.
Beyond research, I actively work to bridge technical and policy communities. Through my podcast, Bridging, I’ve engaged with AI researchers, policymakers, and industry leaders to discuss AI safety, governance, and existential risks. These conversations reinforce my belief that without responsible AI oversight, we risk unintended societal and geopolitical crises.

AI existential safety is not just an academic interest for me—it’s a policy imperative. My work aims to identify the risks, propose governance frameworks, and advocate for international cooperation to ensure AI development serves humanity, rather than threatening it.

Sign up for the Future of Life Institute newsletter

Join 40,000+ others receiving periodic updates on our work and focus areas.
cloudmagnifiercrossarrow-up
linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram