Technical PhD Fellowships

The Vitalik Buterin PhD Fellowship in AI Existential Safety is for PhD students who plan to work on AI existential safety research, or for existing PhD students who would not otherwise have funding to work on AI existential safety research.

Status:

Closed for submissions

Deadline: November 21, 2025

Fellows receive:

Tuition and fees for 5 years of their PhD, with extension funding possible.
$40,000 annual stipend at universities in the US, UK and Canada.
A $10,000 fund that can be used for research-related expenses such as travel and computing.
Invitations to virtual and in-person events where they will be able to interact with other researchers in the field.
Applicants who are short-listed for the Fellowship will be reimbursed for this year's application fees for up to 5 PhD programs.

See below for the definition of 'AI Existential Safety research' and additional eligibility criteria.

Questions about the fellowship or application process not answered on this page should be directed to grants@futureoflife.blackfin.biz

The Vitalik Buterin Fellowships in AI Existential Safety are run in partnership with the Beneficial AI Foundation (BAIF).

FLI offers Buterin Fellowships in pursuit of a vibrant AI existential safety research community free from financial conflicts of interest.
Anyone awarded a fellowship will need to confirm the following: "I am aware of FLI’s assessment that moving from a Buterin Fellowship to working (even on a safety team) for a company that is

a) racing to build AGI/ASI, and
b) not pushing for strong binding AI regulation

is a net negative for humanity. I therefore agree that, if I accept a Buterin Fellowship and take a job at any such company (including Anthropic, GoogleDeepMind, Meta, OpenAI, or xAI) within 2 years of completing my Buterin Fellowship, I will donate half of my gross compensation each month to a charity mutually agreeable to me and FLI, including half of any stock options or bonuses.”

Grant winners

People that have been awarded grants within this grant program:

Results

Stephen Casper

Casper, Stephen, et al. "Explore, Establish, Exploit: Red Teaming Language Models from Scratch." arXiv preprint arXiv:2306.09442 (2023).
Shah, Rusheb, et al. "Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation." arXiv preprint arXiv:2311.03348 (2023).

Stephen Casper, Cynthia Chen, Usman Anwar

Casper, Stephen, et al. "Open problems and fundamental limitations of reinforcement learning from human feedback." arXiv preprint arXiv:2307.15217 (2023).

Yawen Duan

Ji, Jiaming, et al. "Ai alignment: A comprehensive survey." arXiv preprint arXiv:2310.19852 (2023).

AI Safety Papers - Jinesis AI Lab

Pandey, Le, et al. "SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests" DisinfoCon 2025 (Oral)
Simko, Sachan, et al. "Improving Large Language Model Safety with Contrastive Representation Learning" EMNLP 2025 Main (Poster)
Piedrahita, Strauss, et al. "Democratic or Authoritarian? Probing a New Dimension of Political Biases in Large Language Models" NLP4Democracy Workshop @ COLM (Poster)
Piedrahita, Yang, et al. "Corrupted by Reasoning: Reasoning Language Models Become Free-Riders in Public Goods Games" COLM 2025 Main (Poster)
Pandey, Simko, et al. "Accidental Vulnerability: Factors in Fine-Tuning that Shift Model Safeguards" SoLaR Workshop @ COLM 2025 (Poster)
Samway, Mihalcea, et al. "When Do Language Models Endorse Limitations on Universal Human Rights Principles?" SoLaR Workshop @ COLM 2025 (Oral)
Yadav, Liu, et al. "Revealing Hidden Mechanisms of Cross-Country Content Moderation with Natural Language Processing" ACL Findings 2025 (Poster)
Hong, Dian Zhou, et al. "The Reasoning-Memorization Interplay in Language Models Is Mediated by a Single Direction" ACL Findings 2025 (Poster)
Piatti, Jin, et al. "Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents" NeurIPS 2024 (Poster)
Ortu, Jin, et al. "Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals" ACL 2024 Main (Poster)
Zhijing Jin, Sydney Levine, et al. "When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment" NeurIPS 2022 (Oral)

Request for Proposal

AI Existential Safety Research Definition

FLI defines AI existential safety research as:

Research that analyzes the most probable ways in which AI technology could cause an existential catastrophe (that is: a catastrophe that permanently and drastically curtailshumanity’s potential, such as by causing human extinction), and which types of research could minimize existential risk (the risk of such catastrophes). Examples include:
- Outlining a set of technical problems and arguments that their solutions would reduce existential risk from AI, or arguing that existing such sets are misguided.
- Concretely specifying properties of AI systems that significantly increase or decrease their probability of causing an existential catastrophe, and providing ways to measure such properties.

Technical research which could, if successful, assist humanity in reducing the existential risk posed by highly impactful AI technology to extremely low levels. Examples include:
- Research on interpretability and verification of machine learning systems, to the extent that it facilitates analysis of whether the future behavior of the system in a potentially different distribution of situations could cause existential catastrophes.
- Research on ensuring that AI systems have objectives that do not incentivize existentially risky behavior, such as deceiving human overseers or amassing large amounts of resources.
- Research on developing formalisms that help analyze advanced AI systems, to the extent that this analysis is relevant for predicting and mitigating existential catastrophes such systems could cause.
- Research on mitigating cybersecurity threats to the integrity of advanced AItechnology.
- Solving problems identified as important by research as described in point 1, or developing benchmarks to make it easier for the AI community to work on such problems.

The following are examples of research directions that do not automatically count as AIexistential safety research, unless they are carried out as part of a coherent plan for generalizing and applying them to minimize existential risk:

The mitigation of non-existential catastrophes, e.g. ensuring that autonomous vehicles avoid collisions, or that recidivism prediction systems do not discriminate based on race. We believe this kind of work is valuable; it is simply outside the scope of this fellowship.
Increasing the general competence of AI systems, e.g. improving generative modelling,or creating agents that can optimize objectives in partially observable environments.

Purpose and eligibility

The purpose of the fellowship is to fund talented students throughout their PhDs to work on AI existential safety research. To be eligible, applicants should either be graduate students or be applying to PhD programs. Funding is conditional on being accepted to a PhD program, working on AI existential safety research, and having an advisor who can confirm to us that they will support the student’s work on AI existential safety research. If a student has multiple advisors, these confirmations would be required from all advisors. There is an exception to this last requirement for first-year graduate students, where all that is required is an “existence proof”. For example, in departments requiring rotations during the first year of a PhD, funding is contingent on only one of the professors making this confirmation. If a student changes advisor, this confirmation is required from the new advisor for the fellowship to continue.

An application from a current graduate student must address in the Research Statement how this fellowship would enable their AI existential safety research, either by letting them continue such research when no other funding is currently available, or by allowing them to switch into this area.

Fellows are expected to participate in annual workshops and other activities that will be organized to help them interact and network with other researchers in the field.

Continued funding is contingent on continued eligibility, demonstrated by submitting a brief (~1 page) progress report due each summer.

There are no geographic limitations on applicants or host universities. We welcome applicants from a diverse range of backgrounds, and we particularly encourage applications from women and underrepresented minorities.

Application process

Applicants will submit a curriculum vitae, a research statement, and the names and email addresses of up to three referees, who will be sent a link where they can submit letters of recommendation and answer a brief questionnaire about the applicant. Applicants are encouraged but not required to submit their GRE scores using our DI code: 3234.

The research statement can be up to 3 pages long, not including references, outlining applicants’ current plans for doing AI existential safety research during their PhD. It should include the applicant’s reason for interest in AI existential safety, a technical specification of the proposed research, and a discussion of why it would reduce the existential risk of advanced AI technologies. For current PhD students, it should also detail why no existing funding arrangements allow work on AI existential safety research.

Timing for Fall 2025

The deadline for application is November 21, 2025 at 11:59 pm ET. After an initial round of deliberation, those applicants who make the short-list will then go through an interview process before fellows are finalized. Offers will be made no later than the end of March 2026.

Our other fellowships

US-China AI Governance PhD Fellowships

Closed for submissions

Deadline: November 21, 2025

Technical Postdoctoral Fellowships

Closed for submissions

Deadline: 5 January 2026

All Fellowships

Our other grant programs

All Grant Programs

Technical PhD Fellowships

Grant winners