AI Safety Research Fellowship

Design and Publish Research on AI Safety

Open Philanthropy Logo Funded by Open Philanthropy, a 501(c)(3) nonprofit

Applications Closed for 2025

Overview

The AI Safety Research Fellowship spans 12 weeks and immerses you in cutting-edge safety research through expert-led sessions and hands-on projects.

Part-time program (25+ hours per week)
Exploration of AI safety research agendas to find ideal fit in the field
Technical exercises to upskill in relevant domains before starting research projects
Teams of 3 participants are paired with mentors at leading AI safety labs

Make an impact in the field of AI safety & alignment by surveying different research agendas, learning the technical skills to contribute to your chosen track, and working in a team to publish a novel research paper. AI safety and alignment with human values has been identified as a priority for AI model developers such as OpenAI and governments such as the UK's AI Security Institute.

Eligibility

Our program is open to university students and industry professionals worldwide who are looking to break into technical AI safety research, well-versed in ML fundamentals, and possess strong software engineering skills.

70 participants will be admitted as Algoverse AI Safety Foundations Participants for a one-week trial period focused on foundational learning. At the end of Week 1, 45 participants will be selected as AI Safety Research Fellows based on demonstrated effort and alignment with mentor research areas.

If our admissions committee is interested in your application, you will be invited to complete a take-home coding challenge, which should take 1-2 hours. This will help us assess your ability to use modern AI systems and analyze results from experiments.

We anticipate this fellowship to be highly selective. Applications are reviewed on a rolling basis. Due to limited capacity and high demand, we encourage applicants to submit as soon as possible.

Program Fee

Thanks to a grant from Open Philanthropy, we've received preliminary, conditional support from Open Philanthropy to run this program free of cost, covering compute resources, expert mentorship, and program administration. For conference registration and travel costs, we have limited funding reserved and can offer support in cases of demonstrated financial need.

Mentorship

Receive guidance from AI safety experts committed to:

Literature review & ideation
Code implementation & experimentation
Result analysis & iteration
Manuscript drafting & submission

Important Dates

Application Deadline: Sunday, July 6, 11:59 pm PT

Application Estimated Completion Time: 20-30 minutes

Program Dates: July 14 – October 3, 2025

Program Timeline

Phase 1

Foundations Trial Week & Team Matching

Week 1
Attend lectures & coding assignments
70 selected participants begin as Algoverse AI Safety Foundations Participants, attending daily lectures and exercises on RLHF, interpretability, SAEs, scalable oversight, evaluation, and adversarial robustness. This week builds foundational knowledge and allows participants to demonstrate effort and engagement.
Week 2
Selection & team proposal
45 participants are selected as AI Safety Research Fellows based on their Week 1 performance and alignment with mentor interests. Fellows are matched into teams and begin developing research proposals with feedback from the PI.

Phase 2

Implementation & analysis

Weeks 3-7
Implementation phase
Build and test your experiment pipeline in collaboration with your mentor.
Weeks 8-10
Analysis phase
Analyze results, draw insights, and plan any follow-up experiments.

Phase 3

Write & submit

Weeks 11-12
Paper writing
Draft your manuscript, incorporate mentor and PI feedback, and finalize for submission.

AI Safety Research Highlights

Explore some of our recent AI safety research papers from our students.

Accepted to ACL SRW 2025

From Directions to Cones: Multidimensional Representations of Propositional Facts in LLMs

Accepted to ACL SRW 2025

Semantic Convergence: Investigating Shared Representations Across Scaled LLMs

Accepted to ACL SRW 2025

Causal Language Control in Multilingual Transformers via Sparse Feature Steering

* The launch on July 14, 2025 will be the first iteration of our dedicated AI safety research track. The work above was produced in our core AI research program, which many of our students participated in previously.