Research Engineer
london, south east england, united kingdom
Hybrid/Remote Options
Hybrid/Remote Options
Anthropic
team, we are opportunistically hiring for the following research areas: AI Control: Creating methods to ensure advanced AI systems remain safe and harmless in unfamiliar or adversarial scenarios. Alignment Stress-testing: Creating model organisms of misalignment to improve our empirical understanding of how alignment failures might arise. Note: Currently, the team's hub is in San Francisco, so … may not hear back on your application to the London team unless we see an unusually strong fit. For this role, we conduct all interviews in Python. Representative Projects: Testing the robustness of our safety techniques by training language models to subvert our safety techniques, and seeing how effective they are at subverting our interventions. Run multi-agent reinforcement More ❯
Posted: