22 of 22 Reinforcement Learning Jobs in London

Artificial Intelligence Engineer

Hiring Organisation
WorkGenius Group
Location
City of London, London, United Kingdom
Role: Full-time (Permanent Role) We are building a world-class AI research team focused on advancing next-generation agentic systems and intent-aware learning architectures. Our mission is to bridge cutting-edge research in large language models, reinforcement learning, and alignment with scalable, real-world production … systems. You will operate at the intersection of research and product, shaping foundational capabilities in intent understanding, agent learning, and model alignment across distributed AI environments. This is an opportunity to influence AI systems deployed at global scale across diverse compute environments including edge and cloud. Responsibilities Define Research ...

Artificial Intelligence Researcher

Hiring Organisation
microTECH Global LTD
Location
City of London, London, United Kingdom
permanent position with candidates required to do hybrid working in either Cambridge or London. Our client are looking for AI Researchers specialising in Reinforcement Learning with Human Feedback (RLHF) and Generative AI. In this role, you will design and optimise the algorithms that align large-scale generative models … build the next generation of foundation models Responsibilities: Develop and refine RLHF algorithms for large language and generative models. Research and implement deep reinforcement learning methods (policy gradients, actor-critic, off-policy learning) for model alignment. Train, fine-tune, and evaluate LLMs and diffusion models at scale. ...

Applied AI Research Engineer - £300k + bens - London

Hiring Organisation
Transparent Technology
Location
City of London, London, United Kingdom
Employment Type
Permanent
success. What You'll Do * Design and implement state-of-the-art instruction tuning methods * Fine-tune and deploy LLMs in production environments * Apply reinforcement learning techniques (SFT, PPO, DPO, GRPO) * Run hands-on experimentation to outperform closed-source models * Break down ambiguous research ideas into structured roadmaps … speech systems. Ideal Background * 5-7+ years in applied AI/ML (exceptional 3+ years considered) * Deep experience in fine-tuning + reinforcement learning * Experience shipping ML systems from research into production * Open-source LLM experience essential * Product-driven engineering mindset (Apple, LinkedIn, Amazon style environments ideal ...

Machine Learning Engineer

Hiring Organisation
Block MB
Location
London Area, United Kingdom
Senior Machine Learning Engineer Location: London, UK About the Role We’re looking for an experienced Machine Learning Engineer to lead the development and training of advanced large-scale language models. In this role, you will be responsible for pushing the performance and reliability of next-generation … execute large-scale training experiments on multi-GPU and distributed environments using cutting-edge ML frameworks. Lead both supervised fine-tuning (SFT) and reinforcement learning (RL) workflows to improve model performance on domain-specific tasks. Build, maintain, and optimise custom training pipelines, including dataset preparation, distributed training primitives ...

Reinforcement Learning (RL) control Engineer

Hiring Organisation
Randstad Digital
Location
City of London, London, United Kingdom
Employment Type
Permanent
Reinforcement Learning (RL) Engineer Manipulation London Based (5 days in office) Competitive salary A high-profile robotics organization is urgently seeking a high-caliber RL Engineer (Manipulation) to join their London-based R&D team. This role is pivotal in bridging the gap between simulation and real-world … cloning. High-Performance Engineering: Designing and profiling research-grade PyTorch/JAX code to support large-scale, distributed RL infrastructure. Essential Skills Needed Deep Learning Mastery: 5+ years building and shipping models, with deep hands-on expertise in LLMs, VLMs, or generative architectures. Industry Experience: 3+ years of commercial ...

Senior Machine Learning Engineer

Hiring Organisation
OJ Digital
Location
Greater London, England, United Kingdom
Senior Machine Learning Engineer The Role We’re hiring a Senior or Staff ML Research Engineer to join a high growth AI company building advanced proprietary language models that power real world products at scale. This business has strong product market fit and significant enterprise adoption. A large proportion … Design and implement state of the art instruction tuning and information retrieval methods Fine tune and deploy large open source LLMs in production Apply reinforcement learning approaches including SFT, DPO, PPO and GRPO Develop models that outperform closed source alternatives Break down ambiguous research ideas into structured technical ...

Reinforcement Learning (RL) control Engineer

Hiring Organisation
Randstad Digital
Location
City, London, United Kingdom
Employment Type
Permanent
Salary
GBP 100,000 Annual
Reinforcement Learning (RL) Engineer Manipulation London Based (5 days in office) Competitive salary A high-profile robotics organization is urgently seeking a high-caliber RL Engineer (Manipulation) to join their London-based R&D team. This role is pivotal in bridging the gap between simulation and real-world ...

Research Scientist

Hiring Organisation
Axiōma Search
Location
City of London, London, United Kingdom
given inference cost. What you'll do Research post-training methods for large multimodal language models, with a focus on RL and feedback-driven learning Design reward models and large-scale reinforcement learning setups for instruction following and tool use Build automated data collection pipelines using human … cases into new training signals What you'll need Strong research background combined with hands-on experience with LLM post-training, alignment, or reinforcement learning Proficiency in Python and at least one major DL framework (PyTorch, JAX, or TensorFlow) Experience training large models on distributed systems Publications ...

Senior Data Scientist

Hiring Organisation
Anson Mccade
Location
London, United Kingdom
Employment Type
Permanent
Responsibilities End-to-End Delivery: Lead the technical execution of AI projects, from initial problem discovery and hypothesis testing to deploying production-grade machine learning models. Strategic Advisory: Act as a "translator" between technical complexity and business value. You will work closely with C-suite stakeholders to identify … solve their most pressing strategic challenges. Technical Leadership: Architect robust, scalable data pipelines and state-of-the-art models (including LLMs, Reinforcement Learning, or Bayesian Inference) tailored to specific client needs. Mentorship: Guide and upskill junior Data Scientists, fostering a culture of rigorous peer review, clean coding standards ...

Software Engineer - Large Language Models

Hiring Organisation
Fastino Labs
Location
South London, UK
Employment Type
Full-time
overall performance metrics Architect data processing pipelines, implementing filtering, balancing, and captioning systems to ensure training data quality across diverse content categories Implement reinforcement learning techniques including Direct Preference Optimization and Generalized Reward Preference Optimization to align model outputs with human preferences and quality standards Build robust … Required - Great velocity for building and shipping agents/AI products. Optional - Advanced degree (Master's or PhD) in Computer Science, Artificial Intelligence, Machine Learning, or related technical discipline with concentrated study in deep learning and computer vision methodologies Optional - Demonstrated ability to do independent research in Academic ...

Lead AI Engineer

Hiring Organisation
Akixi
Location
London, UK
Employment Type
Full-time
similar conversational-AI platforms. Deep understanding of prompt engineering and fine-tuning of large language models. Strong grounding in ML concepts — supervised, unsupervised, and reinforcement learning. Familiarity with cloud AI/ML services (e.g. Azure Cognitive Services, AWS SageMaker, and/or GCP Vertex AI). Experience deploying ...

AI Engineer

Hiring Organisation
IC Resources
Location
London Area, United Kingdom
advanced autonomous systems is looking for an AI Developer to join its growing team in London. This role sits at the intersection of machine learning and robotics, focused on building intelligent decision-making and control systems for real-world platforms. You’ll be working closely with engineers and researchers … develop AI that operates reliably in complex, changing environments. The Role You’ll be responsible for designing and implementing probabilistic and machine learning models that support navigation, planning and adaptive control. A key part of the position will involve integrating these models into real robotic systems and ensuring they ...

Data Scientist

Hiring Organisation
Harnham - Data & Analytics Recruitment
Location
London, South East, England, United Kingdom
Employment Type
Full-Time
Salary
£55,000 - £65,000 per annum
will join a collaborative cross functional unit working across engineering, product and data. The Role Develop and deploy production grade Python and deep learning models. Build NLP and LLM features including embeddings, intent detection and conversational AI. Contribute to end to end pipelines using cloud services, microservices and containerisation. … Experiment with advanced techniques including reinforcement learning and RAG workflows. Collaborate closely with engineering and product on delivery and performance. Present work clearly in team sessions and contribute to technical decision making. Your Skills and Experience Strong Python skills and experience deploying ML models into production. Hands ...

Lead ML Engineer (London)

Hiring Organisation
Glite Tech
Location
City of London, London, United Kingdom
English to intermediate and advanced learners. We’re on the verge of solving one of the biggest challenges in education – making high-quality, personalised learning accessible to everyone . We are building a fundamental model for education - one that can accurately predict student knowledge and orchestrate lessons, adapting … models to production, to own the ML team in our growing company. What you will do 🚀 Build fundamental models for education - solving the ultimate learning task of predicting student knowledge and optimal ‘next task’ Work with a vast amount of unique data - we have data from over 1M language ...

Agentic Developer - Building guardrails for autonomous AI

Hiring Organisation
governr
Location
London, UK
Employment Type
Full-time
requirements through first principles • You communicate technical concepts clearly to non-technical stakeholders Highly Valued (Differentiated Candidates) • Publications or research in multi-agent systems, reinforcement learning, AI safety, or agent architectures • Experience at AI labs (Anthropic, OpenAI, DeepMind) or leading AI research groups • Production experience with agents: LangChain … Dr. Ayman Hindy, Marcel Cassard, and leading figures in AI, high frequency risk management and financial regulation. Early team of sharp, mission-driven builders. Learning Curve: You'll gain expertise in cutting-edge AI architectures, enterprise software, regulatory frameworks, and category creation simultaneously. This is one of those roles ...

Senior NLP Engineer (London)

Hiring Organisation
Glite Tech
Location
City of London, London, United Kingdom
English to intermediate and advanced learners. We’re on the verge of solving one of the biggest challenges in education – making high-quality, personalised learning accessible to everyone . We are building a fundamental model for education - one that can accurately predict student knowledge and orchestrate lessons, adapting … models to production, to join the ML team in our growing company. What you will do 🚀 Build fundamental models for education - solving the ultimate learning task of predicting student knowledge and optimal ‘next task’ Build fully-automated pipelines for dictionary building; including span identification, word sense distribution, and sense ...

AI Simulation and Control Engineer (up to £125k + equity)

Hiring Organisation
Optimal Agriculture
Location
Greater London, England, United Kingdom
performance of Optimal's AI to maximise crop yields and minimise resource consumption, working closely with our agronomy experts. Technical skills Experience training Machine Learning models. Strong background in at least one of Machine Learning, Optimisation, Control (Model Predictive Control, Optimal Control, and classical feedback techniques), Reinforcement Learning, Physics Modelling and Numerical Simulation. Software engineering in Python (Julia is a bonus) Software engineering processes and tools (containers, version control, deployments etc) AI coding Compensation Salary: £70k – £125k Equity: 0.5% – 5.0 ...

Research Engineer (Agents)

Hiring Organisation
Native
Location
City of London, London, United Kingdom
structured data representations (tables, graphs, schemas) Build training, simulation, and evaluation environments for long-horizon, multi-step agent behavior Develop self-supervised and reinforcement learning objectives for improving agent reliability and correctness Integrate foundation model embeddings and symbolic components into agent workflows Own agent systems ...

Data Scientist

Hiring Organisation
Odysse Ltd
Location
Croydon, England, United Kingdom
data infrastructure that supports both today’s human-driven fleets and tomorrow’s autonomous mobility networks. This is a hands-on applied machine learning role focused on building and improving decision systems that directly influence live fleet operations and contribute to long-term autonomous fleet orchestration capabilities. You will … real-world data and optimise for practical impact rather than just model accuracy Has exposure to advanced modelling approaches (e.g. neural networks, optimisation, or reinforcement learning) Nice to Have Experience with time-series or geospatial datasets, experimentation or optimisation problems Experience in logistics, marketplaces, mobility systems, ride-hailing ...

Reinforcement Learning (RL) Engineer, Manipulation

Hiring Organisation
Randstad Digital
Location
City of London, London, United Kingdom
Employment Type
Permanent
talent to solve the most complex challenges in high-DOF autonomous systems and embodied AI. We are looking for experts across: AI/Machine Learning MLOps Software Engineering Data Science The Environment The mission is driven by high-bandwidth, in-person collaboration. This is a 5-day-a-week ...

Reinforcement Learning (RL) Engineer, Manipulation

Hiring Organisation
Randstad Technologies
Location
London, South East, England, United Kingdom
Employment Type
Full-Time
Salary
£80,000 - £120,000 per annum
talent to solve the most complex challenges in high-DOF autonomous systems and embodied AI. We are looking for experts across: AI/Machine Learning MLOps Software Engineering Data Science The Environment The mission is driven by high-bandwidth, in-person collaboration. This is a 5-day-a-week ...

Data Scientist - Inside IR35 - Hybrid

Hiring Organisation
Halian Technology Limited
Location
Croydon, Surrey, South East, United Kingdom
Employment Type
Contract
Role We are recruiting on behalf of a mobility technology business building intelligent fleet orchestration systems. This role suits an experienced Applied Machine Learning Engineer or Data Scientist comfortable working with messy real-world data, operational constraints, and production systems. Youll join a small, high-calibre team solving complex … years Geospatial data experience (H3, GeoPandas, PostGIS or similar) Optimisation/operations research exposure Logistics/mobility/marketplace domain experience Nice to Have Reinforcement learning Simulation modelling Experience deploying models into cloud environments Experimentation frameworks (A/B testing, model validation at scale) How to Apply ...