1 to 25 of 65 Permanent Reinforcement Learning Jobs in England

Artificial Intelligence Engineer

Hiring Organisation
WorkGenius Group
Location
City of London, London, United Kingdom
Role: Full-time (Permanent Role) We are building a world-class AI research team focused on advancing next-generation agentic systems and intent-aware learning architectures. Our mission is to bridge cutting-edge research in large language models, reinforcement learning, and alignment with scalable, real-world production … systems. You will operate at the intersection of research and product, shaping foundational capabilities in intent understanding, agent learning, and model alignment across distributed AI environments. This is an opportunity to influence AI systems deployed at global scale across diverse compute environments including edge and cloud. Responsibilities Define Research ...

Artificial Intelligence Researcher

Hiring Organisation
microTECH Global LTD
Location
City of London, London, United Kingdom
permanent position with candidates required to do hybrid working in either Cambridge or London. Our client are looking for AI Researchers specialising in Reinforcement Learning with Human Feedback (RLHF) and Generative AI. In this role, you will design and optimise the algorithms that align large-scale generative models … build the next generation of foundation models Responsibilities: Develop and refine RLHF algorithms for large language and generative models. Research and implement deep reinforcement learning methods (policy gradients, actor-critic, off-policy learning) for model alignment. Train, fine-tune, and evaluate LLMs and diffusion models at scale. ...

Applied AI Research Engineer - £300k + bens - London

Hiring Organisation
Transparent Technology
Location
City of London, London, United Kingdom
Employment Type
Permanent
success. What You'll Do * Design and implement state-of-the-art instruction tuning methods * Fine-tune and deploy LLMs in production environments * Apply reinforcement learning techniques (SFT, PPO, DPO, GRPO) * Run hands-on experimentation to outperform closed-source models * Break down ambiguous research ideas into structured roadmaps … speech systems. Ideal Background * 5-7+ years in applied AI/ML (exceptional 3+ years considered) * Deep experience in fine-tuning + reinforcement learning * Experience shipping ML systems from research into production * Open-source LLM experience essential * Product-driven engineering mindset (Apple, LinkedIn, Amazon style environments ideal ...

Applied Data Scientist

Hiring Organisation
Change-IT Consulting Ltd
Location
Birmingham, West Midlands, United Kingdom
Employment Type
Permanent, Work From Home
faster and that genuinely meet user needs at national scale. Youll play a key role in exploring complex datasets, building production-ready machine learning and generative AI solutions, and working closely with multidisciplinary teams to translate real user problems into impactful AI capabilities. Key responsibilities include: Exploring, analysing … interpreting large, complex and diverse datasets to uncover insights and opportunities for AI-driven improvement. Designing, building, evaluating and optimising machine learning, deep learning and generative AI models for real-world service applications. Collaborating with engineers, product managers, designers and policy stakeholders to translate user needs into scalable ...

Principal Machine Learning Engineer

Hiring Organisation
Harnham - Data & Analytics Recruitment
Location
Manchester, Lancashire, England, United Kingdom
Employment Type
Full-Time
Salary
£85,000 - £100,000 per annum
Remote - Europe Staff/Lead Machine Learning Engineer We are working with a global languages and translation company. They are focused on designing, developing, and deploying cutting-edge machine learning solutions across the company. Responsibilites: Leadership of technical projects Mentoring junior members of the team Architecture … write and produce production-grade code in Python Experience with TensorFlow, PyTorch and Scikit-learn Experience with NLPs and LLMs Strong knowledge of machine learning techniques and algorithms, including supervised and unsupervised learning, deep learning, and reinforcement learning Leadership of technical projects ...

Machine Learning Engineer

Hiring Organisation
Block MB
Location
London Area, United Kingdom
Senior Machine Learning Engineer Location: London, UK About the Role We’re looking for an experienced Machine Learning Engineer to lead the development and training of advanced large-scale language models. In this role, you will be responsible for pushing the performance and reliability of next-generation … execute large-scale training experiments on multi-GPU and distributed environments using cutting-edge ML frameworks. Lead both supervised fine-tuning (SFT) and reinforcement learning (RL) workflows to improve model performance on domain-specific tasks. Build, maintain, and optimise custom training pipelines, including dataset preparation, distributed training primitives ...

Reinforcement Learning (RL) control Engineer

Hiring Organisation
Randstad Digital
Location
City of London, London, United Kingdom
Employment Type
Permanent
Reinforcement Learning (RL) Engineer Manipulation London Based (5 days in office) Competitive salary A high-profile robotics organization is urgently seeking a high-caliber RL Engineer (Manipulation) to join their London-based R&D team. This role is pivotal in bridging the gap between simulation and real-world … cloning. High-Performance Engineering: Designing and profiling research-grade PyTorch/JAX code to support large-scale, distributed RL infrastructure. Essential Skills Needed Deep Learning Mastery: 5+ years building and shipping models, with deep hands-on expertise in LLMs, VLMs, or generative architectures. Industry Experience: 3+ years of commercial ...

Senior Machine Learning Engineer

Hiring Organisation
OJ Digital
Location
Greater London, England, United Kingdom
Senior Machine Learning Engineer The Role We’re hiring a Senior or Staff ML Research Engineer to join a high growth AI company building advanced proprietary language models that power real world products at scale. This business has strong product market fit and significant enterprise adoption. A large proportion … Design and implement state of the art instruction tuning and information retrieval methods Fine tune and deploy large open source LLMs in production Apply reinforcement learning approaches including SFT, DPO, PPO and GRPO Develop models that outperform closed source alternatives Break down ambiguous research ideas into structured technical ...

Reinforcement Learning (RL) control Engineer

Hiring Organisation
Randstad Digital
Location
City, London, United Kingdom
Employment Type
Permanent
Salary
GBP 100,000 Annual
Reinforcement Learning (RL) Engineer Manipulation London Based (5 days in office) Competitive salary A high-profile robotics organization is urgently seeking a high-caliber RL Engineer (Manipulation) to join their London-based R&D team. This role is pivotal in bridging the gap between simulation and real-world ...

Research Scientist

Hiring Organisation
Axiōma Search
Location
City of London, London, United Kingdom
given inference cost. What you'll do Research post-training methods for large multimodal language models, with a focus on RL and feedback-driven learning Design reward models and large-scale reinforcement learning setups for instruction following and tool use Build automated data collection pipelines using human … cases into new training signals What you'll need Strong research background combined with hands-on experience with LLM post-training, alignment, or reinforcement learning Proficiency in Python and at least one major DL framework (PyTorch, JAX, or TensorFlow) Experience training large models on distributed systems Publications ...

Research Scientist, Machine Learning

Hiring Organisation
SoCode Recruitment
Location
Cambridge, England, United Kingdom
Research Scientist – Machine Learning Cambridge/Hybrid (Flexible 1 day per week in the office) You MUST have a PhD to apply Open to recent graduates, through to experienced research Leaders (Salary will match your level) About the Company We build data-efficient engineering AI software that helps teams … looking for collaborative researchers who enjoy solving complex problems together. Essential: PhD in a technical field or equivalent experience Published research in machine learning , statistics, or optimisation (conferences and/or journals) Desirable: Experience in decision-making methods ( Bayesian optimisation , bandits, reinforcement learning, active learning) Background ...

Machine Learning Researcher Statistics Python AI

Hiring Organisation
Client Server
Location
Cambridge, Cambridgeshire, East Anglia, United Kingdom
Employment Type
Permanent, Work From Home
Salary
£85,000
Machine Learning Researcher (PhD Statistics Python AI R&D) Cambridge/WFH to £85k Are you a tech savvy, PhD educated, Machine Learning Researcher looking for an opportunity to work on complex and interesting systems at the cutting edge of AI technology? You could be progressing your career … that provides AI and ML products for automotive innovators to design better cars faster and achieve greater sustainability through Machine Learning. As a Machine Learning Researcher you will work fairly independently, developing your own research programme, with a view to developing new tools and techniques for probabilistic models, Bayesian ...

Senior Data Scientist

Hiring Organisation
Anson Mccade
Location
London, United Kingdom
Employment Type
Permanent
Responsibilities End-to-End Delivery: Lead the technical execution of AI projects, from initial problem discovery and hypothesis testing to deploying production-grade machine learning models. Strategic Advisory: Act as a "translator" between technical complexity and business value. You will work closely with C-suite stakeholders to identify … solve their most pressing strategic challenges. Technical Leadership: Architect robust, scalable data pipelines and state-of-the-art models (including LLMs, Reinforcement Learning, or Bayesian Inference) tailored to specific client needs. Mentorship: Guide and upskill junior Data Scientists, fostering a culture of rigorous peer review, clean coding standards ...

Software Engineer - Large Language Models

Hiring Organisation
Fastino Labs
Location
Southampton, UK
Employment Type
Full-time
overall performance metrics Architect data processing pipelines, implementing filtering, balancing, and captioning systems to ensure training data quality across diverse content categories Implement reinforcement learning techniques including Direct Preference Optimization and Generalized Reward Preference Optimization to align model outputs with human preferences and quality standards Build robust … Required - Great velocity for building and shipping agents/AI products. Optional - Advanced degree (Master's or PhD) in Computer Science, Artificial Intelligence, Machine Learning, or related technical discipline with concentrated study in deep learning and computer vision methodologies Optional - Demonstrated ability to do independent research in Academic ...

Software Engineer - Large Language Models

Hiring Organisation
Fastino Labs
Location
Leicester, UK
Employment Type
Full-time
overall performance metrics Architect data processing pipelines, implementing filtering, balancing, and captioning systems to ensure training data quality across diverse content categories Implement reinforcement learning techniques including Direct Preference Optimization and Generalized Reward Preference Optimization to align model outputs with human preferences and quality standards Build robust … Required - Great velocity for building and shipping agents/AI products. Optional - Advanced degree (Master's or PhD) in Computer Science, Artificial Intelligence, Machine Learning, or related technical discipline with concentrated study in deep learning and computer vision methodologies Optional - Demonstrated ability to do independent research in Academic ...

Software Engineer - Large Language Models

Hiring Organisation
Fastino Labs
Location
Nottingham, UK
Employment Type
Full-time
overall performance metrics Architect data processing pipelines, implementing filtering, balancing, and captioning systems to ensure training data quality across diverse content categories Implement reinforcement learning techniques including Direct Preference Optimization and Generalized Reward Preference Optimization to align model outputs with human preferences and quality standards Build robust … Required - Great velocity for building and shipping agents/AI products. Optional - Advanced degree (Master's or PhD) in Computer Science, Artificial Intelligence, Machine Learning, or related technical discipline with concentrated study in deep learning and computer vision methodologies Optional - Demonstrated ability to do independent research in Academic ...

Software Engineer - Large Language Models

Hiring Organisation
Fastino Labs
Location
Telford, Shropshire, UK
Employment Type
Full-time
overall performance metrics Architect data processing pipelines, implementing filtering, balancing, and captioning systems to ensure training data quality across diverse content categories Implement reinforcement learning techniques including Direct Preference Optimization and Generalized Reward Preference Optimization to align model outputs with human preferences and quality standards Build robust … Required - Great velocity for building and shipping agents/AI products. Optional - Advanced degree (Master's or PhD) in Computer Science, Artificial Intelligence, Machine Learning, or related technical discipline with concentrated study in deep learning and computer vision methodologies Optional - Demonstrated ability to do independent research in Academic ...

Software Engineer - Large Language Models

Hiring Organisation
Fastino Labs
Location
Gloucester, Gloucestershire, UK
Employment Type
Full-time
overall performance metrics Architect data processing pipelines, implementing filtering, balancing, and captioning systems to ensure training data quality across diverse content categories Implement reinforcement learning techniques including Direct Preference Optimization and Generalized Reward Preference Optimization to align model outputs with human preferences and quality standards Build robust … Required - Great velocity for building and shipping agents/AI products. Optional - Advanced degree (Master's or PhD) in Computer Science, Artificial Intelligence, Machine Learning, or related technical discipline with concentrated study in deep learning and computer vision methodologies Optional - Demonstrated ability to do independent research in Academic ...

Software Engineer - Large Language Models

Hiring Organisation
Fastino Labs
Location
Woking, Surrey, UK
Employment Type
Full-time
overall performance metrics Architect data processing pipelines, implementing filtering, balancing, and captioning systems to ensure training data quality across diverse content categories Implement reinforcement learning techniques including Direct Preference Optimization and Generalized Reward Preference Optimization to align model outputs with human preferences and quality standards Build robust … Required - Great velocity for building and shipping agents/AI products. Optional - Advanced degree (Master's or PhD) in Computer Science, Artificial Intelligence, Machine Learning, or related technical discipline with concentrated study in deep learning and computer vision methodologies Optional - Demonstrated ability to do independent research in Academic ...

Software Engineer - Large Language Models

Hiring Organisation
Fastino Labs
Location
Dartford, Kent, UK
Employment Type
Full-time
overall performance metrics Architect data processing pipelines, implementing filtering, balancing, and captioning systems to ensure training data quality across diverse content categories Implement reinforcement learning techniques including Direct Preference Optimization and Generalized Reward Preference Optimization to align model outputs with human preferences and quality standards Build robust … Required - Great velocity for building and shipping agents/AI products. Optional - Advanced degree (Master's or PhD) in Computer Science, Artificial Intelligence, Machine Learning, or related technical discipline with concentrated study in deep learning and computer vision methodologies Optional - Demonstrated ability to do independent research in Academic ...

Software Engineer - Large Language Models

Hiring Organisation
Fastino Labs
Location
Bath, Somerset, UK
Employment Type
Full-time
overall performance metrics Architect data processing pipelines, implementing filtering, balancing, and captioning systems to ensure training data quality across diverse content categories Implement reinforcement learning techniques including Direct Preference Optimization and Generalized Reward Preference Optimization to align model outputs with human preferences and quality standards Build robust … Required - Great velocity for building and shipping agents/AI products. Optional - Advanced degree (Master's or PhD) in Computer Science, Artificial Intelligence, Machine Learning, or related technical discipline with concentrated study in deep learning and computer vision methodologies Optional - Demonstrated ability to do independent research in Academic ...

Software Engineer - Large Language Models

Hiring Organisation
Fastino Labs
Location
South London, UK
Employment Type
Full-time
overall performance metrics Architect data processing pipelines, implementing filtering, balancing, and captioning systems to ensure training data quality across diverse content categories Implement reinforcement learning techniques including Direct Preference Optimization and Generalized Reward Preference Optimization to align model outputs with human preferences and quality standards Build robust … Required - Great velocity for building and shipping agents/AI products. Optional - Advanced degree (Master's or PhD) in Computer Science, Artificial Intelligence, Machine Learning, or related technical discipline with concentrated study in deep learning and computer vision methodologies Optional - Demonstrated ability to do independent research in Academic ...

Software Engineer - Large Language Models

Hiring Organisation
Fastino Labs
Location
Lincoln, Lincolnshire, UK
Employment Type
Full-time
overall performance metrics Architect data processing pipelines, implementing filtering, balancing, and captioning systems to ensure training data quality across diverse content categories Implement reinforcement learning techniques including Direct Preference Optimization and Generalized Reward Preference Optimization to align model outputs with human preferences and quality standards Build robust … Required - Great velocity for building and shipping agents/AI products. Optional - Advanced degree (Master's or PhD) in Computer Science, Artificial Intelligence, Machine Learning, or related technical discipline with concentrated study in deep learning and computer vision methodologies Optional - Demonstrated ability to do independent research in Academic ...

Software Engineer - Large Language Models

Hiring Organisation
Fastino Labs
Location
Northampton, Northamptonshire, UK
Employment Type
Full-time
overall performance metrics Architect data processing pipelines, implementing filtering, balancing, and captioning systems to ensure training data quality across diverse content categories Implement reinforcement learning techniques including Direct Preference Optimization and Generalized Reward Preference Optimization to align model outputs with human preferences and quality standards Build robust … Required - Great velocity for building and shipping agents/AI products. Optional - Advanced degree (Master's or PhD) in Computer Science, Artificial Intelligence, Machine Learning, or related technical discipline with concentrated study in deep learning and computer vision methodologies Optional - Demonstrated ability to do independent research in Academic ...

Software Engineer - Large Language Models

Hiring Organisation
Fastino Labs
Location
Bournemouth, Dorset, UK
Employment Type
Full-time
overall performance metrics Architect data processing pipelines, implementing filtering, balancing, and captioning systems to ensure training data quality across diverse content categories Implement reinforcement learning techniques including Direct Preference Optimization and Generalized Reward Preference Optimization to align model outputs with human preferences and quality standards Build robust … Required - Great velocity for building and shipping agents/AI products. Optional - Advanced degree (Master's or PhD) in Computer Science, Artificial Intelligence, Machine Learning, or related technical discipline with concentrated study in deep learning and computer vision methodologies Optional - Demonstrated ability to do independent research in Academic ...