1 to 25 of 92 Permanent Reinforcement Learning Jobs in the UK excluding London

Senior Reinforcement Learning expert

Hiring Organisation
Barrington James
Location
Slough, Berkshire, UK
Employment Type
Full-time
development of intelligent controllers for real-world robotic systems. This is a hands-on, highly technical role: you'll design, build, and maintain advanced learning pipelines that combine imitation learning, reinforcement learning, and language or vision-conditioned models. You will play a pivotal role … infrastructure and becoming a core pillar of the research organization. What You'll Do Design and implement training pipelines that blend Imitation Learning and Reinforcement Learning (both offline and online) to teach robotic behaviors. Collect high-quality demonstration data by teleoperating robots (around 4–10 hours ...

Solution Architect - Generative AI Data and Post-Training

Hiring Organisation
NVIDIA
Location
Midlands, UK
Employment Type
Full-time
seeking someone skilled in data preparation and curation for large-scale model training, as well as in LLM alignment techniques such as reinforcement learning from human feedback (RLHF) and Supervised Finetuning (SFT). In this role, you will operate at the intersection of innovative AI research and real … curation, cleaning and Synthetic Data Generation. The candidate should have confirmed experience in Large-Scale post-training using supervised fine-tuning and/or Reinforcement Learning techniques. Hands-on experience applying techniques related to reinforcement learning for LLM alignment is a nice to have. What ...

Solution Architect - Generative AI Data and Post-Training

Hiring Organisation
NVIDIA
Location
Bradford, UK
Employment Type
Full-time
seeking someone skilled in data preparation and curation for large-scale model training, as well as in LLM alignment techniques such as reinforcement learning from human feedback (RLHF) and Supervised Finetuning (SFT). In this role, you will operate at the intersection of innovative AI research and real … curation, cleaning and Synthetic Data Generation. The candidate should have confirmed experience in Large-Scale post-training using supervised fine-tuning and/or Reinforcement Learning techniques. Hands-on experience applying techniques related to reinforcement learning for LLM alignment is a nice to have. What ...

Solution Architect - Generative AI Data and Post-Training

Hiring Organisation
NVIDIA
Location
Glasgow, UK
Employment Type
Full-time
seeking someone skilled in data preparation and curation for large-scale model training, as well as in LLM alignment techniques such as reinforcement learning from human feedback (RLHF) and Supervised Finetuning (SFT). In this role, you will operate at the intersection of innovative AI research and real … curation, cleaning and Synthetic Data Generation. The candidate should have confirmed experience in Large-Scale post-training using supervised fine-tuning and/or Reinforcement Learning techniques. Hands-on experience applying techniques related to reinforcement learning for LLM alignment is a nice to have. What ...

Solution Architect - Generative AI Data and Post-Training

Hiring Organisation
NVIDIA
Location
Birmingham, UK
Employment Type
Full-time
seeking someone skilled in data preparation and curation for large-scale model training, as well as in LLM alignment techniques such as reinforcement learning from human feedback (RLHF) and Supervised Finetuning (SFT). In this role, you will operate at the intersection of innovative AI research and real … curation, cleaning and Synthetic Data Generation. The candidate should have confirmed experience in Large-Scale post-training using supervised fine-tuning and/or Reinforcement Learning techniques. Hands-on experience applying techniques related to reinforcement learning for LLM alignment is a nice to have. What ...

Solution Architect - Generative AI Data and Post-Training

Hiring Organisation
NVIDIA
Location
Bristol, UK
Employment Type
Full-time
seeking someone skilled in data preparation and curation for large-scale model training, as well as in LLM alignment techniques such as reinforcement learning from human feedback (RLHF) and Supervised Finetuning (SFT). In this role, you will operate at the intersection of innovative AI research and real … curation, cleaning and Synthetic Data Generation. The candidate should have confirmed experience in Large-Scale post-training using supervised fine-tuning and/or Reinforcement Learning techniques. Hands-on experience applying techniques related to reinforcement learning for LLM alignment is a nice to have. What ...

Solution Architect - Generative AI Data and Post-Training

Hiring Organisation
NVIDIA
Location
Manchester, UK
Employment Type
Full-time
seeking someone skilled in data preparation and curation for large-scale model training, as well as in LLM alignment techniques such as reinforcement learning from human feedback (RLHF) and Supervised Finetuning (SFT). In this role, you will operate at the intersection of innovative AI research and real … curation, cleaning and Synthetic Data Generation. The candidate should have confirmed experience in Large-Scale post-training using supervised fine-tuning and/or Reinforcement Learning techniques. Hands-on experience applying techniques related to reinforcement learning for LLM alignment is a nice to have. What ...

Solution Architect - Generative AI Data and Post-Training

Hiring Organisation
NVIDIA
Location
Coventry, UK
Employment Type
Full-time
seeking someone skilled in data preparation and curation for large-scale model training, as well as in LLM alignment techniques such as reinforcement learning from human feedback (RLHF) and Supervised Finetuning (SFT). In this role, you will operate at the intersection of innovative AI research and real … curation, cleaning and Synthetic Data Generation. The candidate should have confirmed experience in Large-Scale post-training using supervised fine-tuning and/or Reinforcement Learning techniques. Hands-on experience applying techniques related to reinforcement learning for LLM alignment is a nice to have. What ...

Solution Architect - Generative AI Data and Post-Training

Hiring Organisation
NVIDIA
Location
Belfast, UK
Employment Type
Full-time
seeking someone skilled in data preparation and curation for large-scale model training, as well as in LLM alignment techniques such as reinforcement learning from human feedback (RLHF) and Supervised Finetuning (SFT). In this role, you will operate at the intersection of innovative AI research and real … curation, cleaning and Synthetic Data Generation. The candidate should have confirmed experience in Large-Scale post-training using supervised fine-tuning and/or Reinforcement Learning techniques. Hands-on experience applying techniques related to reinforcement learning for LLM alignment is a nice to have. What ...

Solution Architect - Generative AI Data and Post-Training

Hiring Organisation
NVIDIA
Location
Cardiff, UK
Employment Type
Full-time
seeking someone skilled in data preparation and curation for large-scale model training, as well as in LLM alignment techniques such as reinforcement learning from human feedback (RLHF) and Supervised Finetuning (SFT). In this role, you will operate at the intersection of innovative AI research and real … curation, cleaning and Synthetic Data Generation. The candidate should have confirmed experience in Large-Scale post-training using supervised fine-tuning and/or Reinforcement Learning techniques. Hands-on experience applying techniques related to reinforcement learning for LLM alignment is a nice to have. What ...

Solution Architect - Generative AI Data and Post-Training

Hiring Organisation
NVIDIA
Location
Sheffield, UK
Employment Type
Full-time
seeking someone skilled in data preparation and curation for large-scale model training, as well as in LLM alignment techniques such as reinforcement learning from human feedback (RLHF) and Supervised Finetuning (SFT). In this role, you will operate at the intersection of innovative AI research and real … curation, cleaning and Synthetic Data Generation. The candidate should have confirmed experience in Large-Scale post-training using supervised fine-tuning and/or Reinforcement Learning techniques. Hands-on experience applying techniques related to reinforcement learning for LLM alignment is a nice to have. What ...

Solution Architect - Generative AI Data and Post-Training

Hiring Organisation
NVIDIA
Location
Nottingham, UK
Employment Type
Full-time
seeking someone skilled in data preparation and curation for large-scale model training, as well as in LLM alignment techniques such as reinforcement learning from human feedback (RLHF) and Supervised Finetuning (SFT). In this role, you will operate at the intersection of innovative AI research and real … curation, cleaning and Synthetic Data Generation. The candidate should have confirmed experience in Large-Scale post-training using supervised fine-tuning and/or Reinforcement Learning techniques. Hands-on experience applying techniques related to reinforcement learning for LLM alignment is a nice to have. What ...

Solution Architect - Generative AI Data and Post-Training

Hiring Organisation
NVIDIA
Location
Liverpool, UK
Employment Type
Full-time
seeking someone skilled in data preparation and curation for large-scale model training, as well as in LLM alignment techniques such as reinforcement learning from human feedback (RLHF) and Supervised Finetuning (SFT). In this role, you will operate at the intersection of innovative AI research and real … curation, cleaning and Synthetic Data Generation. The candidate should have confirmed experience in Large-Scale post-training using supervised fine-tuning and/or Reinforcement Learning techniques. Hands-on experience applying techniques related to reinforcement learning for LLM alignment is a nice to have. What ...

Solution Architect - Generative AI Data and Post-Training

Hiring Organisation
NVIDIA
Location
Maidstone, Kent, UK
Employment Type
Full-time
seeking someone skilled in data preparation and curation for large-scale model training, as well as in LLM alignment techniques such as reinforcement learning from human feedback (RLHF) and Supervised Finetuning (SFT). In this role, you will operate at the intersection of innovative AI research and real … curation, cleaning and Synthetic Data Generation. The candidate should have confirmed experience in Large-Scale post-training using supervised fine-tuning and/or Reinforcement Learning techniques. Hands-on experience applying techniques related to reinforcement learning for LLM alignment is a nice to have. What ...

Solution Architect - Generative AI Data and Post-Training

Hiring Organisation
NVIDIA
Location
Chelmsford, Essex, UK
Employment Type
Full-time
seeking someone skilled in data preparation and curation for large-scale model training, as well as in LLM alignment techniques such as reinforcement learning from human feedback (RLHF) and Supervised Finetuning (SFT). In this role, you will operate at the intersection of innovative AI research and real … curation, cleaning and Synthetic Data Generation. The candidate should have confirmed experience in Large-Scale post-training using supervised fine-tuning and/or Reinforcement Learning techniques. Hands-on experience applying techniques related to reinforcement learning for LLM alignment is a nice to have. What ...

Solution Architect - Generative AI Data and Post-Training

Hiring Organisation
NVIDIA
Location
Slough, Berkshire, UK
Employment Type
Full-time
seeking someone skilled in data preparation and curation for large-scale model training, as well as in LLM alignment techniques such as reinforcement learning from human feedback (RLHF) and Supervised Finetuning (SFT). In this role, you will operate at the intersection of innovative AI research and real … curation, cleaning and Synthetic Data Generation. The candidate should have confirmed experience in Large-Scale post-training using supervised fine-tuning and/or Reinforcement Learning techniques. Hands-on experience applying techniques related to reinforcement learning for LLM alignment is a nice to have. What ...

Solution Architect - Generative AI Data and Post-Training

Hiring Organisation
NVIDIA
Location
Cambridge, Cambridgeshire, UK
Employment Type
Full-time
seeking someone skilled in data preparation and curation for large-scale model training, as well as in LLM alignment techniques such as reinforcement learning from human feedback (RLHF) and Supervised Finetuning (SFT). In this role, you will operate at the intersection of innovative AI research and real … curation, cleaning and Synthetic Data Generation. The candidate should have confirmed experience in Large-Scale post-training using supervised fine-tuning and/or Reinforcement Learning techniques. Hands-on experience applying techniques related to reinforcement learning for LLM alignment is a nice to have. What ...

Solution Architect - Generative AI Data and Post-Training

Hiring Organisation
NVIDIA
Location
Reading, Berkshire, UK
Employment Type
Full-time
seeking someone skilled in data preparation and curation for large-scale model training, as well as in LLM alignment techniques such as reinforcement learning from human feedback (RLHF) and Supervised Finetuning (SFT). In this role, you will operate at the intersection of innovative AI research and real … curation, cleaning and Synthetic Data Generation. The candidate should have confirmed experience in Large-Scale post-training using supervised fine-tuning and/or Reinforcement Learning techniques. Hands-on experience applying techniques related to reinforcement learning for LLM alignment is a nice to have. What ...

Solution Architect - Generative AI Data and Post-Training

Hiring Organisation
NVIDIA
Location
Luton, Bedfordshire, UK
Employment Type
Full-time
seeking someone skilled in data preparation and curation for large-scale model training, as well as in LLM alignment techniques such as reinforcement learning from human feedback (RLHF) and Supervised Finetuning (SFT). In this role, you will operate at the intersection of innovative AI research and real … curation, cleaning and Synthetic Data Generation. The candidate should have confirmed experience in Large-Scale post-training using supervised fine-tuning and/or Reinforcement Learning techniques. Hands-on experience applying techniques related to reinforcement learning for LLM alignment is a nice to have. What ...

Solution Architect - Generative AI Data and Post-Training

Hiring Organisation
NVIDIA
Location
Warrington, Cheshire, UK
Employment Type
Full-time
seeking someone skilled in data preparation and curation for large-scale model training, as well as in LLM alignment techniques such as reinforcement learning from human feedback (RLHF) and Supervised Finetuning (SFT). In this role, you will operate at the intersection of innovative AI research and real … curation, cleaning and Synthetic Data Generation. The candidate should have confirmed experience in Large-Scale post-training using supervised fine-tuning and/or Reinforcement Learning techniques. Hands-on experience applying techniques related to reinforcement learning for LLM alignment is a nice to have. What ...

Solution Architect - Generative AI Data and Post-Training

Hiring Organisation
NVIDIA
Location
Bournemouth, Dorset, UK
Employment Type
Full-time
seeking someone skilled in data preparation and curation for large-scale model training, as well as in LLM alignment techniques such as reinforcement learning from human feedback (RLHF) and Supervised Finetuning (SFT). In this role, you will operate at the intersection of innovative AI research and real … curation, cleaning and Synthetic Data Generation. The candidate should have confirmed experience in Large-Scale post-training using supervised fine-tuning and/or Reinforcement Learning techniques. Hands-on experience applying techniques related to reinforcement learning for LLM alignment is a nice to have. What ...

Solution Architect - Generative AI Data and Post-Training

Hiring Organisation
NVIDIA
Location
Bath, Somerset, UK
Employment Type
Full-time
seeking someone skilled in data preparation and curation for large-scale model training, as well as in LLM alignment techniques such as reinforcement learning from human feedback (RLHF) and Supervised Finetuning (SFT). In this role, you will operate at the intersection of innovative AI research and real … curation, cleaning and Synthetic Data Generation. The candidate should have confirmed experience in Large-Scale post-training using supervised fine-tuning and/or Reinforcement Learning techniques. Hands-on experience applying techniques related to reinforcement learning for LLM alignment is a nice to have. What ...

Solution Architect - Generative AI Data and Post-Training

Hiring Organisation
NVIDIA
Location
Basildon, Essex, UK
Employment Type
Full-time
seeking someone skilled in data preparation and curation for large-scale model training, as well as in LLM alignment techniques such as reinforcement learning from human feedback (RLHF) and Supervised Finetuning (SFT). In this role, you will operate at the intersection of innovative AI research and real … curation, cleaning and Synthetic Data Generation. The candidate should have confirmed experience in Large-Scale post-training using supervised fine-tuning and/or Reinforcement Learning techniques. Hands-on experience applying techniques related to reinforcement learning for LLM alignment is a nice to have. What ...

Solution Architect - Generative AI Data and Post-Training

Hiring Organisation
NVIDIA
Location
Peterborough, Cambridgeshire, UK
Employment Type
Full-time
seeking someone skilled in data preparation and curation for large-scale model training, as well as in LLM alignment techniques such as reinforcement learning from human feedback (RLHF) and Supervised Finetuning (SFT). In this role, you will operate at the intersection of innovative AI research and real … curation, cleaning and Synthetic Data Generation. The candidate should have confirmed experience in Large-Scale post-training using supervised fine-tuning and/or Reinforcement Learning techniques. Hands-on experience applying techniques related to reinforcement learning for LLM alignment is a nice to have. What ...

Solution Architect - Generative AI Data and Post-Training

Hiring Organisation
NVIDIA
Location
Preston, Lancashire, UK
Employment Type
Full-time
seeking someone skilled in data preparation and curation for large-scale model training, as well as in LLM alignment techniques such as reinforcement learning from human feedback (RLHF) and Supervised Finetuning (SFT). In this role, you will operate at the intersection of innovative AI research and real … curation, cleaning and Synthetic Data Generation. The candidate should have confirmed experience in Large-Scale post-training using supervised fine-tuning and/or Reinforcement Learning techniques. Hands-on experience applying techniques related to reinforcement learning for LLM alignment is a nice to have. What ...