26 to 29 of 29 Distributed Computing Jobs in the UK

Senior Site Reliability Engineer

Hiring Organisation
Realm
Location
City of London, London, United Kingdom
large-scale compute, data centre capacity, and power solutions for advanced machine learning workloads. Platforms support leading research and industry teams requiring high-performance computing at significant scale. Fast-paced environment with emphasis on ownership, execution speed, and quality. Culture centred on pragmatic problem-solving, cross-functional collaboration … full lifecycle responsibility. Role Overview: Position operating across software, infrastructure, and operations to ensure reliability, scalability, and performance of a globally distributed compute platform. Close collaboration with networking, platform engineering, and physical infrastructure teams to design and operate systems supporting high-demand computational workloads. Hands-on engineering role requiring ...

Senior DevOps Engineer

Hiring Organisation
Humanoid
Location
London Area, United Kingdom
operating multi-GPU, cross-cloud platforms that enable efficient, reliable, and scalable model training. You’ll work at the intersection of DevOps, MLOps, and distributed systems, helping push the limits of real-world AI. What You’ll Do: Design, build, and operate scalable multi-GPU infrastructure across cloud environments … code and automation for provisioning, orchestration, and lifecycle management Build and evolve CI/CD pipelines for both infrastructure and ML training workflows Optimize distributed training workloads (scheduling, resource utilization, observability) Ensure high standards of reliability, scalability, security, and monitoring across systems Collaborate with ML engineers and researchers ...

Staff DevOps Engineer

Hiring Organisation
Humanoid
Location
London Area, United Kingdom
multi-GPU, cross-cloud platforms, driving architecture, reliability, and performance at scale. This role sits at the intersection of DevOps, MLOps, and distributed systems, enabling cutting-edge AI in real-world environments. What You’ll Do: Lead the design and evolution of scalable multi-GPU infrastructure across cloud environments … code and automation for provisioning, orchestration, and lifecycle management Architect and improve CI/CD systems for both infrastructure and ML training workflows Optimize distributed training workloads (scheduling, resource utilization, observability) Partner with ML engineers and researchers to enable efficient experimentation and productionization Lead troubleshooting and resolution of complex ...

System Engineer: £120k + Bonus/benefits (AI Trading)

Hiring Organisation
Hunter Bond
Location
London, UK
operating systems to automation and observability—while gaining exposure to how a world-class investment firm manages its technology infrastructure. Key Responsibilities Manage a distributed compute environment and several petabyte-scale storage systems Install, configure, and monitor RHEL-based Linux environments Troubleshoot hardware and software issues across the stack … Experience with modern software development practices (version control, agile methodologies) Familiarity with infrastructure automation and configuration management tools (Chef, Puppet, or Ansible) Exposure to distributed storage systems and related protocols Experience with observability and monitoring tools (Elasticsearch, Logstash, Kibana, Datadog, Prometheus, Grafana) Strong written and verbal communication skills Demonstrated ...