Description Now role is hybrid (2 days on site is required) Number of days on site might change in the future. Job Title: Network ReliabilityEngineer Role Description: As we embark on a journey to transform the Network Services Group in CME, we are looking for a highly skilled Network ReliabilityEngineer to join us. We … team across US, UK, India and Singapore made up of a diverse range of people from varied backgrounds who each bring unique network experiences and skillsets. The new Network Reliability/Automation team is responsible for building a suite of custom automation tools and developing our self-healing capabilities while working closely with other members of the Network Services More ❯
Hounslow, London, United Kingdom Hybrid / WFH Options
Deerfoot Recruitment Solutions
DevOps/Service ReliabilityEngineer Location: Hounslow/Hybrid (50% hybrid working) Duration: 12 months Rate: up to £430 per day (inside IR35) We're looking for a DevOps/Service ReliabilityEngineer who combines software development, automation, and operations expertise to help deliver highly reliable, scalable services. If you're passionate about automation, cloud technologies … OpenShift Monitoring: Splunk, Prometheus, Grafana Databases: Oracle (OCA/OCP a plus) Environments: Linux/Unix Strong debugging, problem-solving, and collaboration skills Proven experience in DevOps and service reliability roles Interested? Apply now and help build the future of resilient, automated infrastructure Deerfoot Recruitment Solutions Ltd is a leading independent tech recruitment consultancy in the UK. For every More ❯
Staff Software Engineer, AI Reliability Engineering London, UK About Anthropic Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to … build beneficial AI systems. About the role Anthropic is seeking talented and experienced Reliability Engineers, including Software Engineers and Systems Engineers with experience and interest in reliability, to join our team. We will be defining and achieving reliability metrics for all of Anthropic's internal and external products and services. While significantly improving reliability for Anthropic … GPUs, TPUs, Trainium, e.g.) Understand ML-specific networking optimizations like RDMA and InfiniBand. Have expertise in AI-specific observability tools and frameworks Understand ML model deployment strategies and their reliability implications Have contributed to open-source infrastructure or ML tooling Deadline to apply: None. Applications will be reviewed on a rolling basis. The expected salary range for this position More ❯
Platform reliability and release engineer - Hybrid - Permanent United Kingdom Job Description Posted Tuesday 1 July 2025 at 00:00 Salary: Up to £40,000 per annum (negotiable based on experience) + comprehensive benefits package Jisc grade: TDV2 (internal use only) Hours: 35 hours per week Reports into: Platform Reliability & Release Manager Working style: Hybrid - A blend of … and its members. This role also supports the release and environment strategy for Jisc's platforms, driving ongoing improvements to optimise quality and efficiency. Working closely with the Platform Reliability and Release Manager and development teams, it ensures timely, well-managed releases and maintains clear, up-to-date processes and documentation. Other key responsibilities: Support daily platform operations, ensuring More ❯