Amazon Technologies Systems Engineer, Reliability and Automation Engineering Team (RAE) Job ID: Amazon UK Services Ltd. As an Amazon Technologies Systems Engineer, you will be the primary point of contact for internal customers and partners, driving the implementation and continuous improvement of world-class maintenance, repair, and supportability solutions for Amazon's Mechatronics and Sustainable Packaging systems … within Amazon's Fulfillment Centers. You will lead cross-functional engineering and product support teams engaged in continuous improvement initiatives to enhance processes around maintenance and reliability of these systems. You will analyze large-scale data from PLCs, sensors, controls equipment and maintenance records to identify improvement opportunities around preventative maintenance, system optimizations, reliability metrics, and overall equipment … will engage customers to understand and document business requirements, drive problems to root cause, and manage implementation programs for corrective actions. You will apply your expertise in robotics, mechatronics, reliability engineering and system lifecycle management to build scalable solutions that ensure optimal performance and availability of thousands of workcells across Amazon's global fulfillment center network. Key job responsibilities More ❯
ESPN and ESPN+, and much more. Innovation: We develop and execute groundbreaking products and techniques that shape industry norms and enhance how audiences experience sports, entertainment & news. The Data Reliability Engineering team for Disney's Product and Data Engineering team is responsible for maintaining and improving the reliability of Disney Entertainment's big data platform, which processes hundreds … of terabytes of data and billions of events daily. Job Summary: The Senior Software Engineer will help us in the ongoing mission of delivering outstanding services to our users allowing Disney Entertainment to be more data-driven. You will work closely with our partner teams to monitor and drive improvements for reliability and observability of their critical data … be required to build high quality data models and products that monitor and reports on data pipeline health and data quality. Work closely with all members of the Data Reliability Engineering team to set project deliverables, review design documents, perform code reviews and help mentor junior members of the team. Collaborate with engineering teams to improve, maintain, performance tune More ❯
platform connects tens of millions of customers with hundreds of thousands of restaurant, grocery and convenience partners across the globe. About this role We are seeking a seasoned Principal Engineer to lead the design, development, and evolution of our Observability Platform , ensuring it meets the needs of our rapidly scaling systems and engineering teams. This role will also focus … scale. Integrate ML/AI-driven solutions to enhance anomaly detection, root cause analysis, and predictive insights. Lead the development and adoption of platform capabilities to ensure system health, reliability, and performance. Establish and evolve platform standards and best practices to align with the company's overall engineering goals. Strategic Initiatives Collaborate with engineering teams to define the observability … available, performant, and secure across all environments. Optimize data collection, processing, and storage to balance performance with cost efficiency. Define SLAs, SLOs, and SLIs for observability services to support reliability engineering practices. Continuously improve MTTD and MTTR by leveraging advanced AI/ML models for predictive analysis and automated responses. Mentorship and Collaboration Act as a mentor and technical More ❯
Job Description:** * Subject to the successful closing of the transaction with Spirit, after obtaining relevant regulatory approvals * *Role: Infrastructure Customer ReliabilityEngineer* * Description: * Are you passionate about IT, experienced in IT Technical Services techniques and convinced by the business added value of digitalization? One key component of this Digital Transformation is the implementation of solutions to enable the … involving different P roduct and S ervice L ines teams (also called PSL) as well as the Digital Workplace (DW) teams, Security (DS) ... We as CRE - Infrastructure Customer ReliabilityEngineer - are the focal point and key partner for Business & IM applications teams to Advise, Design, Deliver and Operate best fit and innovative IT solutions to host applications More ❯
Hounslow, London, United Kingdom Hybrid / WFH Options
Deerfoot Recruitment Solutions
DevOps/Service ReliabilityEngineer Location: Hounslow/Hybrid (50% hybrid working) Duration: 12 months Rate: up to £430 per day (inside IR35) We're looking for a DevOps/Service ReliabilityEngineer who combines software development, automation, and operations expertise to help deliver highly reliable, scalable services. If you're passionate about automation, cloud technologies … OpenShift Monitoring: Splunk, Prometheus, Grafana Databases: Oracle (OCA/OCP a plus) Environments: Linux/Unix Strong debugging, problem-solving, and collaboration skills Proven experience in DevOps and service reliability roles Interested? Apply now and help build the future of resilient, automated infrastructure Deerfoot Recruitment Solutions Ltd is a leading independent tech recruitment consultancy in the UK. For every More ❯
Staff Software Engineer, AI Reliability Engineering London, UK About Anthropic Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to … build beneficial AI systems. About the role Anthropic is seeking talented and experienced Reliability Engineers, including Software Engineers and Systems Engineers with experience and interest in reliability, to join our team. We will be defining and achieving reliability metrics for all of Anthropic's internal and external products and services. While significantly improving reliability for Anthropic … GPUs, TPUs, Trainium, e.g.) Understand ML-specific networking optimizations like RDMA and InfiniBand. Have expertise in AI-specific observability tools and frameworks Understand ML model deployment strategies and their reliability implications Have contributed to open-source infrastructure or ML tooling Deadline to apply: None. Applications will be reviewed on a rolling basis. The expected salary range for this position More ❯
Platform reliability and release engineer - Hybrid - Permanent United Kingdom Job Description Posted Tuesday 1 July 2025 at 00:00 Salary: Up to £40,000 per annum (negotiable based on experience) + comprehensive benefits package Jisc grade: TDV2 (internal use only) Hours: 35 hours per week Reports into: Platform Reliability & Release Manager Working style: Hybrid - A blend of … and its members. This role also supports the release and environment strategy for Jisc's platforms, driving ongoing improvements to optimise quality and efficiency. Working closely with the Platform Reliability and Release Manager and development teams, it ensures timely, well-managed releases and maintains clear, up-to-date processes and documentation. Other key responsibilities: Support daily platform operations, ensuring More ❯