Vitals for optimal performance. Integrate third-party software into the platform, including tag management using Google Tag Manager (GTM) . Improve and maintain platform observability tools and systems. Manage and enhance automated CI/CD pipelines for efficient and reliable deployments. Ensure sites are accessible to all users, meeting WCAG More ❯
blockers to keep teams moving forward Champion engineering best practices including clean code, secure design, CI/CD, and test automation Oversee infrastructure resilience, observability, and incident response processes Act as the bridge between Product and Engineering teams. Provide transparency on engineering progress, challenges, and decisions to senior stakeholders. Key More ❯
blockers to keep teams moving forward Champion engineering best practices including clean code, secure design, CI/CD, and test automation Oversee infrastructure resilience, observability, and incident response processes Act as the bridge between Product and Engineering teams. Provide transparency on engineering progress, challenges, and decisions to senior stakeholders. Key More ❯
valueIntegrate AI models into operational workflows Ensure reliability through fail-safes, self-healing, and fallback mechanisms Monitor & improve AI performance with feedback loops and observability tools Collaborate with Data Engineers to ensure AI has accurate, real-time data Implement human-in-the-loop systems where needed Skills and Qualifications Experience More ❯
valueIntegrate AI models into operational workflows Ensure reliability through fail-safes, self-healing, and fallback mechanisms Monitor & improve AI performance with feedback loops and observability tools Collaborate with Data Engineers to ensure AI has accurate, real-time data Implement human-in-the-loop systems where needed Skills and Qualifications Experience More ❯
company relies Leading the team strategy, including defining and tracking its success measures Driving the operational excellence of the team, including its on-call, observability and security postures Working closely with our telecommunication and networking partners and managing those relationships Building a team of diverse & experienced engineers, and guiding them More ❯
across our coordination server backend and open source client code. Exhibit ownership over the running services that comprise Tailscale's product by building for observability, participating in incident response, and fielding customer support escalations. Analyze and improve the efficiency, scalability, and stability of systems and resources. Bring a security-first More ❯
including performance tuning, query optimisation, and scaling strategies Modern data engineering experience : Hands-on expertise with modern data processing frameworks, ETL/ELT patterns, observability, and governance tools Customer-centric mindset : Experience translating business and user needs into technical solutions that deliver measurable impact You are Self-motivated and able More ❯
software deployment and scalability. CI/CD Expertise: Automate software build, test, and deployment pipelines following agile methodologies. Terraform Exposure: Beneficial experience with Terraform. Observability Tools: Experience with Grafana and Splunk is beneficial, particularly in developing and applying an observability strategy across a large organisation. Learn More For more information More ❯
Key responsibilities include integrating external supplier APIs, implementing Software Reliability Engineering (SRE) best practices, and ensuring seamless collaboration across teams. The team enhances resilience, observability, incident management, and disaster recovery (DR) practices while working closely with Peri Pantry, Stock Management, and Accounting, Banking, and Property (ABP) teams. Key Responsibilities Technical … to align technology decisions with business needs. Solution Design : Ensure the right technologies and architectures are used to enhance system performance, maintainability, and security. Observability & Resilience : Establish best practices for monitoring, incident response, and disaster recovery. Best Practices & Governance : Define engineering standards and drive their adoption across teams. Vendor & API … ability to drive initiatives with a strategic mindset. Ability to communicate effectively with technical and non-technical stakeholders , ensuring alignment across teams. Experience improving observability, monitoring, and incident response processes. Security-first mindset, focusing on least privilege access, automated secrets management, and compliance automation . Why Join Us? As a More ❯
Key responsibilities include integrating external supplier APIs, implementing Software Reliability Engineering (SRE) best practices, and ensuring seamless collaboration across teams. The team enhances resilience, observability, incident management, and disaster recovery (DR) practices while working closely with Peri Pantry, Stock Management, and Accounting, Banking, and Property (ABP) teams. Key Responsibilities Technical … to align technology decisions with business needs. Solution Design : Ensure the right technologies and architectures are used to enhance system performance, maintainability, and security. Observability & Resilience : Establish best practices for monitoring, incident response, and disaster recovery. Best Practices & Governance : Define engineering standards and drive their adoption across teams. Vendor & API … ability to drive initiatives with a strategic mindset. Ability to communicate effectively with technical and non-technical stakeholders , ensuring alignment across teams. Experience improving observability, monitoring, and incident response processes. Security-first mindset, focusing on least privilege access, automated secrets management, and compliance automation . Why Join Us? As a More ❯
of platform engineering maturity. You'll innovate to keep our products' global platform reliable, secure, and fast. You'll help enhance our system availability, observability, security, and reliability whilst resolving issues before they impact our customers. You'll also play a role in the continued evolution of software delivery tooling … easily with third-party providers, engineering teams, and business stakeholders to ensure you offer the best possible experience. Self-motivated You're passionate about observability, availability, and issue management. You enjoy solving problems and like to challenge yourself by quickly identifying and mitigating an issue before moving on to the … next one. What You'll Do Implement best practices for CI/CD pipelines, infrastructure-as-code, and observability that supports our continuous delivery practices. Lead initiatives to improve system reliability, performance, and security. Writing tooling to support our self-service automation portal to improve visibility for engineers. Creating and More ❯
Engineering organization and help us scale our products to millions of end-users. We are looking for individuals with a passion for automation and observability, ensuring the quality and availability of our services. Location - We are flexible on remote working from home, if you are based in the UK or … SCRUM, and deployment planning Perform Root Cause Analysis (RCA) and provide recommendations for application teams Improve availability and reduce customer impact using Industry best observability tools Ensure best-practice and security-minded architecture by influencing design decisions Create and maintain technical documentation and SOP's Develop software, scripts, or tooling … roles 3+ years' experience with an object-oriented language (preferably Java, .NET or C++) Expert+ level Linux administration, scripting, and troubleshooting Demonstratable knowledge of Observability tools (New Relic, Splunk, DataDog) Comprehensive experience with AWS (Amazon Web Services) and its core capabilities (VPC, EC2, ECS, Route53, Fargate, ALB/NLB distributions More ❯
Engineering organization and help us scale our products to millions of end-users. We are looking for individuals with a passion for automation and observability, ensuring the quality and availability of our services. Location - We are flexible on remote working from home, if you are based in the UK or … SCRUM, and deployment planning Perform Root Cause Analysis (RCA) and provide recommendations for application teams Improve availability and reduce customer impact using Industry best observability tools Ensure best-practice and security-minded architecture by influencing design decisions Create and maintain technical documentation and SOP's Develop software, scripts, or tooling … roles 3+ years' experience with an object-oriented language (preferably Java, .NET or C++) Expert+ level Linux administration, scripting, and troubleshooting Demonstratable knowledge of Observability tools (New Relic, Splunk, DataDog) Comprehensive experience with AWS (Amazon Web Services) and its core capabilities (VPC, EC2, ECS, Route53, Fargate, ALB/NLB distributions More ❯
Sheffield, Burngreave, South Yorkshire, United Kingdom Hybrid / WFH Options
Ada Meher
projects simultaneously using Agile practices. The ideal candidate will also have knowledge around or an interest in learning other key DevOps areas such as observability, CI/CD pipeline development and config management. The company have a personal development budget available to all staff for such courses and accreditations, to … services and architecture Strong experience working with Terraform (or other IaC technology) Proven team leadership experience Experience working with CI/CD pipelines (Jenkins), Observability (Grafana) & Configuration Management (Ansible, Chef, Puppet) Excellent communication skills are a must Along with an excellent work/life balance, this company also offer a More ❯
Employment Type: Permanent
Salary: £70000 - £75000/annum Flexible/Remote Working | AWS, Terra
Design, develop, and manage Infrastructure as Code (IaC) using Terraform. Build and maintain CI/CD pipelines for seamless and secure deployments. Enhance system observability using monitoring APM, logs, and metrics with event correlation. Ensure system reliability by proactively identifying and resolving performance and availability issues. Manage and optimise containerised … platforms . Strong expertise in Terraform for Infrastructure as Code (IaC) management. Hands-on experience with Kubernetes and Helm for container orchestration. Proficiency in observability tools such as Elastic Cloud, Grafana, and Prometheus . Experience in building and managing CI/CD pipelines Solid knowledge of Linux systems and shell More ❯
bring new data science, AI & ML opportunities to life. You'll enjoy solving complex problems, and have an appreciation for all things infrastructure, reliability, observability, platform and operations engineering. Responsibilities You will design & build modern data systems on Azure cloud environments. Looking at the big picture you'll see how … systems interconnect & relate to each other - using a wide range of tools & approaches to problem solve. Help our customers exploit observability benefits to understand and better support microservices architecture. Bring to life auto-provisioning technologies e.g. Docker, Kubernetes, Terraform. Improve the availability, scalability, latency, and efficiency of our customer environments. More ❯
ABOUT ORGANOX: OrganOx is an innovative, fast-paced, global medical device company with a mission to save lives by making every donated organ count. We are a commercial stage organ technology company, spun out of the University of Oxford in More ❯
Algolia is set to enable every company to create world-class Search and Discovery experiences with an API-first approach. Performance and Scalability is at the heart of our mission: we power 1.5 trillion searches a year, for 10K+ customers More ❯
in both support and engineering. What You’ll Do: Troubleshoot and resolve issues in live trading and analytics systems Monitor production systems and develop observability tools Build and enhance features in Python and C++ Manage configuration and deployment processes Support onboarding of new teams and systems What We’re Looking More ❯
in both support and engineering. What You’ll Do: Troubleshoot and resolve issues in live trading and analytics systems Monitor production systems and develop observability tools Build and enhance features in Python and C++ Manage configuration and deployment processes Support onboarding of new teams and systems What We’re Looking More ❯
Requirements: Extensive experience with Ruby Technical leadership experience (mostly hands on, leading a team of 2 other engineers) Strong background in platform engineering (DevOps, Observability, infrastructure) Familiarity with Rails FE is a nice to have (Hotwire, Stimulus, Turbo - nice to have) Collaborative engineering culture More ❯
Requirements: Extensive experience with Ruby Technical leadership experience (mostly hands on, leading a team of 2 other engineers) Strong background in platform engineering (DevOps, Observability, infrastructure) Familiarity with Rails FE is a nice to have (Hotwire, Stimulus, Turbo - nice to have) Collaborative engineering culture More ❯
reporting and security leads to ensure data platforms are meeting product needs to service client expectations. Guide teams to ensure a high degree of observability of data platform reliability and performance, working alongside the Head of Platform to enhance visibility of these metrics throughout the business. Drive innovation in related More ❯
on: • Some of the world’s most performant ETL pipelines that deal with billions of data points every second • One of the most successful observability platforms worldwide • Building software solutions/products with scale, reliability and latency considerations in mind • R&D work for functional programming (either pre-existing languages More ❯