Manchester, Lancashire, United Kingdom Hybrid / WFH Options
Modix International
Actions). Strong troubleshooting skills for cloud infrastructure and application performance. Knowledge of cloud security, compliance , and identity management . Experience with monitoring and observability tools (New Relic, Splunk). A continuous improvement mindset and a desire to optimize systems for security, performance, and cost. AWS Certifications (e.g., AWS Certified More ❯
database deployments using Terraform, AWS CloudFormation, and AWS CDK. Integrate database changes into CI/CD pipelines using tools like Flyway or Liquibase. Define observability and monitoring strategies using CloudWatch, X-Ray, and Prometheus. Practice & Team Development Contribute to the development of modernisation frameworks, methodologies, and best practices. Help shape More ❯
monitoring and logging tools (Dynatrace, ELK stack, Splunk). Experience using logging to derive application insights. Consideration of non-functional requirements (security, accessibility and observability) during design and development. Solid understanding of Object-Relational Mapping principles and proficiency in JPA and Hibernate. Experience using Swagger for API documentation and coding More ❯
Engineers, DevOps Engineers, and DBAs. Foster a DevOps culture by driving collaboration between infrastructure, security, and engineering. Operational Excellence & Automation Strategy Define AI-driven observability and automated issue resolution strategies. Oversee incident response and resilience engineering to improve platform uptime. About the Role If you're craving real influence, cutting More ❯
to implement and maintain reliable and scalable systems while adhering to industry best practices and security standards. Responsibilities and Impact: Design, implement, and maintain observability solutions to track system health and performance. Analyze observability data to identify and troubleshoot potential issues proactively. Develop and implement alerts and notifications for critical … or a related field. 5+ years of experience as a Site Reliability Engineer or equivalent in a similar role. Proficient in application and infrastructure observability, Splunk OpenTelemetry preferred Experienced in production environments running in AWS Comfortable with Infrastructure as Code, Terraform is preferred Comfortable with CI/CD pipelines such More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
Manchester Digital
they work on, from ideation through to development, testing, and deployment, so you should expect to champion and mentor on best practices like TDD, Observability, and IaC. Skills: CI/CD, TDD, SOLID The salary is competitive - up to £90k plus benefits including hybrid working (1-2 days per month More ❯
GitLab) Experience with modern CI/CD tools and techniques Desirable skills: Experience in BDD and creating tests using Gherkin syntax Experience working with observability and monitoring tools such as Firebase or similar Experience of non-functional testing in a mobile environment Equal Opportunities Statement At AND Digital we embrace More ❯
customer-focused organisation which provides operational excellence whilst identifying new areas of growth as part of our day to day objectives. The CoE Lead - Observability & Tools at JD Sports Fashion Plc is a critical, hands-on technical role focused on designing, building, and maintaining the company's Observability platform. This … focus on implementation and adoption. What You'll Be Doing: We are looking for an experienced CoE Lead to design, build, and maintain our Observability platform. The CoE Lead will work closely with DevOps, Engineering, Service Reliability, and Service Delivery teams to continuously improve our Observability capabilities. This role is … technical, hands-on position with a 75% focus on framework design and 25% on implementation and adoption. You will contribute to pipeline design, enabling observability from the first deployment in test environments and providing early insights for Engineering, Service Reliability, Service Delivery, and DevOps teams. The role involves building frameworks More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
Embarcaderomediagroup
of our engineering operations, bringing together SRE principles and modern platform engineering practices. This includes combining principles of SRE - such as service-level reliability, observability, incident response - with platform engineering practices like GitOps, Infrastructure as Code, DevSecOps automation, and self-service enablement, to help development teams ship faster, safer, and … more cost-efficiently. What you'll be doing: Designing and operating highly reliable, scalable, and secure Azure-based platforms Applying SRE principles like SLOs, observability, and incident management to drive service reliability Building Infrastructure as Code using Terraform (v1.7+) and GitOps workflows Enabling teams through platform tools, reusable Terraform modules … DB, etc.) Strong Infrastructure as Code skills with Terraform (v1.7+) Experience with CI/CD pipelines, GitOps, and automation tools (PowerShell, Bash) Familiarity with observability and incident tools like Datadog, ELK, and synthetic monitoring Solid understanding of networking (TCP/IP, Load Balancing, DNS, Routing) Good knowledge of DevSecOps practices More ❯
Vitals for optimal performance. Integrate third-party software into the platform, including tag management using Google Tag Manager (GTM) . Improve and maintain platform observability tools and systems. Manage and enhance automated CI/CD pipelines for efficient and reliable deployments. Ensure sites are accessible to all users, meeting WCAG More ❯
reporting and security leads to ensure data platforms are meeting product needs to service client expectations. Guide teams to ensure a high degree of observability of data platform reliability and performance, working alongside the Head of Platform to enhance visibility of these metrics throughout the business. Drive innovation in related More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
bet365 Group
A Site Reliability Engineer, who will enhance system reliability, observability and performance through a strong engineering approach and assist with incident resolution and best practices. You will have software engineering skills, focusing on system reliability and observability. You will monitor the health, performance and availability of critical systems, directly impacting … maintainability. You will also help engineer tools and automation for effective service management. Collaboration is key, working across multiple functions to integrate reliability and observability best practices into the software development life cycle. By supporting governance standards set by the central teams, you will foster a culture where these principles … including the creation and management of effective Service Level Indicators (SLI) and Service Level Objectives (SLO) for reliability and customer satisfaction. Knowledge of contemporary observability tools, techniques and best practice including Splunk, New Relic, Grafana and Pager Duty. Knowledge and experience of modern software development techniques and lifecycles. Experience with More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
Tbwa Chiat/Day Inc
Principal Product Manager to drive the product roadmap and execution for Capella , our Database-as-a-Service (DBaaS) platform, focusing on cloud infrastructure and observability . In this role, you will collaborate across multiple teams to ensure Capella delivers unparalleled value and performance for customers leveraging modern cloud technologies. This … for long-term success in a competitive and rapidly evolving market. Location: Manchester, UK Key Responsibilities Own the product roadmap for cloud infrastructure and observability features, balancing short-term priorities with long-term objectives. Partner with engineering, operations, and finance teams to design and implement robust billing and metering systems … that ensure accuracy, transparency, and scalability. Collaborate with the observability team to enhance platform monitoring, logging, and alerting capabilities, empowering customers to manage their applications effectively. Identify emerging opportunities in cloud infrastructure and observability to position Capella as a leader in the DBaaS space. Act as the voice of the More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
M-KOPA Kenya Limited
you will work as a servant leader, coach and contributor within a team which places emphasis on high quality output using infrastructure as code, observability and monitoring as well as automated testing in all environments, including production. About Us At M-KOPA, we are revolutionising financial inclusion and connectivity across … engineers, fostering a collaborative and high-performing environment. Lead as a servant leader, coach, and contributor, emphasising high-quality output using infrastructure as code, observability, and monitoring, as well as automated testing in all environments, including production. Full Ownership: Oversee the entire software stack, including supporting infrastructure, throughout the entire … software lifecycle from inception, through production, and right the way to decommissioning under a DevOps culture. DevOps Culture: Champion continuous delivery, testing, and observability as first-order concerns. Tech Stack: Work with C#.NET , event-driven systems, microservices, and deployment technologies. Our Mission: We create financial inclusion for the traditionally excluded More ❯
advantage of all structured and unstructured data - securing and protecting private information more effectively - Elastic's complete, cloud-based solutions for search, security, and observability help organizations deliver on the promise of AI. Elastic is looking for a disruptive force in the Public Sector space-someone with a relentless drive … extract real-world value from their data and accelerate digital transformation. Evangelizing Open Source and championing Elastic's search-powered solutions across Enterprise Search, Observability, and Security. Uncovering and driving new use cases by working directly with technical teams, developers, and decision-makers. Collaborating across business functions, partners, and communities More ❯