deployment pipelines to enhance efficiency and reliability. Quality, Stability & Standards: Establish quality standards to meet performance, reliability, and maintainability of the systems. With a strong production-first mindset, drive observability, maintain Service Level Objectives (SLOs), and ensure efficient incident resolution. Oversee the maintenance of existing systems, ensuring continuous improvements and prompt resolution of issues. Agile Delivery & Collaboration: Working closely with More ❯
a collaborative and supportive team environment through experienced, empathetic leadership Commit to continuous learning and stay current with emerging technologies and best practices Implement and maintain application monitoring and observability, proactively identifying and resolving system issues Person Specification Experience Essential Relevant degree or qualification is desirable but not essential Previous experience using Cloud Platforms, Version Control Systems and Front and More ❯
integration and continuous delivery tools with different tech stacks, web or mobile. You've previously worked with monitoring systems for availability, performance or security, stress and performance testing with observability patterns: Distributed Tracing/OpenTracing, Log Aggregation, Audit Logging, Exception Tracking, Health Check API, Application MetricS, Self-Healing/Multi-Cloud. You have an understanding of security concerns, threats and More ❯
our engineers Lead and contribute to cross-team initiatives from design through deployment and operations Write maintainable, well-tested, high-quality code and uphold engineering best practices Focus on observability and maintain Service Level Objectives, take operational responsibility for the Identity Platform, including joining the on-call rota Foster a strong engineering culture through mentorship, code reviews, and collaboration Lead More ❯
monitoring platforms such as IBM Netcool, Moogsoft, BigPanda, PagerDuty, ServiceNow AIOps. Proficiency in Python, and hands-on knowledge of Ansible Automation Platform. Other highly valued skills include: Knowledge of Observability Platforms: Prometheus, Grafana, ELK, Splunk. Experience with integration into ITSM platforms such as ServiceNow. Experience with Kafka. You may be assessed on the key critical skills relevant for success in More ❯
through coaching, 1:1s, career development and goal-setting Collaborating closely with Product, Delivery and Commercial to plan and deliver outcomes Championing engineering excellence - from clean code to testing, observability and CI/CD Improving system reliability and scalability as our merchant footprint grows Ensuring quality is built in at every stage of the SDLC Bringing a clear sense of More ❯
Salford, Manchester, United Kingdom Hybrid / WFH Options
Lloyds Bank plc
access, storage, processing, and deployment-empowering users through self-service, automated engagement, and Golden Paths that simplify platform adoption. Your focus will be on ensuring the reliability, scalability, and observability of our cloud infrastructure, while continuously improving service levels and operational excellence. This is a unique opportunity to contribute to a platform that underpins enterprise-wide data provisioning, enabling faster … principles, including SLIs/SLOs, error budgets, and incident response. Experience with infrastructure as code (e.g., Terraform, Deployment Manager) and CI/CD pipelines. Proficiency in monitoring, logging, and observability tools (e.g., Stackdriver, Prometheus, Grafana). Knowledge of Linux systems, networking, and cloud security best practices. It would be great if you also had Experience working in DevOps environments, with More ❯
line management, you'd have full technical overview for your product domain including technology vision, strategy, workload and quality governance of the squads within it. Skills: TypeScript, Node, React Observability CI/CD The money is good too - up to £90k plus benefits including 10% annual bonus, pension, private healthcare and flexible working. If you're interested in this opportunity More ❯
skills, testing experience (React Testing Library), and familiarity with Tailwind CSS are essential. Nice-to-haves include Storybook component library work, Cloudflare Workers, E2E testing (Cypress or Playwright), and observability practices. Benefits Salary up to £65,000 25 days holiday + bank holidays + birthday off Friendly, forward-thinking team culture Agile workflows with real influence over technical direction Regular More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
Embarcaderomediagroup
ll sit at the heart of our engineering operations, bringing together SRE principles and modern platform engineering practices. This includes combining principles of SRE - such as service-level reliability, observability, incident response - with platform engineering practices like GitOps, Infrastructure as Code, DevSecOps automation, and self-service enablement, to help development teams ship faster, safer, and more cost-efficiently. What you … ll be doing: Designing and operating highly reliable, scalable, and secure Azure-based platforms Applying SRE principles like SLOs, observability, and incident management to drive service reliability Building Infrastructure as Code using Terraform (v1.7+) and GitOps workflows Enabling teams through platform tools, reusable Terraform modules, and self-service infrastructure Enhancing CI/CD pipelines (Azure DevOps, YAML-based) with security … knowledge (AKS, Functions, SQL, Cosmos DB, etc.) Strong Infrastructure as Code skills with Terraform (v1.7+) Experience with CI/CD pipelines, GitOps, and automation tools (PowerShell, Bash) Familiarity with observability and incident tools like Datadog, ELK, and synthetic monitoring Solid understanding of networking (TCP/IP, Load Balancing, DNS, Routing) Good knowledge of DevSecOps practices - including security scanning, IAM, and More ❯
Engineer, you will: Build and automate IaaS and PaaS platforms across public, private, and hybrid cloud environments Create and manage solutions such as landing zones, container platforms, DevSecOps pipelines, observability stacks, and integration layers Use modern tooling like Terraform , CI/CD pipelines , and cloud-native security frameworks Collaborate with product teams, cloud architects, and stakeholders to rapidly deliver working … consultancy mindset: adaptable, delivery-focused, and comfortable with ambiguity Experience in the following areas is highly desirable: Designing and building multi-cloud or hybrid platforms Implementing cloud-native operations, observability, or SRE practices Working with Kubernetes, container orchestration, and modern networking patterns Securing cloud infrastructure and deploying secure coding practices (DevSecOps) Migrating legacy workloads to the cloud using agile methodologies More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
Suits Me Limited
across multiple squads to ensure our platform is scalable, secure, and designed for rapid deployment and operational excellence. You'll contribute to the development and automation of cloud infrastructure, observability systems, CI/CD pipelines, and event-based services that power key parts of our product ecosystem. About Suits Me Suits Me is a multi-award-winning, ethical fintech dedicated … pipelines (e.g. GitHub Actions) to enable rapid and reliable delivery of services Contributing to the design of scalable and secure platform components that enable developer productivity Building and improving observability tooling (e.g. CloudWatch, Grafana) to support rapid detection and resolution of issues Collaborating with developers and stakeholders across squads to understand infrastructure needs and ensure best practices are applied Writing More ❯
Zopa with UK retailers and marketplaces.In this role, you'll ensure our systems are reliable, scalable, and secure. You'll help automate deployments, evolve our cloud infrastructure, and improve observability and developer experience - making it easier for product teams to deliver quality software quickly and safely. Why Zopa Manchester? We're building a new tech hub right in the heart … platform and developer experience teams Ensuring our container platforms (including Kubernetes) are reliable, secure, and up to date Designing scalable, self-service tools to reduce operational toil Supporting infrastructure observability through metrics, tracing, and alerting Working closely with product teams to foster a culture of reliability engineering About you: 4+ years in a Platform/Site Reliability Engineering or similar More ❯
Knutsford, Cheshire, United Kingdom Hybrid / WFH Options
Trust In Soda
Observability & Telemetry Lead Location: Knutsford (Hybrid/On-site 2 days a week) Contract Duration: Until 31st December 2025 Rate: £350/day (Inside IR35) Security Clearance: Candidate must be eligible for BPSS Role Overview: We are seeking a skilled Observability and Telemetry Lead to support the continued evolution of Enterprise-as-a-Service (EaaS) capabilities. The ideal candidate will … bring strong technical expertise across monitoring, infrastructure observability, and diagnostics, with the ability to work across both Linux and Windows environments. Essential Skills: Programming: Proven experience in PHP or Python Databases: Strong knowledge of Oracle and other relational databases Operating Systems: Comfortable working in both Linux and Windows environments Monitoring Tools: Hands-on experience with tools such as: AppDynamics ITRS … infrastructure and containerized environments: OpenShift , Docker , Kubernetes Knowledge of Cloud service models : IaaS, PaaS, SaaS Concepts in infrastructure virtualization This is an excellent opportunity to contribute to an enterprise observability strategy within a dynamic, large-scale technical environment. If you're passionate about system health, telemetry, and performance monitoring - and meet the eligibility criteria - we encourage you to apply. More ❯
in implementing good practice with regards to accessibility (Keyboard support, screen readers, form usability) Knowledge of various front-end architectural patterns E2E Testing experience (Cypress/Playwright) Experience with Observability as a practice (logging, GA tagging, TrackJS, App Insights) If you would be interested please apply below! INDMANS More ❯
in implementing good practice with regards to accessibility (Keyboard support, screen readers, form usability) Knowledge of various front-end architectural patterns E2E Testing experience (Cypress/Playwright) Experience with Observability as a practice (logging, GA tagging, TrackJS, App Insights) If you would be interested please apply below! INDMANS More ❯
fosters innovation, and delivers exceptional user interactions delivering robust internal developer platform (IDP) capabilities, strengthening CI/CD pipelines, enabling on-demand environments, and scaling platform foundations such as observability, security, and FinOps - while adhering to best practices in DevOps and modern software delivery What we expect from you Drive the development of a comprehensive IDP (e.g., based on Backstage … on-demand environments for development, QA, and staging through Infrastructure-as-Code and container orchestration. Support multi-tenancy and environment rationalization to reduce duplication and inefficiency. Define and implement observability standards, including logging, metrics, tracing, and alerting . Use tools like New Relic , Prometheus , and Grafana , alongside building custom instrumentation for key platform services. Drive incident readiness and operational resilience … tools. Proven success in building and operating developer platforms and enablement frameworks. Experience with cloud-native technologies, Kubernetes, and Infrastructure as Code (Terraform, Helm, etc.). Strong understanding of observability tooling (especially New Relic, Prometheus, Grafana) and incident response best practices. Familiarity with FinOps, platform cost tracking, and infrastructure efficiency techniques. Excellent communication, leadership, and stakeholder management skills. Attract, hire More ❯
Liverpool, Lancashire, United Kingdom Hybrid / WFH Options
Very Group
hands-on role in designing, building, and maintaining the tools and frameworks that enable our teams to deliver high-quality, reliable, and secure solutions. You'll work across Testing, Observability, and Security, helping squads embed quality automation and monitoring into their products from the ground up. You'll collaborate closely with Engineers, Tech Leads, Architects, and the wider QA community … to drive best practices in reliability engineering. You'll also champion the use of our observability platform and promote a culture of continuous improvement, automation, and technical excellence. Key responsibilities Work with Engineers, Tech Leads, Engineering Managers, Architects, and the QA community to deliver high-quality solutions. Design and build solutions that balance commercial needs with the QA roadmap. Actively … ownership of non-functional requirements around performance, security, and scalability. Drive best practices in real-time logging, monitoring, and alerting. Actively promote the use of The Very Group's observability platform. Create and maintain threat models for Performance Unit products. Continuously enhance CI processes to improve deployment efficiency. Identify and mitigate risks, obstacles, and issues impacting technical delivery. Adhere to More ❯
will have proven experience in the field, focusing on kernel development and cluster automation(build, os/kubernetes upgrade and decommission). You will also drive the implementation of observability practices to monitor, troubleshoot, and ensure the reliability of our infrastructure at scale. What you will accomplish: Design, develop, and maintain a stable, high-performance Linux operating system optimized for …/BPF-based network segmentation and service mesh solutions. Collaborate with cross-functional teams to validate, adopt, and integrate optimized Linux OS distributions across diverse infrastructure environments. Implement robust observability frameworks to monitor system health, ensure performance, and support proactive issue resolution at scale. What you will bring: Bachelor's or Master's degree in Computer Science, Engineering, or a More ❯
across global technology teams to design and implement monitoring solutions for the firm's core Line of Business ("LOB") applications and vital infrastructure, while providing input into monitoring and observability platform technical design and architectural decisions and changes, including the design and implementation of new monitoring systems integrations. This role will be located in ourManchesteroffice. Please note that this role … may be eligible for a flexible working schedule that allows for a hybrid and in-office presence. Responsibilities & Qualifications Other key responsibilities include: Demonstrating understanding of Monitoring and Observability tools and core concepts Demonstrating understanding of Observability frameworks and tools that ingest telemetry data from multiple sources Ensuring platform health and stability Providing support to and acting as the main … and the NOC team Promoting the enterprise monitoring service through stakeholder engagement We'd love to hear from you if you: Display expert subject matter knowledge on Monitoring and Observability tools for critical Infrastructure and LOB services/applications Demonstrate good knowledge of working with Observability tools to leverage telemetry data (logs, metrics and trace data) to provide insight into More ❯