Observability Jobs in Cambridgeshire

8 of 8 Observability Jobs in Cambridgeshire

Senior Software Engineer - Observability Platform (Golang / Kubernetes)

Cambridge, Cambridgeshire, United Kingdom
Roku, Inc
that process massive amounts of data? Do you thrive on designing and implementing innovative solutions to empower engineering teams with actionable insights? Are you excited to advance open-source observability at a massive scale? Join us to extend (open source) observability tools and build new capabilities that help teams manage data better and get actionable insights. About the Team The … Observability team is part of Roku's Cloud Technology Infrastructure organisation and plays a critical role in our platform. We are a high-performing, fast-moving international team that thrives on ownership, effective communication, and delivering impactful engineering solutions. Our mission is to advance Roku's observability platform, which operates at an impressive scale by ingesting terabytes of data daily … and fast iteration, and we emphasise solving meaningful engineering problems, collaboration, and continuous improvement of how we work as a team. About the Role You will work on core observability systems (metrics, logs, traces) while also developing robust data pipelines and storage solutions optimized for high throughput, performance, and cost. You'll leverage technologies such as time-series databases, columnar More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer

cambridge, east anglia, united kingdom
Hybrid / WFH Options
Speechmatics
postmortems and ensuring the same incident doesn't happen twice. Managing and improving GitOps release workflows and CI/CD pipelines. Monitoring system performance and troubleshooting production environments. Implementing observability improvements using OpenTelemetry tooling. Automating processes that reduces manual efforts and creates self-healing systems. Taking part in on-call rota for production systems that has a generous daily pay … dive deep into new technologies; you thrive on learning as you go. Prior experience with on-call rotations and incident response is a plus. Familiarity with OpenTelemetry and related observability tooling is advantageous. We encourage you to apply even if you do not feel you match all of the requirements exactly. The list of requirements is intended to show the More ❯
Posted:

Site Reliability Engineering Manager

Peterborough, England, United Kingdom
Sanderson
lead performance testing and chaos engineering initiatives, and embed reliability best practices across engineering, DevOps, and infrastructure teams. This is a senior, strategic leadership role focused on system excellence, observability, and continuous improvement. Ideal Candidate: Proven experience leading Performance Engineering, Reliability, or SRE functions Deep expertise in performance testing methodologies (load, stress, spike, soak) Strong hands-on background with LoadRunner … strategy across critical platforms and services Oversee load, stress, and chaos testing initiatives to ensure systems perform and recover under real-world conditions Define and drive best practices for observability, monitoring, and APM adoption using tools like Dynatrace Drive incident reduction, faster recovery (MTTR) , and continuous reliability improvements Champion a culture of performance ownership , ensuring teams build with scalability, stability More ❯
Posted:

Site Reliability Engineering Manager

cambridge, east anglia, united kingdom
Sanderson
lead performance testing and chaos engineering initiatives, and embed reliability best practices across engineering, DevOps, and infrastructure teams. This is a senior, strategic leadership role focused on system excellence, observability, and continuous improvement. Ideal Candidate: Proven experience leading Performance Engineering, Reliability, or SRE functions Deep expertise in performance testing methodologies (load, stress, spike, soak) Strong hands-on background with LoadRunner … strategy across critical platforms and services Oversee load, stress, and chaos testing initiatives to ensure systems perform and recover under real-world conditions Define and drive best practices for observability, monitoring, and APM adoption using tools like Dynatrace Drive incident reduction, faster recovery (MTTR) , and continuous reliability improvements Champion a culture of performance ownership , ensuring teams build with scalability, stability More ❯
Posted:

Site Reliability Engineering Manager

peterborough, east anglia, united kingdom
Sanderson
lead performance testing and chaos engineering initiatives, and embed reliability best practices across engineering, DevOps, and infrastructure teams. This is a senior, strategic leadership role focused on system excellence, observability, and continuous improvement. Ideal Candidate: Proven experience leading Performance Engineering, Reliability, or SRE functions Deep expertise in performance testing methodologies (load, stress, spike, soak) Strong hands-on background with LoadRunner … strategy across critical platforms and services Oversee load, stress, and chaos testing initiatives to ensure systems perform and recover under real-world conditions Define and drive best practices for observability, monitoring, and APM adoption using tools like Dynatrace Drive incident reduction, faster recovery (MTTR) , and continuous reliability improvements Champion a culture of performance ownership , ensuring teams build with scalability, stability More ❯
Posted:

Director - Performance and Reliability

Cambridgeshire, England, United Kingdom
Sanderson
lead performance testing and chaos engineering initiatives, and embed reliability best practices across engineering, DevOps, and infrastructure teams. This is a senior, strategic leadership role focused on system excellence, observability, and continuous improvement. Ideal Candidate: Proven experience leading Performance Engineering, Reliability, or SRE functions Deep expertise in performance testing methodologies (load, stress, spike, soak) Strong hands-on background with LoadRunner … strategy across critical platforms and services Oversee load, stress, and chaos testing initiatives to ensure systems perform and recover under real-world conditions Define and drive best practices for observability, monitoring, and APM adoption using tools like Dynatrace Drive incident reduction, faster recovery (MTTR) , and continuous reliability improvements Champion a culture of performance ownership , ensuring teams build with scalability, stability More ❯
Employment Type: Full-Time
Salary: £84,000 - £95,000 per annum, Negotiable, Inc benefits
Posted:

Director - Performance and Reliability

Cambridgeshire, East Anglia, United Kingdom
Sanderson Recruitment
lead performance testing and chaos engineering initiatives, and embed reliability best practices across engineering, DevOps, and infrastructure teams. This is a senior, strategic leadership role focused on system excellence, observability, and continuous improvement. Ideal Candidate: Proven experience leading Performance Engineering, Reliability, or SRE functions Deep expertise in performance testing methodologies (load, stress, spike, soak) Strong hands-on background with LoadRunner … strategy across critical platforms and services Oversee load, stress, and chaos testing initiatives to ensure systems perform and recover under real-world conditions Define and drive best practices for observability, monitoring, and APM adoption using tools like Dynatrace Drive incident reduction, faster recovery (MTTR) , and continuous reliability improvements Champion a culture of performance ownership , ensuring teams build with scalability, stability More ❯
Employment Type: Permanent
Salary: £95,000
Posted:

Performance and Reliability Manager

Peterborough, Cambridgeshire, England, United Kingdom
Hybrid / WFH Options
Noir
Performance & Reliability Director - Software House - Peterborough/Hybrid (Key skills: Performance Engineering, Reliability Engineering, SRE, Load Testing, Observability, Chaos Testing, Cloud Platforms, Microservices, Leadership, CI/CD, APM Tools) Are you a technology leader passionate about driving performance, scalability, and reliability across complex software platforms? Do you thrive in high-growth environments where innovation, engineering excellence, and resilience are core … lifecycle. You'll oversee system profiling, capacity planning, and test strategies - ensuring every release meets the highest standards for speed, scalability, and reliability. You'll drive the adoption of observability and monitoring frameworks, leveraging platforms like Datadog and Dynatrace to build a proactive performance culture. You'll champion continuous improvement, implement chaos testing programmes, and ensure teams deliver fault-tolerant More ❯
Employment Type: Full-Time
Salary: £80,000 - £95,000 per annum
Posted: