Permanent Observability Jobs in Cambridge

3 of 3 Permanent Observability Jobs in Cambridge

Senior Software Engineer - Observability Platform (Golang / Kubernetes)

Cambridge, Cambridgeshire, United Kingdom
Roku, Inc
that process massive amounts of data? Do you thrive on designing and implementing innovative solutions to empower engineering teams with actionable insights? Are you excited to advance open-source observability at a massive scale? Join us to extend (open source) observability tools and build new capabilities that help teams manage data better and get actionable insights. About the Team The … Observability team is part of Roku's Cloud Technology Infrastructure organisation and plays a critical role in our platform. We are a high-performing, fast-moving international team that thrives on ownership, effective communication, and delivering impactful engineering solutions. Our mission is to advance Roku's observability platform, which operates at an impressive scale by ingesting terabytes of data daily … and fast iteration, and we emphasise solving meaningful engineering problems, collaboration, and continuous improvement of how we work as a team. About the Role You will work on core observability systems (metrics, logs, traces) while also developing robust data pipelines and storage solutions optimized for high throughput, performance, and cost. You'll leverage technologies such as time-series databases, columnar More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer

cambridge, east anglia, united kingdom
Hybrid / WFH Options
Speechmatics
postmortems and ensuring the same incident doesn't happen twice. Managing and improving GitOps release workflows and CI/CD pipelines. Monitoring system performance and troubleshooting production environments. Implementing observability improvements using OpenTelemetry tooling. Automating processes that reduces manual efforts and creates self-healing systems. Taking part in on-call rota for production systems that has a generous daily pay … dive deep into new technologies; you thrive on learning as you go. Prior experience with on-call rotations and incident response is a plus. Familiarity with OpenTelemetry and related observability tooling is advantageous. We encourage you to apply even if you do not feel you match all of the requirements exactly. The list of requirements is intended to show the More ❯
Posted:

Site Reliability Engineering Manager

cambridge, east anglia, united kingdom
Sanderson
lead performance testing and chaos engineering initiatives, and embed reliability best practices across engineering, DevOps, and infrastructure teams. This is a senior, strategic leadership role focused on system excellence, observability, and continuous improvement. Ideal Candidate: Proven experience leading Performance Engineering, Reliability, or SRE functions Deep expertise in performance testing methodologies (load, stress, spike, soak) Strong hands-on background with LoadRunner … strategy across critical platforms and services Oversee load, stress, and chaos testing initiatives to ensure systems perform and recover under real-world conditions Define and drive best practices for observability, monitoring, and APM adoption using tools like Dynatrace Drive incident reduction, faster recovery (MTTR) , and continuous reliability improvements Champion a culture of performance ownership , ensuring teams build with scalability, stability More ❯
Posted: