Permanent Observability Jobs in the East of England

14 of 14 Permanent Observability Jobs in the East of England

Software Developer

Ipswich, Suffolk, England, United Kingdom
PCIpal
APIs • Experience of writing performance critical code • Experience of using Git or similar to track changes • Experience of both the full .NET Framework and .NET Core • Experience of using observability systems such as Elastic APM or DataDog to track and diagnose issues in production • A solid understanding of security principles and secure coding including OWASP Top 10 Nice to haves More ❯
Employment Type: Full-Time
Salary: Competitive salary
Posted:

Senior Software Engineer - Observability Platform (Golang / Kubernetes)

Cambridge, Cambridgeshire, United Kingdom
Roku, Inc
that process massive amounts of data? Do you thrive on designing and implementing innovative solutions to empower engineering teams with actionable insights? Are you excited to advance open-source observability at a massive scale? Join us to extend (open source) observability tools and build new capabilities that help teams manage data better and get actionable insights. About the Team The … Observability team is part of Roku's Cloud Technology Infrastructure organisation and plays a critical role in our platform. We are a high-performing, fast-moving international team that thrives on ownership, effective communication, and delivering impactful engineering solutions. Our mission is to advance Roku's observability platform, which operates at an impressive scale by ingesting terabytes of data daily … and fast iteration, and we emphasise solving meaningful engineering problems, collaboration, and continuous improvement of how we work as a team. About the Role You will work on core observability systems (metrics, logs, traces) while also developing robust data pipelines and storage solutions optimized for high throughput, performance, and cost. You'll leverage technologies such as time-series databases, columnar More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer

cambridge, east anglia, united kingdom
Hybrid / WFH Options
Speechmatics
postmortems and ensuring the same incident doesn't happen twice. Managing and improving GitOps release workflows and CI/CD pipelines. Monitoring system performance and troubleshooting production environments. Implementing observability improvements using OpenTelemetry tooling. Automating processes that reduces manual efforts and creates self-healing systems. Taking part in on-call rota for production systems that has a generous daily pay … dive deep into new technologies; you thrive on learning as you go. Prior experience with on-call rotations and incident response is a plus. Familiarity with OpenTelemetry and related observability tooling is advantageous. We encourage you to apply even if you do not feel you match all of the requirements exactly. The list of requirements is intended to show the More ❯
Posted:

Director, Infrastructure & Security Operations

Chelmsford, Essex, United Kingdom
Hybrid / WFH Options
Brooks Automation, Inc
infrastructure and security services, ensuring operational excellence and incident response readiness. Partner with the CISO to shape long-term strategy and roadmap for secure, resilient IT services. Drive automation, observability, and scalability across the infrastructure and security stack. Serve as a key escalation point for technical troubleshooting and security event resolution. Guide vendor selection, contract negotiations, and service-level adherence More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Services AI Data Solution Principal (Services Technical PreSales), based London

Wheathampstead, Hertfordshire, United Kingdom
Dell
AI platform designs leveraging Dell's product and partner ecosystem e.g. NVAIE, Run.ai, H2O.ai, ClearML, OpenShift, etc. Provide expert guidance on modern data stack components: data quality, metadata management, observability, data products, feature stores, with governance and Dell's maturity model frameworks. Stay current on emerging AI and associated Data Management technologies. Actively contribute field feedback to Dell's product More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineering Manager

Peterborough, England, United Kingdom
Sanderson
lead performance testing and chaos engineering initiatives, and embed reliability best practices across engineering, DevOps, and infrastructure teams. This is a senior, strategic leadership role focused on system excellence, observability, and continuous improvement. Ideal Candidate: Proven experience leading Performance Engineering, Reliability, or SRE functions Deep expertise in performance testing methodologies (load, stress, spike, soak) Strong hands-on background with LoadRunner … strategy across critical platforms and services Oversee load, stress, and chaos testing initiatives to ensure systems perform and recover under real-world conditions Define and drive best practices for observability, monitoring, and APM adoption using tools like Dynatrace Drive incident reduction, faster recovery (MTTR) , and continuous reliability improvements Champion a culture of performance ownership , ensuring teams build with scalability, stability More ❯
Posted:

Site Reliability Engineering Manager

peterborough, east anglia, united kingdom
Sanderson
lead performance testing and chaos engineering initiatives, and embed reliability best practices across engineering, DevOps, and infrastructure teams. This is a senior, strategic leadership role focused on system excellence, observability, and continuous improvement. Ideal Candidate: Proven experience leading Performance Engineering, Reliability, or SRE functions Deep expertise in performance testing methodologies (load, stress, spike, soak) Strong hands-on background with LoadRunner … strategy across critical platforms and services Oversee load, stress, and chaos testing initiatives to ensure systems perform and recover under real-world conditions Define and drive best practices for observability, monitoring, and APM adoption using tools like Dynatrace Drive incident reduction, faster recovery (MTTR) , and continuous reliability improvements Champion a culture of performance ownership , ensuring teams build with scalability, stability More ❯
Posted:

Site Reliability Engineering Manager

cambridge, east anglia, united kingdom
Sanderson
lead performance testing and chaos engineering initiatives, and embed reliability best practices across engineering, DevOps, and infrastructure teams. This is a senior, strategic leadership role focused on system excellence, observability, and continuous improvement. Ideal Candidate: Proven experience leading Performance Engineering, Reliability, or SRE functions Deep expertise in performance testing methodologies (load, stress, spike, soak) Strong hands-on background with LoadRunner … strategy across critical platforms and services Oversee load, stress, and chaos testing initiatives to ensure systems perform and recover under real-world conditions Define and drive best practices for observability, monitoring, and APM adoption using tools like Dynatrace Drive incident reduction, faster recovery (MTTR) , and continuous reliability improvements Champion a culture of performance ownership , ensuring teams build with scalability, stability More ❯
Posted:

Director - Performance and Reliability

Cambridgeshire, England, United Kingdom
Sanderson
lead performance testing and chaos engineering initiatives, and embed reliability best practices across engineering, DevOps, and infrastructure teams. This is a senior, strategic leadership role focused on system excellence, observability, and continuous improvement. Ideal Candidate: Proven experience leading Performance Engineering, Reliability, or SRE functions Deep expertise in performance testing methodologies (load, stress, spike, soak) Strong hands-on background with LoadRunner … strategy across critical platforms and services Oversee load, stress, and chaos testing initiatives to ensure systems perform and recover under real-world conditions Define and drive best practices for observability, monitoring, and APM adoption using tools like Dynatrace Drive incident reduction, faster recovery (MTTR) , and continuous reliability improvements Champion a culture of performance ownership , ensuring teams build with scalability, stability More ❯
Employment Type: Full-Time
Salary: £84,000 - £95,000 per annum, Negotiable, Inc benefits
Posted:

Director - Performance and Reliability

Cambridgeshire, East Anglia, United Kingdom
Sanderson Recruitment
lead performance testing and chaos engineering initiatives, and embed reliability best practices across engineering, DevOps, and infrastructure teams. This is a senior, strategic leadership role focused on system excellence, observability, and continuous improvement. Ideal Candidate: Proven experience leading Performance Engineering, Reliability, or SRE functions Deep expertise in performance testing methodologies (load, stress, spike, soak) Strong hands-on background with LoadRunner … strategy across critical platforms and services Oversee load, stress, and chaos testing initiatives to ensure systems perform and recover under real-world conditions Define and drive best practices for observability, monitoring, and APM adoption using tools like Dynatrace Drive incident reduction, faster recovery (MTTR) , and continuous reliability improvements Champion a culture of performance ownership , ensuring teams build with scalability, stability More ❯
Employment Type: Permanent
Salary: £95,000
Posted:

Machine Learning Engineer

hertfordshire, east anglia, united kingdom
Hybrid / WFH Options
Rightmove
scientists to take models from development to production-grade systems, ensuring scalability, reproducibility, and robustness. Automating feature engineering and data pipeline processes, ensuring reproducibility and auditability. Implementing monitoring and observability to detect drift, bias, and performance degradation, and setting up rollback/recovery processes. Using MLOps tools (e.g., Vertex Pipelines, Kubeflow, Weights & Biases) for experiment tracking, model registry, and automated … distributed systems). 3+ years of experience as an ML Engineer, MLOps Engineer, Data Engineer, or similar, in a larger-scale, production-focused environment. Hands-on with model monitoring, observability, and retraining pipelines. Exposure to feature stores, registries, and experimentation frameworks. Familiarity with business-driven metrics and experience balancing ML performance with commercial goals. Experience with generative AI and LLM More ❯
Posted:

Performance and Reliability Manager

Peterborough, Cambridgeshire, England, United Kingdom
Hybrid / WFH Options
Noir
Performance & Reliability Director - Software House - Peterborough/Hybrid (Key skills: Performance Engineering, Reliability Engineering, SRE, Load Testing, Observability, Chaos Testing, Cloud Platforms, Microservices, Leadership, CI/CD, APM Tools) Are you a technology leader passionate about driving performance, scalability, and reliability across complex software platforms? Do you thrive in high-growth environments where innovation, engineering excellence, and resilience are core … lifecycle. You'll oversee system profiling, capacity planning, and test strategies - ensuring every release meets the highest standards for speed, scalability, and reliability. You'll drive the adoption of observability and monitoring frameworks, leveraging platforms like Datadog and Dynatrace to build a proactive performance culture. You'll champion continuous improvement, implement chaos testing programmes, and ensure teams deliver fault-tolerant More ❯
Employment Type: Full-Time
Salary: £80,000 - £95,000 per annum
Posted:

Software Engineering Lead - £90,000 Equity

St. Albans, Hertfordshire, England, United Kingdom
Method Resourcing
Software Engineering Lead£90,000 + Equity Method is working with a purpose-driven technology company on a multi-year transformation to rebuild its core platform into a modern, event-driven microservices architecture. Their mission is to improve safety, efficiency More ❯
Employment Type: Full-Time
Salary: Salary negotiable
Posted:

Software Engineering Lead - £90,000 + Equity

St. Albans, Hertfordshire, South East, United Kingdom
Method-Resourcing
Software Engineering Lead £90,000 + Equity Method is working with a purpose-driven technology company on a multi-year transformation to rebuild its core platform into a modern, event-driven microservices architecture. Their mission is to improve safety, efficiency More ❯
Employment Type: Permanent
Posted:
Observability
the East of England
10th Percentile
£70,300
25th Percentile
£81,250
Median
£92,500
75th Percentile
£100,000