Head of Performance & Reliability Engineering Full-Time - Hybrid (3 days in Cambridgeshire) Up to £95,000 + Bonus This is an exceptional opportunity to join a major organisation at a pivotal stage in their digital transformation. As Head of Performance & Reliability Engineering You'll shape strategy, lead performance testing and chaosengineering initiatives, and embed reliability … best practices across engineering, DevOps, and infrastructure teams. This is a senior, strategic leadership role focused on system excellence, observability, and continuous improvement. Ideal Candidate: Proven experience leading Performance Engineering, Reliability, or SRE functions Deep expertise in performance testing methodologies (load, stress, spike, soak) Strong hands-on background with LoadRunner and Dynatrace (plus tools such as NeoLoad, k6 … or JMeter) Skilled in chaosengineering , resilience testing , and system scalability Experience defining and managing SLOs, SLIs, and error budgets Strong knowledge of distributed systems , cloud platforms (AWS, Azure, or GCP), and microservice architectures Proven ability to influence senior stakeholders and embed performance and reliability into the SDLC Inspirational leader with experience building, mentoring, and enabling high-performing More ❯
Henley-on-thames, Oxfordshire, United Kingdom Hybrid/Remote Options
Invesco Real Estate
Volunteering days Enhanced parental leave Life insurance Your Role: Lead the technical assessment, architecture, and continuous enhancement of technology risk controls across hybrid and cloud-native environments. Leverage advanced engineering practices, automation, and analytics to proactively identify, quantify, and mitigate risks, embedding a culture of technical excellence and risk accountability. What you will be doing: Architect and implement robust … technology risk controls, and assessments using advanced engineering techniques, chaosengineering, automated fault injection, adversarial simulations across cloud (AWS, Azure, GCP) and on-premises platforms. Design and operationalize real-time Key Risk Indicators (KRIs) by integrating telemetry from SIEM (e.g., Splunk, Sentinel), CSPM (e.g., Prisma Cloud, Wiz), EDR, and workload protection platforms. Develop analytics pipelines for early … workflows (e.g., SOAR, IaC-based remediation) to ensure timely, effective, and sustainable outcomes. Develop and maintain integrated, actionable risk dashboards and reporting using Power BI, custom APIs, and data engineering best practices. Partner with engineering, DevOps, and SRE teams to embed risk controls into CI/CD pipelines, deliver technical training, and drive adoption of secure-by-design More ❯
Strong expertise in implementing Site Reliability Engineering (SRE) principles. Advanced knowledge of establishing observability using tools Dynatrace & Datadog (primary skills). Proficiency in automation & scripting using Python & Ansible (primary skills). Strong experience with cloud platforms AWS & Azure (primary skills). Solid understanding of containerization and orchestration tools like Docker and Kubernetes . Proficiency in cloud native distributed systems … microservices architecture. Exposure to AI/ML techniques for predictive analytics and automated problem resolution. Familiarity with CI/CD pipelines & enabling automated release & deployment engineering solutions. Good to have experience with chaosengineering tools like Gremlin or Chaos Monkey and implementing automation frameworks for resilience tracking. Ability to manage and prioritize multiple projects in a … fast-paced environment. Strong interpersonal and communication skills to work effectively across teams. Excellent problem solving, analytical thinking, and adaptability. Strategic mindset balancing engineering excellence with business priorities. More ❯