Knutsford, Cheshire, United Kingdom Hybrid / WFH Options
Octopus Computer Associates
Role Overview: We are seeking a highly capable Security Engineer to join a focused team developing a telemetry pipeline MVP. This role requires deep technical expertise in containerized environments, observability tooling, and secure infrastructure design. The ideal candidate will ensure that security is Embedded across the pipeline architecture, from deployment to data flow, while collaborating closely with DevOps and development … risk analysis for the telemetry pipeline Collaborate with DevOps engineers to embed security into infrastructure-as-code and deployment workflows Monitor and respond to security events and alerts from observability platforms Maintain documentation of security architecture, policies, and incident response procedures Required Skills & Experience: Strong hands-on experience with Kubernetes and OpenShift in secure production environments Proficiency in GitLab and More ❯
london, south east england, united kingdom Hybrid / WFH Options
Paradigm Talent
and desktop applications. Design and implement distributed systems for async processing, ML workflows, and asset pipelines. Own authentication, billing, and subscription systems — ensuring reliability and seamless user experience. Drive observability, performance tuning, and deployment automation across the stack. Collaborate closely with product, frontend, and ML teams to deliver features that delight and scale. You should have 5+ years of experience … with cloud-native infrastructure (AWS, GCP, or Azure). Familiarity with auth, billing, or subscription systems . Background in 3D graphics, creative tooling, or ML pipelines . Knowledge of observability tools like Grafana, Prometheus, or OpenTelemetry. This is a rare opportunity to join an early-stage team backed by leading deep-tech investors, building the foundation of a platform that More ❯
slough, south east england, united kingdom Hybrid / WFH Options
Paradigm Talent
and desktop applications. Design and implement distributed systems for async processing, ML workflows, and asset pipelines. Own authentication, billing, and subscription systems — ensuring reliability and seamless user experience. Drive observability, performance tuning, and deployment automation across the stack. Collaborate closely with product, frontend, and ML teams to deliver features that delight and scale. You should have 5+ years of experience … with cloud-native infrastructure (AWS, GCP, or Azure). Familiarity with auth, billing, or subscription systems . Background in 3D graphics, creative tooling, or ML pipelines . Knowledge of observability tools like Grafana, Prometheus, or OpenTelemetry. This is a rare opportunity to join an early-stage team backed by leading deep-tech investors, building the foundation of a platform that More ❯
london (city of london), south east england, united kingdom Hybrid / WFH Options
Paradigm Talent
and desktop applications. Design and implement distributed systems for async processing, ML workflows, and asset pipelines. Own authentication, billing, and subscription systems — ensuring reliability and seamless user experience. Drive observability, performance tuning, and deployment automation across the stack. Collaborate closely with product, frontend, and ML teams to deliver features that delight and scale. You should have 5+ years of experience … with cloud-native infrastructure (AWS, GCP, or Azure). Familiarity with auth, billing, or subscription systems . Background in 3D graphics, creative tooling, or ML pipelines . Knowledge of observability tools like Grafana, Prometheus, or OpenTelemetry. This is a rare opportunity to join an early-stage team backed by leading deep-tech investors, building the foundation of a platform that More ❯
explanations, citations) clear and accessible. Architecture: Shape a modular, scalable platform on AWS (ECS), separating ingestion, retrieval, reasoning, and delivery. Quality & reliability: Ensure reliability through testing, CI/CD, observability (metrics/tracing for LLM and retrieval paths), and performance optimisation. Collaboration: Partner with product and leadership teams, mentor peers, and play a role in shaping technical direction. Innovation: Explore … to have Experience with rerankers (e.g., cross-encoders), hybrid retrieval (SQL + vectors), query expansion, or lightweight knowledge graphs. Familiarity with LLM evaluation tooling (LangChain, LlamaIndex, OpenAI Evals) and observability for cost, relevance, and latency. Background in B2B data products or fintech. Applicants must be based in the UK with full right to work. More ❯
and user-focused. Mentor junior engineers, providing guidance on coding practices and problem-solving. Leverage AI across the SDLC to improve delivery (e.g. code quality, test coverage, release speed, observability). Evaluate trade-offs of AI-driven solutions and collaborate with Product/Design/Tech Lead to ensure AI use supports user and business value. Share learnings about effective … Opensearch/Elasticsearch Familiarity with GraphRag or experience building knowledge graphs Familiarity with the latest Generative AI developments such as LLM architectures, fine-tuning strategies, Agentic workflows Experience in observability tooling for distributed AI systems. Understanding of data ingestion and transformation pipelines supporting vector and knowledge graph stores. Proven ability to own feature delivery end-to-end. Strong front-end More ❯
Bracknell, Berkshire, South East, United Kingdom Hybrid / WFH Options
John Lewis Head Office
the teams checks, your role in the team will be to mentor others in testing practice; coach them to adopt and improve their quality approaches including deployment approaches and observability; review and contribute to the teams codebase and pipeline configuration; help the team with their system of work from first business need to monitoring services in production. At all times … performance, resource usage, variable bandwidth, device compatibility, accessibility etc.) and advising on how these risks should be mitigated. Understanding operational and non-functional requirements (such as resilience, performance and observability) and how solutions are implemented and tested. Desirable skills/experience you may have Bitrise/Gitlab CI GraphQL Backend for Frontend (BFF) patterns Microservice Architectures Experience of cloud infrastructure More ❯
london (city of london), south east england, united kingdom
oryxsearch.io
Senior Software Engineer II – ML Platform & GenAI (relocation to dubai) Location: Dubai, United Arab Emirates As part of the Machine Learning Platform & Generative AI Applications team, this role sits at the intersection of engineering, data science, and product development. The More ❯
Senior Software Engineer II – ML Platform & GenAI (relocation to dubai) Location: Dubai, United Arab Emirates As part of the Machine Learning Platform & Generative AI Applications team, this role sits at the intersection of engineering, data science, and product development. The More ❯
Senior Software Engineer II – ML Platform & GenAI (relocation to dubai) Location: Dubai, United Arab Emirates As part of the Machine Learning Platform & Generative AI Applications team, this role sits at the intersection of engineering, data science, and product development. The More ❯
Glasgow, Lanarkshire, Scotland, United Kingdom Hybrid / WFH Options
Virgin Money
from risks and prevent attacks. We've a new and exciting role within our IT Service Management function and looking for an experienced Lead Technical Consultant who specialises in Observability who'll report directly to the Head of Service Management. What you'll be doing • Act as the key consumer & gatekeeper of operational Observability across the estate, ensuring the data … is not only visible but actionable which in turn will drive real-time operational excellence.• Own, shape & develop Observability in practice i.e. day to day for the 24/7 Operations teams• Be the strategic player for all things in driving the continuous enhancements & maintenance of our monitoring systems. • Utilise your deep knowledge of monitoring tools and best practices. • Help … shape Observability to lead the diagnosis and resolution of incidents, ensuring that creative solutions are identified or escalated.• Work closely with senior stakeholders to manage risks and influence key decisions. • Take charge of technical decision-making and develop our teams capabilities. • Translate strategy into actionable plans and communicate effectively with your team. • Provide guidance and mentorship, ensuring a unified approach More ❯
+ Bonus (24%) + Benefits Location: 2 days per week in the West London office (potential flexibility) Key Responsibilities: Work on AWS-first infrastructure, modernising Legacy systems and embedding observability, automation, and CI/CD practices. Play a key role in a business-wide shift to SRE and DevSecOps ways of working. Influence and coach teams, define SLOs/SLIs … and drive proactive reliability and performance culture. Experience required: 3-5 years' experience as an SRE. Strong understanding of SRE principles (Monitoring, observability etc). Hands-on experience with AWS, Terraform, and CI/CD pipelines. Experience with containerisation (Kubernetes or Docker) and monitoring tools (Datadog, Splunk, etc.). Strong communicator able to evangelise SRE practices across teams. This is More ❯
lead performance testing and chaos engineering initiatives, and embed reliability best practices across engineering, DevOps, and infrastructure teams. This is a senior, strategic leadership role focused on system excellence, observability, and continuous improvement. Ideal Candidate: Proven experience leading Performance Engineering, Reliability, or SRE functions Deep expertise in performance testing methodologies (load, stress, spike, soak) Strong hands-on background with LoadRunner … strategy across critical platforms and services Oversee load, stress, and chaos testing initiatives to ensure systems perform and recover under real-world conditions Define and drive best practices for observability, monitoring, and APM adoption using tools like Dynatrace Drive incident reduction, faster recovery (MTTR) , and continuous reliability improvements Champion a culture of performance ownership , ensuring teams build with scalability, stability More ❯
lead performance testing and chaos engineering initiatives, and embed reliability best practices across engineering, DevOps, and infrastructure teams. This is a senior, strategic leadership role focused on system excellence, observability, and continuous improvement. Ideal Candidate: Proven experience leading Performance Engineering, Reliability, or SRE functions Deep expertise in performance testing methodologies (load, stress, spike, soak) Strong hands-on background with LoadRunner … strategy across critical platforms and services Oversee load, stress, and chaos testing initiatives to ensure systems perform and recover under real-world conditions Define and drive best practices for observability, monitoring, and APM adoption using tools like Dynatrace Drive incident reduction, faster recovery (MTTR) , and continuous reliability improvements Champion a culture of performance ownership , ensuring teams build with scalability, stability More ❯
lead performance testing and chaos engineering initiatives, and embed reliability best practices across engineering, DevOps, and infrastructure teams. This is a senior, strategic leadership role focused on system excellence, observability, and continuous improvement. Ideal Candidate: Proven experience leading Performance Engineering, Reliability, or SRE functions Deep expertise in performance testing methodologies (load, stress, spike, soak) Strong hands-on background with LoadRunner … strategy across critical platforms and services Oversee load, stress, and chaos testing initiatives to ensure systems perform and recover under real-world conditions Define and drive best practices for observability, monitoring, and APM adoption using tools like Dynatrace Drive incident reduction, faster recovery (MTTR) , and continuous reliability improvements Champion a culture of performance ownership , ensuring teams build with scalability, stability More ❯
lead performance testing and chaos engineering initiatives, and embed reliability best practices across engineering, DevOps, and infrastructure teams. This is a senior, strategic leadership role focused on system excellence, observability, and continuous improvement. Ideal Candidate: Proven experience leading Performance Engineering, Reliability, or SRE functions Deep expertise in performance testing methodologies (load, stress, spike, soak) Strong hands-on background with LoadRunner … strategy across critical platforms and services Oversee load, stress, and chaos testing initiatives to ensure systems perform and recover under real-world conditions Define and drive best practices for observability, monitoring, and APM adoption using tools like Dynatrace Drive incident reduction, faster recovery (MTTR) , and continuous reliability improvements Champion a culture of performance ownership , ensuring teams build with scalability, stability More ❯
Employment Type: Full-Time
Salary: £84,000 - £95,000 per annum, Negotiable, Inc benefits
lead performance testing and chaos engineering initiatives, and embed reliability best practices across engineering, DevOps, and infrastructure teams. This is a senior, strategic leadership role focused on system excellence, observability, and continuous improvement. Ideal Candidate: Proven experience leading Performance Engineering, Reliability, or SRE functions Deep expertise in performance testing methodologies (load, stress, spike, soak) Strong hands-on background with LoadRunner … strategy across critical platforms and services Oversee load, stress, and chaos testing initiatives to ensure systems perform and recover under real-world conditions Define and drive best practices for observability, monitoring, and APM adoption using tools like Dynatrace Drive incident reduction, faster recovery (MTTR) , and continuous reliability improvements Champion a culture of performance ownership , ensuring teams build with scalability, stability More ❯
london, south east england, united kingdom Hybrid / WFH Options
Rightmove
scientists to take models from development to production-grade systems, ensuring scalability, reproducibility, and robustness. Automating feature engineering and data pipeline processes, ensuring reproducibility and auditability. Implementing monitoring and observability to detect drift, bias, and performance degradation, and setting up rollback/recovery processes. Using MLOps tools (e.g., Vertex Pipelines, Kubeflow, Weights & Biases) for experiment tracking, model registry, and automated … distributed systems). 3+ years of experience as an ML Engineer, MLOps Engineer, Data Engineer, or similar, in a larger-scale, production-focused environment. Hands-on with model monitoring, observability, and retraining pipelines. Exposure to feature stores, registries, and experimentation frameworks. Familiarity with business-driven metrics and experience balancing ML performance with commercial goals. Experience with generative AI and LLM More ❯
hertfordshire, east anglia, united kingdom Hybrid / WFH Options
Rightmove
scientists to take models from development to production-grade systems, ensuring scalability, reproducibility, and robustness. Automating feature engineering and data pipeline processes, ensuring reproducibility and auditability. Implementing monitoring and observability to detect drift, bias, and performance degradation, and setting up rollback/recovery processes. Using MLOps tools (e.g., Vertex Pipelines, Kubeflow, Weights & Biases) for experiment tracking, model registry, and automated … distributed systems). 3+ years of experience as an ML Engineer, MLOps Engineer, Data Engineer, or similar, in a larger-scale, production-focused environment. Hands-on with model monitoring, observability, and retraining pipelines. Exposure to feature stores, registries, and experimentation frameworks. Familiarity with business-driven metrics and experience balancing ML performance with commercial goals. Experience with generative AI and LLM More ❯
buckinghamshire, south east england, united kingdom Hybrid / WFH Options
Rightmove
scientists to take models from development to production-grade systems, ensuring scalability, reproducibility, and robustness. Automating feature engineering and data pipeline processes, ensuring reproducibility and auditability. Implementing monitoring and observability to detect drift, bias, and performance degradation, and setting up rollback/recovery processes. Using MLOps tools (e.g., Vertex Pipelines, Kubeflow, Weights & Biases) for experiment tracking, model registry, and automated … distributed systems). 3+ years of experience as an ML Engineer, MLOps Engineer, Data Engineer, or similar, in a larger-scale, production-focused environment. Hands-on with model monitoring, observability, and retraining pipelines. Exposure to feature stores, registries, and experimentation frameworks. Familiarity with business-driven metrics and experience balancing ML performance with commercial goals. Experience with generative AI and LLM More ❯
Manchester, Lancashire, England, United Kingdom Hybrid / WFH Options
Eutopia Solutions ltd
availability, performance, vulnerability, and compliance monitoring you will operate, administer, and engineer monitoring solutions within this complex environment. You will also play a part in looking to monitoring and observability platform technical design and architectural decisions and changes. This includes the design and implementation of new monitoring systems integrations. This is an exciting time to join them as they continue … their journey with Azure and look to build out automated functions within the operation.They are looking to an individual that has good proven experience with a range of observability tools, both on-prem and in the Cloud, with a good understanding of observability frameworks.Offering excellent ongoing professional development opportunities, they are keen to engage with professionals that will embrace this. … with 2 days a week required onsite. Key skills and experience: Experience of working in a similar role within a large-scale corporate environment Proven knowledge of working with observability tools to evaluate application/system state to provide a healthy and stable platform A firm and proven understanding of Monitoring and Observability tools and core concepts Experience of working More ❯
Peterborough, Cambridgeshire, England, United Kingdom Hybrid / WFH Options
Noir
Performance & Reliability Director - Software House - Peterborough/Hybrid (Key skills: Performance Engineering, Reliability Engineering, SRE, Load Testing, Observability, Chaos Testing, Cloud Platforms, Microservices, Leadership, CI/CD, APM Tools) Are you a technology leader passionate about driving performance, scalability, and reliability across complex software platforms? Do you thrive in high-growth environments where innovation, engineering excellence, and resilience are core … lifecycle. You'll oversee system profiling, capacity planning, and test strategies - ensuring every release meets the highest standards for speed, scalability, and reliability. You'll drive the adoption of observability and monitoring frameworks, leveraging platforms like Datadog and Dynatrace to build a proactive performance culture. You'll champion continuous improvement, implement chaos testing programmes, and ensure teams deliver fault-tolerant More ❯
london (city of london), south east england, united kingdom Hybrid / WFH Options
Paradigm Talent
Role: Staff Software Engineer (Python | Backend | Infrastructure) Location: Hybrid - 2-3 days in London Office Compensation: Up to £170,000 + equity We’re working with a frontier AI lab pushing the boundaries of computational biology, combining machine learning, cloud More ❯
slough, south east england, united kingdom Hybrid / WFH Options
Paradigm Talent
Role: Staff Software Engineer (Python | Backend | Infrastructure) Location: Hybrid - 2-3 days in London Office Compensation: Up to £170,000 + equity We’re working with a frontier AI lab pushing the boundaries of computational biology, combining machine learning, cloud More ❯
london, south east england, united kingdom Hybrid / WFH Options
Paradigm Talent
Role: Staff Software Engineer (Python | Backend | Infrastructure) Location: Hybrid - 2-3 days in London Office Compensation: Up to £170,000 + equity We’re working with a frontier AI lab pushing the boundaries of computational biology, combining machine learning, cloud More ❯