AIOps / LLM Operations

Deploying AI is only the beginning. Organisations increasingly require operational leaders capable of ensuring AI systems remain secure, observable, reliable and effective once in production.

Role Overview

  • Establish AI operational processes and standards
  • Monitor model, agent and workflow performance
  • Define observability, incident management and support frameworks
  • Manage AI deployment, release and rollback processes
  • Support security, compliance and operational resilience requirements
  • Drive continuous improvement across AI systems
  • Ensure cost is controlled

Tools & Technologies Required

  • MLflow, Weights & Biases or similar
  • Monitoring and observability platforms
  • CI/CD pipelines
  • Cloud infrastructure
  • Logging and performance management tools
  • Security and governance frameworks

Nice to Have

  • Site Reliability Engineering (SRE) background
  • MLOps experience
  • Cloud architecture knowledge
  • Experience supporting regulated environments

About You

  • Strong operational mindset
  • Calm and methodical under pressure
  • Focused on reliability and resilience
  • Able to balance speed with control
  • Comfortable working across engineering, security and governance teams

Benefits & Perks

  • Amongst the best around.

Job Details

Company
Diagonal recruitment
Location
London Area, United Kingdom
Posted