Principal Data Scientist

You will be part of a team designing and building a Gen AI virtual agent to support customers and employees across multiple channels. You will build and run LLM-powered agentic experiences, owning the design, orchestration, MLOps, and continuous improvement.

  • Design & build client-specific GenAI/LLM virtual agents
  • Enable the orchestration, management, and execution of AI-powered interactions through purpose-built AI agents
  • Design, build and maintain robust LLM powered processing workflows
  • Develop cutting edge testing suites related to bespoke LLM performance metrics
  • Craft context-aware, multi-channel self-service experiences
  • Develop bespoke testing suites and LLM performance metrics
  • CI/CD for ML/LLM: automated build/train/validate/deploy pipelines for chatbots and agent services
  • IaC - Infrastructure as Code, (Terraform/CloudFormation) to provision scalable cloud for training and real-time inference
  • Observability: monitoring, drift detection, hallucination, SLOs, and alerting for model and service health
  • Serving at scale: containerised, auto-scaling (e.g., Kubernetes) with low-latency inference
  • Data & model versioning; maintain a central model registry with lineage and rollback
  • Workflow automation across the ML lifecycle (data ingestion → retraining → deployment)
  • Deliver a live performance dashboard (intent accuracy, latency, error rates) and a documented retraining strategy
  • Lead and foster creativity around frameworks/models; collaborate closely with product, engineering, and client stakeholders

Qualifications / Experience

  • Relevant primary level degree and ideally MSc or PhD
  • Proven expertise in mathematics and classical ML algorithms, plus deep knowledge of LLMs (prompting, fine-tuning, RAG/tool use, evaluation)
  • Hands-on with AWS and Azure services for data/ML (e.g., Bedrock/SageMaker, Azure OpenAI/Azure ML)
  • Strong engineering: Python, APIs, containers, Git; CI/CD (GitHub Actions/Azure DevOps), IaC (Terraform/CloudFormation)
  • Scalable Serving Infrastructure: A containerized, auto-scaling environment (e.g., using Kubernetes) to serve the chatbot model with low latency
  • Workflow Automation: Automate the end-to-end machine learning lifecycle, from data ingestion and preprocessing to model retraining and deployment
  • Live Performance Dashboard: A real-time dashboard displaying key model metrics such as intent accuracy, response latency, and error rates
  • Centralized Model Registry: A versioned repository for all trained models, their performance metrics, and associated training data
  • Documented Retraining Strategy: An automated workflow and documentation outlining the process for periodically retraining the model on new data
  • Experience with Kubernetes, inference optimisation, caching, vector stores, and model registries
  • Clear communication, stakeholder management, and a habit of writing crisp technical docs and runbooks

Personal Attributes

  • Personal Integrity, Stakeholder Management, Project Management, Agile Methodologies, Automation, Data Visualisation and Analysis.
Company
ISx4
Location
United Kingdom, UK
Posted
Company
ISx4
Location
United Kingdom, UK
Posted