Principal Data Scientist
You will be part of a team designing and building a Gen AI virtual agent to support customers and employees across multiple channels. You will build and run LLM-powered agentic experiences, owning the design, orchestration, MLOps, and continuous improvement.
- Design & build client-specific GenAI/LLM virtual agents
- Enable the orchestration, management, and execution of AI-powered interactions through purpose-built AI agents
- Design, build and maintain robust LLM powered processing workflows
- Develop cutting edge testing suites related to bespoke LLM performance metrics
- Craft context-aware, multi-channel self-service experiences
- Develop bespoke testing suites and LLM performance metrics
- CI/CD for ML/LLM: automated build/train/validate/deploy pipelines for chatbots and agent services
- IaC - Infrastructure as Code, (Terraform/CloudFormation) to provision scalable cloud for training and real-time inference
- Observability: monitoring, drift detection, hallucination, SLOs, and alerting for model and service health
- Serving at scale: containerised, auto-scaling (e.g., Kubernetes) with low-latency inference
- Data & model versioning; maintain a central model registry with lineage and rollback
- Workflow automation across the ML lifecycle (data ingestion → retraining → deployment)
- Deliver a live performance dashboard (intent accuracy, latency, error rates) and a documented retraining strategy
- Lead and foster creativity around frameworks/models; collaborate closely with product, engineering, and client stakeholders
Qualifications / Experience
- Relevant primary level degree and ideally MSc or PhD
- Proven expertise in mathematics and classical ML algorithms, plus deep knowledge of LLMs (prompting, fine-tuning, RAG/tool use, evaluation)
- Hands-on with AWS and Azure services for data/ML (e.g., Bedrock/SageMaker, Azure OpenAI/Azure ML)
- Strong engineering: Python, APIs, containers, Git; CI/CD (GitHub Actions/Azure DevOps), IaC (Terraform/CloudFormation)
- Scalable Serving Infrastructure: A containerized, auto-scaling environment (e.g., using Kubernetes) to serve the chatbot model with low latency
- Workflow Automation: Automate the end-to-end machine learning lifecycle, from data ingestion and preprocessing to model retraining and deployment
- Live Performance Dashboard: A real-time dashboard displaying key model metrics such as intent accuracy, response latency, and error rates
- Centralized Model Registry: A versioned repository for all trained models, their performance metrics, and associated training data
- Documented Retraining Strategy: An automated workflow and documentation outlining the process for periodically retraining the model on new data
- Experience with Kubernetes, inference optimisation, caching, vector stores, and model registries
- Clear communication, stakeholder management, and a habit of writing crisp technical docs and runbooks
Personal Attributes
- Personal Integrity, Stakeholder Management, Project Management, Agile Methodologies, Automation, Data Visualisation and Analysis.
- Company
- ISx4
- Location
- United Kingdom, UK
- Posted
- Company
- ISx4
- Location
- United Kingdom, UK
- Posted