MLOps Architect - AWS
About Quantiphi:
Quantiphi is an award-winning, AI-First global digital engineering company that helps the world’s leading Fortune 1000 organizations transform bold ideas into measurable business impact. We go beyond building innovative AI technologies—we solve the problems that matter most to our clients.
Since our founding in 2013, Quantiphi has built a proven track record of turning complex challenges into meaningful outcomes across industries.
Headquartered in Boston, with more than 4,000 professionals worldwide, we partner with global enterprises to deliver large-scale digital, cloud, and AI-driven transformation. #SolvingWhatMatters.
We are an Elite and Premier partner to Google Cloud, AWS, NVIDIA, Snowflake, and other leading technology platforms, and our work has been recognized across the industry, including:
- 21 Google Cloud Partner of the Year awards in the past 10 years
- 3 AWS AI/ML Partner of the Year awards
- 3 NVIDIA Partner of the Year awards
- 3 Snowflake Partner of the Year awards
- Rated Leaders by Gartner, Forrester, IDC, ISG, Everest Group and other leading analyst firms
Quantiphi delivers First-in-class AI solutions across Life Sciences, Healthcare, Banking, Financial Services, CPG, Manufacturing, Energy, High-Tech, Telecommunications, etc., powered by cutting-edge Generative AI and Agentic AI accelerators.
We are also proud to be certified as a Great Place to Work—reflecting our commitment to our people and our culture.
For more details, visit: Website or LinkedIn Page
Role: MLOps Architect - AWS
Experience Level: 8+ years
Employment type: Full Time
Location: Remote (UK)
What you will do:
- We are seeking an experienced MLOps/AIOps Architect who can drive end-to-end implementation of MLOps strategy and also contribute broadly across other enterprise AI/ML programs. This role demands a strong architectural mindset, hands-on technical depth, and the ability to design scalable, cloud-native machine learning operations across traditional ML and modern LLM workflows.
- The ideal candidate will bring experience with SageMaker-based MLOps pipelines, evaluation of equivalent tooling stacks, hybrid MLOps/LLMOps automation, CI/CD orchestration, governance, monitoring, automation and production-grade scalability patterns.
Key Responsibilities:
- Architect and implement the MLOps strategy for the programme, ensuring alignment with the project proposal and delivery roadmap.
- Design and own enterprise-grade ML/LLM pipelines covering model training, validation, deployment, versioning, monitoring, and CI/CD automation.
- Build container-oriented ML platforms (EKS-first) while evaluating alternative orchestration tools with similar capabilities (Kubeflow, SageMaker, MLflow, Airflow, etc.).
- Implement hybrid MLOps + LLMOps workflows, including prompt/version governance, evaluation frameworks, and monitoring for LLM-based systems.
- Serve as a technical authority across multiple internal and customer projects, contributing architectural patterns, best practices, and reusable frameworks.
- Enable observability, monitoring, drift detection, lineage tracking, and auditability across ML/LLM systems.
- Define and implement standards for model deployment, monitoring, governance, and automation to ensure production-grade reliability and scalability.
- Collaborate with cross-functional teams — data engineering, platform, DevOps, and client stakeholders — to deliver production-ready ML solutions.
- Ensure all solutions adhere to security, governance, and compliance expectations, particularly around handling cloud services, Kubernetes workloads, and MLOps tools.
- Conduct architecture reviews, troubleshoot complex ML system issues, and guide teams through implementation across cloud-native ML platforms.
- Mentor engineers and provide guidance on modern MLOps tools, platform capabilities, and best practices.
Basic Qualifications (BQs):
- Expierence working in ML/AI engineering or MLOps roles with strong architecture exposure.
- Strong experience in leading enterprise grade MLOps strategy and its execution.
- Proven experience in implementing the adoption of enterprise-grade MLOps platforms with client data science teams.
- Proven leadership in defining and executing enterprise MLOps strategy.
- Demonstrated success in driving the adoption of enterprise-grade MLOps platforms with client data science teams.
- Strong expertise in AWS cloud-native ML stack, including: SageMaker(primary), EKS, Lambda, API Gateway, CI/CD (CodeBuild/CodePipeline or equivalent).
- Hands-on experience with at least one major MLOps toolset and awareness of alternatives: MLflow, Kubeflow, SageMaker Pipelines, Airflow, BentoML, KServe, Seldon.
- Deep understanding of model lifecycle management (feature engineering->training → registry → deployment → monitoring).
- Experience implementing or supporting LLMOps pipelines, including: prompt versioning, evaluation metrics, automation frameworks.
- Deep understanding of ML lifecycle: data ingestion, feature engineering, training, evaluation, model packaging, CI/CD, drift detection, monitoring, and governance.
- Strong experience with AWS SageMaker (Pipelines, Feature Store, Model Registry, Model Monitor).
- Experience implementing ML CI/CD pipelines including automated training, testing, validation, model promotion, and endpoint deployment.
- Experience working on Infrastructure as Code (IaC) tools and CI/CD pipelines.
- Experience with Kubernetes based development.
- Experience with feature engineering pipelines and Feature Store management.
- Understanding of lineage tracking: training data snapshot, feature versions, code versioning, metadata tracking, reproducibility.
- Hands-on experience with AWS Bedrock and Agentcore service.
- Experience with CloudWatch, SageMaker Model Monitor, Prometheus/Grafana.
- Strong foundation in Python and cloud-native development patterns.
- Solid understanding of security best practices, IAM, secrets management, and artifact governance.
Other Qualifications (OQs):
- Experience with vector databases, RAG pipelines, or multi-agent AI systems.
- Exposure to DevOps and infrastructure-as-code (Terraform, Helm, CDK).
- Hands-on understanding of model drift detection, A/B testing, canary rollouts, and blue-green deployments.
- Familiarity with Observability stacks (Prometheus, Grafana, CloudWatch, OpenTelemetry).
- SQL and data transformation experience using Snowflake, Databricks, Spark.
- Ability to translate business goals into scalable AI/ML platform designs.
- Strong communication and cross-team collaboration skills.
- Ability to guide engineering teams through technical uncertainty and design choices.
What is in it for you:
- Join one of the world’s fastest-growing AI-first digital engineering companies and make a real impact at scale.
- Lead and collaborate with a high-energy team of talented, driven individuals solving complex, meaningful challenges.
- Work with Fortune 500 companies and disruptive innovators in a research-driven environment with 60+ patents.
- Stay ahead of the curve by gaining hands-on experience with cutting-edge AI, ML, data, and cloud technologies while continuously upskilling.