MLOps Architect - AWS
About Quantiphi:
- Quantiphi is an award-winning, AI-First digital engineering and consulting company focused on delivering high-impact Services and Solutions that help organizations solve what truly matters. We partner with enterprises to reimagine their businesses through intelligent, scalable, and transformative AI—driving measurable outcomes at the very core of their operations.
- Since our founding in 2013, Quantiphi has tackled some of the world’s most complex business challenges by combining deep industry expertise, disciplined cloud and data engineering practices, and cutting-edge applied AI research. Our work is rooted in delivering accelerated, quantifiable business value—not just technology for technology’s sake.
- Headquartered in Boston, Quantiphi is a global organization with 4,000+ professionals serving clients across key industry verticals, including BFSI, Healthcare & Life Sciences, CPG, MFG, TME etc. As an Elite and Premier partner to leading cloud and AI platforms such as NVIDIA, Google Cloud, AWS, and Snowflake, we build and deliver enterprise-grade AI services and solutions that create real-world impact.
We have been recognized with:
- 3x AWS AI/ML award wins.
- 3x NVIDIA Partner of the Year titles.
- Recognized Leaders by Gartner, Forrester, IDC, ISG, Everest Group and other leading analyst and independent research firms.
- We offer first-in-class industry solutions across Healthcare, Financial Services, Consumer Goods, Manufacturing, and more, powered by cutting-edge Generative AI and Agentic AI accelerators.
- We have been certified as a Great Place to Work for the third year in a row- 2021, 2022, 2023.
Be part of a trailblazing team that’s shaping the future of AI, ML, and cloud innovation. Your next big opportunity starts here!
For more details, visit: Website or LinkedIn Page.
Role: MLOps Architect - AWS
Experience Level: 8+ years
Employment type: Full Time
Location: Remote (UK)
Description:
- We are seeking an experienced MLOps/AIOps Architect who can drive end-to-end implementation of MLOps strategy and also contribute broadly across other enterprise AI/ML programs. This role demands a strong architectural mindset, hands-on technical depth, and the ability to design scalable, cloud-native machine learning operations across traditional ML and modern LLM workflows.
- The ideal candidate will bring experience with SageMaker-based MLOps pipelines, evaluation of equivalent tooling stacks, hybrid MLOps/LLMOps automation, CI/CD orchestration, governance, monitoring, automation and production-grade scalability patterns.
Key Responsibilities:
- Architect and implement the MLOps strategy for the programme, ensuring alignment with the project proposal and delivery roadmap.
- Design and own enterprise-grade ML/LLM pipelines covering model training, validation, deployment, versioning, monitoring, and CI/CD automation.
- Build container-oriented ML platforms (EKS-first) while evaluating alternative orchestration tools with similar capabilities (Kubeflow, SageMaker, MLflow, Airflow, etc.).
- Implement hybrid MLOps + LLMOps workflows, including prompt/version governance, evaluation frameworks, and monitoring for LLM-based systems.
- Serve as a technical authority across multiple internal and customer projects, contributing architectural patterns, best practices, and reusable frameworks.
- Enable observability, monitoring, drift detection, lineage tracking, and auditability across ML/LLM systems.
- Define and implement standards for model deployment, monitoring, governance, and automation to ensure production-grade reliability and scalability.
- Collaborate with cross-functional teams — data engineering, platform, DevOps, and client stakeholders — to deliver production-ready ML solutions.
- Ensure all solutions adhere to security, governance, and compliance expectations, particularly around handling cloud services, Kubernetes workloads, and MLOps tools.
- Conduct architecture reviews, troubleshoot complex ML system issues, and guide teams through implementation across cloud-native ML platforms.
- Mentor engineers and provide guidance on modern MLOps tools, platform capabilities, and best practices.
Skills:
- Expierence working in ML/AI engineering or MLOps roles with strong architecture exposure.
- Strong experience in leading enterprise grade MLOps strategy and its execution.
- Proven experience in implementing the adoption of enterprise-grade MLOps platforms with client data science teams.
- Proven leadership in defining and executing enterprise MLOps strategy.
- Demonstrated success in driving the adoption of enterprise-grade MLOps platforms with client data science teams.
- Strong expertise in AWS cloud-native ML stack, including: SageMaker(primary), EKS, Lambda, API Gateway, CI/CD (CodeBuild/CodePipeline or equivalent).
- Hands-on experience with at least one major MLOps toolset and awareness of alternatives: MLflow, Kubeflow, SageMaker Pipelines, Airflow, BentoML, KServe, Seldon.
- Deep understanding of model lifecycle management (feature engineering->training → registry → deployment → monitoring).
- Experience implementing or supporting LLMOps pipelines, including: prompt versioning, evaluation metrics, automation frameworks.
- Deep understanding of ML lifecycle: data ingestion, feature engineering, training, evaluation, model packaging, CI/CD, drift detection, monitoring, and governance.
- Strong experience with AWS SageMaker (Pipelines, Feature Store, Model Registry, Model Monitor).
- Experience implementing ML CI/CD pipelines including automated training, testing, validation, model promotion, and endpoint deployment.
- Experience working on Infrastructure as Code (IaC) tools and CI/CD pipelines.
- Experience with Kubernetes based development.
- Experience with feature engineering pipelines and Feature Store management.
- Understanding of lineage tracking: training data snapshot, feature versions, code versioning, metadata tracking, reproducibility.
- Hands-on experience with AWS Bedrock and Agentcore service.
- Experience with CloudWatch, SageMaker Model Monitor, Prometheus/Grafana.
- Strong foundation in Python and cloud-native development patterns.
- Solid understanding of security best practices, IAM, secrets management, and artifact governance.
Nice to have:
- Experience with vector databases, RAG pipelines, or multi-agent AI systems.
- Exposure to DevOps and infrastructure-as-code (Terraform, Helm, CDK).
- Hands-on understanding of model drift detection, A/B testing, canary rollouts, and blue-green deployments.
- Familiarity with Observability stacks (Prometheus, Grafana, CloudWatch, OpenTelemetry).
- SQL and data transformation experience using Snowflake, Databricks, Spark.
- Ability to translate business goals into scalable AI/ML platform designs.
- Strong communication and cross-team collaboration skills.
- Ability to guide engineering teams through technical uncertainty and design choices.
What is in it for you:
- Make an impact at one of the world’s fastest-growing AI-first digital engineering companies.
- Upskill and discover your potential as you solve complex challenges in cutting-edge areas of technology alongside passionate, talented colleagues.
- Work where innovation happens - work with disruptive innovators in a research-focused organization with 60+ patents filed across various disciplines.
- Stay ahead of the curve—immerse yourself in breakthrough AI, ML, data, and cloud technologies and gain exposure working with Fortune 500 companies.