MLOps Engineer
Role Summary
We are seeking a highly skilled MLOps Engineer to focus on the deployment, monitoring, and maintenance of machine learning models in production environments. This role is platform-focused and does not involve model development or end-user support. The successful candidate will ensure reliability, scalability, and performance of ML platforms while managing API endpoints and deployment workflows.
Key Responsibilities
Platform Operations & Monitoring
- Monitor ML model endpoints and platform health using tools such as Grafana and Domino Data Lab
- Respond to incidents and alerts; perform code fixes and manage changes via ServiceNow
- Liaise with Domino Data Lab support to resolve platform-related issues
Model Deployment
- Deploy and maintain ML models in production environments
- Ensure models integrate seamlessly into automated pipelines
- Maintain reliability, version control, and governance standards
Pipeline Maintenance
- Collaborate with Data Scientists and Engineers for smooth production handoff
- Maintain and optimize ML pipelines for stability and scalability
- Improve performance, resource usage, and automation
Automation & Tooling
- Implement automation for deployment and monitoring
- Contribute to continuous platform improvements
Required Skills & Experience
- Strong Python programming experience
- Proven experience deploying and monitoring ML models in production
- Understanding of model evaluation metrics, data drift, overfitting, and feature importance
- Experience with AWS services (S3, Redshift, etc.)
- Hands-on experience with Grafana for monitoring
- Familiarity with Domino Data Lab (desirable)
- Strong knowledge of CI/CD, version control, Docker, Kubernetes
- Excellent troubleshooting and incident management skills
- Strong stakeholder communication skills