help shape how platform engineering is done as the team continues to scale. Tech stack AWS (Core services - EC2, RDS, S3, IAM, etc.) Configuration Management Ansible Monitoring and Observability Grafana, Prometheus Kubernetes (building and managing production clusters) Terraform (IaC provisioning) Python or Java (scripting, automation) GitHub Actions (CI/CD pipelines) What They’re Looking For Experience in AWS cloud More ❯
technical decisions and help shape how platform engineering is done as the team continues to scale. Tech stack AWS (Core services - EC2, RDS, S3, IAM, etc.) Monitoring and Observability Grafana, Prometheus Kubernetes (building and managing production clusters) Terraform (IaC provisioning) Python, Bash or Go (scripting, automation) GitHub Actions (CI/CD pipelines) What They’re Looking For Experience in AWS More ❯
help shape how platform engineering is done as the team continues to scale. Tech stack AWS (Core services - EC2, RDS, S3, IAM, etc.) Configuration Management Ansible Monitoring and Observability Grafana, Prometheus Kubernetes (building and managing production clusters) Terraform (IaC provisioning) GitHub Actions (CI/CD pipelines) What They’re Looking For Experience in AWS cloud infrastructure (ideally in a regulated More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Huxley Associates
or ARM templates Hands-on experience with CI/CD pipelines (e.g., Bitbucket, Azure DevOps) API Gateway, Azure API Management (APIM), Azure Application Gateway Monitoring tools such as Prometheus, Grafana, and Azure Monitor Understanding of secure multi-region deployments and network segmentation Remote Working Expected to be in the office 1 to 2 days a week. With additional days depending More ❯
API endpoints and overseeing model deployment workflows to ensure seamless integration and scalability. Key Responsibilities: Platform Operations & Monitoring • Monitor ML model endpoints and overall platform health using tools like Grafana and Domino Data Lab. • Respond to incidents and alerts, perform code fixes, manage incidents internally and manages changes through ServiceNow • Interface directly with Domino Data Lab support to resolve model … monitoring. • Working knowledge of core data science concepts, such as model evaluation metrics, overfitting, data drift, and feature importance. • Proficiency in AWS services (like S3, RedShift etc) • Experience with Grafana for monitoring and alerting. • Good to have hands-on experience with Domino Data Lab platform. • Solid understanding of CI/CD pipelines, version control, containerization, and orchestration. • Ability to communicate More ❯
london (city of london), south east england, united kingdom
HCLTech
API endpoints and overseeing model deployment workflows to ensure seamless integration and scalability. Key Responsibilities: Platform Operations & Monitoring • Monitor ML model endpoints and overall platform health using tools like Grafana and Domino Data Lab. • Respond to incidents and alerts, perform code fixes, manage incidents internally and manages changes through ServiceNow • Interface directly with Domino Data Lab support to resolve model … monitoring. • Working knowledge of core data science concepts, such as model evaluation metrics, overfitting, data drift, and feature importance. • Proficiency in AWS services (like S3, RedShift etc) • Experience with Grafana for monitoring and alerting. • Good to have hands-on experience with Domino Data Lab platform. • Solid understanding of CI/CD pipelines, version control, containerization, and orchestration. • Ability to communicate More ❯