and Active Directory. Experience with disaster recovery and redundancy strategies in both cloud and on-premises environments. Proficiency with leading monitoring tools, such as Datadog, Splunk , Prometheus, Grafana, ELK Stack, and New Relic. Programming expertise, especially in systems programming languages (e.g., Java, Kotlin, Scala) and databases (e.g., SQL Server, PostgreSQL More ❯
Security Best Practices: IAM, MFA, data encryption, firewall configurations. Programming/Scripting: Python, Terraform, or similar languages. Event-Driven Architectures: Kafka. Monitoring and Logging: Datadog, ELK Stack, Prometheus, etc. Experience in agile methodologies and DevOps practices. Location: Hybrid. Office located in London. (Hayes area). Office presence required: Yes. Frequency More ❯
an object-oriented language (preferably Java, .NET or C++) Expert+ level Linux administration, scripting, and troubleshooting Demonstratable knowledge of Observability tools (New Relic, Splunk, DataDog) Comprehensive experience with AWS (Amazon Web Services) and its core capabilities (VPC, EC2, ECS, Route53, Fargate, ALB/NLB distributions, etc) Extensive experience with cloud More ❯
an object-oriented language (preferably Java, .NET or C++) Expert+ level Linux administration, scripting, and troubleshooting Demonstratable knowledge of Observability tools (New Relic, Splunk, DataDog) Comprehensive experience with AWS (Amazon Web Services) and its core capabilities (VPC, EC2, ECS, Route53, Fargate, ALB/NLB distributions, etc) Extensive experience with cloud More ❯
pipelines and container technologies like Docker and Kubernetes. Deep understanding of networking, distributed systems, and databases. Expertise in monitoring and observability tools such as DataDog, Prometheus, Grafana, ELK stack, or Splunk. Excellent communication skills and a meticulous approach to problem-solving. Desirable Experience: Familiarity with Azure. Experience working in the More ❯
pipelines and container technologies like Docker and Kubernetes. Deep understanding of networking, distributed systems, and databases. Expertise in monitoring and observability tools such as DataDog, Prometheus, Grafana, ELK stack, or Splunk. Excellent communication skills and a meticulous approach to problem-solving. Desirable Experience: Familiarity with Azure. Experience working in the More ❯
london, south east england, United Kingdom Hybrid / WFH Options
Annapurna
pipelines and container technologies like Docker and Kubernetes. Deep understanding of networking, distributed systems, and databases. Expertise in monitoring and observability tools such as DataDog, Prometheus, Grafana, ELK stack, or Splunk. Excellent communication skills and a meticulous approach to problem-solving. Desirable Experience: Familiarity with Azure. Experience working in the More ❯
pipelines, and be confident scripting in Python, C# or similar scripting languages. You’ll also be comfortable working with monitoring and performance tools like Datadog or Prometheus, and ideally, you’ll have worked in a fast-moving SaaS or product-led business before. Bonus points if you’ve helped shape More ❯
pipelines, and be confident scripting in Python, C# or similar scripting languages. You’ll also be comfortable working with monitoring and performance tools like Datadog or Prometheus, and ideally, you’ll have worked in a fast-moving SaaS or product-led business before. Bonus points if you’ve helped shape More ❯
london, south east england, United Kingdom Hybrid / WFH Options
Noir
pipelines, and be confident scripting in Python, C# or similar scripting languages. You’ll also be comfortable working with monitoring and performance tools like Datadog or Prometheus, and ideally, you’ll have worked in a fast-moving SaaS or product-led business before. Bonus points if you’ve helped shape More ❯
Proficient in cloud platforms (AWS, Azure, GCP) and modern DevOps tooling (e.g., Terraform, Jenkins, Kubernetes). Hands-on with observability and monitoring tools (e.g., DataDog, Azure Monitor, AppDynamics). Expert in cyber security practices, identity management, encryption, and secure API development. Familiarity with compliance frameworks such as GDPR and PCI More ❯
london, south east england, United Kingdom Hybrid / WFH Options
Merlin Entertainments
Proficient in cloud platforms (AWS, Azure, GCP) and modern DevOps tooling (e.g., Terraform, Jenkins, Kubernetes). Hands-on with observability and monitoring tools (e.g., DataDog, Azure Monitor, AppDynamics). Expert in cyber security practices, identity management, encryption, and secure API development. Familiarity with compliance frameworks such as GDPR and PCI More ❯
secret management tools (e.g., HashiCorp Vault, Azure Key Vault) and SSO/authentication systems (e.g., Okta). Observability: Hands-on experience with platforms like DataDog, Grafana, or Azure Monitor. Networking: Strong understanding of networking principles, DNS, and related technologies. CI/CD: Skilled in creating and maintaining CI/CD More ❯
CI/CD best practices and tools (e.g. GitHub Actions, Jenkins, CodePipeline) Exposure to monitoring and observability tools for ML systems (e.g. Prometheus, Grafana, DataDog, WhyLabs, Evidently, etc.) Experience in building parallelised or distributed model inference pipelines Nice-to-Have Skills Familiarity with feature stores and model registries (e.g. Feast More ❯
services using Docker and Kubernetes, optimised for cloud deployment (Azure preferred). Implement model and pipeline monitoring using tools such as Prometheus, Grafana, or Datadog, ensuring performance and observability. Collaborate with DevOps to maintain and improve infrastructure scalability, reliability, and cost-efficiency. Design, build and maintain internal ML tools to More ❯
must. Program at a high level in at least one language such as: Java, C#, Javascript, Python or Ruby. Integration experience with PagerDuty, ServiceNow, Datadog, CloudWatch. Good understanding of Site Reliability Engineering (SRE) philosophies, technologies, platforms and tools, SLO management, incident resolution, and automation; Ability to explain technical concepts in More ❯
AWS (EC2, EKS, ECS, Fargate, Lambda, CloudFormation, Load Balancers, CloudWatch) and equivalents in Azure and GCP. Familiarity with observability tools such as Kibana, Grafana, Datadog, NewRelic, and others. Proficiency in RegEx, Lucene, and PromQL. Leadership & Onboarding: Proven experience leading technical teams focused on observability solutions and customer onboarding. Ability to More ❯
AWS (EC2, EKS, ECS, Fargate, Lambda, CloudFormation, Load Balancers, CloudWatch) and equivalents in Azure and GCP. Familiarity with observability tools such as Kibana, Grafana, Datadog, NewRelic, and others. Proficiency in RegEx, Lucene, and PromQL. Leadership & Onboarding: Proven experience leading technical teams focused on observability solutions and customer onboarding. Ability to More ❯
london, south east england, united kingdom Hybrid / WFH Options
ITR Partners
AWS (EC2, EKS, ECS, Fargate, Lambda, CloudFormation, Load Balancers, CloudWatch) and equivalents in Azure and GCP. Familiarity with observability tools such as Kibana, Grafana, Datadog, NewRelic, and others. Proficiency in RegEx, Lucene, and PromQL. Leadership & Onboarding: Proven experience leading technical teams focused on observability solutions and customer onboarding. Ability to More ❯
Python, Groovy, Golang, Powershell Familiarity with SDLC tools such as Jira, Confluence, Bitbucket, Nexus, Zephyr Monitoring and logging expertise with Grafana, Prometheus, Splunk, Dynatrace, Datadog Nice to have: Security capabilities including SCA, SAST, DAST If interested and qualified, please submit your CV for consideration. McGregor Boyall is an equal opportunity More ❯
Hands-on experience with security practices like vulnerability scanning, encryption, authentication, and secrets management (Vault, Key Management Service). Experience with SIEM platforms (Splunk, Datadog, or equivalent) for monitoring and threat detection. You thrive when working as part of a team, are comfortable in a fast-paced environment, have excellent More ❯
Find the latest job opportunities in AI and tech. RunPod offers GPU cloud computing for AI/ML, providing secure and community cloud options, on-demand and spot pods, and serverless GPU scaling. The flexibility of remote work with an More ❯