secret management tools (e.g., HashiCorp Vault, Azure Key Vault) and SSO/authentication systems (e.g., Okta). Observability: Hands-on experience with platforms like DataDog, Grafana, or Azure Monitor. Networking: Strong understanding of networking principles, DNS, and related technologies. CI/CD: Skilled in creating and maintaining CI/CD More ❯
strategy while delivering incremental value. Technical Debt Management – Experience identifying and remediating inefficient architectures. Observability & Performance Optimization – Familiarity with monitoring and logging tools (e.g., Datadog, Splunk, Prometheus, New Relic). Stakeholder Management – Ability to engage with senior leadership, product managers, and engineering teams. Metrics-Driven Decision Making – Familiarity with engineering More ❯
strategy while delivering incremental value. Technical Debt Management – Experience identifying and remediating inefficient architectures. Observability & Performance Optimization – Familiarity with monitoring and logging tools (e.g., Datadog, Splunk, Prometheus, New Relic). Stakeholder Management – Ability to engage with senior leadership, product managers, and engineering teams. Metrics-Driven Decision Making – Familiarity with engineering More ❯
utilization - Strong understanding of network fundamentals (DNS, DHCP, TCP/IP, routing, load balancing, load shedding) and experience with monitoring frameworks (such as CloudWatch, Datadog, Grafana, Elastic or similar) - Experience scripting operating system tasks in Bash, Python, etc. and with Infrastructure as Code, (such as CDK, CloudFormation, Puppet, Chef, Ansible More ❯
messaging tools (Kafka, Kinesis, Redis), and cloud infrastructure technologies (AWS, Docker, Kubernetes, Terraform). Strong understanding of CI/CD pipelines, observability tools (e.g., DataDog), and Agile and Lean methodologies. Demonstrated ability to adapt to new technologies, align technical decisions with business goals, and champion quality engineering through testable code More ❯
Kinesis, DynamoDB, and Lambda Proficiency in CI/CD tools, particularly Jenkins and Spinnaker Familiarity with monitoring and observability tools such as CloudWatch and Datadog Strong understanding of security best practices in cloud environments Preferred Qualifications In addition to the required qualifications, the following skills and experiences are highly desirable More ❯
services using Docker and Kubernetes, optimised for cloud deployment (Azure preferred). Implement model and pipeline monitoring using tools such as Prometheus, Grafana, or Datadog, ensuring performance and observability. Collaborate with DevOps to maintain and improve infrastructure scalability, reliability, and cost-efficiency. Design, build and maintain internal ML tools to More ❯
CI/CD best practices and tools (e.g. GitHub Actions, Jenkins, CodePipeline) Exposure to monitoring and observability tools for ML systems (e.g. Prometheus, Grafana, DataDog, WhyLabs, Evidently, etc.) Experience in building parallelised or distributed model inference pipelines Nice-to-Have Skills Familiarity with feature stores and model registries (e.g. Feast More ❯
Experience with AWS certifications (AWS Certified Solutions Architect, Developer, or DevOps Engineer). Experience with Monitoring and Logging solutions like CloudWatch , New Relic , or Datadog . More ❯
Experience with AWS certifications (AWS Certified Solutions Architect, Developer, or DevOps Engineer). Experience with Monitoring and Logging solutions like CloudWatch , New Relic , or Datadog . More ❯
Find the latest job opportunities in AI and tech. RunPod offers GPU cloud computing for AI/ML, providing secure and community cloud options, on-demand and spot pods, and serverless GPU scaling. The flexibility of remote work with an More ❯
Python, Groovy, Golang, Powershell Familiarity with SDLC tools such as Jira, Confluence, Bitbucket, Nexus, Zephyr Monitoring and logging expertise with Grafana, Prometheus, Splunk, Dynatrace, Datadog Nice to have: Security capabilities including SCA, SAST, DAST If interested and qualified, please submit your CV for consideration. McGregor Boyall is an equal opportunity More ❯
Proficiency in SQL and data analytics tools (e.g., Sigma, Snowflake) Experience with FIX protocol and market data analysis proficient in AWS, Kubernetes, monitoring tools (Datadog, Prometheus, Grafana), and automation frameworks (Terraform, Ansible, Pulumi) For more information, please apply with a relevant CV. More ❯
Proficiency in SQL and data analytics tools (e.g., Sigma, Snowflake) Experience with FIX protocol and market data analysis proficient in AWS, Kubernetes, monitoring tools (Datadog, Prometheus, Grafana), and automation frameworks (Terraform, Ansible, Pulumi) For more information, please apply with a relevant CV. More ❯
define, version, and manage infrastructure as code across multiple environments. GitHub Actions & OIDC – build and maintain automated CI/CD pipelines with secure authentication. Datadog, Prometheus or similar – implement logging, metrics, and alerting for robust observability – the interim CTO is keen to hear your recommendation(s) on tooling and implementation More ❯
london, south east england, United Kingdom Hybrid / WFH Options
Prism Digital
define, version, and manage infrastructure as code across multiple environments. GitHub Actions & OIDC – build and maintain automated CI/CD pipelines with secure authentication. Datadog, Prometheus or similar – implement logging, metrics, and alerting for robust observability – the interim CTO is keen to hear your recommendation(s) on tooling and implementation More ❯
AWS (EC2, EKS, ECS, Fargate, Lambda, CloudFormation, Load Balancers, CloudWatch) and equivalents in Azure and GCP. Familiarity with observability tools such as Kibana, Grafana, Datadog, NewRelic, and others. Proficiency in RegEx, Lucene, and PromQL. Leadership & Onboarding: Proven experience leading technical teams focused on observability solutions and customer onboarding. Ability to More ❯
london, south east england, united kingdom Hybrid / WFH Options
ITR Partners
AWS (EC2, EKS, ECS, Fargate, Lambda, CloudFormation, Load Balancers, CloudWatch) and equivalents in Azure and GCP. Familiarity with observability tools such as Kibana, Grafana, Datadog, NewRelic, and others. Proficiency in RegEx, Lucene, and PromQL. Leadership & Onboarding: Proven experience leading technical teams focused on observability solutions and customer onboarding. Ability to More ❯
skills with an ability to drive SRE adoption A strong understanding of SQL, PHP, Kubernetes, CI/CD Observability product experience (eg: New Relic, Datadog) Strong facilitation and servant leadership skills Ability to work both independently and as part of a team Ability to work under pressure and be highly More ❯
Hands-on experience with security practices like vulnerability scanning, encryption, authentication, and secrets management (Vault, Key Management Service). Experience with SIEM platforms (Splunk, Datadog, or equivalent) for monitoring and threat detection. You thrive when working as part of a team, are comfortable in a fast-paced environment, have excellent More ❯
based Containers Orchestration Platforms - AWS EKS. Skilled working with Infrastructure as Code, Terraform required. Proficiency in setting up or integration with Observability tools e.g., Datadog, CloudWatch, X-Ray. Previous experience with troubleshooting and debugging on public cloud infrastructure (AWS). Working Proficiency of RDS Databases and Cache Engines. Experience with More ❯