AWS, Azure, or GCP Manage infrastructure as code using tools like Terraform Monitor and maintain production systems using tools such as Prometheus, Grafana, or Datadog Collaborate with development and QA teams to improve deployment processes and system reliability Contribute to incident response, troubleshooting, and root cause analysis Requirements Approximately More ❯
GCP Background knowledge and hands-on practice in Observability, specifically experience working with one or more of the following tools - Kibana, Open-Search, Grafana, Datadog, Sumo Logic, New Relic, AppDynamics, Dynatrace, Prometheus, Logz.io, SignalFX, Instana, Splunk, Honeycomb, Jaeger Hands-on experience with Infrastructure as a Code (Terraform/Ansible) Hands More ❯
environments. Understanding of cloud security best practices and encryption. Certifications in Azure or other cloud platforms. Use of observability and logging platforms such as DataDog , App Insights , or Splunk. Experience with AKS (Azure Kubernetes Service) and infrastructure as code (e.g., Terraform, Bicep, ARM templates). Development background (.NET/C# More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Canada Life Group (UK) Ltd (The)
Observability Designing, implementing and day-to-day use of logging and monitoring tools to capture data for alerting and issue identification and resolution using DataDog, App Insights or similar tools. Designing applications and infrastructure for observability, security, and reliability. Networking & Security Monitor and enhance network performance, ensuring high levels of More ❯
QBRs. ✅ What You Bring: Strong hands-on experience with cloud platforms (AWS, GCP, Azure) and DevOps tooling Familiarity with observability stacks like Grafana, Prometheus, Datadog, Splunk, Kibana, etc. Experience with technical integrations (OpenTelemetry, Fluentd, Fluentbit, Filebeat, etc.) Skilled in troubleshooting Kubernetes and containerised environments Strong communication skills — able to engage More ❯
City of London, Greater London, UK Hybrid / WFH Options
Zettafleet
in cloud platforms (e.g., AWS, GCP or Azure), an understanding of containerisation (e.g., Docker), infrastructure-as-code software (e.g., Terraform), and observability platforms (e.g., Datadog or Grafana). Leadership: A track record of leading complex projects. Problem solving: Strong analytical problem-solving skills and attention to detail. You have the More ❯
in cloud platforms (e.g., AWS, GCP or Azure), an understanding of containerisation (e.g., Docker), infrastructure-as-code software (e.g., Terraform), and observability platforms (e.g., Datadog or Grafana). Leadership : A track record of leading complex projects. Problem solving: Strong analytical problem-solving skills and attention to detail. You have the More ❯
City of London, Greater London, UK Hybrid / WFH Options
Annapurna
pipelines and container technologies like Docker and Kubernetes. Deep understanding of networking, distributed systems, and databases. Expertise in monitoring and observability tools such as DataDog, Prometheus, Grafana, ELK stack, or Splunk. Excellent communication skills and a meticulous approach to problem-solving. Desirable Experience: Familiarity with Azure. Experience working in the More ❯
environment (3 days a week onsite in London) Experience with Terraform, Kubernetes, or CI/CD pipelines Familiarity with observability tooling (e.g. Prometheus, Grafana, Datadog) Experience mentoring or leading other engineers More ❯
using AWS services (SNS, SQS, EventBridge). Knowledge of GraphQL, WebSockets, or real-time data streaming. Exposure to DevOps and observability practices (e.g., Prometheus, Datadog, AWS CloudWatch, OpenTelemetry). Prior experience in leading distributed engineering teams. More ❯
for data extraction and troubleshooting purposes. Experience with using and troubleshooting programming interfaces especially REST APIs and Web Sockets. Experience with monitoring tools (Grafana, DataDog) Experience working with Crypto and blockchain (DLT) Familiarity with common engineering development workflows and tools (e.g. JIRA, Confluences, github, scrum, etc...) Familiarly with scaling, monitoring More ❯
City of London, Greater London, UK Hybrid / WFH Options
ITR Partners
AWS (EC2, EKS, ECS, Fargate, Lambda, CloudFormation, Load Balancers, CloudWatch) and equivalents in Azure and GCP. Familiarity with observability tools such as Kibana, Grafana, Datadog, NewRelic, and others. Proficiency in RegEx, Lucene, and PromQL. Leadership & Onboarding: Proven experience leading technical teams focused on observability solutions and customer onboarding. Ability to More ❯
experience in Cloud DevOps (e.g. AWS, Azure, GCP), including services like EKS, ECS, Lambda, etc. Proficiency with observability platforms such as Grafana, Kibana, Prometheus, Datadog, Splunk , or similar. Strong knowledge of RegEx, Lucene, PromQL . Proven track record of leading technical teams and owning the end-to-end onboarding journey. More ❯
integration (e.g., Contentful, Sanity, Strapi) Testing : Automated testing (Jest, Mocha, Selenium), TDD, BDD, integration and load testing Observability : Logging, tracing, monitoring tools (e.g., Sentry, Datadog, New Relic) System Architecture : Designing scalable, secure, and maintainable web architectures Please apply online today if you have the relevant skills and experience for the More ❯
with; Strong experience supporting technical products in a customer facing capacity Deep understanding of cloud native technologies and modern observability stacks such as Grafana, DataDog, Splunk or similar A hands on mindset and the ability to work comfortably across Kubernetes, microservices, and comparable environments Beyond technical skills, they value clear More ❯
software tools for intelligence collection and analysis. Create systems to analyze cryptocurrency behavior on both the clear and dark web. Implement observability tools (e.g., DataDog) to monitor and resolve issues. Mentor team members and promote engineering best practices. Contribute to the team’s technical strategy and decision-making. Ideal Candidate More ❯
operating and supporting bespoke trading platforms Commercial experience deploying applications into AKS clusters Experience operating one or more of Kafka, Redis, Atlassian Suite, Elastic, Datadog etc. Sponsorship cannot be offered for this role. Apply below with an up to date CV below to set up an initial call. More ❯
City of London, Greater London, UK Hybrid / WFH Options
Trust In SODA
systems. Ideally they would also have good knowledge of: Containerisation (Kubernetes) Relational Databases (PostgreSQL, SQL) Data Warehousing (Snowflake, RDS) Cloud (AWS) IaC (Terraform) Monitoring (Datadog) In return they would be offering: An employee equity incentive scheme Flexible/Remote working 25 days’ holiday (+bday off + option to buy or More ❯
organization such as SLOs/SLIs and TOIL measurement Implement best practices for building successful monitoring and alerting systems. Experience with Observability platforms like Datadog and open telemetry is desired. You will work closely with engineering/development teams to design, build, and maintain systems and help them decide on More ❯
City of London, Greater London, UK Hybrid / WFH Options
Cpl
goes beyond traditional SRE – you’ll champion best practices across product teams, drive observability strategy, and work hands-on with cutting-edge tools like Datadog and AWS. Key Responsibilities: Lead the SRE function and promote observability-first thinking across development and operations teams. Define and implement the observability roadmap across … product domains in collaboration with the client. Be hands-on with Datadog for infrastructure and application-level monitoring. Guide and review daily operations and improvements across observability platforms. Partner with engineering squads to deliver on observability requirements in an agile, demand-led way. Core Skills & Experience: Proven experience as a … hands-on SRE Engineer. Deep understanding of observability and monitoring practices. Practical experience with Datadog (or similar observability platforms). Strong DevOps toolchain knowledge: GitHub, GitHub Actions, Jenkins, CodeQL, Nexus, CloudFormation, Terraform. Solid cloud engineering skills, especially with AWS (EC2, ELB, ECS, S3, CloudTrail, Config, Lambda, VPC, EFS). Desirable More ❯
with our product. W e are backed by Tier 1 VCs and Angel Investors, including Index Ventures , co-founders of Hugging Face, CEO of Datadog, and product experts from DeepMind and OpenAI. We are now growing the team to reach our ambitions. We hire the best and reward accordingly. Compensation More ❯