Remote Datadog Job Vacancies

1 to 25 of 398 Remote Datadog Jobs

Fullstack Engineer, Observability & SRE - (Remote)

Baltimore, Maryland, United States
Hybrid / WFH Options
Archesys Inc
Ability to work independently and collaboratively in a fast-paced, dynamic environment. Nice to Have: AWS Certifications (e.g., Solutions Architect, DevOps Engineer). Experience with other observability tools (e.g., Datadog, New Relic, OpenTelemetry). Knowledge of distributed tracing concepts and tools (e.g., Jaeger, Tempo). Experience with machine learning for anomaly detection in time-series data. Contributions to open-source More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:

Site Reliability Engineer

London, England, United Kingdom
Hybrid / WFH Options
Spectrum IT Recruitment
Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration management platforms like Ansible, Puppet, or Chef Professional certifications in cloud DevOps, such as AWS Certified DevOps Engineer or Google Cloud Professional DevOps Engineer More ❯
Posted:

DevOps/Site Reliability Engineer, Junior/Mid/Senior (m/f/ )

United Kingdom
Hybrid / WFH Options
Crane Venture Partners
Who we are We are a London tech startup on the lookout for bright, motivated and self-driven individuals to join the team. Who you are You are a DevOps/Site Reliability Engineer with experience managing complex infrastructure and More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Mid-Senior DevOps / Site Reliability Engineer (m/f/*)

London, England, United Kingdom
Hybrid / WFH Options
Quaisr Limited
DevOps/Site Reliability Engineer, Junior/Mid/Senior (m/f/*) We are a London tech startup on the lookout for bright, motivated and self-driven individuals to join the team. Who you are You are a More ❯
Posted:

Site Reliability Engineer III

Chicago, Illinois, United States
Hybrid / WFH Options
Ahold Delhaize
and track service level objectives (SLOs) and service level indicators (SLIs). Build and manage microservices-based platforms leveraging Spring Boot, Java, Tomcat, and Redis. Monitor production environments using Datadog and proactively address performance and reliability issues. Perform root cause analysis and lead post-incident reviews to drive continual improvement. Manage CI/CD pipelines and deployment automation using GitHub … or Go. Proven experience with Spring Boot, Tomcat, Redis, and microservices architecture. Hands-on experience in managing Linux environments, particularly Ubuntu. Proficiency with observability stacks and performance monitoring using Datadog, Prometheus, and ELK. Deep understanding of containerization and orchestration using Docker, Kubernetes, and ArgoCD. Experience managing event-driven systems using Kafka. Expertise in IaC and automation using Terraform and GitHub More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:

Senior DevOps Engineer

Cambridge, Cambridgeshire, United Kingdom
Hybrid / WFH Options
Arm Limited
/green & canary releases, and automated rollbacks. Proficiency with Docker, Kubernetes, and related cloud-native orchestration patterns. Proven track record building dashboards and visualizations across platforms such as Grafana, Datadog, and AWS. Experience with instrumentation tools like Prometheus and managing time-series stores such as Graphite and VictoriaMetrics. Solid understanding of networking, security, and compliance in cloud environments. Excellent written More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer

Hampshire, UK
Hybrid / WFH Options
Spectrum IT Recruitment
Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration management platforms like Ansible, Puppet, or Chef Professional certifications in cloud DevOps, such as AWS Certified DevOps Engineer or Google Cloud Professional DevOps Engineer More ❯
Posted:

Site Reliability Engineer

Southampton, Hampshire, South East, United Kingdom
Hybrid / WFH Options
Spectrum It Recruitment Limited
Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration management platforms like Ansible, Puppet, or Chef Professional certifications in cloud DevOps, such as AWS Certified DevOps Engineer or Google Cloud Professional DevOps Engineer More ❯
Employment Type: Permanent, Work From Home
Posted:

Site Reliability Engineer

Portsmouth, England, United Kingdom
Hybrid / WFH Options
Spectrum IT Recruitment
Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration management platforms like Ansible, Puppet, or Chef Professional certifications in cloud DevOps, such as AWS Certified DevOps Engineer or Google Cloud Professional DevOps Engineer More ❯
Posted:

Site Reliability Engineer

London, England, United Kingdom
Hybrid / WFH Options
ZipRecruiter
Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration management platforms like Ansible, Puppet, or Chef Professional certifications in cloud DevOps, such as AWS Certified DevOps Engineer or Google Cloud Professional DevOps Engineer More ❯
Posted:

Site Reliability Engineer (SRE) - Weekend Coverage

London, England, United Kingdom
Hybrid / WFH Options
Elwood Technologies
closely with engineering teams to design and deploy scalable, fault-tolerant infrastructure solutions on AWS or GCP . Improve observability by utilizing monitoring, logging, and alerting systems (e.g., CloudWatch , Datadog ). Lead post-incident reviews , contribute to the continuous improvement of system reliability and follow up on strategic fixes. Develop and update runbooks, incident response playbooks, and documentation. Work closely … love it if you have experience of some or all of the following: Experience with client-impact triage , working cross-functionally with account managers or product teams. Proficiency with Datadog or similar observability platforms. Knowledge of serverless architectures (e.g., AWS Lambda, GCP Cloud Functions). Familiarity with RDBMS and NoSQL databases , such as RDS, CloudSQL, DynamoDB. Prior experience in fintech More ❯
Posted:

Senior Site Reliability Engineer

California, United States
Hybrid / WFH Options
ZEFR
Mesh: Istio CI/CD & Automation: CI/CD Pipelines: GitHub Actions GitOps/Continuous Delivery: Argo CD Primary Scripting/Automation Language: Python Observability & Monitoring: Monitoring & Alerting: Prometheus, Datadog, Pagerduty Telemetry Standards: OpenTelemetry Application & Data Ecosystem (Supporting): Application Languages/Frameworks: Python, FastAPI, Flask, Node.js, React Data Streaming: Apache Kafka Data Processing/Transformation: Pandas, DBT Workflow Orchestration: Apache … Knowledge of IaC and configuration management tools (Terraform, OpenTofu, Crossplane, Pulumi, Ansible, CloudFormation) Strong problem-solving experience, focusing on automation Production experience with Monitoring and Observability tools (Prometheus, Grafana, Datadog, Thanos, New Relic, Open Telemetry) Understanding of Cloud Networking concepts (Mesh Networking, NAT, Load Balancers, SSL Certificates and TLS termination, API Gateways, proxies, etc) Strong written and verbal communication, organization More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:

Platform Site Reliability Engineer

Boston, Massachusetts, United States
Hybrid / WFH Options
Nexthink
topologies (VPC, Virtual Subnets, NACLS, NSG, ILB, ELB, etc.) and storage (S3, EBS, Azure Files etc). Monitor system health, application performance, and user-facing SLAs using tools like Datadog, Prometheus, Grafana Be a main actor and improve incident response practices and help reduce mean time to detect (MTTD) and recover (MTTR). Experience in coordinating teams and persons to … programming or scripting skills (Python, Go, Bash ). Experience with CI/CD pipelines (e.g., GitHub Actions, GitLab CI, ArgoCD). Experience with observability stacks (Prometheus, ELK/EFK, Datadog, etc.). Comfort with being part of a rotating on-call schedule , including handling critical incidents and conducting post-incident reviews. Strong system-level troubleshooting skills and a proactive mindset More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:

Platform Site Reliability Engineer

Colorado Springs, Colorado, United States
Hybrid / WFH Options
Nexthink
topologies (VPC, Virtual Subnets, NACLS, NSG, ILB, ELB, etc.) and storage (S3, EBS, Azure Files etc). Monitor system health, application performance, and user-facing SLAs using tools like Datadog, Prometheus, Grafana Be a main actor and improve incident response practices and help reduce mean time to detect (MTTD) and recover (MTTR). Experience in coordinating teams and persons to … programming or scripting skills (Python, Go, Bash ). Experience with CI/CD pipelines (e.g., GitHub Actions, GitLab CI, ArgoCD). Experience with observability stacks (Prometheus, ELK/EFK, Datadog, etc.). Comfort with being part of a rotating on-call schedule , including handling critical incidents and conducting post-incident reviews. Strong system-level troubleshooting skills and a proactive mindset More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:

Site Reliability Engineer (SRE)

Atlanta, Georgia, United States
Hybrid / WFH Options
Zencon Group
best practices. Preferred Qualifications: AWS certifications (e.g., AWS Certified DevOps Engineer, Solutions Architect ) Experience in hybrid cloud environments or enterprise-scale distributed systems Familiarity with other observability tools like Datadog, Prometheus, or Grafana Experience with incident management and SRE metrics (SLIs, SLOs, error budgets More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:

Senior Site Reliability Engineer

England, United Kingdom
Hybrid / WFH Options
Stratospherec Limited
such as Azure, AWS or GCP Proficiency using Infrastructure as Code (IaC) tools such as Terraform (preferred), Ansible, or CloudFormation. Experience with monitoring, observability and logging tools such as DataDog, Prometheus, Grafana, or similar. Proven track record of maintaining highly-available and performant production environments. Ability to identify and implement effective mitigation strategies and operational playbooks. Useful/Bonus Skills More ❯
Posted:

Senior Site Reliability Engineer

London, England, United Kingdom
Hybrid / WFH Options
Stratospherec Limited
such as Azure, AWS or GCP Proficiency using Infrastructure as Code (IaC) tools such as Terraform (preferred), Ansible, or CloudFormation. Experience with monitoring, observability and logging tools such as DataDog, Prometheus, Grafana, or similar. Proven track record of maintaining highly-available and performant production environments. Ability to identify and implement effective mitigation strategies and operational playbooks. Useful/Bonus Skills More ❯
Posted:

Senior Site Reliability Engineer

San Diego, California, United States
Hybrid / WFH Options
PlayStation Global
Control Nice to have Experience with hosting and CDN technologies like Akamai and Cloudflare Experience with Cyber Security, threat detection and mitigation with Akamai Monitoring and Alerting solutions including Datadog, Prometheus and Grafana Logging and log aggregation solutions like Splunk, ElasticSearch and AWS CloudWatch Logs Tracing & debugging on various level including container, network, storage, compute Certifications in Linux, AWS, Docker More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:

Platform Engineer

London, England, United Kingdom
Hybrid / WFH Options
Magentus Group
CI, or similar). Experience with scripting or programming languages (Python, Go, Bash, etc.). Understanding of networking, security principles, and best practices. Knowledge of observability tools such as Datadog, Prometheus, Grafana, etc. Desired Attributes Strong problem-solving skills with a proactive approach to improving systems and processes. Excellent communication and collaboration skills, able to work effectively with cross-functional More ❯
Posted:

Platform Engineer

Manchester, England, United Kingdom
Hybrid / WFH Options
Magentus Group
CI, or similar). Experience with scripting or programming languages (Python, Go, Bash, etc.). Understanding of networking, security principles, and best practices. Knowledge of observability tools such as Datadog, Prometheus, Grafana, etc. Desired Attributes Strong problem-solving skills with a proactive approach to improving systems and processes. Excellent communication and collaboration skills, able to work effectively with cross-functional More ❯
Posted:

Senior Software Engineer

London, England, United Kingdom
Hybrid / WFH Options
Octopus Legacy
Python web frameworks such as Flask or FastAPI. Experience optimising applications for cloud performance, cost-efficiency, and scalability. Hands-on experience with monitoring and logging tools (e.g., AWS CloudWatch, Datadog, ELK stack). An understanding of lean software development principles and practices focused on delivering value quickly. A passion for mentoring and sharing knowledge, contributing to a culture of continuous More ❯
Posted:

Site Reliability Engineer - Observability

Bessemer, Alabama, United States
Hybrid / WFH Options
Regions Bank
technologies: Prior experience supporting hybrid environments with one or more Cloud providers (AWS, Azure) Observability: Prior experience implementing one or more Commercial Observability/APM solutions (Dynatrace, New Relic, Datadog, AppDynamics, Honeycomb) Monitoring and Logging: Solid familiarity with Splunk, Elastic, OpenSearch, Prometheus, Grafana Implementing Site Reliability Engineering (SRE) principles SLO/SLI Experience troubleshooting and resolving issues with critical business More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:

Infrastructure Engineer

London, England, United Kingdom
Hybrid / WFH Options
Keyrock
EKS, K3s, or self-managed). Proficiency in scripting with Python, Bash, or Go. Experience with Infrastructure as Code (Terraform, CloudFormation, Ansible). Familiarity with observability tools (Prometheus, Grafana, Datadog, ELK). Solid understanding of networking (VPC, Load Balancers, DNS, Firewalls). Experience with DevOps, CI/CD, and GitOps practices. Experience with high-performance, low-latency systems. Familiarity with More ❯
Posted:

Site Reliability Engineer - Observability

Birmingham, Alabama, United States
Hybrid / WFH Options
Regions Bank
technologies: Prior experience supporting hybrid environments with one or more Cloud providers (AWS, Azure) Observability: Prior experience implementing one or more Commercial Observability/APM solutions ( Dynatrace , New Relic, Datadog, AppDynamics, Honeycomb) Monitoring and Logging: Solid familiarity with Splunk, Elastic, OpenSearch, Prometheus, Grafana Implementing Site Reliability Engineering (SRE) principles SLO/SLI Experience troubleshooting and resolving issues with critical business More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:

Site Reliability Engineer - Observability

Orlando, Florida, United States
Hybrid / WFH Options
Regions Bank
technologies: Prior experience supporting hybrid environments with one or more Cloud providers (AWS, Azure) Observability: Prior experience implementing one or more Commercial Observability/APM solutions ( Dynatrace , New Relic, Datadog, AppDynamics, Honeycomb) Monitoring and Logging: Solid familiarity with Splunk, Elastic, OpenSearch, Prometheus, Grafana Implementing Site Reliability Engineering (SRE) principles SLO/SLI Experience troubleshooting and resolving issues with critical business More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:
Datadog
10th Percentile
£48,250
25th Percentile
£65,000
Median
£75,000
75th Percentile
£87,500
90th Percentile
£97,500