Remote Datadog Job Vacancies

1 to 25 of 268 Remote Datadog Jobs

Fullstack Engineer Observability SRE Remote

Baltimore, Maryland, United States
Hybrid / WFH Options
Archesys Inc
Ability to work independently and collaboratively in a fast-paced, dynamic environment. Nice to Have: AWS Certifications (e.g., Solutions Architect, DevOps Engineer). Experience with other observability tools (e.g., Datadog, New Relic, OpenTelemetry). Knowledge of distributed tracing concepts and tools (e.g., Jaeger, Tempo). Experience with machine learning for anomaly detection in time-series data. Contributions to open-source More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:

DevOps/Site Reliability Engineer, Junior/Mid/Senior (m/f/ )

United Kingdom
Hybrid / WFH Options
Crane Venture Partners
Who we are We are a London tech startup on the lookout for bright, motivated and self-driven individuals to join the team. Who you are You are a DevOps/Site Reliability Engineer with experience managing complex infrastructure and More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer III

Chicago, Illinois, United States
Hybrid / WFH Options
Ahold Delhaize
and track service level objectives (SLOs) and service level indicators (SLIs). Build and manage microservices-based platforms leveraging Spring Boot, Java, Tomcat, and Redis. Monitor production environments using Datadog and proactively address performance and reliability issues. Perform root cause analysis and lead post-incident reviews to drive continual improvement. Manage CI/CD pipelines and deployment automation using GitHub … or Go. Proven experience with Spring Boot, Tomcat, Redis, and microservices architecture. Hands-on experience in managing Linux environments, particularly Ubuntu. Proficiency with observability stacks and performance monitoring using Datadog, Prometheus, and ELK. Deep understanding of containerization and orchestration using Docker, Kubernetes, and ArgoCD. Experience managing event-driven systems using Kafka. Expertise in IaC and automation using Terraform and GitHub More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:

Senior DevOps Engineer

Cambridge, Cambridgeshire, United Kingdom
Hybrid / WFH Options
Arm Limited
/green & canary releases, and automated rollbacks. Proficiency with Docker, Kubernetes, and related cloud-native orchestration patterns. Proven track record building dashboards and visualizations across platforms such as Grafana, Datadog, and AWS. Experience with instrumentation tools like Prometheus and managing time-series stores such as Graphite and VictoriaMetrics. Solid understanding of networking, security, and compliance in cloud environments. Excellent written More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer

Southampton, Hampshire, South East, United Kingdom
Hybrid / WFH Options
Spectrum It Recruitment Limited
Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration management platforms like Ansible, Puppet, or Chef Professional certifications in cloud DevOps, such as AWS Certified DevOps Engineer or Google Cloud Professional DevOps Engineer More ❯
Employment Type: Permanent, Work From Home
Posted:

Sr. Site Reliability Engineer, Product Reliability Engineering - Middleware

Austin, Texas, United States
Hybrid / WFH Options
Visa
Understanding of Linux/Unix systems, networking, cloud platforms (AWS, Azure, GCP), containerization (Kubernetes, Docker), and infrastructure-as-code tools (Terraform, Ansible). Proficiency with monitoring tools (Prometheus, Grafana, Datadog, etc.), logging systems (ELK stack, Splunk), and tracing tools (Jaeger, Zipkin). Proven track record of automating complex tasks and processes to improve efficiency and reliability using Python, Go, Java More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:

Senior Site Reliability Engineer

California, United States
Hybrid / WFH Options
ZEFR
Mesh: Istio CI/CD & Automation: CI/CD Pipelines: GitHub Actions GitOps/Continuous Delivery: Argo CD Primary Scripting/Automation Language: Python Observability & Monitoring: Monitoring & Alerting: Prometheus, Datadog, Pagerduty Telemetry Standards: OpenTelemetry Application & Data Ecosystem (Supporting): Application Languages/Frameworks: Python, FastAPI, Flask, Node.js, React Data Streaming: Apache Kafka Data Processing/Transformation: Pandas, DBT Workflow Orchestration: Apache … Knowledge of IaC and configuration management tools (Terraform, OpenTofu, Crossplane, Pulumi, Ansible, CloudFormation) Strong problem-solving experience, focusing on automation Production experience with Monitoring and Observability tools (Prometheus, Grafana, Datadog, Thanos, New Relic, Open Telemetry) Understanding of Cloud Networking concepts (Mesh Networking, NAT, Load Balancers, SSL Certificates and TLS termination, API Gateways, proxies, etc) Strong written and verbal communication, organization More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:

Platform Site Reliability Engineer

Boston, Massachusetts, United States
Hybrid / WFH Options
Nexthink
topologies (VPC, Virtual Subnets, NACLS, NSG, ILB, ELB, etc.) and storage (S3, EBS, Azure Files etc). Monitor system health, application performance, and user-facing SLAs using tools like Datadog, Prometheus, Grafana Be a main actor and improve incident response practices and help reduce mean time to detect (MTTD) and recover (MTTR). Experience in coordinating teams and persons to … programming or scripting skills (Python, Go, Bash ). Experience with CI/CD pipelines (e.g., GitHub Actions, GitLab CI, ArgoCD). Experience with observability stacks (Prometheus, ELK/EFK, Datadog, etc.). Comfort with being part of a rotating on-call schedule , including handling critical incidents and conducting post-incident reviews. Strong system-level troubleshooting skills and a proactive mindset More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:

Platform Site Reliability Engineer

Colorado Springs, Colorado, United States
Hybrid / WFH Options
Nexthink
topologies (VPC, Virtual Subnets, NACLS, NSG, ILB, ELB, etc.) and storage (S3, EBS, Azure Files etc). Monitor system health, application performance, and user-facing SLAs using tools like Datadog, Prometheus, Grafana Be a main actor and improve incident response practices and help reduce mean time to detect (MTTD) and recover (MTTR). Experience in coordinating teams and persons to … programming or scripting skills (Python, Go, Bash ). Experience with CI/CD pipelines (e.g., GitHub Actions, GitLab CI, ArgoCD). Experience with observability stacks (Prometheus, ELK/EFK, Datadog, etc.). Comfort with being part of a rotating on-call schedule , including handling critical incidents and conducting post-incident reviews. Strong system-level troubleshooting skills and a proactive mindset More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:

Site Reliability Engineer (SRE)

Atlanta, Georgia, United States
Hybrid / WFH Options
Zencon Group
best practices. Preferred Qualifications: AWS certifications (e.g., AWS Certified DevOps Engineer, Solutions Architect ) Experience in hybrid cloud environments or enterprise-scale distributed systems Familiarity with other observability tools like Datadog, Prometheus, or Grafana Experience with incident management and SRE metrics (SLIs, SLOs, error budgets More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:

Senior Site Reliability Engineer

England, United Kingdom
Hybrid / WFH Options
Stratospherec Limited
such as Azure, AWS or GCP Proficiency using Infrastructure as Code (IaC) tools such as Terraform (preferred), Ansible, or CloudFormation. Experience with monitoring, observability and logging tools such as DataDog, Prometheus, Grafana, or similar. Proven track record of maintaining highly-available and performant production environments. Ability to identify and implement effective mitigation strategies and operational playbooks. Useful/Bonus Skills More ❯
Posted:

Senior Site Reliability Engineer

London, England, United Kingdom
Hybrid / WFH Options
Stratospherec Limited
such as Azure, AWS or GCP Proficiency using Infrastructure as Code (IaC) tools such as Terraform (preferred), Ansible, or CloudFormation. Experience with monitoring, observability and logging tools such as DataDog, Prometheus, Grafana, or similar. Proven track record of maintaining highly-available and performant production environments. Ability to identify and implement effective mitigation strategies and operational playbooks. Useful/Bonus Skills More ❯
Posted:

Senior Site Reliability Engineer

San Diego, California, United States
Hybrid / WFH Options
Sony Interactive Entertainment
Control Nice to have Experience with hosting and CDN technologies like Akamai and Cloudflare Experience with Cyber Security, threat detection and mitigation with Akamai Monitoring and Alerting solutions including Datadog, Prometheus and Grafana Logging and log aggregation solutions like Splunk, ElasticSearch and AWS CloudWatch Logs Tracing & debugging on various level including container, network, storage, compute Certifications in Linux, AWS, Docker More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:

Site Reliability Engineer - Observability

Bessemer, Alabama, United States
Hybrid / WFH Options
Regions Bank
technologies: Prior experience supporting hybrid environments with one or more Cloud providers (AWS, Azure) Observability: Prior experience implementing one or more Commercial Observability/APM solutions (Dynatrace, New Relic, Datadog, AppDynamics, Honeycomb) Monitoring and Logging: Solid familiarity with Splunk, Elastic, OpenSearch, Prometheus, Grafana Implementing Site Reliability Engineering (SRE) principles SLO/SLI Experience troubleshooting and resolving issues with critical business More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:

Site Reliability Engineer - Observability

Birmingham, Alabama, United States
Hybrid / WFH Options
Regions Bank
technologies: Prior experience supporting hybrid environments with one or more Cloud providers (AWS, Azure) Observability: Prior experience implementing one or more Commercial Observability/APM solutions ( Dynatrace , New Relic, Datadog, AppDynamics, Honeycomb) Monitoring and Logging: Solid familiarity with Splunk, Elastic, OpenSearch, Prometheus, Grafana Implementing Site Reliability Engineering (SRE) principles SLO/SLI Experience troubleshooting and resolving issues with critical business More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:

Site Reliability Engineer - Observability

Salt Lake City, Utah, United States
Hybrid / WFH Options
Regions Bank
technologies: Prior experience supporting hybrid environments with one or more Cloud providers (AWS, Azure) Observability: Prior experience implementing one or more Commercial Observability/APM solutions ( Dynatrace , New Relic, Datadog, AppDynamics, Honeycomb) Monitoring and Logging: Solid familiarity with Splunk, Elastic, OpenSearch, Prometheus, Grafana Implementing Site Reliability Engineering (SRE) principles SLO/SLI Experience troubleshooting and resolving issues with critical business More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:

Site Reliability Engineer - Observability

Nashville, Tennessee, United States
Hybrid / WFH Options
Regions Bank
technologies: Prior experience supporting hybrid environments with one or more Cloud providers (AWS, Azure) Observability: Prior experience implementing one or more Commercial Observability/APM solutions ( Dynatrace , New Relic, Datadog, AppDynamics, Honeycomb) Monitoring and Logging: Solid familiarity with Splunk, Elastic, OpenSearch, Prometheus, Grafana Implementing Site Reliability Engineering (SRE) principles SLO/SLI Experience troubleshooting and resolving issues with critical business More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:

Site Reliability Engineer - Observability

Dallas, Texas, United States
Hybrid / WFH Options
Regions Bank
technologies: Prior experience supporting hybrid environments with one or more Cloud providers (AWS, Azure) Observability: Prior experience implementing one or more Commercial Observability/APM solutions ( Dynatrace , New Relic, Datadog, AppDynamics, Honeycomb) Monitoring and Logging: Solid familiarity with Splunk, Elastic, OpenSearch, Prometheus, Grafana Implementing Site Reliability Engineering (SRE) principles SLO/SLI Experience troubleshooting and resolving issues with critical business More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:

Site Reliability Engineer - Observability

Houston, Texas, United States
Hybrid / WFH Options
Regions Bank
technologies: Prior experience supporting hybrid environments with one or more Cloud providers (AWS, Azure) Observability: Prior experience implementing one or more Commercial Observability/APM solutions ( Dynatrace , New Relic, Datadog, AppDynamics, Honeycomb) Monitoring and Logging: Solid familiarity with Splunk, Elastic, OpenSearch, Prometheus, Grafana Implementing Site Reliability Engineering (SRE) principles SLO/SLI Experience troubleshooting and resolving issues with critical business More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:

Site Reliability Engineer - Observability

Denver, Colorado, United States
Hybrid / WFH Options
Regions Bank
technologies: Prior experience supporting hybrid environments with one or more Cloud providers (AWS, Azure) Observability: Prior experience implementing one or more Commercial Observability/APM solutions ( Dynatrace , New Relic, Datadog, AppDynamics, Honeycomb) Monitoring and Logging: Solid familiarity with Splunk, Elastic, OpenSearch, Prometheus, Grafana Implementing Site Reliability Engineering (SRE) principles SLO/SLI Experience troubleshooting and resolving issues with critical business More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:

Site Reliability Engineer - Observability

Tampa, Florida, United States
Hybrid / WFH Options
Regions Bank
technologies: Prior experience supporting hybrid environments with one or more Cloud providers (AWS, Azure) Observability: Prior experience implementing one or more Commercial Observability/APM solutions ( Dynatrace , New Relic, Datadog, AppDynamics, Honeycomb) Monitoring and Logging: Solid familiarity with Splunk, Elastic, OpenSearch, Prometheus, Grafana Implementing Site Reliability Engineering (SRE) principles SLO/SLI Experience troubleshooting and resolving issues with critical business More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:

Site Reliability Engineer - Observability

Atlanta, Georgia, United States
Hybrid / WFH Options
Regions Bank
technologies: Prior experience supporting hybrid environments with one or more Cloud providers (AWS, Azure) Observability: Prior experience implementing one or more Commercial Observability/APM solutions ( Dynatrace , New Relic, Datadog, AppDynamics, Honeycomb) Monitoring and Logging: Solid familiarity with Splunk, Elastic, OpenSearch, Prometheus, Grafana Implementing Site Reliability Engineering (SRE) principles SLO/SLI Experience troubleshooting and resolving issues with critical business More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:

Site Reliability Engineer - Observability

Orlando, Florida, United States
Hybrid / WFH Options
Regions Bank
technologies: Prior experience supporting hybrid environments with one or more Cloud providers (AWS, Azure) Observability: Prior experience implementing one or more Commercial Observability/APM solutions ( Dynatrace , New Relic, Datadog, AppDynamics, Honeycomb) Monitoring and Logging: Solid familiarity with Splunk, Elastic, OpenSearch, Prometheus, Grafana Implementing Site Reliability Engineering (SRE) principles SLO/SLI Experience troubleshooting and resolving issues with critical business More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:

Site Reliability Engineer - Observability

Charlotte, North Carolina, United States
Hybrid / WFH Options
Regions Bank
technologies: Prior experience supporting hybrid environments with one or more Cloud providers (AWS, Azure) Observability: Prior experience implementing one or more Commercial Observability/APM solutions ( Dynatrace , New Relic, Datadog, AppDynamics, Honeycomb) Monitoring and Logging: Solid familiarity with Splunk, Elastic, OpenSearch, Prometheus, Grafana Implementing Site Reliability Engineering (SRE) principles SLO/SLI Experience troubleshooting and resolving issues with critical business More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:

Principal Solutions Architect

London, United Kingdom
Hybrid / WFH Options
Parser Limited
and other relevant tools. Security Best Practices: IAM, MFA, data encryption, firewall configurations. Programming/Scripting: Python, Terraform, or similar languages. Event-Driven Architectures: Kafka. Monitoring and Logging: Datadog, ELK Stack, Prometheus, etc. Experience in agile methodologies and DevOps practices. Location: Hybrid. Office located in London. (Hayes area). Office presence required: Yes. Frequency: 2-3 times a week at More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:
Datadog
10th Percentile
£48,250
25th Percentile
£65,000
Median
£75,000
75th Percentile
£87,500
90th Percentile
£97,500