languages such as Python, Bash, or PowerShell for automation, environment configuration, and system management. Monitoring and Logging Tools: Experience with monitoring and logging tools like Prometheus, Grafana, AWS CloudWatch, Datadog, ELK Stack, or Splunk. Security Best Practices: Strong understanding of security concepts, including encryption, access control, vulnerability management, and compliance standards (e.g., NIST, FISMA, or CIS). Problem-Solving Skills More ❯
pipelines (GitHub Actions, GitLab CI, Azure DevOps, Jenkins) Experience withconfiguration managementtools such asChef/Puppet Strong proficiency in scripting/programming (Python, Go, or similar) Experience with observability platforms (Datadog, New Relic, Prometheus/Grafana) Knowledge of microservices architecture and service mesh technologies Understanding of security best practices and compliance frameworks Comfortable with asynchronous collaboration tools (Slack, Teams) Agile mindset More ❯
Azure) and related services (e.g., EC2, S3, Lambda, Kubernetes). Experience with containerization and orchestration technologies like Docker and Kubernetes. Proficiency with monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, Dynatrace, ELK Stack). Strong understanding of networking fundamentals (DNS, HTTP, TCP/IP), load balancing, and CDNs. Experience with CI/CD tools (e.g., Jenkins, GitLab CI, CircleCI) and More ❯
an infrastructure/DevOps perspective. Experience working with relational and NoSQL databases, including PostgreSQL, RDBMS platforms, and DynamoDB. Proficiency with observability and monitoring tools such as Splunk, New Relic, Datadog, AWS DevOps Guru, and AWS Forecast. Ability to monitor, manage, and optimize system resources and performance. Strong problem-solving, analytical, and communication skills with the ability to work independently and More ❯
testing, and incident management. Hands on experience with Databricks , MLflow , or similar ML/ETL platforms is a plus. Bonus: Experience with container orchestration (Kubernetes) and observability tools like Datadog, Prometheus, or Grafana. Passion for building tools and platforms that empower teams and improve developer velocity. Excitement, passion and curiosity about our mission of connecting the world's health data More ❯
or incident response. Knowledge of networking fundamentals and APIs. Excellent problem-solving and communication skills. Nice to Have Experience with containerization (Docker, Kubernetes). Exposure to monitoring tools (Grafana, Datadog). Cloud certifications or security accreditations. Understanding of Agile methodologies. Interest in automation, security testing, or threat detection. To find out more about Computer Futures please visit www.computerfutures.com Computer Futures More ❯
or incident response. Knowledge of networking fundamentals and APIs. Excellent problem-solving and communication skills. Nice to Have Experience with containerization (Docker, Kubernetes). Exposure to monitoring tools (Grafana, Datadog). Cloud certifications or security accreditations. Understanding of Agile methodologies. Interest in automation, security testing, or threat detection. To find out more about Computer Futures please visit (url removed) Computer More ❯
in containerization: Strong skills with Docker and Kubernetes Hands-on experience with IaC: Terraform is a plus Excellent scripting skills: Python or Bash Experience with monitoring tools: Prometheus, Grafana, Datadog, or similar A proactive, problem-solving mindset and a passion for building scalable, reliable systems Nice to haves: MLOps: Automate the end-to-end ML lifecycle, including model versioning, training More ❯
understanding of Linux/Unix systems, networking protocols, certificate management, secret management, system design, cloud platforms (AWS, Azure, GCP), and containerization (Kubernetes, Docker • Proficiency with monitoring tools (Prometheus, Grafana, Datadog, etc.), logging systems (ELK stack, Splunk), and tracing tools (Jaeger, Zipkin). • Proficiency in infrastructure-as-code tools such as Terraform and Ansible. • Hands-on experience with CI/CD More ❯
Operations: Manage and optimize cloud environments (AWS, Azure, GCP), ensuring high availability and cost efficiency. Monitoring & Observability: Implement and maintain monitoring, logging, and alerting systems (e.g., Prometheus, Grafana, ELK, Datadog). Security & Compliance: Enforce security best practices and ensure compliance with industry standards (e.g., SOC 2). Mentorship: Provide technical leadership and mentorship to DevOps engineers and other team members. More ❯
Kubernetes) Solid understanding of infrastructure-as-code (e.g., Terraform, Ansible) Strong knowledge of Linux systems, networking, and systems performance tuning Experience with monitoring and observability stacks (e.g., Prometheus, Grafana, Datadog, ELK, OpenTelemetry) Proficiency with CI/CD tools and pipelines (e.g., GitHub Actions, ArgoCD, etc.) Ability to debug complex systems and automate solutions in scripting languages (Python, Bash, etc.) Excellent More ❯
Kubernetes, Docker Knowledge of networking fundamentals (TCP/IP, DNS, load balancing Proficiency in Linux/Unix administration, scripting (Python, Bash, or similar Experience with monitoring tools (Prometheus, Grafana, DataDog Familiarity with containerization (Docker, Kubernetes) and cloud services. Experience with CI/CD systems (Jenkins, GitHub Actions, GitLab CI Strong analytical and problem-solving skills. Knowledge of security practices (IAM More ❯
GitHub Actions, CircleCI, etc.) and infrastructure as code (Terraform, CloudFormation). Proficiency in Docker and container orchestration (ECS or Kubernetes). Familiarity with monitoring, alerting, and logging (Prometheus, Grafana, Datadog, etc.). Experience securing systems and managing secrets, permissions, and network policies. Strong communication skills and comfort working remotely in a fast-moving, product-focused environment. Prior experience in high More ❯
analytics and anomaly detection systems using advanced machine learning techniques and large language models Architect cloud-native microservices and APIs that integrate seamlessly with major observability platforms (Splunk, Elastic, Datadog, New Relic) Implement robust monitoring, alerting, and observability solutions for distributed systems operating at enterprise scale Collaborate with Product and DevOps teams to translate customer requirements into technical solutions Optimize More ❯
analytics and anomaly detection systems using advanced machine learning techniques and large language models Design cloud-native microservices and APIs that integrate seamlessly with major observability platforms (Splunk, Elastic, Datadog, New Relic) Establish robust monitoring, alerting, and observability solutions for distributed systems operating at enterprise scale Lead cross-functional technical initiatives, collaborating with Product, Data Science, and DevOps teams to More ❯
Edinburgh, Midlothian, United Kingdom Hybrid/Remote Options
Aberdeen
tech talks to share knowledge and promote adoption of tools and practices. About the Candidate The ideal candidate will possess the following: Experience with observability tools (eg, Grafana, Prometheus, Datadog). Background in DevOps, SRE, or platform engineering with a security first mindset. Strong programming skills in languages such as .Net, JavaScript, Python or similar. Experience with CI/CD More ❯
Terraform, Pulumi, or similar tools. Collaborate with engineering teams to improve deployment workflows, observability, and performance monitoring. Set up and manage logging, alerting, and monitoring frameworks (Prometheus, Grafana, ELK, Datadog, etc.). Champion security best practices, including secrets management and vulnerability assessments. Drive automation across environments to reduce manual effort and increase reliability. What We're Looking For 4+ years More ❯
San Francisco, California, United States Hybrid/Remote Options
Lambda
compliance and improve efficiency and productivity Participate in on-call rotations and provide support for incident response and resolution Implement and integrate logging and metrics across platforms such as Datadog, Prometheus, OpenTelemetry, Grafana, SumoLogic, etc You 7+ years of experience in Site Reliability Engineering, DevOps, or a similar role Strong understanding of modern AI infrastructure, from GPU architectures to hardware More ❯
Coaching/coordination without authority: can unblock others with hands-on guidance. Understanding of system administration in Linux (and possibly Windows) environments Proficiency with monitoring and observability tools (e.g., Datadog, PagerDuty, CloudWatch) Proficiency with Bash and Python Proficiency with infrastructure-as-code (e.g., Terraform, Cloudformation) Experience with Version Control Software (Git preferred) Experience implementing CI/CD (e.g., GitHub Actions More ❯
complex technical ideas to diverse audiences and influence technical decisions. Nice-to-Have Experience implementing monorepo and IDP using Backstage. A strong background in observability tools (e.g., Prometheus, Grafana, Datadog, OpenTelemetry). Experience with Service Mesh technologies (e.g., Istio, Linkerd). Contributions to open-source developer tools or platform technologies. A reasonable estimate of the base salary compensation range is More ❯
Atlanta, Georgia, United States Hybrid/Remote Options
Qgenda
applications in AWS using services such as Lambda, EC2, ECS, S3, SNS, SQS, RDS, Redshift, and Elasticache Experience with logging, creating dashboards, and alerts using observability tools such as Datadog and Amazon CloudWatch Strong understanding of networking and DNS Familiarity with configuration management and infrastructure as code (IaC) tools such as Terraform Firm understanding and experience with Agile and Scrum More ❯
Site Reliability Engineering (SRE) principles, including SLOs, error budgets, toil reduction, and blameless culture. Expertise in designing, implementing, and managing observability platforms for cloud-native environments (e.g., Prometheus, Grafana, Datadog, ELK stack, OpenTelemetry, Jaeger). Proficiency in at least one programming/scripting language (e.g., Python, Go) for automation and tool development. Extensive hands-on experience with cloud platforms (AWS More ❯
Cambridge, Cambridgeshire, England, United Kingdom
Computer Futures
security principles , threat detection, or incident response. Strong problem-solving skills and willingness to learn. Nice to Have Exposure to containerization (Docker, Kubernetes). Knowledge of monitoring tools (Grafana, Datadog). Experience with SIEM/SOC tools or security automation. Cloud certifications or security training (AWS, GCP, Azure, or similar). To find out more about Computer Futures please visit More ❯
experience you';ll bring: 5+ years of experience in SRE, Observability, or DevOps roles, with leadership responsibilities. Proven expertise with Application Performance Monitoring (APM) tools such as New Relic, Datadog, AppDynamics, or Dynatrace. Hands-on experience with OpenTelemetry (OTel) for distributed tracing and observability instrumentation. Strong proficiency in Infrastructure as Code (IaC) using Terraform. Solid understanding of cloud platforms including More ❯
code tools (Terraform, CloudFormation) and container orchestration platforms (Docker, ECS/Fargate) 2+ years of hands-on experience with Linux system administration, shell scripting, and monitoring tools (New Relic, Datadog) 1+ years of hands-on experience with AI coding assistants, AI-powered development workflows, and agentic automation solutions (Cursor, GitHub Copilot, Claude Code, etc.) Experience with CI/CD pipeline More ❯