languages such as Python, Bash, or PowerShell for automation, environment configuration, and system management. Monitoring and Logging Tools: Experience with monitoring and logging tools like Prometheus, Grafana, AWS CloudWatch, Datadog, ELK Stack, or Splunk. Security Best Practices: Strong understanding of security concepts, including encryption, access control, vulnerability management, and compliance standards (e.g., NIST, FISMA, or CIS). Problem-Solving Skills More ❯
pipelines (GitHub Actions, GitLab CI, Azure DevOps, Jenkins) Experience withconfiguration managementtools such asChef/Puppet Strong proficiency in scripting/programming (Python, Go, or similar) Experience with observability platforms (Datadog, New Relic, Prometheus/Grafana) Knowledge of microservices architecture and service mesh technologies Understanding of security best practices and compliance frameworks Comfortable with asynchronous collaboration tools (Slack, Teams) Agile mindset More ❯
Azure) and related services (e.g., EC2, S3, Lambda, Kubernetes). Experience with containerization and orchestration technologies like Docker and Kubernetes. Proficiency with monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, Dynatrace, ELK Stack). Strong understanding of networking fundamentals (DNS, HTTP, TCP/IP), load balancing, and CDNs. Experience with CI/CD tools (e.g., Jenkins, GitLab CI, CircleCI) and More ❯
an infrastructure/DevOps perspective. Experience working with relational and NoSQL databases, including PostgreSQL, RDBMS platforms, and DynamoDB. Proficiency with observability and monitoring tools such as Splunk, New Relic, Datadog, AWS DevOps Guru, and AWS Forecast. Ability to monitor, manage, and optimize system resources and performance. Strong problem-solving, analytical, and communication skills with the ability to work independently and More ❯
testing, and incident management. Hands on experience with Databricks , MLflow , or similar ML/ETL platforms is a plus. Bonus: Experience with container orchestration (Kubernetes) and observability tools like Datadog, Prometheus, or Grafana. Passion for building tools and platforms that empower teams and improve developer velocity. Excitement, passion and curiosity about our mission of connecting the world's health data More ❯
in containerization: Strong skills with Docker and Kubernetes Hands-on experience with IaC: Terraform is a plus Excellent scripting skills: Python or Bash Experience with monitoring tools: Prometheus, Grafana, Datadog, or similar A proactive, problem-solving mindset and a passion for building scalable, reliable systems Nice to haves: MLOps: Automate the end-to-end ML lifecycle, including model versioning, training More ❯
with strong debugging and code optimization skills. Hands-on experience with IaC tools - especially Terraform. Extensive CI/CD pipeline design & management experience. Familiarity with observability platforms (Prometheus, Coralogix, Datadog, etc.). Strong understanding of cloud platforms (AWS, Azure) and containerization (Docker, Kubernetes). Ability to troubleshoot complex issues across the full stack - from code to infrastructure. Excellent communication and More ❯
Kubernetes (EKS), Kafka (MSK), Java, Spring Framework, Python, and AWS services. Strong plus if you are a database wiz. Expertise in monitoring and observability tools like Prometheus, Grafana, Honeycomb, Datadog, Open Telemetry, New Relic, or similar tools to measure system health and performance. Programming and scripting experience in languages such as Python, Go, Bash, or other relevant languages used in More ❯
understanding of Linux/Unix systems, networking protocols, certificate management, secret management, system design, cloud platforms (AWS, Azure, GCP), and containerization (Kubernetes, Docker • Proficiency with monitoring tools (Prometheus, Grafana, Datadog, etc.), logging systems (ELK stack, Splunk), and tracing tools (Jaeger, Zipkin). • Proficiency in infrastructure-as-code tools such as Terraform and Ansible. • Hands-on experience with CI/CD More ❯
Operations: Manage and optimize cloud environments (AWS, Azure, GCP), ensuring high availability and cost efficiency. Monitoring & Observability: Implement and maintain monitoring, logging, and alerting systems (e.g., Prometheus, Grafana, ELK, Datadog). Security & Compliance: Enforce security best practices and ensure compliance with industry standards (e.g., SOC 2). Mentorship: Provide technical leadership and mentorship to DevOps engineers and other team members. More ❯
Kubernetes, Docker Knowledge of networking fundamentals (TCP/IP, DNS, load balancing Proficiency in Linux/Unix administration, scripting (Python, Bash, or similar Experience with monitoring tools (Prometheus, Grafana, DataDog Familiarity with containerization (Docker, Kubernetes) and cloud services. Experience with CI/CD systems (Jenkins, GitHub Actions, GitLab CI Strong analytical and problem-solving skills. Knowledge of security practices (IAM More ❯
analytics and anomaly detection systems using advanced machine learning techniques and large language models Architect cloud-native microservices and APIs that integrate seamlessly with major observability platforms (Splunk, Elastic, Datadog, New Relic) Implement robust monitoring, alerting, and observability solutions for distributed systems operating at enterprise scale Collaborate with Product and DevOps teams to translate customer requirements into technical solutions Optimize More ❯
Edinburgh, Midlothian, United Kingdom Hybrid/Remote Options
Aberdeen
tech talks to share knowledge and promote adoption of tools and practices. About the Candidate The ideal candidate will possess the following: Experience with observability tools (eg, Grafana, Prometheus, Datadog). Background in DevOps, SRE, or platform engineering with a security first mindset. Strong programming skills in languages such as .Net, JavaScript, Python or similar. Experience with CI/CD More ❯
Terraform, Pulumi, or similar tools. Collaborate with engineering teams to improve deployment workflows, observability, and performance monitoring. Set up and manage logging, alerting, and monitoring frameworks (Prometheus, Grafana, ELK, Datadog, etc.). Champion security best practices, including secrets management and vulnerability assessments. Drive automation across environments to reduce manual effort and increase reliability. What We're Looking For 4+ years More ❯
YAML, JSON Build Tools: Maven, Gradle, NPM, Bazel, Go Databases: RDS, SQL, MySQL, Postgres, RedShift, MongoDB, DynamoDB Security Scans: SAST, Secrets, Container, DAST, Xray, Prisma Cloud Logging and Monitoring: DataDog, Splunk, App Dynamics, ELK, Grafana About PROLIM Corporation PROLIM is a leading provider of end-to-end IT, PLM and Engineering Services and Solutions for Global 1000 companies. They understand More ❯
Coaching/coordination without authority: can unblock others with hands-on guidance. Understanding of system administration in Linux (and possibly Windows) environments Proficiency with monitoring and observability tools (e.g., Datadog, PagerDuty, CloudWatch) Proficiency with Bash and Python Proficiency with infrastructure-as-code (e.g., Terraform, Cloudformation) Experience with Version Control Software (Git preferred) Experience implementing CI/CD (e.g., GitHub Actions More ❯
Atlanta, Georgia, United States Hybrid/Remote Options
Qgenda
applications in AWS using services such as Lambda, EC2, ECS, S3, SNS, SQS, RDS, Redshift, and Elasticache Experience with logging, creating dashboards, and alerts using observability tools such as Datadog and Amazon CloudWatch Strong understanding of networking and DNS Familiarity with configuration management and infrastructure as code (IaC) tools such as Terraform Firm understanding and experience with Agile and Scrum More ❯
Site Reliability Engineering (SRE) principles, including SLOs, error budgets, toil reduction, and blameless culture. Expertise in designing, implementing, and managing observability platforms for cloud-native environments (e.g., Prometheus, Grafana, Datadog, ELK stack, OpenTelemetry, Jaeger). Proficiency in at least one programming/scripting language (e.g., Python, Go) for automation and tool development. Extensive hands-on experience with cloud platforms (AWS More ❯
code tools (Terraform, CloudFormation) and container orchestration platforms (Docker, ECS/Fargate) 2+ years of hands-on experience with Linux system administration, shell scripting, and monitoring tools (New Relic, Datadog) 1+ years of hands-on experience with AI coding assistants, AI-powered development workflows, and agentic automation solutions (Cursor, GitHub Copilot, Claude Code, etc.) Experience with CI/CD pipeline More ❯
Denver, Colorado, United States Hybrid/Remote Options
Cleerly
in AWS security, encryption, and backup practices, including compliance with frameworks such as SOC 2, HIPAA, and HITRUST. Manage monitoring and log analysis using tools like CloudWatch, CloudTrail, GuardDuty, Datadog, and Sentry. Collaborate with application teams to gather requirements and deliver secure, scalable migration paths using AWS services like CloudFront, ECS, EC2, EKS, ElastiCache, Aurora, DynamoDB, SQS, SNS, Step Functions More ❯
london, south east england, united kingdom Hybrid/Remote Options
Mott MacDonald
region deployment. Strong proficiency and current experience in React, Typescript, Python and database systems (SQL + NoSQL). Experience with performance monitoring and logging tools, including CloudWatch, Sentry, or DataDog, to ensure application stability, performance optimisation, and effective issue resolution Experience managing or mentoring engineering teams, including cross-functional collaboration. Understanding of secure architecture, API design, and performance optimisation. Experience More ❯
/CD pipelines (e.g., Jenkins, TeamCity, Concourse). Familiarity with web/application servers such as NGINX, Apache, or JBoss. Exposure to monitoring and logging tools (ELK, Nagios, Splunk, DataDog, New Relic, etc.). Understanding of security and identity management (OAuth2, SSO, ADFS, Keycloak, etc.). Experience with version control systems (Git, Bitbucket, Subversion). Working knowledge of database technologies More ❯
design (REST, GraphQL) Experience with containerization (Docker, Kubernetes) and cloud-native development patterns DevOps & SRE Practices Experience implementing CI/CD pipelines and DevOps methodologies Knowledge of infrastructure monitoring (Datadog), log aggregation, and incident management Understanding of SLO/SLA definition and observability best practices Strategic & Business Acumen Ability to align technical initiatives with business objectives and articulate ROI Experience More ❯
solutions via Terraform, Helm, and/or ArgoCD Experience managing CI/CD pipelines (e.g GitHub Actions, CircleCI, CodeFresh) Experience with system observability tools (e.g. ELK Stack, Prometheus, Grafana, Datadog) Excellent communication skills, with the ability to convey complex security topics in layman's terms. Proficiency in scripting and programming languages relevant to security tasks. Preferred Qualifications & Skills Understanding of More ❯
or Google Cloud (we are an AWS shop). Solid understanding of containerization and orchestration technologies such as Docker and Kubernetes. Experience with monitoring and observability tools such as Datadog, Prometheus, Grafana, or similar. Knowledge of infrastructure-as-code (IaC) tools such as Terraform or CloudFormation. Familiarity with security best practices and tools for infrastructure and application security. Excellent problem More ❯