pipelines (GitHub Actions, GitLab CI, Azure DevOps, Jenkins) Experience withconfiguration managementtools such asChef/Puppet Strong proficiency in scripting/programming (Python, Go, or similar) Experience with observability platforms (Datadog, New Relic, Prometheus/Grafana) Knowledge of microservices architecture and service mesh technologies Understanding of security best practices and compliance frameworks Comfortable with asynchronous collaboration tools (Slack, Teams) Agile mindset More ❯
testing, and incident management. Hands on experience with Databricks , MLflow , or similar ML/ETL platforms is a plus. Bonus: Experience with container orchestration (Kubernetes) and observability tools like Datadog, Prometheus, or Grafana. Passion for building tools and platforms that empower teams and improve developer velocity. Excitement, passion and curiosity about our mission of connecting the world's health data More ❯
Operations: Manage and optimize cloud environments (AWS, Azure, GCP), ensuring high availability and cost efficiency. Monitoring & Observability: Implement and maintain monitoring, logging, and alerting systems (e.g., Prometheus, Grafana, ELK, Datadog). Security & Compliance: Enforce security best practices and ensure compliance with industry standards (e.g., SOC 2). Mentorship: Provide technical leadership and mentorship to DevOps engineers and other team members. More ❯
Kubernetes) Solid understanding of infrastructure-as-code (e.g., Terraform, Ansible) Strong knowledge of Linux systems, networking, and systems performance tuning Experience with monitoring and observability stacks (e.g., Prometheus, Grafana, Datadog, ELK, OpenTelemetry) Proficiency with CI/CD tools and pipelines (e.g., GitHub Actions, ArgoCD, etc.) Ability to debug complex systems and automate solutions in scripting languages (Python, Bash, etc.) Excellent More ❯
Kubernetes, Docker Knowledge of networking fundamentals (TCP/IP, DNS, load balancing Proficiency in Linux/Unix administration, scripting (Python, Bash, or similar Experience with monitoring tools (Prometheus, Grafana, DataDog Familiarity with containerization (Docker, Kubernetes) and cloud services. Experience with CI/CD systems (Jenkins, GitHub Actions, GitLab CI Strong analytical and problem-solving skills. Knowledge of security practices (IAM More ❯
analytics and anomaly detection systems using advanced machine learning techniques and large language models Design cloud-native microservices and APIs that integrate seamlessly with major observability platforms (Splunk, Elastic, Datadog, New Relic) Establish robust monitoring, alerting, and observability solutions for distributed systems operating at enterprise scale Lead cross-functional technical initiatives, collaborating with Product, Data Science, and DevOps teams to More ❯
Edinburgh, Midlothian, United Kingdom Hybrid/Remote Options
Aberdeen
tech talks to share knowledge and promote adoption of tools and practices. About the Candidate The ideal candidate will possess the following: Experience with observability tools (eg, Grafana, Prometheus, Datadog). Background in DevOps, SRE, or platform engineering with a security first mindset. Strong programming skills in languages such as .Net, JavaScript, Python or similar. Experience with CI/CD More ❯
YAML, JSON Build Tools: Maven, Gradle, NPM, Bazel, Go Databases: RDS, SQL, MySQL, Postgres, RedShift, MongoDB, DynamoDB Security Scans: SAST, Secrets, Container, DAST, Xray, Prisma Cloud Logging and Monitoring: DataDog, Splunk, App Dynamics, ELK, Grafana About PROLIM Corporation PROLIM is a leading provider of end-to-end IT, PLM and Engineering Services and Solutions for Global 1000 companies. They understand More ❯
Atlanta, Georgia, United States Hybrid/Remote Options
Qgenda
applications in AWS using services such as Lambda, EC2, ECS, S3, SNS, SQS, RDS, Redshift, and Elasticache Experience with logging, creating dashboards, and alerts using observability tools such as Datadog and Amazon CloudWatch Strong understanding of networking and DNS Familiarity with configuration management and infrastructure as code (IaC) tools such as Terraform Firm understanding and experience with Agile and Scrum More ❯
Denver, Colorado, United States Hybrid/Remote Options
Cleerly
in AWS security, encryption, and backup practices, including compliance with frameworks such as SOC 2, HIPAA, and HITRUST. Manage monitoring and log analysis using tools like CloudWatch, CloudTrail, GuardDuty, Datadog, and Sentry. Collaborate with application teams to gather requirements and deliver secure, scalable migration paths using AWS services like CloudFront, ECS, EC2, EKS, ElastiCache, Aurora, DynamoDB, SQS, SNS, Step Functions More ❯
e.g., AWS, Azure, Google Cloud) and containerization technologies (e.g., Docker, Kubernetes). Familiarity with data processing frameworks (e.g., Apache Kafka, Apache Spark) and IT monitoring tools (e.g., Prometheus, Grafana, Datadog, Splunk). Deep understanding of distributed systems architecture, microservices, and their operational challenges. Demonstrated ability to translate business requirements and operational pain points into technical specifications and deliver robust AIOps More ❯
london, south east england, united kingdom Hybrid/Remote Options
Mott MacDonald
region deployment. Strong proficiency and current experience in React, Typescript, Python and database systems (SQL + NoSQL). Experience with performance monitoring and logging tools, including CloudWatch, Sentry, or DataDog, to ensure application stability, performance optimisation, and effective issue resolution Experience managing or mentoring engineering teams, including cross-functional collaboration. Understanding of secure architecture, API design, and performance optimisation. Experience More ❯
/CD pipelines (e.g., Jenkins, TeamCity, Concourse). Familiarity with web/application servers such as NGINX, Apache, or JBoss. Exposure to monitoring and logging tools (ELK, Nagios, Splunk, DataDog, New Relic, etc.). Understanding of security and identity management (OAuth2, SSO, ADFS, Keycloak, etc.). Experience with version control systems (Git, Bitbucket, Subversion). Working knowledge of database technologies More ❯
design (REST, GraphQL) Experience with containerization (Docker, Kubernetes) and cloud-native development patterns DevOps & SRE Practices Experience implementing CI/CD pipelines and DevOps methodologies Knowledge of infrastructure monitoring (Datadog), log aggregation, and incident management Understanding of SLO/SLA definition and observability best practices Strategic & Business Acumen Ability to align technical initiatives with business objectives and articulate ROI Experience More ❯
KPIs (observability, alerting, SLAs) Hands on experience with CI/CD, containerization and orchestration tools (Docker, Kubernetes ) Knowledge of monitoring, logging, alerting and observability tools (Prometheus, Grafana, ELK Stack, Datadog ) Familiarity with infrastructure as code tools like Terraform or CloudFormation Proficiency in scripting languages (Python, Go, Bash ) and knowledge of software development best practices Strong understanding of networking, security, and More ❯
Strong programming skills in Python and SQL. Hands-on experience with containerization (Docker), orchestration (Kubernetes), and virtualization. Familiarity with CI/CD tools and observability platforms (e.g., Prometheus, Grafana, Datadog). Knowledge of system security practices, identity/access management, and data encryption. Experience with regulatory compliance frameworks (HIPPA, SOC 2). Certifications such as AWS Certified Solutions Architect or More ❯
complex issues to senior stakeholders and technical teams. Implementation of highly available and reliable systems, using multi-AZ and multiregional approaches Expertise with monitoring and observability tools (e.g. SolarWinds, Datadog, Azure/AWS native tools) Expertise with SLI/SLO management tools such as (ServiceNow) Expertise with Incident ticketing and change management systems such as (ServiceNow, Ivanti) Expertise with automated More ❯
platforms) that supports the different platform services. Develop comprehensive monitoring solutions to provide full visibility into different platform components using tools and services such as Kubernetes, Prometheus, Grafana, ELK, Datadog, New Relic and other similar tools. Identify and troubleshoot any bottlenecks, availability and performance issues at multiple layers of deployment, from hardware, operating environment, software, network, and application. Evaluate performance More ❯
in the knowledge of programming languages, relational databases, and NoSQL databases Experience building infrastructure as code using AWS CloudFormation or similar scripting techniques Familiarity with monitoring tool suites like DataDog, SumoLogic, NewRelic, and Nagios Strong practical Linux based systems administration skills and scripting experience in a Cloud based environment Experience with Agile Scrum practice a plus Desired Skills More ❯
have a bias towards secure solutions that follow engineering best practice. Technologies: Python, Aurora PostgreSQL, AWS infrastructure (EC2, S3, RDS, Redshift, etc.), Kubernetes, Docker, Terraform, CICD, observability tooling (e.g., Datadog, Prometheus, SumoLogic), OpenSearch, and Linux This is a remote, US-based role. Responsibilities: Construct infrastructure as code. Develop and enforce best practice across configurations while preventing drift between Terraform configurations More ❯
/CD tooling such as Jenkins, Github, Bitbucket, ArgoCD, Artifactory, Bitbucket, Azure DevOps in a large-scale environment 3+ Years experience managing observability tooling such as Grafana, Prometheus, Splunk, Datadog, New Relic, DynaTrace, Sentry, etc. in a large-scale environment Advanced understanding of YAML, JSON, HTML, XML. 2+ years of work experience supporting relational and non-relational databases MySQL, MongoDB More ❯
Bash Experience with containerization (i.e. Docker) or IaC (e.g. Terraform, Helm, CloudFormation) An eagerness to follow modern engineering practices and learn from others Familiarity with observability tools such as DataDog, New Relic, Grafana, Prometheus, ELK, or OpenTelemetry Understanding of core networking concepts (DNS, HTTP/S, Load Balancing, etc.) A collaborative mindset with clear communication skills Willing to ask questions More ❯
Knowledge of containerization technologies (e.g., Docker, Kubernetes).- good to have. Strong understanding of networking, security, and system administration. 3-5 yrs) Familiarity with monitoring toolssuch as DynaTrace/Datadog/Splunk Familiarity with Agile development methodologies. Soft Skills: Excellent problem-solving and analytical skills. Strong communication and teamwork abilities. Ability to work independently Regards, Manoj Derex Technologies INC Contact More ❯
on experience with AWS services (EC2, RDS, EKS, VPC, IAM) Strong experience in infrastructure automation using Terraform or CloudFormation Proven experience managing and optimizing Expertise in monitoring tools (CloudWatch, Datadog or equivalent) Experience with containerization (Docker/Kubernetes) Knowledge of networking concepts and security best practices Experience handling production incidents and on-call responsibilities Strong scripting skills (Bash, Shell script More ❯
and integration (preferably using Go but not essential). Knowledge of OpenShift Containerisation, RHEL 6,7,8, Docker and Kubernetes. Experience with monitoring systems e.g., ELK, Nagios, New Relic, DataDog, Splunk etc. Working knowledge of digital delivery processes and methodologies. Working knowledge of Atlassian Toolset. Knowledge of Javascript frontend frameworks. Understanding of front-end technologies, such as HTML5, and CSS3. More ❯