or large-scale environments. Strong expertise in cloud platforms (AWS, GCP, Azure) and container orchestration tools (Kubernetes, Docker). Deep knowledge of monitoring, logging, and observability tools (e.g., Prometheus, Grafana, Datadog, Splunk). Proficiency in programming or scripting languages (e.g., Python, Go, Bash). Experience with incident management, post-mortems, and implementing preventative measures. Solid understanding of networking, databases, and More ❯
Quality Tools : Docker (OCI Images), GitHub Actions, Gradle, Jenkins (legacy, moving towards GitHub Actions), Maven, SonarCloud Data : Elasticsearch, MongoDB, MySQL, Neo4J IaC : Ansible, Terraform Languages : Java, Python, TypeScript Monitoring : Grafana, Prometheus Misc : Apache (legacy, moving towards AWS CloudFront/API Gateway), Git (GitHub), Linux (Ubuntu), RabbitMQ We are looking to start or make more use of the following AWS services More ❯
Columbia, Maryland, United States Hybrid / WFH Options
Codescratch LLC
development tool suites. Preferred Skills and Experience: Experience with Docker and Kubernetes Experience with Hadoop Experience with Spark Experience with Accumulo Experience monitoring application performance with metrics (Prometheus, InfluxDB, Grafana) and logs with ELK Stack (ElsticSearch, Logstash, Kibana) Experience with asynchronous messaging systems (RabbitMQ, Apache Kafka, etc.) Location: Columbia Annex, MD (60%+ telework) Salary Range: $115,000 - $200,000.00 More ❯
standard software development tool suites. Preferred Skills and Experience: Experience with Docker and Kubernetes Experience with Virtual Machines Experience with Networking Experience monitoring application performance with metrics (Prometheus, InfluxDB, Grafana) and logs with ELK Stack (ElasticSearch, Logstash, Kibana) Have, or obtain Security+ certification or equivalent DoD 8570 IAT II certification Location Fort Eisenhower, GA (Appx 50% hybrid telework) Salary Range More ❯
Sonatype Nexus Knowledge and working experience of containerising application components including writing DockerFiles and deploying to Kubernetes Deep understanding of pipelines as code Observability concepts and tooling; Opensearch, Cribl, Grafana, Prometheus, CloudWatch Experience of working with agile teams Job Band & Level: Manager/7 Not The Perfect Fit? Concerned that you may not meet the criteria precisely? At TP ICAP More ❯
London, England, United Kingdom Hybrid / WFH Options
Appvia
product teams to iterate on their applications in AWS Setting best practices and policies, especially around microservice architecture Developing, maintaining and operating complex operational tooling (e.g. Kubernetes, Opensearch, Prometheus, Grafana, Github or equivalent alternative technologies) Assessing customer technical capabilities and upskilling for reduced friction and increased platform adoption Enhancing operational reliability and scalability of existing products POC'ing new ideas More ❯
our long running services and analytics in C#. We use Airflow for workflow management, Kafka for data pipelines, Bitbucket for source control, Jenkins for continuous integration, ELK for logs, Grafana, Prometheus & InfluxDb for metrics, Docker and Kubernetes for containerisation, OpenStack for our private cloud, Ansible and Terraform for architecture automation, and Slack for internal communication. We heavily utilise ArcticDB ( https More ❯
City of London, England, United Kingdom Hybrid / WFH Options
VE3
VE3 City Of London, England, United Kingdom Join or sign in to find your next job Join to apply for the AWS DevOps & System Administrator role at VE3 VE3 City Of London, England, United Kingdom 1 week ago Be among More ❯
London, England, United Kingdom Hybrid / WFH Options
VE3
Social network you want to login/join with: AWS DevOps & System Administrator, London col-narrow-left Client: VE3 Location: London, United Kingdom Job Category: Other - EU work permit required: Yes col-narrow-right Job Reference: bb464544d87a Job Views: 8 More ❯
that enable continuous integration, continuous deployment, and efficient infrastructure management. Your expertise in using cutting-edge technologies such as GitHub, Azure, Kubernetes, ACR, YAML, Terraform, Azure DevOps, HELM, Prometheus, Grafana, PowerShell, and Jira will be crucial in driving innovation and ensuring the reliability of our software delivery pipeline. Key Responsibilities: Infrastructure Automation and Configuration Management: Design, build, and maintain infrastructure … and scale containerized applications. Work with Docker and Azure Container Registry (ACR) to create and manage container images efficiently. Monitoring and Performance Management: Implement monitoring solutions using Prometheus and Grafana to track system performance, application health, and resource utilization. Set up alerts and notifications to promptly respond to potential issues. Security and Compliance: Collaborate with the security team to implement … Helm for managing Kubernetes applications. Experience with implementing and maintaining CI/CD pipelines using Azure DevOps or similar tools. Knowledge of monitoring and logging solutions like Prometheus and Grafana, or similar tools. Strong scripting skills, especially in PowerShell, for automating tasks and configurations. Excellent problem-solving skills and the ability to work well in a fast-paced, collaborative environment. More ❯
estatescontributing to their operational success through proactive insight and incident prevention. What you'll do Design, implement, and manage observability solutions using industry-leading tools such as Dynatrace (primary), Grafana, and Splunk Collect and analyse telemetry data (metrics, logs, traces, events) to diagnose and resolve system and application performance issues Integrate monitoring platforms with ITSM tools (e.g. ServiceNow) and CI … present monitoring solutions and technical designs Proactively identify and highlight risks that could impact solution success What you'll need Strong experience deploying and managing observability platforms including Dynatrace, Grafana, and/or Splunk Deep understanding of telemetry signal analysis and performance monitoring Experience integrating observability tools with ITSM platforms and DevOps toolchains Ability to troubleshoot complex infrastructure and application More ❯
contributing to their operational success through proactive insight and incident prevention. What you'll do Design, implement, and manage observability solutions using industry-leading tools such as Dynatrace (primary), Grafana, and Splunk Collect and analyse telemetry data (metrics, logs, traces, events) to diagnose and resolve system and application performance issues Integrate monitoring platforms with ITSM tools (e.g. ServiceNow) and CI … present monitoring solutions and technical designs Proactively identify and highlight risks that could impact solution success What you'll need Strong experience deploying and managing observability platforms including Dynatrace, Grafana, and/or Splunk Deep understanding of telemetry signal analysis and performance monitoring Experience integrating observability tools with ITSM platforms and DevOps toolchains Ability to troubleshoot complex infrastructure and application More ❯
contributing to their operational success through proactive insight and incident prevention. What you'll do Design, implement, and manage observability solutions using industry-leading tools such as Dynatrace (primary), Grafana, and Splunk Collect and analyse telemetry data (metrics, logs, traces, events) to diagnose and resolve system and application performance issues Integrate monitoring platforms with ITSM tools (e.g. ServiceNow) and CI … present monitoring solutions and technical designs Proactively identify and highlight risks that could impact solution success What you'll need Strong experience deploying and managing observability platforms including Dynatrace, Grafana, and/or Splunk Deep understanding of telemetry signal analysis and performance monitoring Experience integrating observability tools with ITSM platforms and DevOps toolchains Ability to troubleshoot complex infrastructure and application More ❯
London, England, United Kingdom Hybrid / WFH Options
Keyrock
availability and security. Automation & CI/CD: Implement and manage CI/CD pipelines for efficient deployment, testing, and monitoring. Observability & Monitoring: Develop monitoring solutions with tools like Prometheus, Grafana, ELK stack to enhance system reliability. Security & Compliance: Apply best practices for cloud security, IAM policies, and compliance standards (SOC2, ISO 27001). Incident Response & Performance Optimization: Troubleshoot issues, perform … experience (EKS, K3s, or self-managed). Proficiency in scripting with Python, Bash, or Go. Experience with Infrastructure as Code (Terraform, CloudFormation, Ansible). Familiarity with observability tools (Prometheus, Grafana, Datadog, ELK). Solid understanding of networking (VPC, Load Balancers, DNS, Firewalls). Experience with DevOps, CI/CD, and GitOps practices. Experience with high-performance, low-latency systems. Familiarity More ❯
APIs Experience with Git Source Control System Position Desired Skills Experience with Messaging Frameworks such as Kafka, ActiveMQ, and RabbitMQ Experience with tools used for metrics visualization such as Grafana and Kibana Experience with containerization technologies such as Docker Experience with the Atlassian Tool Suite (JIRA, Confluence More ❯
for microservices and Kafka-related applications using tools like Drone Automate infrastructure provisioning using Terraform or Infrastructure-as-Code tools Build and maintain monitoring and alerting systems using Prometheus, Grafana, or AWS native monitoring tools like CloudWatch Collaborate with development and DevOps teams to design MSK and Kubernetes-based solutions Troubleshoot complex issues related to Kafka and container orchestration. Document … as Code (IaC) tools like Terraform Knowledge of container build and deployment automation using CI/CD pipelines Experience in observability tools for both MSK and Kubernetes, including Prometheus, Grafana, and AWS CloudWatch for metrics and logs Deep understanding of Kafka and Kubernetes security practices, including network policies and IAM roles Experience with Vault Strong analytical and troubleshooting skills Ability More ❯
Tottenham, England, United Kingdom Hybrid / WFH Options
Gamingtec
determine the root cause of the incident; Infrastructure as Code: Terraform, Helm, Ansible, (optional) Werf; Linux administration and container orchestration (K8s) skills; Experience with monitoring/observability stacks: Prometheus, Grafana, ELK, Loki, etc.; Strong understanding of TCP/IP, DNS and load balancers; Familiarity with incident response, postmortems, and blameless culture; Availability to work between 5 PM and 8 AM … systems (e.g., live games, sports betting, KYC, payments); Manage and support highly available, scalable infrastructure (K8s, cloud and bare metal); Implement and manage monitoring, logging, and alerting (e.g., Prometheus, Grafana, Loki, ELK); Automate deployments and operations using CI/CD pipelines (Jenkins, ArgoCD, Helm, etc.); Conduct post-incident reviews, define action items, and reduce mean time to recovery (MTTR); Participate More ❯
Portsmouth, England, United Kingdom Hybrid / WFH Options
Trust In SODA
life cycle. Infrastructure-as-code Bash Delivery methods and techniques, including agile scrum experience. Desirable Skills: RedHat OpenShift Hashicorp (such as Terraform, Packer, Vault) Ansible Observability (such as Prometheus, Grafana, Splunk) Containerised services (such as Postgres, Redis, Kafka, Keycloak, Elk) Experience of doing all the above at OS or S level YAML based pipelines. Immutable infrastructure Experience with MOD delivery More ❯
Azure is a plus). Deep understanding of Container Orchestration technologies such as Kubernetes and Docker . Proficiency in monitoring and logging tools including: Datadog , Splunk , Dynatrace , AppDynamics , Prometheus , Grafana , ELK Stack , CloudWatch , Gremlin , ThousandEyes . Experience with Terraform , Jenkins , GitLab CI , PostgreSQL , Redis , and Kong API Gateway . Solid understanding of networking , security best practices , and infrastructure automation . More ❯
Sonatype Nexus Knowledge and working experience of containerising application components including writing DockerFiles and deploying to Kubernetes Deep understanding of pipelines as code Observability concepts and tooling; Opensearch, Cribl, Grafana, Prometheus, CloudWatch #J-18808-Ljbffr More ❯
of building and maintaining CI/CD pipelines using the likes of GitLab, Jenkins, CircleCI, CodeBuild etc. Familiarity with scripting (Bash or Python). Monitoring and alerting tools - Prometheus, Grafana or Splunk, ELK. We're looking for someone who wants to progress their career into the DevOps arena. Submit your CV now to be considered. IND_PC1 Carbon60, Lorien & SRG More ❯
experience with AWS cloud infrastructure Deep understanding of IaC tools: Terraform, Packer, CloudFormation Proven leadership in multidisciplinary delivery teams Skills in Databases: MongoDB/Atlas; Messaging: Kafka; Observability: Prometheus, Grafana, Splunk Experience working in a DevOps environment with a focus on CI/CD pipelines Experience designing, implementing, securing, and supporting Unix/Linux platforms (preferably RHEL/CentOS) with More ❯