administration, at an advanced level (Ubuntu/RHEL). Working knowledge of at least one scripting language, e.g. Python, Bash. Monitoring and logging of systems using tools like Prometheus, Grafana, or ELK. Experience with IaC tools (ideally Ansible). Working knowledge of version control methodologies and practices. Desirable: Experience with air gapped environments. Understanding of cross domain solutions. Working knowledge More ❯
Harewood, England, United Kingdom Hybrid / WFH Options
Assured Data Protection
/Confluence or other project managemet tools. - MySQL/SQL Databases & ElasticSearch & Redis - Cloud computing (Azure/AWS/GCP) - Rabbit MQ - Linux - REST/GraphQL - Ansible - Prometheus/Grafana/Alertmanager What We Offer Hybrid working options for flexibility. Regular team-building and off-site company events. A dynamic, inclusive, and collaborative work environment At Assured Data Protection we More ❯
London, England, United Kingdom Hybrid / WFH Options
ZILO™
solving skills and attention to detail. Strong communication and teamwork abilities. Knowledgeable in Java profiling and JVM memory model Preferred Qualifications Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack). Familiarity with Agile methodologies and DevOps practices. Benefits Enhanced leave - 38 days inclusive of 8 UK Public Holidays Private Health Care including family cover Life Assurance – 5x More ❯
infrastructure Other duties as needed About You 5+ years' experience in Site Reliability Engineer roles Expert+ level Linux administration, scripting, and troubleshooting Demonstrable knowledge of Observability tools (Prometheus/Grafana, New Relic, Splunk, DataDog) Comprehensive experience with AWS (Amazon Web Services) and its core capabilities (VPC, EC2, ECS, Route53, Fargate, ALB/NLB distributions, etc) Extensive experience with cloud automation More ❯
DevOps principles and practices, including CI/CD, infrastructure as code, and monitoring. Hands-on experience with common DevOps tools such as Jenkins, Tekton, uDeploy, Harness, Git, Kubernetes, Prometheus, Grafana, or similar. Experience with GenAI tools and platforms is a plus. Proficiency in scripting languages like Python or Bash for automation and tooling. Excellent problem-solving and analytical skills, with More ❯
team success Strong Linux systems administration skills Infrastructure as Code tools like Terraform and Ansible Basic knowledge of VLAN networking and Bash scripting Familiarity with monitoring tools like Prometheus, Grafana, and Sensu Go Experience with Kubernetes, OpenShift, and KubeVirt Workplace Options: This position is onsite or hybrid/flex as desired. While on-site, you will be a part of More ❯
Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
Arm Limited
other cloud providers (e.g. GCP, Azure). Strong understanding of key security technologies and protocols such as TLS, OAuth and SPIFFE. Observability, alerting, metrics collection and visualisation (e.g. Prometheus, Grafana, Elasticsearch, Dynatrace). "Nice To Have" Skills and Experience: We would be even more impressed if you are passionate about the following: Cluster processes performance tuning and optimisation. Migrating on More ❯
London, England, United Kingdom Hybrid / WFH Options
One World GTM
Reliability Engineering. Mentor engineers, advocate for DevOps culture, and drive improvements across development, security, and operations. Stay ahead of the curve by adopting the latest AWS services, observability tools (Grafana), and Kubernetes based architectures. Implement security best practices and governance controls to protect critical systems and data. Work closely with cross-functional teams to support fast, efficient, and seamless software More ❯
Manchester, England, United Kingdom Hybrid / WFH Options
Sectigo
. Experience with Configuration Management and Infrastructure as Code tools (Ansible, Puppet, Terraform preferred). Good understanding of container technology (Docker, Kubernetes preferred). Experience with monitoring tools (Prometheus, Grafana, Nagios, or similar.) and alerting systems. Experience with non-cloud infrastructure. Experience running a large-scale 24/7 production environment. Experience with distributed data processing, databases, and large-scale More ❯
a given technical discipline (e.g., AWS, Kubernetes, etc.) Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others Experience with continuous integration and continuous delivery tools like Jenkins, GitLab, or Terraform Ability to contribute to large and collaborative teams by presenting information More ❯
troubleshooting and problem solving skills • A passion for learning new technologies and innovation Desirable: • Certifications on Amazon Web Services, including Solutions Architect, Developer, Google Cloud or Azure • Amazon Managed Grafana • JetBrains TeamCity • Google Apps Script • Agile Development #LI-JS2 Together, as owners, let's turn meaningful insights into action. Life at CGI is rooted in ownership, teamwork, respect and belonging. More ❯
Washington, Washington DC, United States Hybrid / WFH Options
ESimplicity Inc
in relational and non-relational databases like SQL, MySQL, NoSQL, PostgreSQL, MongoDB or similar Experience with Redis Experience with benchmarking, monitoring, and performance engineering applications with tools such as Grafana, Sentry, and Prometheus Knowledge of AuthN and AuthZ systems, including Active Directory, Okta, and AWS IAM Policies/Roles using attribute-based access controls Knowledge of automated end-to-end More ❯
distributed delivery models Additional skills that are a plus: Programming languages such as Scala, Rust, Go, Angular, React, Kotlin Database management with PostgreSQL Experience with ElasticSearch, observability tools like Grafana and Prometheus What this role can offer Opportunity to deepen understanding of AI and Data Science applications Mentorship and support from colleagues to apply your talents Career growth and development More ❯
London, England, United Kingdom Hybrid / WFH Options
Ten Lifestyle Group
Experience with cloud platforms (AWS, GCP, Azure) and infrastructure-as-code (Terraform). Familiarity and hands-on with DevOps practices (CI/CD, Docker, K8s) and observability tools (Prometheus, Grafana, Datadog). Experience in distributed systems and scaling. Knowledge and hands-on experience with multiple data stores (both SQL and NoSQL). Desired experience in building agentic workflows (e.g., autonomous More ❯
pre-sales activities. Requirements Skills/Experience: Observability and SRE Practices: In-depth understanding of observability and Site Reliability Engineering practices. Familiarity with tools in the LGTM stack (Loki, Grafana, Tempo, Mimir) or equivalent observability platforms. Containerisation: Strong experience building and managing containerised applications, effectively leveraging container orchestration platforms such as Kubernetes. Cloud Expertise: Demonstrable ability to architect and implement More ❯
London, England, United Kingdom Hybrid / WFH Options
Ten Lifestyle Group
Experience with cloud platforms (AWS, GCP, Azure) and infrastructure-as-code (Terraform etc). Hands-on with DevOps/Infra tooling (CI/CD, Docker, K8s) and observability (Prometheus, Grafana, Datadog etc) Experience building distributed systems Knowledge and hands-on experience with multiple datastores (both SQL and NoSQL) Desired experience in building agents and workflows (e.g autonomous systems or multi More ❯
Please apply via the Civil Service Jobs link for your application to be considered. We're looking for outstanding Lead DevOps Engineers to develop, build and maintain our flagship service, Universal Credit and our Working Age Benefits. Universal Credit is More ❯
We’re seeking an experienced contractor to support the delivery of observability solutions for a new, large-scale infrastructure environment. This role focuses on developing insightful and automated Grafana dashboards, with a strong emphasis on data integration and actionable telemetry. Required Skills Excellent, concise communication skills - essential for collaborating with technical teams to shape observability outputs. Deep experience with Grafana … Python for automation and integration. Strong collaboration skills to work with cross-functional engineering teams. Experience working in Linux-based environments. Bonus/Nice-to-Have Skills: Experience deploying Grafana instances via code (provisioning dashboards, datasources). Familiarity with OpenTelemetry, metric instrumentation, and telemetry pipelines. Background in data center environments, infrastructure monitoring, or SRE practices. Exposure to CI/CD More ❯
We’re seeking an experienced contractor to support the delivery of observability solutions for a new, large-scale infrastructure environment. This role focuses on developing insightful and automated Grafana dashboards, with a strong emphasis on data integration and actionable telemetry. Required Skills Excellent, concise communication skills - essential for collaborating with technical teams to shape observability outputs. Deep experience with Grafana … Python for automation and integration. Strong collaboration skills to work with cross-functional engineering teams. Experience working in Linux-based environments. Bonus/Nice-to-Have Skills: Experience deploying Grafana instances via code (provisioning dashboards, datasources). Familiarity with OpenTelemetry, metric instrumentation, and telemetry pipelines. Background in data center environments, infrastructure monitoring, or SRE practices. Exposure to CI/CD More ❯
such as Terraform or CloudFormation Monitor and troubleshoot production systems to identify and resolve issues proactively Develop and maintain monitoring and logging systems using tools such as Prometheus and Grafana Implement and maintain security and compliance policies across all systems and environments Research and evaluate new tools and technologies to improve DevOps processes Collaborate with development teams to design and … with IaC tools such as Terraform and CloudFormation Knowledge of AWS cloud platform (e.g., EKS) Strong Linux systems administration skills Familiarity with monitoring and logging tools like Prometheus and Grafana Experience with scripting languages such as Python, Ruby, or Bash Excellent communication and collaboration skills Working with us is about: Joining a motivated and professional team Working in a modern More ❯
a move? Get in touch and apply today! Responsibilities: Respond rapidly to critical AWS incidents, identify root causes, and deploy automated hotfixes. Lead the setup and integration of Prometheus-Grafana observability stack. Refactor and modernize deployment pipelines using GitHub Actions and Kubernetes. Maintain robust monitoring, alerting, and CI/CD systems. Skills/Must have: Strong hands-on experience with … AWS (eg EC2, EKS, CloudWatch, Lambda). Background in incident, change, and problem management; comfortable with on-call rotations. Expertise in Prometheus, Grafana, and Splunk; solid knowledge of PromQL. Proficient in Scripting/programming (Python, Go, Bash, SQL). Salary: £500 per day More ❯
software brokers across cloud and on-prem platforms Responding to production incidents and working on root cause analysis and long-term fixes Monitoring system health and performance with Prometheus, Grafana, and custom dashboards Optimising Solace across WAN environments for secure, low-latency message delivery Partnering with development and support teams to troubleshoot integration and message flow issues Driving capacity planning … re Looking For: 3+ years’ hands-on experience with Solace PubSub+ in a production environment Strong knowledge of WAN-based distributed systems and networking fundamentals Experience with Prometheus and Grafana for observability and alerting Confident in Linux/Unix systems and scripting (Bash, Python, etc.) Excellent problem-solving instincts and attention to detail Strong communicator who works well across technical More ❯
software brokers across cloud and on-prem platforms Responding to production incidents and working on root cause analysis and long-term fixes Monitoring system health and performance with Prometheus, Grafana, and custom dashboards Optimising Solace across WAN environments for secure, low-latency message delivery Partnering with development and support teams to troubleshoot integration and message flow issues Driving capacity planning … re Looking For: 3+ years’ hands-on experience with Solace PubSub+ in a production environment Strong knowledge of WAN-based distributed systems and networking fundamentals Experience with Prometheus and Grafana for observability and alerting Confident in Linux/Unix systems and scripting (Bash, Python, etc.) Excellent problem-solving instincts and attention to detail Strong communicator who works well across technical More ❯
software brokers across cloud and on-prem platforms Responding to production incidents and working on root cause analysis and long-term fixes Monitoring system health and performance with Prometheus, Grafana, and custom dashboards Optimising Solace across WAN environments for secure, low-latency message delivery Partnering with development and support teams to troubleshoot integration and message flow issues Driving capacity planning … Were Looking For: 3+ years hands-on experience with Solace PubSub+ in a production environment Strong knowledge of WAN-based distributed systems and networking fundamentals Experience with Prometheus and Grafana for observability and alerting Confident in Linux/Unix systems and scripting (Bash, Python, etc.) Excellent problem-solving instincts and attention to detail Strong communicator who works well across technical More ❯