Leeds, England, United Kingdom Hybrid / WFH Options
ZipRecruiter
platform modernisation Mentor and lead a small team of engineers Align DevOps capabilities with the wider business Champion DevEx, reliability, and security Embed operational excellence and incident response Promote observability and performance optimisation Lead DevOps Engineer Requirements Proven technical and some leader/mentoring experience Cloud- expertise (any cloud provider is fine: GCP, AWS or Azure) Knowledge of GitLab CI More ❯
Bradford, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
platform modernisation Mentor and lead a small team of engineers Align DevOps capabilities with the wider business Champion DevEx, reliability, and security Embed operational excellence and incident response Promote observability and performance optimisation Proven technical and some leader/mentoring experience Cloud-native expertise (any cloud provider is fine: GCP, AWS or Azure) Knowledge of GitLab CI/CD, Terraform More ❯
and firewalls. Experience with load balancers (F5, HAProxy, Nginx) and network monitoring tools. Experience in DNS management and troubleshooting. Experience in network security best practices. Proficiency in monitoring and observability tools (Prometheus, Grafana, Splunk). Proficiency in at least one scripting language (Python, Bash) for automation. Experience with CI/CD pipeline management and DevOps practices. Strong understanding of disaster More ❯
London, England, United Kingdom Hybrid / WFH Options
InterQuest Group
Go Significant experience with AWS cloud infrastructure Deep understanding of IaC tools: Terraform, Packer, CloudFormation Proven leadership in multidisciplinary delivery teams Skills in Databases: MongoDB/Atlas, Messaging: Kafka, Observability: Prometheus, Grafana, Splunk Experience working in a DevOps environment with Continuous Integration & Deployment Designing, implementing, securing, and supporting Unix/Linux based platforms Developing solutions using scripting languages (Bash, Python More ❯
a fast-paced, dynamic environment. Previous experience working on large App/Data migrations engagements. Cloud Platforms and Technology Experience Core Skills: GCP – Networking, Security tool/Best Practices Observability - Operations suite, Logging, Monitoring, Alerting. Additional Skills: Good understanding of Linux OS. Bash, Scripting, Automation, Ansible, Networking, Security. Hands-on experience with DevOps Principles and Tools. Hands-on with Terraform More ❯
built on Solace PubSub+, ensuring high availability, optimal performance, and reliability across production and non-production environments. This includes working on incident response, capacity planning, WAN optimization, and system observability using tools like Prometheus and Grafana . Key Responsibilities: Administer and maintain Solace PubSub+ appliances and software brokers across environments (on-prem and cloud). Provide production support for messaging More ❯
built on Solace PubSub+, ensuring high availability, optimal performance, and reliability across production and non-production environments. This includes working on incident response, capacity planning, WAN optimization, and system observability using tools like Prometheus and Grafana . Key Responsibilities: Administer and maintain Solace PubSub+ appliances and software brokers across environments (on-prem and cloud). Provide production support for messaging More ❯
DevOps & Automation Create and manage automation pipelines for deployments. Implement Infrastructure as Code (IaC) using tools such as Terraform or Ansible. Monitor and enhance system performance using logging and observability tools. Develop automation solutions for provisioning, scaling, and maintenance. Support containerization efforts with Docker/Kubernetes where applicable. Networking & System Administration Configure and maintain network infrastructure, including firewalls, VLANs, and More ❯
Go Significant experience with AWS cloud infrastructure Deep understanding of IaC tools: Terraform, Packer, CloudFormation Proven leadership in multidisciplinary delivery teams Skills in Databases: MongoDB/Atlas, Messaging: Kafka, Observability: Prometheus, Grafana, Splunk Experience of working in a DevOps environment - favouring and implementing Continuous Integration & Deployment over manual processes. Experience of designing, implementing, securing and supporting Unix/Linux based More ❯
Southampton, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
/CD pipelines, ensuring rapid and reliable code delivery. Support microservices architecture, focusing on latency-sensitive and high-availability services. Monitor system performance, conduct root cause analysis, and implement observability best practices (metrics, logging, tracing). Harden infrastructure and deployments with infrastructure as code (Terraform/CDK/CloudFormation). Lead incident response, system reliability efforts, and infrastructure scalability initiatives. More ❯
to L3 networking Programming languages, such as C#, Python, Perl, Java, C++ CICD tools such as Azure DevOps, GitHub Actions, Gitlab, Jenkins, TeamCity Scripting languages such as PowerShell, bash Observability/Monitoring: Prometheus, Grafana, Splunk Containerisation tools such as Docker, K8S, OpenShift, EC, containers Hosting technologies such as IIS, nginx, Apache, App Service, LightSail Analytical and creative approach to problem More ❯
tools (e.g., Jenkins, GitLab, GitHub Actions, CircleCI ) and orchestration technologies (e.g., Kubernetes, Docker). Proficiency in scripting and programming languages (e.g., Python, Bash, Go). Experience with monitoring and observability tools (e.g., Prometheus, Grafana, Datadog). Solid understanding of security best practices, compliance standards, and DevSecOps . Proven ability to manage and deliver complex projects on time and within budget. More ❯
machine learning models and analytical services. Implement and enforce security best practices across cloud and network environments. Troubleshoot deployment and performance issues across multiple environments. Set up and maintain observability tools for logging, monitoring, and alerting (e.g., Prometheus, Grafana, Loki). Contribute to internal tooling to streamline development, testing, and operations workflows. Stay current with DevOps trends and recommend improvements More ❯
London, England, United Kingdom Hybrid / WFH Options
amber labs
working in Agile teams using tools like Git , Jira , and Confluence Eligible for SC and NPPV3 clearance Desirable: Container orchestration with Kubernetes HashiCorp tools: Vault , Consul , Packer Monitoring and observability with Grafana , Prometheus , or similar Familiarity with cloud networking, VPCs, NAT Gateways, security groups, etc. Personal Attributes: Proactive and self-driven with a passion for technology Strong problem-solving mindset More ❯
and Helm for infrastructure automation. - Familiarity with Jenkins and integrating CI/CD tools. - Strong understanding of networking, security standard methodologies, and cloud governance. - Experience with logging, monitoring, and observability tools. - Excellent problem-solving, communication, and collaboration skills. Bonus Points If You Have: - Certifications in AWS (AWS Certified DevOps Engineer, Solutions Architect) or Azure (Azure DevOps Engineer, Solutions Architect). More ❯
delivery practices through tooling and coaching. Provide architectural input on how platform choices impact software delivery and operability. Join wider Application Development squads to accelerate delivery of key projects. Observability, Site Reliability Operate production workloads with an SRE mindset: measure reliability, define SLOs, and reduce toil through automation. Lead initiatives to reduce operational toil and enhance system resilience through automation. More ❯
systems in high-growth or large-scale environments. Strong expertise in cloud platforms (AWS, GCP, Azure) and container orchestration tools (Kubernetes, Docker). Deep knowledge of monitoring, logging, and observability tools (e.g., Prometheus, Grafana, Datadog, Splunk). Proficiency in programming or scripting languages (e.g., Python, Go, Bash). Experience with incident management, post-mortems, and implementing preventative measures. Solid understanding More ❯
and scaling. Implement Containerisation and Orchestration - Containerise applications with Docker and deploy using Kubernetes, ECS, or similar. Manage Helm charts or Customise templates and enforce container security standards. Drive Observability and Operational Readiness - Implement monitoring, logging, and alerting with tools like Prometheus, Grafana, ELK, or Datadog. Create dashboards and promote the adoption of SLOs and error budgets. Embed Security and More ❯
and scaling. Implement Containerisation and Orchestration - Containerise applications with Docker and deploy using Kubernetes, ECS, or similar. Manage Helm charts or Customise templates and enforce container security standards. Drive Observability and Operational Readiness - Implement monitoring, logging, and alerting with tools like Prometheus, Grafana, ELK, or Datadog. Create dashboards and promote the adoption of SLOs and error budgets. Embed Security and More ❯
and scaling. Implement Containerisation and Orchestration - Containerise applications with Docker and deploy using Kubernetes, ECS, or similar. Manage Helm charts or Customise templates and enforce container security standards. Drive Observability and Operational Readiness - Implement monitoring, logging, and alerting with tools like Prometheus, Grafana, ELK, or Datadog. Create dashboards and promote the adoption of SLOs and error budgets. Embed Security and More ❯
meet business needs and objectives. Develop a baseline monitoring and tooling concept for cloud to address the need for compliance infrastructure reporting within agile deliveries as part of our Observability strategy. Develop concepts and tools for chargeback and showback (Financial Instrumentation) in a multicloud context. Implement and mature a cloud forecasting and capacity management solution for the enterprise. Collaborate with More ❯
About the Role Forter is seeking a Senior Software Engineer to join our Observability team . This role offers the opportunity to work at the intersection of software development and platform engineering, contributing to the tools, systems, and practices that improve visibility, reliability, and operational excellence across our engineering organisation. This position is ideally suited for experienced software engineers who … are passionate about building high-quality systems and are interested in expanding their expertise in observability, distributed systems, and developer experience. You will help design, build and maintain systems that empower engineers across Forter to monitor, understand, and troubleshoot their services more effectively. Our observability team is responsible for delivering scalable and user-friendly solutions to over 150 engineers working … re focused on enabling rapid incident detection and resolution, improving our reliability posture, and supporting a culture of continuous improvement. What you'll be doing: Design, build, and maintain observability tools and infrastructure that help our engineers provide actionable insights into the performance and reliability of Forter's systems. Collaborate with other engineers and teams to enhance the developer experience More ❯
team. This role is ideal for someone passionate about service reliability, scalability, and performance. As an SRE, you will collaborate with development and operations teams to automate infrastructure, enhance observability, and reduce manual processes (TOIL) to improve overall system health. Key Responsibilities: Design, build, and maintain scalable, resilient systems and services. Automate routine tasks and eliminate manual effort using scripting … with development teams to ensure best practices for deployment, monitoring, and performance tuning. Drive incident management processes, root cause analysis, and continuous improvement of system reliability. Maintain and improve observability using monitoring and logging tools. Optimize cloud infrastructure usage and costs. Primary Skills & Experience: Strong hands-on experience with cloud platforms, especially AWS (experience with GCP or Azure is a More ❯
London, England, United Kingdom Hybrid / WFH Options
Northrop Grumman
Platforms Engineer to join our Data Platforms Infrastructure team. This role is pivotal in supporting and maintaining our Elastic Cloud Enterprise (ECE) environment, designing and deploying Elastic solutions for Observability and Search, and ensuring the highest standards of security, privacy, and compliance. The ideal candidate will have a strong background in Elasticsearch, a keen understanding of cloud architecture, and a … upgrades to ensure optimal performance and security. - Troubleshoot and resolve issues related to Elasticsearch clusters, indices, and ECE infrastructure. Design and Deployment: - Design scalable and efficient Elasticsearch solutions for Observability and Search use cases. - Implement best practices for data indexing, storage, and retrieval. - Work with stakeholders to understand requirements and translate them into technical solutions. Security, Privacy, and Compliance: - Implement … that enhance the Elasticsearch platform. Your Experience : - Extensive experience working with Elasticsearch, including ECE. - Strong understanding of Elasticsearch architecture, components, and APIs. - Experience with designing and deploying Elasticsearch for Observability and Search use cases. - Knowledge of security best practices and compliance requirements (e.g., GDPR, HIPAA). - Proficiency in scripting and automation (e.g., Python, Bash, Ansible). - Familiarity with cloud platforms More ❯
multiple stakeholders including development teams to implement and maintain reliable and scalable systems while adhering to industry best practices and security standards. Responsibilities and Impact: Design, implement, and maintain observability solutions to track system health and performance. Analyze observability data to identify and troubleshoot potential issues proactively. Develop and implement alerts and notifications for critical events. Collaborate with development teams … in Computer Science, Information Technology, or a related field. 5+ years of experience as a Site Reliability Engineer or equivalent in a similar role. Proficient in application and infrastructure observability, Splunk OpenTelemetry preferred Experienced in production environments running in AWS Comfortable with Infrastructure as Code, Terraform is preferred Comfortable with CI/CD pipelines such as GitHub Actions, Azure DevOps More ❯