leeds, west yorkshire, yorkshire and the humber, United Kingdom Hybrid / WFH Options
Future Talent Group
culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Hands-on AWS expertise (EKS, SQS More ❯
leeds, west yorkshire, yorkshire and the humber, united kingdom Hybrid / WFH Options
Inspirec
Orchestrate and manage containerized applications using Docker, supporting streamlined deployment and environment consistency across development and production. Implement comprehensive monitoring and alerting solutions with Prometheus, Grafana, and AlertManager to proactively identify and resolve system performance issues. Champion DevOps best practices in automation, security, and agile delivery to drive continuous improvement More ❯
leeds, west yorkshire, yorkshire and the humber, United Kingdom Hybrid / WFH Options
Fruition Group
is fine: GCP, AWS or Azure) Knowledge of GitLab CI/CD, Terraform, Ansible Experience in Kubernetes, Docker, SRE and IaC principles Monitoring with Prometheus, Grafana etc Any scripting experience will be a bonus What's in it for me? Competitive salary up to £90k + bonus Hybrid working More ❯
leeds, west yorkshire, yorkshire and the humber, United Kingdom Hybrid / WFH Options
CATCHES
skills and a track record of cross-team collaboration. Nice to have: Kubernetes expertise (GKE/AKS/EKS) and container-native observability stacks (Prometheus/Grafana). NoSQL experience (Firestore, Cosmos DB, DynamoDB, MongoDB). Experience with game-backend scales, real-time services or hybrid cloud/bare-metal More ❯
leeds, west yorkshire, yorkshire and the humber, United Kingdom Hybrid / WFH Options
Beazley Security
and cloud environments. Experience with CI/CD pipelines and tools (e.g., Jenkins, GitLab CI, CircleCI). Familiarity with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack). Strong problem-solving and analytical skills. Excellent communication and collaboration skills. Experience with version control systems (e.g., Git). Experience working More ❯
leeds, west yorkshire, yorkshire and the humber, United Kingdom
Vallum Associates
etc. -Experience with data serialization formats like Avro, JSON, Protobuf. -Proficient in Java, Scala, or Python for Kafka-based development. -Familiarity with monitoring tools (Prometheus, Grafana, Confluent Control Center, etc.). -Understanding of networking, security (SSL/SASL), and data governance. -Experience with CI/CD pipelines and containerization (Docker More ❯
leeds, west yorkshire, yorkshire and the humber, United Kingdom Hybrid / WFH Options
Growing Start up
/CD pipelines with GitLab and GitHub Actions Containerising with Docker and applying best practices for security and performance Monitoring and alerting using Datadog, Prometheus, and Grafana Debugging complex systems using tools like strace, dtrace, and beyond Supporting a tech stack that includes Rust, Python, Go, C++, Java, and more More ❯
leeds, west yorkshire, yorkshire and the humber, United Kingdom
Luupli
e.g., RabbitMQ, Kafka). Deep understanding of API design and best practices (REST, gRPC). Experience with CI/CD pipelines, monitoring tools (e.g., Prometheus, Grafana), and logging systems (e.g., ELK stack).Strong problem-solving, organizational, and communication ski lls. Prefe rred: Experience with distributed systems, event-driven architectures, and More ❯
leeds, west yorkshire, yorkshire and the humber, United Kingdom Hybrid / WFH Options
Durlston Partners
is a plus), infrastructure-as-code, and CI/CD tooling Strong scripting and automation experience in Python and Bash Familiarity with observability stacks (Prometheus, OpenTelemetry, eBPF) Cloud infrastructure experience (AWS/GCP/Azure), with attention to IAM and software supply chain security Curious, persistent, and comfortable experimenting at More ❯
leeds, west yorkshire, yorkshire and the humber, United Kingdom
Fruition Group
Hands-on experience with AWS, Kubernetes, Docker, and modern CI/CD pipelines Familiarity with infrastructure-as-code (e.g., Terraform) and observability tooling (e.g., Prometheus, Grafana) Comfortable working on distributed systems and improving developer workflows A product mindset and a collaborative approach to problem-solving Experience with Kafka, gRPC, or More ❯
leeds, west yorkshire, yorkshire and the humber, United Kingdom Hybrid / WFH Options
Fruition Group
delivery. Lead deployment strategies and ensure smooth feature rollouts with minimal downtime. Define and manage monitoring, logging, and telemetry using tools like AWS Cloudwatch, Prometheus, and Datadog. Lead incident response and production troubleshooting with a proactive and preventative mindset. Drive automation initiatives with tools like GitlabCI, Terraform/OpenTofu, Ansible More ❯
leeds, west yorkshire, yorkshire and the humber, United Kingdom
SoCode Recruitment
API Management and DevOps Pipelines and AWS including EKS Lambda and CloudFormation Infrastructure as Code and GitOps : Terraform Bicep Pulumi ArgoCD and FluxCD Observability : Prometheus Grafana OpenTelemetry and Datadog Security and Compliance : HashiCorp Vault Azure Key Vault AWS KMS OPA Gatekeeper and Drata or similar AI Coding Tools : GitHub Copilot More ❯
leeds, west yorkshire, yorkshire and the humber, United Kingdom Hybrid / WFH Options
Ocho
data engineering tools such as Airflow, Pandas, or Spark Exposure to serverless architectures using AWS Lambda Familiarity with monitoring and logging tools (e.g. CloudWatch, Prometheus) Previous experience working in regulated or high-availability environments Location & Flexibility: This role can be fully remote, with optional visits to a UK-based office More ❯
leeds, west yorkshire, yorkshire and the humber, United Kingdom Hybrid / WFH Options
WMtech
GenAI, LLMs, and multimodal systems Architecture: Microservices, RESTful APIs, async programming Infrastructure: Docker, Terraform, GitHub Actions, GCP (preferred) Datastores: MongoDB, Redis Monitoring/Tooling: Prometheus, Grafana, Sentry The role is remote with occasional travel Ready to lead and build with purpose? If you're excited by the idea of applying More ❯
leeds, west yorkshire, yorkshire and the humber, United Kingdom Hybrid / WFH Options
Nscale
HPC container runtimes (e.g., Singularity, Apptainer). Exposure to provisioning and automation tools (e.g., Ansible, PXE, Terraform). Experience with monitoring tools such as Prometheus, Grafana, and DCGM. Understanding of GPU/accelerator toolchains like CUDA or ROCm. A proactive, customer-first mindset with strong communication skills. Ability to work More ❯
leeds, west yorkshire, yorkshire and the humber, United Kingdom Hybrid / WFH Options
Few&Far
and observability tools Bonus Points For Contributions to open-source projects Contributions to an AI product ⚙️ Tech Stack: Golang, GCP, microservices, Kubernetes, Kafka, MongoDB, Prometheus If scalability, security, databases and performance is your thing, looking for high ownership and impact - this role is for you! Please apply with an up More ❯
leeds, west yorkshire, yorkshire and the humber, United Kingdom Hybrid / WFH Options
KPMG UK
GCP) Knowledge of Database systems and models. Ability to use wide variety of open-source technologies. Experience with logging/monitoring tools (DataDog, StackDriver, Prometheus etc), Knowledge of test automation frameworks. To discuss this or wider Technology roles with our recruitment team, all you need to do is apply, create More ❯
facing applications Strong analytical and troubleshooting skills; confident in interpreting logs and audit trails Hands-on experience with monitoring tools such as Grafana, ELK, Prometheus, or similar Experience configuring alerting systems and remediating issues using automation or scripts Strong understanding of incident and problem management processes in production environments Familiarity More ❯
leeds, west yorkshire, yorkshire and the humber, United Kingdom
Advanced Resource Managers
data flows and integration processes. Familiarity with containerization and orchestration tools such as Docker and Kubernetes. Experience with monitoring and alerting tools such as Prometheus, Grafana, or ELK for data infrastructure Knowledge of security practices for handling sensitive data, including encryption, anonymization, and access control. Familiarity with data governance, data More ❯
leeds, west yorkshire, yorkshire and the humber, United Kingdom
SmartSearch
office attendance. VARIED DAY TO DAY RESPONSIBILITIES Ensuring system reliability, performance, and scalability through monitoring and automation Building and maintaining observability solutions using Grafana, Prometheus, Loki, OpenTelemetry Proactively identifying and resolving performance bottlenecks and infrastructure issues Automating infrastructure provisioning, configuration management, and deployments Implementing effective logging, monitoring, and alerting strategies … service-level objectives (SLOs) Experience designing and implementing robust observability, monitoring and logging solutions Strong proficiency with observability and monitoring tools such as Grafana, Prometheus, and Loki Strong experience with distributed tracing and telemetry tools such as OpenTelemetry An understanding of cloud networking architecture and load balancing techniques Experience with More ❯
Leeds, England, United Kingdom Hybrid / WFH Options
Fruition Group
supporting scalable platform infrastructure with tools like Docker, Kubernetes, cloud platforms (AWS, Azure, or GCP background is welcome), Infrastructure as Code (Terraform, Pulumi, etc.), Prometheus, and Grafana . Key Skills & Responsibilities: Drive the technical vision and architectural direction within the team Design, implement, and maintain robust CI/CD pipelines … Lead on the use of Infrastructure as Code for environment provisioning and configuration Champion observability best practices using Prometheus and Grafana Collaborate across multiple internal teams and stakeholders Foster a culture of autonomy, innovation, and continuous improvement Lead by example with a hands-on approach and clear technical guidance Salary More ❯
leeds, west yorkshire, yorkshire and the humber, United Kingdom Hybrid / WFH Options
Harrington Starr
infrastructure. They’re looking to bring on a Site Reliability Engineer with deep experience in observability . If you’ve worked with tools like Prometheus in AWS , supported development teams with tracing and performance insights , and thrive in a high-scale, distributed environment - this could be a great next step. … What You’ll Be Doing: Managing and improving observability tools like Prometheus, Grafana, and CloudWatch Helping product teams with tracing and monitoring to improve performance and reliability Defining and improving SLIs/SLOs , automating tasks, and reducing operational noise Working with AWS (EKS, EC2, Lambda, RDS), Terraform, and CI/… CD tools What They’re Looking For: Experience in SRE or DevOps roles in a production environment Strong knowledge of observability tools , especially Prometheus in AWS Experience with tracing , metrics, and logs to support development teams Skills in Python or Go , and a good understanding of AWS and Kubernetes What More ❯
leeds, west yorkshire, yorkshire and the humber, United Kingdom Hybrid / WFH Options
Uniting Cloud
and Fargate). Driving SRE best practices: SLIs/SLOs, error budgets, reducing toil, and improving observability. Using (and hopefully enjoying!) tools like Datadog, Prometheus, Grafana, and Nix to support your work. What we’re looking for: Strong experience with AWS, Terraform, Docker, and container orchestration (ECS/Fargate). … Good understanding of CI/CD pipelines and DevOps workflows. Solid grasp of SRE principles – SLIs, SLOs, error budgets, observability, etc. Familiarity with Datadog, Prometheus, Grafana, or similar tools. Experience with Nix is a plus (or curiosity to learn it). Bonus if you’ve worked with Azure, GCP, or More ❯
leeds, west yorkshire, yorkshire and the humber, United Kingdom Hybrid / WFH Options
Stealth iT Consulting
and Service Level Objectives (SLOs), ensuring reliability and performance. An understanding of Microservices & container orchestration Strong Observability & Monitoring experience (preferably tools such as Dynatrace, Prometheus or OpenTelemetry) Experience delivering DevOps/SRE Best Practices and cost optimisation proposals Experience in Multi-Cloud, Security & Governance for Cloud Engineering and Operations would … be desired Key Responsibilities: Apply SRE principles effectively (experience with Dynatrace, Prometheus, and Open Telemetry is a bonus). Define and implement Service Level Indicators (SLIs) and Service Level Objectives (SLOs) in collaboration with development teams. Develop dashboards and configure alerts to monitor system health in real time. Enhance Kubernetes More ❯