London, England, United Kingdom Hybrid / WFH Options
Spectrum IT Recruitment
Out If You Have: Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration management platforms like Ansible, Puppet More ❯
Out If You Have: Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration management platforms like Ansible, Puppet More ❯
Southampton, Hampshire, South East, United Kingdom Hybrid / WFH Options
Spectrum It Recruitment Limited
Out If You Have: Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration management platforms like Ansible, Puppet More ❯
London, England, United Kingdom Hybrid / WFH Options
ZipRecruiter
Out If You Have: Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration management platforms like Ansible, Puppet More ❯
in automating data processing tasks. Experience with CI/CD tools (GitHub Actions, Jenkins, AWS CodePipeline), and integrating data-centric workflows. Familiarity with monitoring and logging tools (e.g., Prometheus, Loki, Grafana) in application and data-intensive environments. Proficiency in Configuration Management tools (Chef, Puppet, Ansible) and data orchestration tools (e.g., Airflow, Prefect). Strong background in containerization using Docker More ❯
Terraform, AWS CDK, or CloudFormation to automate cloud resource provisioning, enabling consistent and repeatable infrastructure deployments. Monitoring & Observability: Implement monitoring, logging, and alerting solutions using tools like Prometheus, Grafana, Loki, Datadog, or CloudWatch to ensure system health and performance. Security & Compliance: Implement security best practices for cloud infrastructure, including IAM policies, security groups, and VPC configurations, to ensure compliance … Experience with CI/CD tools such as Jenkins, GitLab CI, or AWS CodePipeline for automated deployment and testing. Familiarity with monitoring and logging tools such as Prometheus, Grafana, Loki, or Datadog. Strong understanding of cloud security best practices and IAM management. Excellent problem-solving and troubleshooting skills with the ability to resolve complex infrastructure and application issues. Strong More ❯
in migrating monolithic applications into microservices architectures. In-depth Linux/Unix experience, emphasizing system performance tuning and automation. Familiarity with monitoring, logging, and observability tools (e.g., Prometheus, Grafana, Loki, OTel, ELK stack) to ensure system reliability and performance. Experience in developing and working with backend applications technologies (e.g. Express, Django). Benefits we offer: 23 days’ holiday + More ❯
best practices across cloud and network environments. Troubleshoot deployment and performance issues across multiple environments. Set up and maintain observability tools for logging, monitoring, and alerting (e.g., Prometheus, Grafana, Loki). Contribute to internal tooling to streamline development, testing, and operations workflows. Stay current with DevOps trends and recommend improvements to tools and processes. Required Qualifications: Bachelor's degree … to multi-cloud or hybrid cloud architectures. Tech Stack: Cloud: AWS, OCI ZTN: Cloudflare Application: Kong (API Gateway), Java Spring Boot, Python, Go, TypeScript Monitoring: Prometheus Stack (Prometheus, Grafana, Loki) Compute: ECS, EC2, Lambda Frontend: S3, CloudFront Data: Glue, S3, PostgreSQL CI/CD: GitHub Actions IaC: Terraform, AWS SAM Why Join Us? At Intelmatix, you’ll work on More ❯
Ilkley office for occasional office attendance. VARIED DAY TO DAY RESPONSIBILITIES Ensuring system reliability, performance, and scalability through monitoring and automation Building and maintaining observability solutions using Grafana, Prometheus, Loki, OpenTelemetry Proactively identifying and resolving performance bottlenecks and infrastructure issues Automating infrastructure provisioning, configuration management, and deployments Implementing effective logging, monitoring, and alerting strategies Managing incident response and post … error budgets, and service-level objectives (SLOs) Experience designing and implementing robust observability, monitoring and logging solutions Strong proficiency with observability and monitoring tools such as Grafana, Prometheus, and Loki Strong experience with distributed tracing and telemetry tools such as OpenTelemetry An understanding of cloud networking architecture and load balancing techniques Experience with container orchestration platforms like Kubernetes Proficiency More ❯
on set targets will be expected. VARIED DAY TO DAY RESPONSIBILITIES Ensuring system reliability, performance, and scalability through monitoring and automation Building and maintaining observability solutions using Grafana, Prometheus, Loki, OpenTelemetry Proactively identifying and resolving performance bottlenecks and infrastructure issues Automating infrastructure provisioning, configuration management, and deployments Implementing effective logging, monitoring, and alerting strategies Managing incident response and post … error budgets, and service-level objectives (SLOs) Experience designing and implementing robust observability, monitoring and logging solutions Strong proficiency with observability and monitoring tools such as Grafana, Prometheus, and Loki Strong experience with distributed tracing and telemetry tools such as OpenTelemetry An understanding of cloud networking architecture and load balancing techniques Experience with container orchestration platforms like Kubernetes Proficiency More ❯
a strong background and experience in the following: Observability and SRE Practices: In-depth understanding of observability and Site Reliability Engineering practices. Familiarity with tools in the LGTM stack (Loki, Grafana, Tempo, Mimir) or equivalent observability platforms. Containerisation: Strong experience building and managing containerised applications, effectively leveraging container orchestration platforms such as Kubernetes. Cloud Expertise: Demonstrable ability to architect More ❯
Operators). Cloud platforms (Azure, AWS). CI/CD pipelines (Azure DevOps, Bitbucket Pipelines, GitHub Actions). GitOps (e.g., ArgoCD, FluxCD). Monitoring, logging, and alerting (Prometheus, Grafana, Loki). Scripting and automation (Python, Bash, etc.). Exposure to the following would be beneficial: Multi-cloud or hybrid-cloud architectures. Security practices for cloud-native platforms. Cost optimisation More ❯
The Trading Infrastructure team is building a high-performance, front-to-back Trading Platform that supports multi-asset trading. The platform is designed to handle financial instruments with low-latency execution, robust risk controls, and seamless integration across trading, risk More ❯
working knowledge of public cloud patterns (AWS/EKS, Azure/AKS); container tools (Kubernetes, Docker); pipeline tools (Jenkins, Ansible, Terraform); ancillary (Gatekeeper, SonarQube, Hashicorp Vault); logging and monitoring (Loki, Prometheus, Grafana, Splunk, Dynatrace); scripting (Python, Bash), Go programming language. Corporate Security Responsibility All activities involving access to Mastercard assets, information, and networks comes with an inherent risk to More ❯
Experience orchestrating GPU/AI workloads , MLops, or large‐language‐model serving. Knowledge of edge/IoT deployments and over‐the‐air update strategies. Exposure to observability stacks (OpenTelemetry, Loki) and security tooling (Falco, Aqua, Wiz). What We Offer Base salary £100,000 – £170,000 plus meaningful equity. Gym membership Comprehensive health, dental & vision coverage (UK & global travel More ❯
London, England, United Kingdom Hybrid / WFH Options
Circadia Health
Experience orchestrating GPU/AI workloads , MLops, or large‐language‐model serving. Knowledge of edge/IoT deployments and over‐the‐air update strategies. Exposure to observability stacks (OpenTelemetry, Loki) and security tooling (Falco, Aqua, Wiz). What We Offer Base salary £100,000 – £170,000 plus meaningful equity. Gym membership Comprehensive health, dental & vision coverage (UK & global travel More ❯
We are seeking a skilled and proactive Kubernetes Platform Engineer to provide configuration and maintenance support for Kubernetes clusters for our high-availability environments for the EMEA region. Working closely with development, operations, infrastructure and security teams, you will help More ❯
and build a new cloud-native IaC platform. Develop software using technologies such as Docker Compose, Terraform, Kubernetes (K8s), Python, and Go. Provision and orchestrate open-source services including Loki, Redis, Grafana, Authentik, Netbird, among others. Design and implement CI/CD pipelines to streamline deployment processes. Initially focus on AWS environments, with the goal of creating a solution More ❯
London, England, United Kingdom Hybrid / WFH Options
Bright Purple
and build a new cloud-native IaC platform. Develop software using technologies such as Docker Compose, Terraform, Kubernetes (K8s), Python, and Go. Provision and orchestrate open-source services including Loki, Redis, Grafana, Authentik, Netbird, among others. Design and implement CI/CD pipelines to streamline deployment processes. Initially focus on AWS environments, with the goal of creating a solution More ❯
code tools (e.g., Terraform, Helm, Bash, Python). • Solid understanding of microservices, zero-trust security, mTLS, RBAC, and network policies. • Experience with CI/CD tools, logging (e.g., Fluentd, Loki), and monitoring (e.g., Prometheus, Grafana). About us Ascendion is a Global, leading provider of AI-first software engineering services, delivering transformative solutions across North America, APAC, and Europe. More ❯
code tools (e.g., Terraform, Helm, Bash, Python). • Solid understanding of microservices, zero-trust security, mTLS, RBAC, and network policies. • Experience with CI/CD tools, logging (e.g., Fluentd, Loki), and monitoring (e.g., Prometheus, Grafana). About us Ascendion is a Global, leading provider of AI-first software engineering services, delivering transformative solutions across North America, APAC, and Europe. More ❯
code tools (e.g., Terraform, Helm, Bash, Python). Solid understanding of microservices, zero-trust security, mTLS, RBAC, and network policies. Experience with CI/CD tools, logging (e.g., Fluentd, Loki), and monitoring (e.g., Prometheus, Grafana). About us Ascendion is a global, leading provider of AI-first software engineering services, delivering transformative solutions across North America, APAC, and Europe. More ❯
Demonstrated expertise in the process of containerization for applications and their subsequent orchestration within Kubernetes environments. Experience working on at least one monitoring/observability stack (Datadog, ELK, Splunk, Loki, Grafana). Strong knowledge of Unix or Linux Strong communication skills to collaborate with various stakeholders Able to work independently in a fast-paced environment Detail oriented, organized, demonstrating More ❯