London, England, United Kingdom Hybrid / WFH Options
Spectrum IT Recruitment
Out If You Have: Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration management platforms like Ansible, Puppet More ❯
Southampton, Hampshire, South East, United Kingdom Hybrid / WFH Options
Spectrum It Recruitment Limited
Out If You Have: Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration management platforms like Ansible, Puppet More ❯
Portsmouth, England, United Kingdom Hybrid / WFH Options
Spectrum IT Recruitment
Out If You Have: Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration management platforms like Ansible, Puppet More ❯
London, England, United Kingdom Hybrid / WFH Options
ZipRecruiter
Out If You Have: Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration management platforms like Ansible, Puppet More ❯
in automating data processing tasks. Experience with CI/CD tools (GitHub Actions, Jenkins, AWS CodePipeline), and integrating data-centric workflows. Familiarity with monitoring and logging tools (e.g., Prometheus, Loki, Grafana) in application and data-intensive environments. Proficiency in Configuration Management tools (Chef, Puppet, Ansible) and data orchestration tools (e.g., Airflow, Prefect). Strong background in containerization using Docker More ❯
Baltimore, Maryland, United States Hybrid / WFH Options
Archesys Inc
maintaining complex Grafana dashboards. Strong proficiency in at least one backend programming language (e.g., Python, Go, Java, Node.js). Extensive experience with various data sources for Grafana (e.g., Prometheus, Loki, Splunk, SQL databases, CloudWatch). Deep hands-on experience with AWS cloud services, including but not limited to EC2, ECS/EKS, Lambda, S3, RDS, CloudWatch, Kinesis, DynamoDB. Proven More ❯
Terraform, AWS CDK, or CloudFormation to automate cloud resource provisioning, enabling consistent and repeatable infrastructure deployments. Monitoring & Observability: Implement monitoring, logging, and alerting solutions using tools like Prometheus, Grafana, Loki, Datadog, or CloudWatch to ensure system health and performance. Security & Compliance: Implement security best practices for cloud infrastructure, including IAM policies, security groups, and VPC configurations, to ensure compliance … Experience with CI/CD tools such as Jenkins, GitLab CI, or AWS CodePipeline for automated deployment and testing. Familiarity with monitoring and logging tools such as Prometheus, Grafana, Loki, or Datadog. Strong understanding of cloud security best practices and IAM management. Excellent problem-solving and troubleshooting skills with the ability to resolve complex infrastructure and application issues. Strong More ❯
in migrating monolithic applications into microservices architectures. In-depth Linux/Unix experience, emphasizing system performance tuning and automation. Familiarity with monitoring, logging, and observability tools (e.g., Prometheus, Grafana, Loki, OTel, ELK stack) to ensure system reliability and performance. Experience in developing and working with backend applications technologies (e.g. Express, Django). Benefits we offer: 23 days’ holiday + More ❯
best practices across cloud and network environments. Troubleshoot deployment and performance issues across multiple environments. Set up and maintain observability tools for logging, monitoring, and alerting (e.g., Prometheus, Grafana, Loki). Contribute to internal tooling to streamline development, testing, and operations workflows. Stay current with DevOps trends and recommend improvements to tools and processes. Required Qualifications: Bachelor's degree … to multi-cloud or hybrid cloud architectures. Tech Stack: Cloud: AWS, OCI ZTN: Cloudflare Application: Kong (API Gateway), Java Spring Boot, Python, Go, TypeScript Monitoring: Prometheus Stack (Prometheus, Grafana, Loki) Compute: ECS, EC2, Lambda Frontend: S3, CloudFront Data: Glue, S3, PostgreSQL CI/CD: GitHub Actions IaC: Terraform, AWS SAM Why Join Us? At Intelmatix, you’ll work on More ❯
Ilkley office for occasional office attendance. VARIED DAY TO DAY RESPONSIBILITIES Ensuring system reliability, performance, and scalability through monitoring and automation Building and maintaining observability solutions using Grafana, Prometheus, Loki, OpenTelemetry Proactively identifying and resolving performance bottlenecks and infrastructure issues Automating infrastructure provisioning, configuration management, and deployments Implementing effective logging, monitoring, and alerting strategies Managing incident response and post … error budgets, and service-level objectives (SLOs) Experience designing and implementing robust observability, monitoring and logging solutions Strong proficiency with observability and monitoring tools such as Grafana, Prometheus, and Loki Strong experience with distributed tracing and telemetry tools such as OpenTelemetry An understanding of cloud networking architecture and load balancing techniques Experience with container orchestration platforms like Kubernetes Proficiency More ❯
on set targets will be expected. VARIED DAY TO DAY RESPONSIBILITIES Ensuring system reliability, performance, and scalability through monitoring and automation Building and maintaining observability solutions using Grafana, Prometheus, Loki, OpenTelemetry Proactively identifying and resolving performance bottlenecks and infrastructure issues Automating infrastructure provisioning, configuration management, and deployments Implementing effective logging, monitoring, and alerting strategies Managing incident response and post … error budgets, and service-level objectives (SLOs) Experience designing and implementing robust observability, monitoring and logging solutions Strong proficiency with observability and monitoring tools such as Grafana, Prometheus, and Loki Strong experience with distributed tracing and telemetry tools such as OpenTelemetry An understanding of cloud networking architecture and load balancing techniques Experience with container orchestration platforms like Kubernetes Proficiency More ❯
GCP) CI/CD pipelines (Azure DevOps, GoCD, Jenkins) Version control systems (Git, Bitbucket, GitHub) Configuration management (Chef, Ansible, Puppet) Container orchestration (Kubernetes, AKS) Logging and monitoring (NewRelic, Grafana, Loki, Splunk) Nice to have: Experience as a DevOps Engineer in the healthcare industry Object-oriented programming languages and design patterns Familiarity with HIPAA requirements and ePHI security considerations Knowledge More ❯
Experience in setting up and maintaining continuous integration and continuous deployment pipelines with tools such as GitHub Actions Experience setting up and administrating monitoring tools like with Prometheus, Grafana, Loki, etc. Production experience building and maintaining cloud infrastructure Strong understanding of Kubernetes, specifically AKS and Helm Strong understanding of terraform use cases and best practices Experience setting up production More ❯
a strong background and experience in the following: Observability and SRE Practices: In-depth understanding of observability and Site Reliability Engineering practices. Familiarity with tools in the LGTM stack (Loki, Grafana, Tempo, Mimir) or equivalent observability platforms. Containerisation: Strong experience building and managing containerised applications, effectively leveraging container orchestration platforms such as Kubernetes. Cloud Expertise: Demonstrable ability to architect More ❯
Operators). Cloud platforms (Azure, AWS). CI/CD pipelines (Azure DevOps, Bitbucket Pipelines, GitHub Actions). GitOps (e.g., ArgoCD, FluxCD). Monitoring, logging, and alerting (Prometheus, Grafana, Loki). Scripting and automation (Python, Bash, etc.). Exposure to the following would be beneficial: Multi-cloud or hybrid-cloud architectures. Security practices for cloud-native platforms. Cost optimisation More ❯
The Trading Infrastructure team is building a high-performance, front-to-back Trading Platform that supports multi-asset trading. The platform is designed to handle financial instruments with low-latency execution, robust risk controls, and seamless integration across trading, risk More ❯
working knowledge of public cloud patterns (AWS/EKS, Azure/AKS); container tools (Kubernetes, Docker); pipeline tools (Jenkins, Ansible, Terraform); ancillary (Gatekeeper, SonarQube, Hashicorp Vault); logging and monitoring (Loki, Prometheus, Grafana, Splunk, Dynatrace); scripting (Python, Bash), Go programming language. Corporate Security Responsibility All activities involving access to Mastercard assets, information, and networks comes with an inherent risk to More ❯
disaster recovery initiatives. Working knowledge of cloud-native storage solutions such as Longhorn. Strong Linux administration skills, particularly with RHEL environments. Experience implementing comprehensive observability solutions using Prometheus, Grafana, Loki, and related tools. Ability to establish and enforce security policies through tools like Open Policy Agent. Knowledge of identity management solutions such as Keycloak. Experience managing artifact repositories including More ❯
CD systems (Jenkins, Atlassian Bitbucket cloud, GitLab, Azure DevOps); Experience with GitOps tools (ArgoCD, Flux); Knowledge of the network stack; Experience with virtualization systems; Experience in using logging systems: Loki, ELK-stack; Understanding of basic software development processes; Knowledge and practice with applying DevSecOps methodologies; Knowledge of JSON, XML, Yaml formats; Experience with Git; Understanding of building tools for More ❯
London, England, United Kingdom Hybrid / WFH Options
Circadia Health
Experience orchestrating GPU/AI workloads , MLops, or large‐language‐model serving. Knowledge of edge/IoT deployments and over‐the‐air update strategies. Exposure to observability stacks (OpenTelemetry, Loki) and security tooling (Falco, Aqua, Wiz). What We Offer Base salary £100,000 – £170,000 plus meaningful equity. Gym membership Comprehensive health, dental & vision coverage (UK & global travel More ❯
Experience orchestrating GPU/AI workloads , MLops, or large‐language‐model serving. Knowledge of edge/IoT deployments and over‐the‐air update strategies. Exposure to observability stacks (OpenTelemetry, Loki) and security tooling (Falco, Aqua, Wiz). What We Offer Base salary £100,000 – £170,000 plus meaningful equity. Gym membership Comprehensive health, dental & vision coverage (UK & global travel More ❯
CI, etc.) Manage and scale infrastructure with DigitalOcean , with occasional use of AWS Write and enforce Infrastructure-as-Code (Terraform, CloudFormation) Monitor environments with tools like Grafana, Prometheus, and Loki Automate logging, backups, and environment provisioning Secure infrastructure with firewalls, secret management (Vault), and Cloudflare integrations Ensure compliance alignment (ISO 27001, GDPR, SOC2, HIPAA ) Streamline deployments for high-performance … Actions, GitLab CI, Jenkins) Proficiency with DigitalOcean (primary) and/or AWS (secondary) Strong experience with Docker, Kubernetes, and Linux environments Familiarity with monitoring/logging systems: Grafana, Prometheus, Loki Experience with Infrastructure-as-Code (Terraform, Ansible) Bonus: knowledge of Cloudflare , CDN setup, DNS and SSL/TLS Scripting skills: Bash, Python, or Go You work independently, think in More ❯