automation, infrastructure provisioning and tooling to enhance development efficiency. You will manage Platform Reliability and Infrastructure ensuring a reliable and stable platform. You will oversee YouLend's Security and Observability frameworks, focusing on platform security, maintaining observability, and providing dashboards for developers to monitor service health. The ideal candidate is someone who has successfully built and scaled platform architectures, led … the ability to work across technical and non-technical teams. Excellent communication skills, with the ability to translate complex technical concepts to business stakeholders. Operational Focus: Expertise in platform observability, monitoring, incident management, and creating highly reliable systems. Experience implementing SLAs, SLOs, and SLIs is a plus. Security & Compliance: In-depth understanding of platform security, data privacy, and regulatory compliance More ❯
Stoxx's GCP platform infrastructure Ensure the platform's scalability, reliability, and efficiency meets business and client requirements Develop, build and support a robust CI/CD pipeline and observability stack Be the go-to person for the most critical Platform issues, leading cross-functional teams where necessary, to deliver best-in-class engineering solutions. Drive continuous improvement initiatives to … Experience working in a global or multinational team setting Strong documentation, communication and collaboration skills Proven ability to drive innovation and continuous improvement initiatives Focus on simplicity, automation and observability Expertise in Python, GitHub Actions, Apigee, Airflow Expertise in Observability tooling such as Prometheus/Grafana, ELK, Splunk or similar Bachelor's or Master's degree in Computer Science or More ❯
You’ll Be Responsible For As a Senior SRE, you’ll lead initiatives that: Ensure availability, latency, and performance of mission-critical systems across cloud and hybrid environments. Architect observability solutions (monitoring, logging, alerting) that detect and prevent failures before they impact users. Own and improve incident response workflows, including runbooks, communications, and root cause analysis. Define and enforce SLIs … using tools such as Azure DevOps, GitHub Actions, Jenkins, or GitLab. Lead the design and delivery of resilient, scalable infrastructure using IaC (Terraform, Bicep, etc.). Develop automation and observability tooling that enables fast feedback loops and minimal manual intervention. Strategic & Advisory Define infrastructure architecture to support fault-tolerant applications. Collaborate with developers, architects, and product teams to embed reliability More ❯
London, England, United Kingdom Hybrid / WFH Options
Ikerian
scalable AWS cloud environments and services. Manage and prioritise tasks in the cloud infrastructure backlog to address immediate needs and plan long-term improvements. Set up infrastructure monitoring and observability solutions, proactively addressing availability, performance or security issues. Assess new technologies, systems, and services for production readiness, ensuring seamless and stable integration. Prepare and maintain documentation on cloud processes, procedures … CI/CD pipelines and tools, including GitLab (preferred), GitHub Actions, Jenkins, etc. Basic understanding of cloud networking concepts, including VPC, Subnets, and Load Balancing. Familiarity with monitoring and observability tools for cloud environments, such as Grafana, Prometheus, OpenSearch, and the ELK stack. Strong analytical and problem-solving skills, with a proactive approach to challenges. A genuine interest in staying More ❯
London, England, United Kingdom Hybrid / WFH Options
CFP Energy
and enhance CI/CD pipelines, infrastructure/app templates, and automation workflows. Explore and integrate emerging technologies to evolve our platform offerings and support developer needs. Fine-tune observability tools to resolve issues quickly and deliver actionable alerts to the right people. Infrastructure as Code (IaC): Proven experience with cloud infrastructure automation (Terraform and Azure preferred). Kubernetes: Proficiency … GitOps workflows and Helm charts. Security: Hands-on experience with token/secret management tools (e.g., HashiCorp Vault, Azure Key Vault) and SSO/authentication systems (e.g., Okta). Observability: Hands-on experience with platforms like DataDog, Grafana, or Azure Monitor. Networking: Strong understanding of networking principles, DNS, and related technologies. CI/CD: Skilled in creating and maintaining CI More ❯
Architect for Scale & Resilience: Make critical decisions on system design and performance to support a growing platform with increasing complexity and scale. Elevate Operational Maturity: Lead improvements to monitoring, observability, and developer workflows - ensuring backend systems are resilient and teams can ship confidently. Embed Security by Design: Take responsibility for backend security posture, ensuring systems meet best practices and compliance … and SQS. Infrastructure as Code: Experience with Terraform or similar tools for infrastructure automation. High-Throughput Systems: Strong experience in real production projects handling large-scale data flows. Monitoring & Observability: Proficiency in tools like Datadog, Prometheus, and Grafana. Security & Networking: Solid understanding of networking principles, security best practices, and cloud security. Agile & Fast-Paced Environments: Experience in agile teams, working More ❯
with cross-functional teams for requirements Review code to maintain quality and provide constructive feedback Manage CI/CD pipelines for automated deployments and reliability Monitor system health with observability tools and address issues proactively Engage with stakeholders for alignment on project goals and updates Research new technologies to improve the Snowplow ecosystem We’d Love to Hear From You … data processing pipelines Experience with Kubernetes, particularly in the context of data processing workflows Knowledge of Snowplow products and services Experience with data analytics platforms and tools Expertise with observability tools like Grafana and Sentry What We Offer You in Return: A competitive package, including share options Flexible working A generous holiday allowance no matter where you are in the More ❯
London, England, United Kingdom Hybrid / WFH Options
Elwood Technologies
environment. Automate manual processes and workflows, reducing operational overhead. Work closely with engineering teams to design and deploy scalable, fault-tolerant infrastructure solutions on AWS or GCP . Improve observability by utilizing monitoring, logging, and alerting systems (e.g., CloudWatch , Datadog ). Lead post-incident reviews , contribute to the continuous improvement of system reliability and follow up on strategic fixes. Develop … you have experience of some or all of the following: Experience with client-impact triage , working cross-functionally with account managers or product teams. Proficiency with Datadog or similar observability platforms. Knowledge of serverless architectures (e.g., AWS Lambda, GCP Cloud Functions). Familiarity with RDBMS and NoSQL databases , such as RDS, CloudSQL, DynamoDB. Prior experience in fintech , trading platforms, or More ❯
and advocating for the best solutions that improve developer productivity and system efficiency. Infrastructure Automation & Management: Use Terraform/OpenTofu and automation frameworks to provision and manage infrastructure. Monitoring & Observability: Configure and utilise observability tools like Datadog for performance monitoring, alerting, and visualisation, ensuring system reliability and quick identification of issues. Performance Optimisation: Continuously monitor the performance of the tools More ❯
London, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
and will help clients adopt modern DevOps practices with a strong emphasis on automation, self-service, and operational excellence. Tech You'll Use: Terraform & GitHub Actions CI/CD, observability tooling (Grafana, Prometheus), containerisation (Docker) What You'll Be Doing: Designing and implementing secure, resilient AWS infrastructure Building CI/CD pipelines and reusable deployment patterns Advising on cloud-native More ❯
For: 3+ years’ hands-on experience with Solace PubSub+ in a production environment Strong knowledge of WAN-based distributed systems and networking fundamentals Experience with Prometheus and Grafana for observability and alerting Confident in Linux/Unix systems and scripting (Bash, Python, etc.) Excellent problem-solving instincts and attention to detail Strong communicator who works well across technical teams Bonus More ❯
For: 3+ years’ hands-on experience with Solace PubSub+ in a production environment Strong knowledge of WAN-based distributed systems and networking fundamentals Experience with Prometheus and Grafana for observability and alerting Confident in Linux/Unix systems and scripting (Bash, Python, etc.) Excellent problem-solving instincts and attention to detail Strong communicator who works well across technical teams Bonus More ❯
For: 3+ years hands-on experience with Solace PubSub+ in a production environment Strong knowledge of WAN-based distributed systems and networking fundamentals Experience with Prometheus and Grafana for observability and alerting Confident in Linux/Unix systems and scripting (Bash, Python, etc.) Excellent problem-solving instincts and attention to detail Strong communicator who works well across technical teams Bonus More ❯
governance compliance. Utilize AWS, containerization (e.g., Docker), and Infrastructure as Code tools like Terraform and Ansible for performance and cost optimization. Implement best practices in DevOps and DevSecOps, including observability, security, networking, API integration, and disaster recovery. Mentor junior engineers and contribute to technical leadership, preferably with experience in broadcast workflows, audio/video streaming, and Agile methodologies. Key Requirements More ❯
CloudFormation or ARM templates Scripting & Automation - Proficient in PowerShell, Bash, or Python Infrastructure as Code (IaC) - Hands-on experience with Terraform, Bicep, or ARM Certified: Terraform Associate preferred Monitoring & Observability - Familiarity with tools like Azure Monitor, AWS CloudWatch, Prometheus, Grafana Security & Compliance - Strong understanding of IAM, cloud security, compliance frameworks For immediate consideration apply now! TPBN1_UKTJ More ❯
Burton-On-Trent, Staffordshire, West Midlands, United Kingdom
Amtis Professional Ltd
CloudFormation or ARM templates Scripting & Automation - Proficient in PowerShell, Bash, or Python Infrastructure as Code (IaC) - Hands-on experience with Terraform, Bicep, or ARM Certified: Terraform Associate preferred Monitoring & Observability - Familiarity with tools like Azure Monitor, AWS CloudWatch, Prometheus, Grafana Security & Compliance - Strong understanding of IAM, cloud security, compliance frameworks For immediate consideration apply now More ❯
Infrastructure as Code and automation (e.g., CloudFormation, Terraform, Ansible, Python, Bash) 3) DevOps pipelines, CI/CD tooling, and containerization (e.g., GitLab, Jenkins, Docker, Kubernetes) 4) Monitoring and observability in production environments (e.g., CloudWatch, Splunk, Prometheus) 5) Security, cost optimization, and disaster recovery in cloud environments Ideal Experience: 1) Experience in managing live production workloads in AWS 5) Experience deploying More ❯
and postmortems to learn from system failures and prevent recurrence. Participate in on-call rotations and respond to incidents, minimising downtime and customer impact. Continuously improve deployment, configuration, and observability processes. Qualifications: Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience. Strong experience with Linux/Unix systems administration. Proficient in scripting and programming languages More ❯
Edinburgh, Scotland, United Kingdom Hybrid / WFH Options
JR United Kingdom
ideally with Terraform or CloudFormation. Hands-on experience with CI/CD pipelines and automation tooling. Background in containerisation and orchestration – e.g., Docker, Kubernetes. Familiarity with monitoring, alerting, and observability tools (e.g., Prometheus, Grafana, CloudWatch). Proven ability to troubleshoot and resolve complex infrastructure issues. Experience working in cross-functional engineering teams, ideally in a DevOps or SRE capacity. Strong More ❯
Cambourne, Cambridgeshire, United Kingdom Hybrid / WFH Options
Remotestar
strong track record of building and maintaining highly reliable infrastructure and services. Expertise in incident management, including incident response, resolution, and post-mortem analysis. Proficiency in monitoring, alerting, and observability tools such as Prometheus, Grafana, ELK stack or Datadog. Experience with cloud platforms such as AWS, Azure, or GCP, including infrastructure as code tools like Terraform or CloudFormation. Strong scripting More ❯
London, England, United Kingdom Hybrid / WFH Options
Arcus Search
and will help clients adopt modern DevOps practices with a strong emphasis on automation, self-service, and operational excellence. Tech You'll Use: Terraform & GitHub Actions CI/CD, observability tooling (Grafana, Prometheus), containerisation (Docker) What You'll Be Doing: Designing and implementing secure, resilient AWS infrastructure Building CI/CD pipelines and reusable deployment patterns Advising on cloud-native More ❯
London, England, United Kingdom Hybrid / WFH Options
Anson McCade Pty
to automate provisioning. • Deploy and manage Kubernetes solutions, including AKS, EKS, and OpenShift. • Implement DevSecOps practices, integrating CI/CD pipelines and security controls. • Optimize cloud environments using FinOps, observability tooling, and SRE methodologies. • Work closely with Cloud Architects, Engineers, and Business Leaders to build scalable, high-performance platforms. • Enhance networking and security capabilities across hybrid cloud environments. The ideal More ❯
some experience in a mentorship or managerial position. Strong knowledge of cloud platforms (AWS, GCP, Azure) and modern infrastructure technologies (Kubernetes, Docker, Terraform). Expertise in monitoring, logging, and observability tools (e.g., Prometheus, Grafana, Datadog, Splunk). Proficiency in at least one programming or scripting language (e.g., Python, Go, Bash). Deep understanding of networking, databases, and distributed systems. Strong More ❯