London, South East, England, United Kingdom Hybrid / WFH Options
Become
Azure, or GCP) and containerisation (e.g., Docker, Kubernetes) Experience with Infrastructure as Code tools (e.g., Terraform, Ansible, CloudFormation) Familiarity with monitoring, logging, and observability tools (e.g., Prometheus, Grafana, ELK, Datadog) Experience working in regulated environments such as banking, fintech, or insurance Prior experience working in or contributing to a Centre of Excellence team Strong scripting skills (e.g., Bash, Python) and More ❯
and other relevant tools. Security Best Practices: IAM, MFA, data encryption, firewall configurations. Programming/Scripting: Python, Terraform, or similar languages. Event-Driven Architectures: Kafka. Monitoring and Logging: Datadog, ELK Stack, Prometheus, etc. Experience in agile methodologies and DevOps practices. Location: Hybrid. Office located in London. (Hayes area). Office presence required: Yes. Frequency: 2-3 times a week at More ❯
Proficiency in scripting and automation using Python, Bash, or Go. Experience with Infrastructure as Code (Terraform, CloudFormation, or Ansible). Familiarity with monitoring, logging, and observability tools (Prometheus, Grafana, Datadog, ELK, etc.). Strong understanding of networking concepts (VPC, Load Balancers, DNS, Firewalls). Experience with DevOps methodologies, CI/CD pipelines, and GitOps practices. Experience with high-performance and More ❯
needed About You 5+ years' experience in Site Reliability Engineer roles Expert+ level Linux administration, scripting, and troubleshooting Demonstrable knowledge of Observability tools (Prometheus/Grafana, New Relic, Splunk, DataDog) Comprehensive experience with AWS (Amazon Web Services) and its core capabilities (VPC, EC2, ECS, Route53, Fargate, ALB/NLB distributions, etc) Extensive experience with cloud automation and infrastructure-as-code More ❯
/CD tools such as GitlabCI, CircleCI, Github Actions, and GitOps using ArgoCD, FluxCD Troubleshooting and debugging applications using Observability tooling across microservices and serverless applications such as Splunk, DataDog Managing ephemeral secrets and credentials using Hashicorp Vault Managing least privileged access to cloud resources using TPAM solutions such as Hashicorp Boundary Bonus Points for experience with: Production experience architecting More ❯
roles 3+ years' experience with an object-oriented language (preferably Java, .NET or C++) Expert+ level Linux administration, scripting, and troubleshooting Demonstratable knowledge of Observability tools (New Relic, Splunk, DataDog) Comprehensive experience with AWS (Amazon Web Services) and its core capabilities (VPC, EC2, ECS, Route53, Fargate, ALB/NLB distributions, etc) Extensive experience with cloud automation and infrastructure-as-code More ❯
roles 3+ years' experience with an object-oriented language (preferably Java, .NET or C++) Expert+ level Linux administration, scripting, and troubleshooting Demonstrable knowledge of Observability tools (New Relic, Splunk, DataDog) Comprehensive experience with AWS (Amazon Web Services) and its core capabilities (VPC, EC2, ECS, Route53, Fargate, ALB/NLB distributions, etc) Extensive experience with cloud automation and infrastructure-as-code More ❯
roads to help teams get their apps up and running quickly in a consistent manner Event-Driven: We share data through an event-driven system powered by MSK Observability: Datadog is used for comprehensive logging and monitoring Databases: We use a combination of MongoDB and AWS Relational Databases Automation and CICD: Deployments are highly automated using Jenkins pipelines and Github More ❯
of the React Framework, relative patterns and best practices. Good understanding of UI/UX best practices and considerations. Understanding of front-end observability with tools like Sentry, LogRocket, Datadog, or New Relic. Experience with CI/CD pipelines, like Github Actions, ArgoCD. Awareness of common front-end security risks (e.g., XSS, CSRF). Passion for writing clean, modular, scalable More ❯
stack; Experience with AWS Cloud services; Experience with Bash or Python scripting; Experience with CI/CD tools such as Gitlab CI; Familiar with application performance monitoring tools like Datadog, New Relic; Familiar with Docker orchestrators such as Amazon ECS or Kubernetes; Familiar with Git; Ability to solve issues with clear methods while knowing when to take intuitive leaps. Nice More ❯
Kubernetes layers. Orchestrate post-incident reviews: document findings, define mitigation plans, and drive tickets to resolution. Reliability Engineering & Automation Develop and maintain robust observability for front-end components: integrate Datadog for observability. Define SLIs/SLOs for page load times, Time to Interactive, and error rates; build alerting that balances sensitivity with noise reduction. Automate deployments via CI/CD More ❯
stage environments preferred. Nice to Have: Experience scaling engineering orgs across multiple geographies or domains (e.g., front-end, back-end, infrastructure). Familiarity with tools like Linear, Asana, GitHub, Datadog, DORA metrics, or similar performance/observability platforms. Background in organisational change management or engineering program management. What you can expect from us Competitive salary with substantial incentive schemes Generous More ❯
stage environments preferred. Nice to Have: Experience scaling engineering orgs across multiple geographies or domains (e.g., front-end, back-end, infrastructure). Familiarity with tools like Linear, Asana, GitHub, Datadog, DORA metrics, or similar performance/observability platforms. Background in organisational change management or engineering program management. What you can expect from us Competitive salary with substantial incentive schemes Generous More ❯
level production incidents The Person: 5+ years in SRE, DevOps, or infrastructure engineering Strong experience with AWS, EKS/Kubernetes, and Terraform Familiar with Kafka and observability tools like Datadog or Grafana Able to troubleshoot issues across infrastructure and application layers Reference number: BBBH259300 To apply for this role or for to be considered for further roles, please click "Apply More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Rise Technical Recruitment
level production incidents The Person: *5+ years in SRE, DevOps, or infrastructure engineering *Strong experience with AWS, EKS/Kubernetes, and Terraform *Familiar with Kafka and observability tools like Datadog or Grafana *Able to troubleshoot issues across infrastructure and application layers Reference number: BBBH(phone number removed) To apply for this role or for to be considered for further roles More ❯
Employment Type: Permanent
Salary: £80000 - £90000/annum 38 Days Holiday, Healthcare, Pension
London, South East, England, United Kingdom Hybrid / WFH Options
Rise Technical Recruitment Limited
level production incidents The Person: *5+ years in SRE, DevOps, or infrastructure engineering*Strong experience with AWS, EKS/Kubernetes, and Terraform*Familiar with Kafka and observability tools like Datadog or Grafana*Able to troubleshoot issues across infrastructure and application layers Reference number: BBBH259300 To apply for this role or for to be considered for further roles, please click "Apply More ❯
North West London, London, United Kingdom Hybrid / WFH Options
ByteHire
of infrastructure setup and management Exposure to designing or building distributed systems, preferably in a cloud environment Company Tech Stack PHP, Laravel, ReactJS, TypeScript, Inertia, WordPress MySQL, Redis, ElasticSearch, DataDog, AWS, Terraform, Docker Benefits Hybrid working 1-2 days per week in the London office. Collaborate directly with the founding team and take ownership of product features. Be part of More ❯
and technical perspectives. Experience mentoring engineers through pairing, code reviews, and knowledge-sharing. Familiarity with CI/CD pipelines, automated testing strategies, and observability tools (e.g., GitHub Actions, Sentry, Datadog). A mindset geared toward experimentation, measurement, and continuous improvement, especially within growth-driven product teams. Nice to Have Previous experience working in a start-up/scale-up environment. More ❯
up to browser extensions and web applications. Develop software to analyse and interpret cryptocurrency usage behaviours and trends on the clear and dark web Implement observability mechanisms (we use DataDog) to detect problems in your environment(s), and run the associated business processes to resolve Work with the existing engineers on your team to foster their growth and development, and More ❯
management skills, with the ability to lead through influence. Experience in scaling teams across different domains or geographies is a strong plus. Familiarity with tools such as GitHub, Asana, Datadog, Linear, and DORA metrics is desirable. A background in organizational change or transformation initiatives is an advantage. Competitive salary with substantial performance-based incentives. Generous Long-Term Incentive Plan (LTIP More ❯
stack: Typescript, React, GraphQL, Postgres, React-Native, Terraform, AWS, and Github Actions. Apply advanced knowledge of algorithms, data structures, and design patterns. Utilise expert debugging skills with tools like Datadog, ensuring robust error handling. Collaborate and communicate: Foster clear, effective communication within the engineering team and across the business. Actively engage in discussions, provide technical and pastoral support, and drive More ❯
City Of Westminster, London, United Kingdom Hybrid / WFH Options
Track24 Limited
InfoSec team to maintain security best practices. Containerisation & Orchestration: Deploy and manage containerised applications using Docker and other orchestration tools. Observability & Monitoring: Provision and maintain observability platforms such as DataDog, Splunk, or New Relic to gain monitoring and performance insights. Incident Management: Establish and oversee monitoring and incident management processes to ensure system reliability. Site Reliability Engineering (SRE): Perform SRE More ❯
Central London, London, United Kingdom Hybrid / WFH Options
Eligo Recruitment Limited
Bring Strong experience with GCP , Terraform , and Infrastructure-as-Code Deep knowledge of cloud networking, security automation, and compliance standards Proficiency in CI/CD pipelines , monitoring tools (Grafana, Datadog), and scripting A collaborative mindset with excellent communication and mentoring skills Why Join? Shape a next-gen AI infrastructure with autonomy and purpose Hybrid working with regular meetups in our More ❯