DevOps & SRE Practices Experience implementing CI/CD pipelines and DevOps methodologies Knowledge of infrastructure monitoring (Datadog), log aggregation, and incident management Understanding of SLO/SLA definition and observability best practices Strategic & Business Acumen Ability to align technical initiatives with business objectives and articulate ROI Experience creating technical roadmaps and conducting cost-benefit analyses Track record presenting to C More ❯
using tools such as Terraform or CloudFormation. * Implement CI/CD pipelines, enabling continuous integration and continuous deployment for mission-critical applications. * Monitor system performance, availability, and security, implementing observability best practices. * Work in an Agile environment, engaging with stakeholders to understand requirements and deliver iterative improvements. Your skills and experience Essential: * Experience deploying and managing cloud infrastructure on AWS More ❯
City of London, London, United Kingdom Hybrid/Remote Options
Client Server
supporting gameplay, user management, platform and content management systems, collaborating with product and game teams to ensure alignment of features with backend architecture and with DevOps to ensure uptime, observability and deployment reliability. This is a senior role where you'll take ownership of complex systems and proactively address potential performance and scalability bottlenecks. Location/WFH: You can work More ❯
City of London, London, United Kingdom Hybrid/Remote Options
83zero Limited
of experimentation, curiosity, and bold thinking. The Role As Dev/Ops Lead, you'll: Architect and optimise a high-scale, cloud-native PaaS. Champion CI/CD, automation, observability, and reliability. Design and maintain secure, performant public APIs. Build real-time, distributed systems on AWS (Lambda, DynamoDB, Kinesis, ECS/EKS). Mentor a cross-functional team and elevate More ❯
City of London, London, United Kingdom Hybrid/Remote Options
Hargreaves Lansdown
our services. About You Experience building and deploying services with Java and Spring Boot. Comfort working in a cloud-native environment - Kubernetes (EKS), containers, scaling etc. An understanding of observability, using tools like Prometheus and Grafana to keep services healthy and understand usage patterns. Familiarity with some AWS services and how to integrate them into modern applications. A keen focus More ❯
treating pods as first-class citizens in a highly available, BGP-based leaf/spine fabric Lead systems administration on Ubuntu and RHEL systems, tuning OS configurations for performance, observability, and compliance Develop services like DNS-based discovery and egress gateways for smart routing of financial traffic Automate infrastructure using Terraform, Ansible, Git, and CI/CD tooling What we More ❯
london (city of london), south east england, united kingdom Hybrid/Remote Options
Gravitee
Helm Charts Cloud experience (AWS and/or Azure) Even better if you also have skills across: Certificate management (ZeroSSL, Let's Encrypt) Argo Workflows & ArgoCD Continuous Delivery tooling Observability tools (Grafana, Prometheus) ESSENTIAL SKILLS The right candidate will possess at least the following skills, if not more: 3+ years of professional experience in infrastructure management Fluent with creating and More ❯
City of London, Greater London, UK Hybrid/Remote Options
IG Group
shifts between hands-on coding—building tools, automation, and infrastructure—and incident response, performance optimisation, and operational excellence. What you'll do System Reliability & Performance Implement comprehensive monitoring and observability using OpenTelemetry standards Identify single points of failure in distributed systems Analyse system performance across OS and network layers, identifying resource utilisation patterns and bottlenecks to optimise efficiency Define and More ❯
City of London, London, England, United Kingdom Hybrid/Remote Options
Lorien
and implement robust testing (unit/integration/contract). Collaborate closely with trading, risk, and operations to refine requirements and ship increments in Agile sprints. Harden production with observability (logging/metrics/tracing), CI/CD, and secure-by-design patterns. Own features end-to-end-from design and documentation to deployment and support. What you'll bring More ❯
with Product, Data Science, and Operations teams Mentor developers, promote best practices, and improve engineering workflows Shape technical strategy and contribute to long-term system improvements Drive code quality, observability, and resiliency across services Tech Stack Frontend : React, JavaScript/TypeScript Backend : Python (FastAPI, Flask, or Django), ideally with geospatial data processing Cloud : AWS (Lambda, ECS, RDS, S3, API Gateway More ❯
City, London, United Kingdom Hybrid/Remote Options
Sky
a cross-functional agile team, using source control with appropriate branching and commit strategies. Experience setting up, configuring and maintaining all parts of the development process, e.g. tracing, monitoring, observability, automated CI/CD pipelines, release tooling, APIs and backend integrations. Experience refactoring complex codebases, reducing technical debt and improving code quality over time. Experience taking ownership of complex projects More ❯
other internal teams to fully understand client requirements and deliver tailored technical solutions. Design and implement scalable, future-proof architectures for new third-party connectors and integrations. Enhance system observability by improving diagnostics, logging, and tracing to aid technical support teams in resolving issues swiftly. Oversee the ongoing development and management of the public API, covering REST and event streaming More ❯
DevOps, infrastructure, and platform engineering. Tech Stack Cloud: AWS (EC2, RDS, S3, IAM, CloudWatch, Lambda) Infrastructure as Code: Terraform Containerisation & Orchestration: Docker, Kubernetes (EKS), Helm Configuration Management: Ansible Monitoring & Observability: Grafana, Prometheus CI/CD: GitHub Actions Automation & Scripting: Python, Bash, Go or Java What We’re Looking For Proven experience running AWS cloud infrastructure in a production or regulated … financial) environment. Hands-on experience managing Kubernetes clusters (preferably EKS). Strong understanding of Infrastructure as Code using Terraform. Familiarity with monitoring and observability stacks such as Prometheus and Grafana. Experience building and maintaining CI/CD pipelines (GitHub Actions or similar). Strong scripting or automation skills using Python, Bash, Go or Java . A collaborative mindset — comfortable working More ❯
AWS (Core Services – EC2, RDS, S3, IAM, Lambda, CloudWatch) Infrastructure as Code: Terraform Containerisation & Orchestration: Docker, Kubernetes (EKS), Helm Configuration Management: Ansible CI/CD Pipelines: GitHub Actions Monitoring & Observability: Grafana, Prometheus Scripting/Automation: Python or Java What We’re Looking For Proven experience managing and scaling AWS cloud environments , ideally supporting live software products or high-traffic platforms. … Strong background in Terraform and Infrastructure as Code best practices. Practical experience with Kubernetes (EKS) in production. Familiarity with monitoring and observability tools such as Grafana and Prometheus. Hands-on experience building CI/CD pipelines (GitHub Actions, Jenkins, CircleCI, etc.). Solid scripting and automation experience using Python or Java . A collaborative engineer who enjoys working closely with More ❯
Monitor and optimise network performance across cloud and on-premise environments Troubleshoot and resolve connectivity issues quickly and effectively Automate network configuration using Terraform, PowerShell and Azure CLI Maintain observability using Azure Monitor, Log Analytics and Network Watcher Ensure deployments align with security and compliance standards Produce technical documentation and support knowledge sharing Required Experience: Strong hands-on experience with More ❯
City of London, London, United Kingdom Hybrid/Remote Options
ARC IT Recruitment Ltd
/MTTR via automation, clear SLAs, and robust RCAs/post-mortems. Safer, faster releases (blue/green, canary, feature flags) in partnership with Trading, Quant, and Engineering. Mature observability (logs/metrics/traces), capacity planning, and performance tuning for low-latency flows. Strong production hygiene and controls aligned to MiFID II/MAR/best-ex. Leadership of More ❯
This job is with MSCI, an inclusive employer and a member of myGwork – the largest global platform for the LGBTQ+ business community. Please do not contact the recruiter directly. Your Team Responsibilities: At MSCI, technology powers our global insights. We More ❯
kernel bypass. Ideally experience with Solarflare with onload and Mellanox with VMA. System hardware/OS tuning and performance troubleshooting. Experience with Automation tooling: Chef or Ansible Experience with observability stack Strong knowledge of server networking configuration. More ❯
City, London, United Kingdom Hybrid/Remote Options
Adecco
/CD workflows. *Solid understanding of Kubernetes operations on AWS (EKS), including cluster scaling and deployment automation. *Proficiency in Linux administration, networking fundamentals, and cloud security principles. *Familiarity with observability stacks such as Prometheus, Grafana, and Loki, with structured alerting practices. *Knowledge of database operations, including migrations, high availability, backups, and disaster recovery strategies. *Skilled in automation and Scripting using … rolling, and canary approaches. Strengthen platform resilience by improving autoscaling, high availability, and eliminating single points of failure. Work closely with SRE and Security teams to enhance monitoring and observability through Prometheus, Grafana, and CloudWatch. Embed security best practices into every layer of the platform, covering IAM, secrets management, WAF, and compliance. Drive cost efficiency and performance improvements through proactive More ❯
london (city of london), south east england, united kingdom
Log my Care
the team. Solve challenging problems: Proactively identify root causes, implement durable solutions, and share learnings that help others solve problems more effectively. Be data-driven: Use product metrics and observability tools to guide decisions during discovery and development, and deliver work that demonstrably improves product or business metrics. Deliver at pace: Improve team velocity by unblocking others, streamlining workflows, and More ❯
City of London, London, United Kingdom Hybrid/Remote Options
Hargreaves Lansdown
/UX architecture . Comfortable guiding teams through design implementation , collaborating with product and design using tools like Figma . Familiar with cloud-native environments (AWS, Docker, Kubernetes) and observability tools like Prometheus and Grafana . Champions quality and security , embedding testing and scanning into development pipelines. Passionate about mentoring engineers , conducting code reviews , and fostering a culture of continuous … Android) JavaScript/HTML/CSS Figma/Git Testing frameworks : Jest, Cypress, XCTest, Espresso CI/CD pipelines : GitHub Actions, CircleCI, Bitrise Cloud-native architecture : AWS, Docker, Kubernetes Observability tools Interview Process 3 Stage Interview Stage 1 - Discussion with our Hiring Manager (30mins): A chance to talk with our Hiring Manager in more detail about the role, our tech More ❯
to translate complex business requirements into data-driven solutions. Write production-grade SQL and ensure data quality through testing, documentation, and version control. Promote best practices around data reliability, observability, and maintainability. (Optional but valued) Contribute to Infrastructure as Code and CI/CD pipelines (e.g., Terraform, GitHub Actions). Skills & Experience 5+ years of experience in data-focused roles … other data visualisation tools. Familiarity with orchestration tools such as Airflow, Prefect, or Dagster. Understanding of CI/CD practices in data and analytics engineering. Knowledge of data governance, observability, and security best practices in cloud environments. More ❯
root cause analysis, lessons learnt and post actions Champion stability and resilience across the trading platforms Ensure new systems are aligned with best practices Drive improvements and alignment in observability and monitoring tools, improving MTTD and MTTR Produce analysis on SRE function performance Provide guidance, recommendations and hands-on support to teams, promoting SRE best practices Develop and maintain a … roadmap for continuous improvement of support and observability Maintain personal/professional development to meet the changing demands of the role, including all relevant regulatory and legislative training When dealing with all customers, clients, or colleagues, ensure that we provide a clear, fair and consistent high-quality service that presents a professional and positive image of CMC Markets Take all … cloud/on prem environment 7 years experience in IT operational roles working with highly reliable systems Experience in modern development methodologies and languages Proficiency in implementing and managing observability tools Knowledge of automation tools and methodologies Strong understanding and application of ITIL processes Excellent oral, written and presentation skills Strong stakeholder management and influencing skills Proactive attitude towards learning More ❯
City of London, London, United Kingdom Hybrid/Remote Options
Bloc Recruitment
engineering teams Guide system design for complex, cloud-native, containerised environments (Kubernetes, Terraform, Helm, Flux) Own the evolution of our data model and processing platforms Embed best practices in observability , security , and testing across the organisation Anticipate dependencies and unblock teams before issues arise Partner with product and business leaders to align technology with strategic goals Mentor engineers and technical … Expertise in modern backend architectures (Node.js, Python, Go) and fluency with frontend ecosystems (React, TypeScript) Deep experience with cloud-native infrastructure (Terraform, Kubernetes, Helm, Flux) Strong grasp of security, observability, and operational excellence Proven ability to influence and guide without formal authority Track record of mentoring and developing high-performing engineering teams Why You'll Love It Here Join a More ❯