execution of disaster recovery tests & seek to automate these activities where possible Covering on-call schedule when Production support is required outside of working hours Participate in enhancing product observability and telemetry, support modernization. Brainstorm ideas to simplify and streamline infrastructure by closely working with infrastructure and SRE teams. Required qualifications, capabilities and skills Knowledge of Python/Unix Shell More ❯
often, and embrace hands-on problem-solving; maturing projects as they become foundational parts of the company's infrastructure, whether that means writing resilient, test-driven code, designing for observability, or building systems that can scale and recover gracefully. You’ll have the space to experiment and the responsibility to stabilise when it counts. You’ll work across AWS and … and CI/CD pipelines in a cloud-native environment. -Database Familiarity: Skilled in both SQL and NoSQL (PostgreSQL, DynamoDB, OpenSearch, or equivalents), using ORMs like Django or SQLAlchemy. -Observability & Monitoring: Comfortable using tools like CloudWatch, X-Ray, and structured logging to keep systems running smoothly. -Mindset: Curious, Collaborative, and Proactive - you enjoy solving problems hands-on and aren’t More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Anson McCade
architectures across Azure, AWS, and Google Cloud Leading platform engineering squads using DevSecOps, Kubernetes, and automation tooling Enabling edge and private cloud capabilities (e.g., Azure Stack, AWS Outposts) Implementing observability and governance tooling to support modern operations Supporting Agile and product-based delivery using SRE, CI/CD, and Infrastructure as Code Advising clients on architecture optimisation, security, cost control More ❯
Fi authentication systems, CRMs and partnered PropTech tools Continually hone and perfect our homegrown DevOps and CI/CD processes by further developing GitHub Actions pipelines, Terraform definitions and observability integrations. Ensure quality & reliability: establish testing best practices (unit, integration, end-to-end), conduct code reviews and demand high quality standards Shape and refine our cloud-native platform to optimise More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Kubrick Group
platform initiatives Provide technical oversight and delivery governance, ensuring scalability, reliability, and security across platforms Champion modern engineering practices including DevSecOps, infrastructure-as-code, CI/CD automation, and observability Align global delivery teams across time zones and regions, ensuring seamless collaboration and value-driven outcomes Personal Specification: 10+ years in software/platform engineering, with significant senior delivery or More ❯
digital trends, challenges, solutions, market dynamics, competition, and peer group activities. Understanding and ability to articulate the vision for modern engineering (e.g., agile, cloud-native, DevOps), and operations (e.g., observability, automated response, SRE etc.), and articulate a path toward a target operating model (people, process, and tools). Required Skills Leadership: Strong leadership skills are essential for guiding teams to More ❯
digital trends, challenges, solutions, market dynamics, competition, and peer group activities. Understanding and ability to articulate the vision for modern engineering (e.g., agile, cloud-native, DevOps), and operations (e.g., observability, automated response, SRE etc.), and articulate a path toward a target operating model (people, process, and tools). Required Skills Leadership: Strong leadership skills are essential for guiding teams to More ❯
City of London, London, United Kingdom Hybrid / WFH Options
DGH Recruitment
managing cloud infrastructures, with expertise in Infrastructure as Code (IaC), particularly using Terraform, proficiency in designing and implementing CI/CD pipelines, and a deep understanding of monitoring and observability practices. Core responsibilities: - Architect, deploy, and manage Azure-based infrastructure to ensure high availability, scalability, and security. - Develop and maintain Infrastructure as Code (IaC) using Terraform for automated and consistent … Code (IaC) tools, especially Terraform. - Experience in designing and managing CI/CD pipelines using tools such as Azure DevOps, Jenkins, or AWS CodePipeline. - Strong understanding of monitoring and observability tools and practices, including experience with Azure Monitor, SCOM, SolarWinds or similar technologies. Senior Azure Infrastructure Engineer (Azure/Terraform/IaC/CI/CD/AWS) In accordance More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Tec Partners
focus on security, resilience, and continuous improvement. Key Responsibilities: Manage and maintain Elastic Cloud Enterprise (ECE) environments, ensuring high availability and performance. Design and deploy scalable Elasticsearch solutions for Observability and Search use cases. Implement robust security, privacy, and compliance controls across Elasticsearch systems. Optimise system configurations and queries to enhance performance and reduce latency. Collaborate with cross-functional teams More ❯
Infrastructure Architect | Lead the Future of Cloud & Hybrid IT London - Hybrid 🚀 Ready to architect the future? As an Infrastructure Architect , you’ll be at the forefront of cloud transformation, leading high-impact projects that bridge the gap between cutting-edge More ❯
SR2 | Socially Responsible Recruitment | Certified B Corporation™
roadmaps for platform , infrastructure , and AI/ML tooling . Act as primary product partner to engineering, SRE, and data science teams. Lead initiatives that boost developer experience , system observability , and engineering efficiency . Foster a culture of enablement , documentation , and adoption for internal tooling and platforms. Required Skills and Experience Strong technical foundation (e.g. CS degree , former engineering experience … success in building and scaling core platforms , cloud infrastructure , developer tooling , and API services . Deep, practical knowledge of AWS , distributed systems , CI/CD , IaC , DevOps principles , and observability tooling . Direct collaboration with engineering/SRE teams on internal tooling and shared services. Track record of delivering internal products that boost developer workflows , reliability , or deployment velocity . … with AI/ML platforms , MLOps , and deploying machine learning models into production environments. Core Focus Areas Core Platform Engineering Developer Experience (DevEx/DX) Engineering Productivity DevOps Tooling Observability Solutions Platform as a Service (PaaS) Infrastructure as a Service (IaaS) Cloud-Native Architecture If you're passionate about building foundational platforms and tools that empower world-class engineering teams More ❯
systems. Architect with Intention: Ensure solutions align with architectural principles, security best practices, and compliance requirements. Champion Quality & Reliability: Contribute to code reviews, test automation, CI/CD, and observability tools to deliver high-quality, maintainable software. What They’re Looking For: 5+ years of professional software development experience, with a strong focus on cloud-native architecture Proficiency with Microsoft More ❯
City of London, London, England, United Kingdom Hybrid / WFH Options
QA
is seeking a dedicated DevOps Engineer Apprentice to bolster their NHS project team. In this role, the chosen candidate will be instrumental in enhancing the incident management protocols, advancing observability and monitoring strategies, and refining CI/CD practices within the AWS ecosystem.Responsibilities:Collaborating with cross-functional teams to ensure smooth and reliable incident management using Jira and Service Now.Developing … and implement observability and monitoring solutions to ensure high system availability and performance.Contributing to maintaining and improving CI/CD pipelines, ensuring efficient code integration and deployment on AWS.Supporting the design and execution of automated test strategies to enhance the quality and security of cloud-based applications.The successful candidate must have:Experience with AWS cloud services and management tools.Familiarity with More ❯
/CD pipelines for modern web applications Familiarity with infrastructure-as-code tools such as Terraform Understanding of security best practices in web infrastructure and application delivery Exposure to observability tooling and techniques (e.g., Prometheus, Grafana, structured logging) Confident in debugging and resolving issues in complex distributed or web-based Systems A product mindset and collaborative approach to improving how More ❯
systems. Architect with Intention: Ensure solutions align with architectural principles, security best practices, and compliance requirements. Champion Quality & Reliability: Contribute to code reviews, test automation, CI/CD, and observability tools to deliver high-quality, maintainable software. What They’re Looking For: 3+ years of professional software development experience, with a strong focus on cloud-native architecture Proficiency with Microsoft More ❯
teams to execute effectively. DataOps Enablement and Optimization: Drive the adoption of modern DataOps principles to streamline engineering workflows. Partner with platform teams to establish CI/CD pipelines, observability standards that improve operational efficiency, reliability, and speed across data pipelines. Data Governance and Quality Assurance: Embed governance, security, and data quality practices into engineering workflows. Define guardrails and reference More ❯
City of London, London, England, United Kingdom Hybrid / WFH Options
Client Server Ltd
real-time operations, resilience and extensibility. You'll collaborate with engineers across the full stack to integrate backend services with Identity and Access control frameworks such as Keycloak, apply observability best practices and contribute to system architecture and codebase quality through reviews and mentoring. Location/WFH: You can work from home most of the time, meeting up with colleagues … have strong Spring Boot, Kafka and event driven microservices experience with high throughput You have leadership, mentoring and coaching skills You have a strong understanding of system-level concerns: observability, availability, resilience You have experience working within security-conscious, regulated, or mission-critical domains You are proficient working with CI/CD pipelines, Infrastructure-as-Code principles and containerised environments More ❯
City Of Westminster, London, United Kingdom Hybrid / WFH Options
Track24 Limited
ISO and SOC compliance standards while collaborating with the InfoSec team to maintain security best practices. Containerisation & Orchestration: Deploy and manage containerised applications using Docker and other orchestration tools. Observability & Monitoring: Provision and maintain observability platforms such as DataDog, Splunk, or New Relic to gain monitoring and performance insights. Incident Management: Establish and oversee monitoring and incident management processes to More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Develop
data, integration layers, and authentication modules Ensure secure, scalable deployment using Azure cloud-native tools Build and support systems using PostgreSQL, Java, and Spring Boot Integrate and monitor using observability tools like Datadog and BigPanda Collaborate closely with architects, DevOps, and security teams across the full SDLC Core Skills & Technologies Strong backend development in Java with Spring Boot Cloud migration … experience, particularly Azure Lift-and-Shift Familiarity with cloud infrastructure and deployment pipelines Exposure to PostgreSQL, authentication/security patterns Monitoring/observability tooling: Datadog, BigPanda Apply now to be considered. More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Anson McCade
or Autogen. Proven track record designing and deploying agentic and generative AI prototypes. Deep understanding of semantic search, vector databases, and memory management strategies. Familiarity with cloud AI tools, observability platforms, and performance optimisation. This is an opportunity to work at the forefront of AI innovation, where your work will directly shape how next-generation systems interact, reason, and assist. More ❯
specialism in vulnerability management Self-starter, able to work in technical detail and motivate a diverse group of stakeholders to build sponsorship for significant and impactful change Desired: Establishing observability platforms Capabilities adjacent to exposure/vulnerability management capabilities (ie cyber security asset management, attack surface management, etc) Pragmatic application of zero-trust philosophies Cloud based security (GCP, AWS and More ❯
Data Operations Manager. Duration 6 months We are seeking a dynamic and driven Data Operations Manager to lead a team of data engineers. You will oversee the daily operations of our data infrastructure and ensure the accuracy, availability, and security More ❯
the stack , but proficiency with Python and Django is necessary , and ideally some exposure to front-end engineering. Frontend solutions are built for both web and mobile platforms. For observability, DataDog is used for monitoring and alerting, and CI/CD pipelines are managed through GitLab to automate testing and deployment workflows. Were looking for someone with a strong product More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Uniting Ambition
insight delivery, and robust reporting. Governance & Operations Implement best practices in data governance, quality, privacy, and compliance (e.g. GDPR, ISO 27001). Monitor product usage and platform performance using observability tools and analytics. Apply data driven insights to inform feature development and improve user experience. Skills & Experience Required 5+ years of experience in product management or technical product delivery, with More ❯