execution of disaster recovery tests & seek to automate these activities where possible Covering on-call schedule when Production support is required outside of working hours Participate in enhancing product observability and telemetry, support modernization. Brainstorm ideas to simplify and streamline infrastructure by closely working with infrastructure and SRE teams. Required qualifications, capabilities and skills Knowledge of Python/Unix Shell More ❯
service mesh solutions across our distributed systems. In this role, you will lead the design and operation of Kong Mesh (based on Kuma) for managing microservices communication, security, and observability at scale. You’ll play a crucial role in defining service-to-service architecture and ensuring platform reliability, scalability, and security. Key Responsibilities: • Lead the design and deployment of Kong … Mesh across our environments (on-prem and cloud). • Define and enforce best practices for service mesh architecture, traffic routing, zero-trust security, observability, and policy enforcement. • Collaborate with infrastructure, security, and development teams to integrate Kong Mesh with CI/CD, monitoring, and logging solutions. • Develop custom policies, plugins, and automation scripts to enhance Kong Mesh capabilities. • Monitor mesh More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Tec Partners
focus on security, resilience, and continuous improvement. Key Responsibilities: Manage and maintain Elastic Cloud Enterprise (ECE) environments, ensuring high availability and performance. Design and deploy scalable Elasticsearch solutions for Observability and Search use cases. Implement robust security, privacy, and compliance controls across Elasticsearch systems. Optimise system configurations and queries to enhance performance and reduce latency. Collaborate with cross-functional teams More ❯
Fi authentication systems, CRMs and partnered PropTech tools Continually hone and perfect our homegrown DevOps and CI/CD processes by further developing GitHub Actions pipelines, Terraform definitions and observability integrations. Ensure quality & reliability: establish testing best practices (unit, integration, end-to-end), conduct code reviews and demand high quality standards Shape and refine our cloud-native platform to optimise More ❯
digital trends, challenges, solutions, market dynamics, competition, and peer group activities. Understanding and ability to articulate the vision for modern engineering (e.g., agile, cloud-native, DevOps), and operations (e.g., observability, automated response, SRE etc.), and articulate a path toward a target operating model (people, process, and tools). Required Skills Leadership: Strong leadership skills are essential for guiding teams to More ❯
digital trends, challenges, solutions, market dynamics, competition, and peer group activities. Understanding and ability to articulate the vision for modern engineering (e.g., agile, cloud-native, DevOps), and operations (e.g., observability, automated response, SRE etc.), and articulate a path toward a target operating model (people, process, and tools). Required Skills Leadership: Strong leadership skills are essential for guiding teams to More ❯
and CI/CD pipelines. Experience supporting real-time trading applications and proficient in scripting and automation (Python, Bash, PowerShell). Knowledge of messaging middleware (e.g., Solace, 29West) and observability platforms (e.g., ITRS Geneos, Prometheus). Excellent communication skills and comfortable working in Linux systems and hybrid infrastructure. Benefits: Flexible working options between office and home. Exposure to global production More ❯
teams to execute effectively. DataOps Enablement and Optimization: Drive the adoption of modern DataOps principles to streamline engineering workflows. Partner with platform teams to establish CI/CD pipelines, observability standards that improve operational efficiency, reliability, and speed across data pipelines. Data Governance and Quality Assurance: Embed governance, security, and data quality practices into engineering workflows. Define guardrails and reference More ❯
City of London, London, England, United Kingdom Hybrid / WFH Options
Client Server Ltd
real-time operations, resilience and extensibility. You'll collaborate with engineers across the full stack to integrate backend services with Identity and Access control frameworks such as Keycloak, apply observability best practices and contribute to system architecture and codebase quality through reviews and mentoring. Location/WFH: You can work from home most of the time, meeting up with colleagues … have strong Spring Boot, Kafka and event driven microservices experience with high throughput You have leadership, mentoring and coaching skills You have a strong understanding of system-level concerns: observability, availability, resilience You have experience working within security-conscious, regulated, or mission-critical domains You are proficient working with CI/CD pipelines, Infrastructure-as-Code principles and containerised environments More ❯
City of London, London, England, United Kingdom Hybrid / WFH Options
Client Server Ltd
real-time operations, resilience and extensibility. You'll collaborate with engineers across the full stack to integrate backend services with Identity and Access control frameworks such as Keycloak, apply observability best practices and contribute to system architecture and codebase quality through reviews and mentoring. Location/WFH: You can work from home most of the time, meeting up with colleagues … have strong Spring Boot, Kafka and event driven microservices experience with high throughput You have leadership, mentoring and coaching skills You have a strong understanding of system-level concerns: observability, availability, resilience You have experience working within security-conscious, regulated, or mission-critical domains You are proficient working with CI/CD pipelines, Infrastructure-as-Code principles and containerised environments More ❯
City Of Westminster, London, United Kingdom Hybrid / WFH Options
Track24 Limited
ISO and SOC compliance standards while collaborating with the InfoSec team to maintain security best practices. Containerisation & Orchestration: Deploy and manage containerised applications using Docker and other orchestration tools. Observability & Monitoring: Provision and maintain observability platforms such as DataDog, Splunk, or New Relic to gain monitoring and performance insights. Incident Management: Establish and oversee monitoring and incident management processes to More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Anson McCade
experience etc Proven track record designing and deploying agentic and generative AI prototypes. Deep understanding of semantic search, vector databases, and memory management strategies. Familiarity with cloud AI tools, observability platforms, and performance optimisation. This is an opportunity to work at the forefront of AI innovation, where your work will directly shape how next-generation systems interact, reason, and assist. More ❯
Your new company This is a major global bank with an office in Central London. Your new role You will be working in a team supporting AWS native databases, supporting other existing products and improving observability. As well as working More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Develop
data, integration layers, and authentication modules Ensure secure, scalable deployment using Azure cloud-native tools Build and support systems using PostgreSQL, Java, and Spring Boot Integrate and monitor using observability tools like Datadog and BigPanda Collaborate closely with architects, DevOps, and security teams across the full SDLC Core Skills & Technologies Strong backend development in Java with Spring Boot Cloud migration … experience, particularly Azure Lift-and-Shift Familiarity with cloud infrastructure and deployment pipelines Exposure to PostgreSQL, authentication/security patterns Monitoring/observability tooling: Datadog, BigPanda Apply now to be considered. More ❯
specialism in vulnerability management Self-starter, able to work in technical detail and motivate a diverse group of stakeholders to build sponsorship for significant and impactful change Desired: Establishing observability platforms Capabilities adjacent to exposure/vulnerability management capabilities (ie cyber security asset management, attack surface management, etc) Pragmatic application of zero-trust philosophies Cloud based security (GCP, AWS and More ❯
the stack , but proficiency with Python and Django is necessary , and ideally some exposure to front-end engineering. Frontend solutions are built for both web and mobile platforms. For observability, DataDog is used for monitoring and alerting, and CI/CD pipelines are managed through GitLab to automate testing and deployment workflows. Were looking for someone with a strong product More ❯