ways of working practices) Championing the Engineering ways of working and implemented in your day-to-day work and across the team Be a key player in the reliability, availability and scalability of software systems Requirements Strong experience with backend, frontend and mobile development with Node, React/React Native and PHP Experience with project management tools such as … IaC tooling such as Terraform Have experience with tracking analytics, feature flags implementation, measuring impact of feature delivery and utilising analytics to make data-informed decisions Have worked in high risk environments managing critical system workloads with highavailability and uptime requirements Have experience with Elixir application development This role is perfect for someone who thrives in More ❯
are seeking a Software Engineer specializing in Account Lifecycle and Access Management (IAM). In this role, you will play a pivotal role in designing, building, and maintaining scalable, high-performance systems. Leveraging JavaScript/TypeScript, Node.js, SQL and NoSQL cloud-native databases, and AWS, you will develop innovative solutions for user, group, and membership management while ensuring the … customer expectations. Develop and maintain enterprise-grade IAM solutions, focusing on user, group, and membership management. Work with Infrastructure as Code (Terraform, AWS CloudFormation, Kubernetes, etc.) to deliver scalable, high-availability applications. Build and maintain high-quality enterprise software using TypeScript, REST APIs, and JSON. About You: 3+ years of experience in frontend engineering and deploying cloud More ❯
responsible for blendingSite Reliability Engineering (SRE), DevOps, and traditional operations modelsto build a next-generationReliability Engineering function. This role ensuresend-to-end automation at scale, 24x7 operational excellence, and high availabilityacrossall of BCG, includingBCG Core, BCG X, and Consulting Team (CT) worldwide. The leader will drivestrategic planning, execution, and optimizationof global IT infrastructure, cloud operations, and service management while … BCG business units. Managenetwork reliability, compute platforms, and cloud-native servicesacross AWS, Azure, and GCP. ScaleInfrastructure as Code (IaC),automated provisioning, andcloud workload optimization. Driveedge computing, containerized workloads, and high-performance computing strategies. ImplementAI-driven monitoring, self-healing automation, and full-stack observability. IT Service Management & Operational Excellence: Mandate and assure the adoption of IT Service Management (ITSM) processes … ensuring standardized, efficient, and effective service delivery. EstablishSRE-based operational metrics, includingSLOs, SLIs, and error budgets. Overseeincident response, problem resolution, and root cause analysis with AI-driven remediation. Ensurehigh availability, performance, and security compliancefor all enterprise services. Develop afollow-the-sun operational support model, ensuring24x7 resilience and uptime across all of BCG. Optimizeincident, change, and capacity management, ensuring alignment More ❯
software platforms that enable our clients to process, analyze, and visualize large-scale geospatial and sensor data for critical decision-making. Our mission is to deliver innovative, scalable, and high-performance platforms that help organizations unlock the full potential of their data for various industries, including defense, national security, environmental monitoring, and urban planning. We are seeking a highly … big data, performing advanced data analytics, and enabling geospatial intelligence features. You will work with cross-functional teams, including data engineers, geospatial analysts, and product managers, to create a high-performance platform that processes massive datasets, integrates complex geospatial data, and offers real-time or near-real-time insights. You will have the opportunity to work on cutting-edge … Key Responsibilities: Platform Development & Optimization: Design, develop, and optimize the core software platform to handle large-scale geospatial datasets, integrate big data sources, and support advanced data analytics. Ensure highavailability, reliability, and performance of platform components. Big Data Architecture: Build and maintain big data architectures and data pipelines to efficiently process large volumes of geospatial and sensor More ❯
we'd love to talk to you. This is a unique opportunity to shape, direct and build better financial services for all UK businesses, whilst being part of a high growth tech company. Backed by leading global VCs, and brings together seasoned, experienced payment, banking and tech industry professionals who are aiming to redefine the market that they operate … Typescript and Node.js Applications AWS Cloud Native and Serverless Architecture You will collaborate with other stakeholders and manage suppliers to fulfil business requirements through system enhancements and maintenance. Ensuring highavailability in production systems is a critical responsibility of this role. To excel in this role, you should be: A people-oriented leader who is easy to work … with, adaptable and flexible while maintaining high standards. Supported by global teams of engineers, QAs, SREs, and DevOps professionals. Comfortable managing a mix of direct reports and shared teams Skills & Experiences AWS Technologies (e.g. ECS, DynamoDB, Lambda, Aurora, SQS, SNS, VPC, Private Link, etc.) Multi-threading & Socket Programming: Java + Spring Applications Architectures - Event Sourcing, CQRS, Polyglot Database, Traditional More ❯
real-time trading demands, improve market efficiency, and support advanced trading strategies. You'll be a technical advocate for excellence and lead by example in fostering a culture of high standards, agility, and innovation within the team. The role is based in London and the team is international. WHAT YOU WILL DO Design and optimize low-latency trading systems … effectively with both technical and non-technical stakeholders, ensuring clear alignment between engineering, business teams (trading, quants), and leadership, especially when discussing complex technical solutions or business goals. Ensure highavailability, reliability, and scalability of trading systems while maintaining a sharp focus on performance and testing. Drive technical decision-making and contribute to high-level architecture discussions … algorithms to ensure maximum profitability, minimal risk, and fast execution, adapting quickly to volatile market conditions. WHAT YOU WILL NEED Extensive experience in Rust and/or C++, building high-performance, low-latency systems in complex environments such as cryptocurrency trading or financial services. Leadership experience with a proven track record of leading by example, advocating for technical excellence More ❯
the reliability and scalability of our production systems. Key Responsibilities Design, implement, and manage AWS cloud infrastructure. Develop and maintain automation scripts and tooling. Support production systems and ensure highavailability and performance. Implement observability and monitoring solutions. Collaborate closely with the PBS (Platform/Backend Services) team. Contribute to infrastructure as code (IaC) and DevOps best practices. More ❯
the reliability and scalability of our production systems. Key Responsibilities Design, implement, and manage AWS cloud infrastructure. Develop and maintain automation scripts and tooling. Support production systems and ensure highavailability and performance. Implement observability and monitoring solutions. Collaborate closely with the PBS (Platform/Backend Services) team. Contribute to infrastructure as code (IaC) and DevOps best practices. More ❯
South West London, London, England, United Kingdom
Oscar Technology
evolution Build and optimise CI/CD pipelines (GitHub Actions, Azure DevOps, Jenkins) Implement robust monitoring and alerting solutions (CloudWatch, Azure Monitor, Grafana, ELK) Own incident response processes, ensuring highavailability and rapid resolution Collaborate with stakeholders to communicate solutions and technical trade-offs clearly Ideal Experience: 3-5 years SRE or DevOps experience across AWS and Azure More ❯
Directory Services Azure AD Strong experience in the up-gradation and migration of ForgeRock CIAM, IDM, AM, DS . Good knowledge of ForgeRock CIAM, IDM, AM, DS deployment in HighAvailability environment and enabling clustering. Hands on experience with setting up various components of ForgeRock CIAM, IDM, AM, DS like User store, Provisioning store, admin console, provisioning server More ❯
Directory Services Azure AD Strong experience in the up-gradation and migration of ForgeRock CIAM, IDM, AM, DS . Good knowledge of ForgeRock CIAM, IDM, AM, DS deployment in HighAvailability environment and enabling clustering. Hands on experience with setting up various components of ForgeRock CIAM, IDM, AM, DS like User store, Provisioning store, admin console, provisioning server More ❯
Kubernetes, as well as Azure Cloud, Azure SQL Database, and Oracle Database with PL/SQL. What tasks await you? Administration, configuration, and implementation of databases in a demanding high-availability environment (Oracle) Planning and provisioning of new databases Automation of existing processes Performing database migrations, patches, and upgrades Consulting and support of projects Ensuring that storage and More ❯
taking responsibility for your services and the technology within them. These roles fit in to squads who are building out brand new parts to our payments platform, focusing on highavailability, cloud native, microservice concepts You'll get to work as the Senior Engineer in your squad, leading on discussions around technical direction and systems design, as well More ❯
is at the heart of everything we do. If this sounds exciting to you, please read on. We are seeking an experienced Senior Software Development Manager to lead our Availability Engineering team within Prime Video. This team is responsible for developing and maintaining our observability platform, incident management systems, and resiliency programs. Key job responsibilities - Manage a high-performing team of software engineers, program managers, data scientists, and incident responders focused on improving the availability and resilience of Prime Video - Oversee the development and evolution of our observability platform, which enables analysis of logs, traces, and other telemetry at scale to rapidly triage and resolve issues - Implement observability and incident management solutions, including the use of … escalation paths, and post-incident review - Drive initiatives to improve the overall resilience and fault-tolerance of the Prime Video platform - Partner closely with other engineering leaders to ensure availability and reliability goals are met - Hire, develop, and retain top technical talent for the Availability Engineering team A day in the life 1. Team Management: - Hold 1-on More ❯
teams to deliver large-scale projects with cross-team dependencies. Collaborate with peer teams to deliver solutions that meet industry standards and customer expectations. Maintain and operate services at high scale, participating in scheduled on-call rotations to ensure reliability. Develop and maintain enterprise-grade IAM solutions, focusing on user, group, and membership management. Implement identity synchronization and lifecycle … management solutions using SCIM and other relevant standards. Work with Infrastructure as Code (Terraform, AWS CloudFormation, Kubernetes, etc.) to deliver scalable, high-availability applications. Design and implement robust access control models using OAuth, OpenID Connect (OIDC), SAML 2.0, and other protocols. Build and maintain high-quality enterprise software using TypeScript, REST APIs, and JSON. About You: 3+ More ❯
drive IT operational excellence, manage security risks, focus on continual service improvement, drive transformational delivery projects, and work effectively with internal stakeholders and third-party vendors to deliver a high-quality Global IT services. Working in line with the Architecture defined IT principle of a "buy before build" environment, the individual will need to ensure that outsourced and cloud … removal of technical debt. Manage, enhance, and optimise the organisation's use of Microsoft 365 and Azure cloud platforms, enabling the migration legacy solutions to native cloud services, ensuring highavailability and performance. Oversee cloud-based SaaS, PaaS, and IaaS solutions, ensuring seamless integration with business applications. Develop and implement cloud-first operational best practices, leveraging automation, infrastructure More ❯
also valid Puppet, Saltstack, Ansible) Understanding of API documentation, implementation and supporting customers using these Expert knowledge of HTTP (including RESTful services) Best practice understanding of Hardware, Virtualization, Clustering, HighAvailability, Disaster Recovery and Security Basic IP networking skills It is also desirable for to be experienced in or familiar with the following (these will become part of More ❯
systems. Implement .NET-based microservices with strong observability and integration with data platforms. Develop custom ETL pipelines using AWS, Python, and MySQL. Implement governance, lineage, and monitoring to ensure highavailability and traceability. AI & Advanced Analytics Integration: Collaborate with AI/ML teams to enable model training pipelines with robust and reliable data access. Leverage metadata and structured More ❯
Ensure application security including OAuth 2.0, OIDC, CORS, CSRF, and cookie management. Write and maintain unit and functional tests using frameworks like Cypress. Optimize Back End performance and ensure high availability. Work with SQL and NoSQL databases. Monitor and troubleshoot applications using tools like Splunk, StackDriver, etc. Required Skills & Experience: Proven experience developing with Node.js. Strong understanding of Microservices More ❯
Maintain Scalable Data Warehousing Solutions: Design, build, and maintain robust ELT pipelines and transformation workflows Model and maintain curated data layers to support reporting, analytics, and decision-making Ensure highavailability, scalability, and performance of data warehouse systems (cloud-based, e.g., Redshift) Develop & Manage Data Products: Collaborate with business and domain experts to define and deliver highMore ❯
Understanding of SCIM for user provisioning and identity management, as well as SAML 2.0 and Single Sign-On protocols. Ability to effectively implement and troubleshoot SSO and SCIM integrations. High-Quality Code: A proven record of writing clean, testable, and maintainable code that meets rigorous standards of software quality. A developer dedicated to enhancing the code base. Problem-Solving … Scalability: Strong problem-solving skills with the capability to develop scalable and durable features in high-availability environments. Adaptability & Communication: The ability to flourish in a fast-paced, dynamic environment, complemented by excellent communication skills that support both internal teams and external customers. Capable of effectively conveying complex concepts to both technical and non-technical audiences. Ambition & Drive More ❯
optimise a complex SQL estate in a global environment? If you're seeking a role where your expertise directly supports critical business systems, and you're passionate about performance, availability, and data security, this could be your next career move. The Opportunity We’re working with a prestigious global law firm looking to add a skilled Database Administrator to … innovation and cross-border collaboration Work within a modern, professional IT environment with access to up-to-date SQL Server technology (2019/2022) Flexible and agile working practices High visibility across multiple teams and business areas Inclusive, supportive culture that values wellbeing and development Exposure to complex systems and high-availability architecture across an international network … Key Responsibilities & Tech Stack Ensure the availability, performance, and security of enterprise SQL Server databases Proactively monitor systems, implement backups and recovery processes, and optimise performance Support integration and development using tools like T-SQL, SSIS, SSRS, SSAS, Power BI Maintain and develop utilities and scripts to support automation and standardisation Contribute to strategic projects including upgrades, migrations and More ❯
solutions that support our business applications and analytics platforms. Reporting into VP Architecture you will work with engineering, data science, and DevOps teams to ensure data integrity, performance, and availability across the organization. What you'll do: Design and implement scalable database architectures for transactional workloads. Develop data models, schemas, and storage strategies that align with business requirements and … engineers to improve queries, indexing strategies, and data access patterns. Lead database migration, replication, and backup/recovery strategies across cloud and on-prem environments. Monitor database performance, ensuring highavailability and disaster recovery readiness. Recommend new database technologies, and platforms. Provide technical leadership and mentorship to database administrators and data engineers. About Experian Experian is a global More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Picture More
take ownership of enterprise-level cloud projects in a global environment? If you're a Cloud Engineer who thrives on designing scalable infrastructure, leading by example, and delivering secure, high-availability cloud solutions, this role is your next big challenge. We’re working with a top-tier legal services organisation who are investing heavily in its tech platforms … Monitor, SolarWinds) Work with DevOps and Security teams to enforce cloud governance and zero-trust models Stay close to the technology – supporting the business, guiding junior engineers, and troubleshooting high-level incidents when needed Who We're Looking For This role suits someone with strong hands-on cloud engineering experience and a proactive mindset. You should be: Confident working More ❯
capabilities of groundbreaking AI technologies to benefit humanity in a safe and reliable way. Responsibilities: Develop appropriate Service Level Objectives for large language model serving and training systems, balancing availability/latency with development velocity Design and implement monitoring systems including availability, latency and other salient metrics Assist in the design and implementation of high-availability language model serving infrastructure capable of handling the needs of millions of external customers and high-traffic internal workloads Develop and manage automated failover and recovery systems for model serving deployments across multiple regions and cloud providers Lead incident response for critical AI services, ensuring rapid recovery and systematic improvements from each incident Build and maintain cost optimization … model serving, batch inference, and training pipelines Have proven experience implementing and maintaining SLO/SLA frameworks for business-critical services Are comfortable working with both traditional metrics (latency, availability) and AI-specific metrics (model performance, training convergence) Have experience with chaos engineering and systematic resilience testing Can effectively bridge the gap between ML engineers and infrastructure teams Have More ❯