able to: Contribute to every part of our system, ranging from code and tests to infrastructure changes. Ensure the stability of our system by implementing and improving monitoring and observability tools. Write resilient code that is well tested. Be curious - not just the code, but the architecture of our platforms and everything that enables the business to thrive. Gain expertise … the rest of the organisation, and almost all of Lyst engineering engages with us on a regular basis. We care about robustness and integrity in our pipelines and use observability tools to monitor. Experience in developing robust and secure software solutions and data pipelines. Effective communication skills, comfortable working with technical and non-technical individuals and teams. Proficiency in developing … within public cloud technologies and architecture (perferably AWS exp). Experience with containers (Docker) and container orchastration. Experience with Infrastructure as Code (we use Terraform). Experience utilising monitoring, observability and logging tools. Experience with git, gitOps, github actions. Exposure or experience with cloud data warehouse/data platforms (we useSnowflake). Things that matter to us: You have a More ❯
for the creation, implementation, and continuous improvement of BCG's modern, fully automated SACM function. As the beating heart of IT, this system will serve as the backbone for observability, service reliability, release and change management, and infrastructure management. The leader will drive the automation and governance of BCG's configuration management database (CMDB), integrating it with SRE, ITSM, and … Establish the CMDB as a real-time, trusted system of record for configuration items across cloud, on-prem, and hybrid environments. Embed SACM capabilities into core IT processes including observability, incident response, service management, and architecture governance. Champion automation, transparency, and traceability of all infrastructure, software, and asset relationships. Automation & Integration: Build and operate a fully automated CMDB with bi … reduce risk and accelerate safe deployments. Operational Excellence & SRE Alignment: Apply SRE principles to ensure reliability, performance, and resilience of the SACM platform. Embed SACM into 24x7 operations and observability platforms to support real-time decision-making. Support incident prevention, root cause analysis, and continuous improvement through data-driven insights. Define and enforce service level objectives (SLOs) and key performance More ❯
for the creation, implementation, and continuous improvement of BCG's modern, fully automated SACM function. As the beating heart of IT, this system will serve as the backbone for observability, service reliability, release and change management, and infrastructure management. The leader will drive the automation and governance of BCG's configuration management database (CMDB), integrating it with SRE, ITSM, and … Establish the CMDB as a real-time, trusted system of record for configuration items across cloud, on-prem, and hybrid environments. Embed SACM capabilities into core IT processes including observability, incident response, service management, and architecture governance. Champion automation, transparency, and traceability of all infrastructure, software, and asset relationships. Automation & Integration: Build and operate a fully automated CMDB with bi … reduce risk and accelerate safe deployments. Operational Excellence & SRE Alignment: Apply SRE principles to ensure reliability, performance, and resilience of the SACM platform. Embed SACM into 24x7 operations and observability platforms to support real-time decision-making. Support incident prevention, root cause analysis, and continuous improvement through data-driven insights. Define and enforce service level objectives (SLOs) and key performance More ❯
at scale, leveraging AWS Organizations, Landing Zones, and multi-account best practices. Develop and maintain Infrastructure as Code solutions using Terraform, CloudFormation, and AWS CDK. Champion security, compliance, and observability by integrating services like AWS Security Hub, GuardDuty, and Inspector. Design CI/CD pipelines to enable seamless deployments and self-service models for customers. Innovate with AWS Networking, KMS … Proficiency in Python, Go, or similar languages for automation and scripting. Expert-level knowledge of AWS Networking, TLS, and security best practices. Experience with container orchestration (Kubernetes, EKS) and observability tools (Grafana, ELK). A passion for innovation, problem-solving, and delivering high-impact solutions. Experience leading/managing junior engineers Significant experience with Control Tower and deploying landing zones. More ❯
at scale, leveraging AWS Organizations, Landing Zones, and multi-account best practices. Develop and maintain Infrastructure as Code solutions using Terraform, CloudFormation, and AWS CDK. Champion security, compliance, and observability by integrating services like AWS Security Hub, GuardDuty, and Inspector. Design CI/CD pipelines to enable seamless deployments and self-service models for customers. Innovate with AWS Networking, KMS … architectures and multi-account AWS setups. Extensive experience with AWS Organisations Expert-level knowledge of AWS Networking, TLS, and security best practices. Experience with container orchestration (Kubernetes, EKS) and observability tools (Grafana, ELK). A passion for innovation, problem-solving, and delivering high-impact solutions. Working with Control Tower and Landing Zones Why Work For Us? Competitive base salary up More ❯
Lead Developer 6 Months Hybrid -1/3 days a month in office, either London or Bristol £750 Overview: Working within an agile digital delivery team developing and supporting a mission critical application for the UK client , with instances hosted More ❯
Senior Java Engineer - Product Engineering B2B SaaS Insurtech Up to £110,000 per annum plus bonus and excellent pension London - 2 days a week Java Spring Boot AWS Kubernetes Event-Driven Architecture Senior Java Engineer - We have been exclusively engaged More ❯
up and harden RAG pipelines (indexing, retrieval policies, grounding, guardrails) and agent frameworks. Take basic infra ownership on GCP (or AWS/Azure): networking, autoscaling, CI/CD, IaC, observability, and cost tuning. Participate in on‐call for your area and drive root‐cause analysis with crisp follow‐ups. 15% Collaborate Pair with back‐end & front‐end to wire extractors … evals; hands‐on with time‐series analysis (forecasting, change‐point, drift). Cloud & ops: Basic infra ownership on GCP (or AWS/Azure): networking, autoscaling, CI/CD, IaC, observability, and cost control. Communication: You explain results clearly, align stakeholders, and write crisp docs. Bonus points DevOps wizardry; GPU/accelerator experience. Multimodal pipelines (text + voice + screenshots). More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Anecdote
up and harden RAG pipelines (indexing, retrieval policies, grounding, guardrails) and agent frameworks. Take basic infra ownership on GCP (or AWS/Azure): networking, autoscaling, CI/CD, IaC, observability, and cost tuning. Participate in on‐call for your area and drive root‐cause analysis with crisp follow‐ups. 15% Collaborate Pair with back‐end & front‐end to wire extractors … evals; hands‐on with time‐series analysis (forecasting, change‐point, drift). Cloud & ops: Basic infra ownership on GCP (or AWS/Azure): networking, autoscaling, CI/CD, IaC, observability, and cost control. Communication: You explain results clearly, align stakeholders, and write crisp docs. Bonus points DevOps wizardry; GPU/accelerator experience. Multimodal pipelines (text + voice + screenshots). More ❯
london, south east england, united kingdom Hybrid / WFH Options
Anecdote
up and harden RAG pipelines (indexing, retrieval policies, grounding, guardrails) and agent frameworks. Take basic infra ownership on GCP (or AWS/Azure): networking, autoscaling, CI/CD, IaC, observability, and cost tuning. Participate in on‐call for your area and drive root‐cause analysis with crisp follow‐ups. 15% Collaborate Pair with back‐end & front‐end to wire extractors … evals; hands‐on with time‐series analysis (forecasting, change‐point, drift). Cloud & ops: Basic infra ownership on GCP (or AWS/Azure): networking, autoscaling, CI/CD, IaC, observability, and cost control. Communication: You explain results clearly, align stakeholders, and write crisp docs. Bonus points DevOps wizardry; GPU/accelerator experience. Multimodal pipelines (text + voice + screenshots). More ❯
london (city of london), south east england, united kingdom Hybrid / WFH Options
Anecdote
up and harden RAG pipelines (indexing, retrieval policies, grounding, guardrails) and agent frameworks. Take basic infra ownership on GCP (or AWS/Azure): networking, autoscaling, CI/CD, IaC, observability, and cost tuning. Participate in on‐call for your area and drive root‐cause analysis with crisp follow‐ups. 15% Collaborate Pair with back‐end & front‐end to wire extractors … evals; hands‐on with time‐series analysis (forecasting, change‐point, drift). Cloud & ops: Basic infra ownership on GCP (or AWS/Azure): networking, autoscaling, CI/CD, IaC, observability, and cost control. Communication: You explain results clearly, align stakeholders, and write crisp docs. Bonus points DevOps wizardry; GPU/accelerator experience. Multimodal pipelines (text + voice + screenshots). More ❯
We are seeking a highly experienced Splunk ITSI Expert with 10+ years in observability to enhance our monitoring and analytics capabilities. Key Responsibilities: Design and implement advanced monitoring strategies using Splunk IT Service Intelligence (ITSI). Create service models, define KPIs, and build glass tables to visualize key business services. Utilize Splunk ES for security event monitoring and correlation searches. … Automate tasks and integrate systems using Python, Shell, or Perl scripting. Perform root cause analysis and anomaly detection by analyzing complex log data. Requirements: 10+ years experience in observability, with deep expertise in Splunk, especially ITSI. Proficiency in Scripting (Shell/PowerShell/Python). Strong understanding of Load Balancers such as F5, Netscaler, and AWS ELB. Hands-on experience More ❯
a key role in scaling and supporting our data systems, which leverage a modern AWS stack and Snowflake. This is a high-impact role with direct influence over reliability, observability, and the DevOps maturity of our data engineering function. Key Responsibilities Platform Ownership Own and manage the data platform infrastructure built on AWS services (EventBridge, Lambda, EC2, MWAA, S3). … Snowflake, and support its integration into the broader data ecosystem. Infrastructure and System Reliability Ensure platform reliability, availability, and scalability across environments. Design and maintain robust monitoring, alerting, and observability frameworks to reduce MTTR and improve visibility. Lead and manage initiatives related to data lineage, platform health, and alert hygiene. CI/CD and Automation Enhance and expand our CI … and operating production data platforms within AWS. Strong understanding of AWS core services: EventBridge, Lambda, EC2, S3, and MWAA (Managed Workflows for Apache Airflow). Experience with infrastructure reliability, observability tooling, and platform automation. Solid experience with CI/CD pipelines, preferably Bitbucket Pipelines. Familiarity with Snowflake administration and deployment practices. Comfortable working through ambiguity and in cross-functional, collaborative More ❯
South West London, London, United Kingdom Hybrid / WFH Options
John Lewis & Partners
the teams checks, your role in the team will be to mentor others in testing practice; coach them to adopt and improve their quality approaches including deployment approaches and observability; review and contribute to the teams codebase and pipeline configuration; help the team with their system of work from first business need to monitoring services in production. At all times … performance, resource usage, variable bandwidth, device compatibility, accessibility etc.) and advising on how these risks should be mitigated. Understanding operational and non-functional requirements (such as resilience, performance and observability) and how solutions are implemented and tested. Desirable skills/experience you may have Bitrise/Gitlab CI GraphQL Backend for Frontend (BFF) patterns Microservice Architectures Experience of cloud infrastructure More ❯
Who are we? Hi! We are Ravelin! We're a fraud detection company using advanced machine learning and network analysis technology to solve big problems. Our goal is to make online transactions safer and help our clients feel confident serving More ❯
Prime Video wide processes. - Communicate results and insights to both technical and non-technical audiences, including through presentations and written reports. About the team Our team owns Prime Video observability features for development teams. We consume PBs of logs daily which feed into multiple observability features focussed on reducing the customer impact time. In 2025, we are expanding our remit … to consume data from more sources to provide more holistic observability for our development teams. BASIC QUALIFICATIONS - Master's degree in engineering, technology, computer science, machine learning, robotics, operations research, statistics, mathematics or equivalent quantitative field - Experience programming in Java, C++, Python or related language - Experience with neural deep learning methods and machine learning - Experience in building machine learning models More ❯
Product Quality and Support Strategist, Alerting About The Position Coralogix is a modern, full-stack observability platform transforming how businesses process and understand their data. Our unique architecture powers in-stream analytics without reliance on expensive indexing or hot storage. We specialize in comprehensive monitoring of logs, metrics, traces, and security events with features such as APM, RUM, SIEM, Kubernetes … monitoring, and more, enhancing operational efficiency and reducing observability spending by up to 70%. We seek a Quality and Support Strategist professional who ensures that the Coralogix Alerting and Incident Management Platform and Process exceed the quality and reliability standards, establish a competitive edge, and prevent failures, profit loss, or work stoppages. You will be responsible for enhancing customer … team members are encouraged to challenge the status quo and contribute to our shared mission. If you thrive in dynamic environments and are eager to shape the future of observability solutions, we'd love to hear from you. Coralogix is an equal opportunity employer and encourages applicants from all backgrounds to apply. More ❯
to deployment and monitoring, balancing cutting-edge techniques with pragmatism to deliver measurable impact. • Apply strong software engineering principles, such as modularity, testing, code reviews, CI/CD and observability, to ensure AI systems are reliable, maintainable, production-ready and can be readily adapted to future developments. • Choose the right approach for the problem at hand, evaluating classical ML and … focused teams, collaborating with designers, engineers, and PMs, to scope and ship AI features iteratively • Ability to reason about system behavior end-to-end, including model performance, latency, and observability, and how these impact user experience. • Clear, structured communicator, comfortable documenting and defending architectural decisions and engaging in thoughtful technical debate. Not required, but it's a plus if you More ❯
Lead Full Stack Engineer London (Hybrid 1x Per week) Salary: Up to £100k + Benefits About Us Our client is a Insurtech Unicorn looking to expand their engineering team. With the opportunity to work on existing products which have generated More ❯
Job Title: AWS Engineer (OpenSearch) Work Location: London, UK (Hybrid) Job Description: We are looking for an experienced AWS Observability Engineer specializing in the Elasticsearch (ELS) Stack to design, implement, and optimize observability solutions across our cloud infrastructure. The ideal candidate will have hands-on experience in building OpenSearch infrastructure . Expertise in AWS services, log management, metric monitoring, and …/Opensearch infrastructure, Configuring Platform and applications to send logs to Open Search. Expertise in the ELS Stack (Elasticsearch, Logstash, Kibana). Designing centralized logging, building Kibana dashboards, optimizing observability with AWS services (CloudWatch, CloudTrail, S3), and automating workflows using Python or Terraform. Configure platforms/applications to forward logs to OpenSearch. Optimize log ingestion pipelines and troubleshoot issues. Collaborate More ❯
to support low latency applications. * Practical and working knowledge of IP networking and data flow within distributed systems. * Experience integrating ELK with packet capture/analysis tools to enhance observability of real-time systems. * Hands-on expertise in data pipeline creation, ingestion strategies, and performance tuning of Logstash and Beats for scalable telemetry. * Knowledge of Corvil and/or Pico … a deep understanding of the demands of high-frequency and algorithmic trading environments. * Working knowledge of security best practices, including RBAC, TLS, and audit logging in Elasticsearch. * Experience with observability platforms such as ITRS Geneos and their integration with ELK is a strong plus. * Comfortable with multi-site replication, cross-cluster search, and disaster recovery configurations for global deployment. * Strong … understanding of Linux systems, containers, and cloud-native observability stacks. * Organized, self-driven, and able to manage priorities in a dynamic, high-performance trading environment. More ❯
with the Enterprise Monitoring & Alerting (EMAS) team to deliver a transformative initiative aimed at maximising Dynatrace capabilities. We are looking for a skilled Dynatrace Admin/Consultant to enable observability across complex, hybrid cloud environments. Essential Skills: To thrive in this role, you must demonstrate extensive experience in designing and configuring within Dynatrace, including: Application Performance Monitoring Proficiency in Dynatrace … assets for monitoring. Work with EMAS to analyse Dynatrace coverage of these critical assets. Identify opportunities for enhancement in monitoring configurations across crucial applications. Review roles and responsibilities concerning observability and propose improvements focused on Operational Resilience. Contribute to establishing an automated end-to-end business flow for key business processes within the Dynatrace toolset. Ensure optimal alerting configurations in … collaboration with Application Stewards and SREs. Participate in workshops with third-party software suppliers to review observability standards. Bonus Points: Skills in correlating events across the full stack for root cause analysis. Key Attributes: Ability to manage competing priorities in a fast-paced environment. Flexibility and a pragmatic approach to problem-solving. A delivery-oriented mindset coupled with a can More ❯
+ Bonus Are you a forward-thinking Engineering Leader with a deep understanding of software engineering, cloud infrastructure, and SRE principles? Do you have a sharp eye for automation, observability, and leading technical teams through digital transformation at scale? If so, this could be the perfect opportunity to elevate your career at the forefront of banking innovation. This is a … Experience building and leading teams of SREs and Engineers - both onshore and offshore Expertise in cloud migration to AWS , distributed architecture, and open telemetry Strong exposure to SRE practices , observability frameworks, and automated monitoring Confident managing vulnerability , resiliency , and chaos engineering practices Track record of transforming large-scale platforms and delivering customer-centric tech Comfortable getting close to the code … hands-on involvement expected) What you'll be doing: Leading platform design, observability strategy, and automation-first service delivery Overseeing hosting migration of the Mortgages portfolio to the cloud Owning vulnerability and incident response models across critical services Collaborating with Architecture, Security, and Engineering Leaders to shape roadmaps Building and growing a high-performing team across critical banking domains Ensuring More ❯
this role combines technical leadership with hands-on engineering across a modern stack: AWS (Lambda, Step Functions, DynamoDB, Postgres, CDK), TypeScript, React, Next.js, Jest, Playwright, CI/CD and observability tools. You'll help establish and grow the Core Services team, building scalable architecture, developer tooling, and platform services that power multiple product squads. Expect around 70-80% coding and … Manager, Technical Lead, Senior Full Stack Engineer, Staff Engineer, AWS Lambda, AWS Step Functions, AWS DynamoDB, AWS Postgres, AWS CDK, TypeScript, React, Next.js, Jest, Playwright, CI/CD, DevOps, Observability, Monitoring, SaaS, HRTech, Hybrid Jobs London, Remote, Scale-up, Scalable Architecture. More ❯
deliver impact. Write clean, testable, and maintainable code with a focus on developer-driven quality. Apply strong CS fundamentals to design scalable, reliable, and efficient systems. Contribute to monitoring, observability, and performance optimisation of production systems. Work closely with multidisciplinary teams, sharing knowledge and solving problems collaboratively. Adapt quickly to changing priorities, delivering high-quality results at pace. What We … at scale. Full-stack experience or deep expertise in backend development. Strong CS fundamentals and fluency in Python. Experience owning and operating complex systems in production. Clear grasp of observability, monitoring, and performance tuning. Commitment to writing high-quality, testable code and improving engineering practices. Relevant degree (e.g. Computer Science, Mathematics, Engineering or similar). Pragmatic, collaborative mindset with strong More ❯