Actions, MLFlow, ZenML, or similar). Deep understanding of containerisation and orchestration tools (Docker, Kubernetes). Desirable Experience deploying AI inference engines (vLLM, Ray Serve, Triton). Familiarity with observability tools for LLMs (TruLens, Helicone, LangSmith). Understanding of AI safety and reliability frameworks (Guardrails AI). This is an exciting opportunity to help define the infrastructure powering the next More ❯
london (city of london), south east england, united kingdom
Amber Labs
Actions, MLFlow, ZenML, or similar). Deep understanding of containerisation and orchestration tools (Docker, Kubernetes). Desirable Experience deploying AI inference engines (vLLM, Ray Serve, Triton). Familiarity with observability tools for LLMs (TruLens, Helicone, LangSmith). Understanding of AI safety and reliability frameworks (Guardrails AI). This is an exciting opportunity to help define the infrastructure powering the next More ❯
level test solutions and automation frameworks using Python, Terraform, and modern cloud-native practices. Contribute to the platforms CI/CD pipeline by integrating automated testing, resilience checks, and observability hooks at every stage. Lead initiatives that drive testability, platform resilience, and validation as code across all layers of the ML platform stack. Collaborate with engineering, MLOps, and infrastructure teams More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Salt Search
non-functional requirements Deep understanding of microservices architecture , cloud-native applications, and API development Experience with distributed systems - managing workloads at scale using modern practices for availability, performance, and observability Knowledge and experience with Test-Driven Development (TDD) and automated testing frameworks Excellent collaboration and communication skills, with a track record of working effectively in Agile environments Familiarity with public More ❯
backend projects. • Knowledge of container orchestration (e.g., Kubernetes). • Experience with mobile application development (Android/iOS). • Knowledge of C# or other backend languages. • Familiarity with monitoring and observability tools (Grafana, Prometheus, etc.). • Experience with AI-assisted development tools (e.g., Copilot, ChatGPT integrations). Attributes & Behaviours • Clear, professional communication with customers and colleagues. • Strong problem-solving and troubleshooting More ❯
solutions. Work with AWS cloud-native services (Lambda, Step Functions, DynamoDB) to develop efficient cloud-based applications. Ensure CI/CD best practices, contributing to GitLab pipelines, automation, and observability improvements. Integrate AI-powered tools (e.g., GitHub Copilot) to enhance development workflows. Drive continuous improvement in performance, security, and maintainability. Support cross-squad collaboration, ensuring architectural consistency and code reusability. More ❯
tools. Strong hands-on technical skills in automation (Jenkins, GitLab, Docker, Kubernetes, Shell Scripting etc). Strong hands-on technical skills in automation, infrastructure as code, logging, monitoring and observability, infrastructure configuration, scripting languages and applications. Experience working in Cloud ecosystem- building and deploying workloads on cloud, preferably AWS. Associate Certification (preferable) in AWS Development and/or Architect or More ❯
Farnborough, Hampshire, England, United Kingdom Hybrid / WFH Options
Sopra Steria
scale secure cloud service. Domain orchestration. Developing workflows and tooling to automate processes and operations. Ensuring routine tasks are consistent, repeatable and scalable. Provisioning, managing and optimising infrastructure. Maintaining observability of the platform. Responding to alerts and incidents to ensure the availability of systems, interoperability and applications. Completing root cause analysis. Continually improving systems and processes to improve the efficiency More ❯
Kubernetes. Familiarity with SQL and NoSQL databases (Cassandra, Postgres), ideally combined with data collaboration platforms (Snowflake, Databricks) Strong DevOps mindset with experience in CI/CD pipelines, monitoring, and observability tools (Grafana or equivalent). Exposure to analytics, reporting, and BI tools such as Apache Superset, Lightdash or OpenSearch Willingness to work across the stack by contributing to API development More ❯
City of London, London, United Kingdom Hybrid / WFH Options
TreasurySpring
queuing technologies, i.e. RabbitMQ Experience of REST and/or GraphQL APIs Knowledge of the core AWS services: i.e. EC2/ECS, RDS, S3 Experience using DataDog or similar observability tools Knowledge of containerisation: Docker, Kubernetes, AWS Fargate etc Any experience of front-end or fullstack development using TypeScript & React Experience building software for financial services and/or investment More ❯
queuing technologies, i.e. RabbitMQ Experience of REST and/or GraphQL APIs Knowledge of the core AWS services: i.e. EC2/ECS, RDS, S3 Experience using DataDog or similar observability tools Knowledge of containerisation: Docker, Kubernetes, AWS Fargate etc Any experience of front-end or fullstack development using TypeScript & React Experience building software for financial services and/or investment More ❯
london, south east england, united kingdom Hybrid / WFH Options
TreasurySpring
queuing technologies, i.e. RabbitMQ Experience of REST and/or GraphQL APIs Knowledge of the core AWS services: i.e. EC2/ECS, RDS, S3 Experience using DataDog or similar observability tools Knowledge of containerisation: Docker, Kubernetes, AWS Fargate etc Any experience of front-end or fullstack development using TypeScript & React Experience building software for financial services and/or investment More ❯
slough, south east england, united kingdom Hybrid / WFH Options
TreasurySpring
queuing technologies, i.e. RabbitMQ Experience of REST and/or GraphQL APIs Knowledge of the core AWS services: i.e. EC2/ECS, RDS, S3 Experience using DataDog or similar observability tools Knowledge of containerisation: Docker, Kubernetes, AWS Fargate etc Any experience of front-end or fullstack development using TypeScript & React Experience building software for financial services and/or investment More ❯
london (city of london), south east england, united kingdom Hybrid / WFH Options
TreasurySpring
queuing technologies, i.e. RabbitMQ Experience of REST and/or GraphQL APIs Knowledge of the core AWS services: i.e. EC2/ECS, RDS, S3 Experience using DataDog or similar observability tools Knowledge of containerisation: Docker, Kubernetes, AWS Fargate etc Any experience of front-end or fullstack development using TypeScript & React Experience building software for financial services and/or investment More ❯
services/message buses and other architectural elements Deploy these applications using features such as containers to cloud leveraging CI/CD to support this process backed with good observability when running these in production Ensure quality through the creation of documentation and use of unit/integration/contract testing with a consideration of security/performance requirements. We More ❯
bradford, yorkshire and the humber, united kingdom
Accenture
services/message buses and other architectural elements Deploy these applications using features such as containers to cloud leveraging CI/CD to support this process backed with good observability when running these in production Ensure quality through the creation of documentation and use of unit/integration/contract testing with a consideration of security/performance requirements. We More ❯
and budgeting. 3) Platform Development & Process Management Co-develop with systematic team and IT a scalable data/quant architecture (data pipelines, model services, APIs) and embed SRE practices (observability, resilience, cost efficiency). Lead automation of portfolio alignment and sustainability reporting ; maintain production health, troubleshoot incidents, and drive continuous improvement. 4) Client & Stakeholder Delivery Address day‐to‐day queries More ❯
Bristol, Gloucestershire, United Kingdom Hybrid / WFH Options
Hargreaves Lansdown PLC
Java 11+, Springboot, RDBMS and SQL Experience with unit, integration, and end-to-end testing tools and practices Experience with CI/CD and Trunk Based Development Advocate for observability, experienced in monitoring, logging, and tracing to ensure system reliability and performance Awareness of website performance implications, best practices and other non-functional requirements Proficient in collaborative code reviews, technical More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Hargreaves Lansdown
Figma and Vercel is a big plus. Capable of writing clean, maintainable and well-tested code. Comfortable working in on-prem and cloud-native environments with an interest in observability, using tools like Prometheus and Grafana to keep services healthy and maintainable. Familiarity with AWS services and how to integrate them into modern applications. A keen focus on quality and More ❯
services. About You Solid experience building and deploying services with Java and Spring Boot. Comfort working in a cloud-native environment - Kubernetes (EKS), containers, scaling etc. An interest in observability, using tools like Prometheus and Grafana to keep services healthy and understand usage patterns. Familiarity with AWS services and how to integrate them into modern applications. A keen focus on More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Hargreaves Lansdown
Figma and Vercel is a big plus. Capable of writing clean, maintainable and well-tested code. Comfortable working in on-prem and cloud-native environments with an interest in observability, using tools like Prometheus and Grafana to keep services healthy and maintainable. Familiarity with AWS services and how to integrate them into modern applications. A keen focus on quality and More ❯
not static and must evolve over time as technology and standards change. You are not afraid to dive deep - writing code, defining standards around CI/CD, maximizing automation, observability and supportability whilst making sure solutions are cost effective. A confident communicator, you will lead with data when collaborating with stakeholders. You will lead by example and mentor more junior More ❯
scalability and reduce manual intervention. Operational Security, SRE & Assurance: Ensure security platforms are resilient, continuously monitored, and designed for 24x7 support and incident response readiness. Embed security telemetry and observability to enable proactive threat detection and automated response. Apply SRE principles to improve reliability, performance, and maintainability of security services. Lead platform health, patching automation, and vulnerability remediation workflows. Define More ❯
innovation, LSEG is a place where everyone can grow, develop and fulfil your potential with meaningful careers. ROLE SUMMARY: LSEG is seeking a skilled and forward-thinking Telemetry and Observability Engineer to join our growing Network Engineering function. This role is critical to defining and implementing our observability strategy, ensuring our systems are measurable, reliable, and continuously improving. You will … work at the intersection of software engineering, SRE, and platform operations, helping teams across LSEG gain actionable insights from telemetry data and maximise the value of our observability tooling. WHAT YOU'LL BE DOING: Design and implement scalable telemetry pipelines for metrics, logs, traces, and events across distributed systems. Develop and maintain observability standards, NMS tooling, dashboards, alerting frameworks, and … SLOs in collaboration with product and platform teams. Champion best practices in instrumentation, monitoring, and incident response across engineering teams. Integrate and optimise observability tools (e.g., OpenTelemetry, Prometheus, Grafana, Splunk, Elastic, etc.) within the NPS ecosystem. Collaborate cross-functionally to ensure observability is embedded into the SDLC and CI/CD pipelines. Drive adoption of observability platforms through enablement, documentation More ❯
containers, Kubernetes, serverless architectures). CI/CD, cloud automation, and AI-powered scripting/plugins (e.g., GitHub Copilot). Enterprise networking concepts (VPC, VPN, hybrid connectivity). Monitoring, observability, and cost optimization. Soft Skills: Excellent communication with technical and business stakeholders. Proven leadership and mentoring of cross-functional teams. Strong problem-solving in high-stakes, regulated environments. Proactive in More ❯