Observability Jobs in the City of London

251 to 275 of 289 Observability Jobs in the City of London

Senior TypeScript Back-End Engineer

london (city of london), south east england, united kingdom
Wave Talent
design, build, and support extensible, low-maintenance back-end services. Partner with Product, Design, Operations, and Growth to prioritise customer-facing and internal problems that drive value. Champion security, observability, and reliability best practices. Mentor teammates and help cultivate a healthy, innovative engineering culture. Tech Stack: Core: TypeScript/JavaScript (server-side) Cloud & Infra: AWS (cloud-based architectures), Docker, CI …/CD; IaC such as Terraform or CloudFormation Quality & Ops: Security and observability tooling/best practices Bonus exposure: React on the front end (nice to have) What we’re looking for We value skill and impact over strict year counts. You should have: Strong TypeScript/JavaScript fundamentals and experience building high-traffic server-side web applications. Solid understanding … of cloud-based application architecture (preferably AWS). Hands-on experience with Docker and CI/CD tooling. Practical grasp of security and observability best practices. Clear, collaborative communication and leadership skills. Why join? Work on meaningful problems in a fun, healthy, productive environment. Competitive package: £80k–£100k + Bonus, up to 10% employer pension, 28 days holiday (plus bank More ❯
Posted:

AWS Cloud Engineer

City of London, London, United Kingdom
Hybrid / WFH Options
Advanced Resource Managers
manage and support a customer’s AWS and Data platform To be technical hands on Provide Incident and problem management on the AWS IaaS and PaaS Platform Monitoring and observability of system and platform performance Collaboration with development and build teams on application and platform deployments and changes Involvement in the resolution of Incidents and problems in an efficient and … timely manner Actively monitor an AWS platform and components for technical issues Implement and improve on existing monitoring and observability solution To be involved in the resolution of technical incidents tickets Assist in the root cause analysis of incidents Assist with improving efficiency and processes within the team Examining traces and logs Escalate incidents and problems to the appropriate teams More ❯
Posted:

AWS Cloud Engineer

london (city of london), south east england, united kingdom
Hybrid / WFH Options
Advanced Resource Managers
manage and support a customer’s AWS and Data platform To be technical hands on Provide Incident and problem management on the AWS IaaS and PaaS Platform Monitoring and observability of system and platform performance Collaboration with development and build teams on application and platform deployments and changes Involvement in the resolution of Incidents and problems in an efficient and … timely manner Actively monitor an AWS platform and components for technical issues Implement and improve on existing monitoring and observability solution To be involved in the resolution of technical incidents tickets Assist in the root cause analysis of incidents Assist with improving efficiency and processes within the team Examining traces and logs Escalate incidents and problems to the appropriate teams More ❯
Posted:

GenAI Engineer

City of London, London, United Kingdom
Clarity (formerly Anecdote)
up and harden RAG pipelines (indexing, retrieval policies, grounding, guardrails) and agent frameworks. Take basic infra ownership on GCP (or AWS/Azure): networking, autoscaling, CI/CD, IaC, observability, and cost tuning. Participate in on‐call for your area and drive root‐cause analysis with crisp follow‐ups. 15% Collaborate Pair with back‐end & front‐end to wire extractors … evals; hands‐on with time‐series analysis (forecasting, change‐point, drift). Cloud & ops: Basic infra ownership on GCP (or AWS/Azure): networking, autoscaling, CI/CD, IaC, observability, and cost control. Communication: You explain results clearly, align stakeholders, and write crisp docs. Bonus points DevOps wizardry; GPU/accelerator experience. Multimodal pipelines (text + voice + screenshots). More ❯
Posted:

GenAI Engineer

london (city of london), south east england, united kingdom
Clarity (formerly Anecdote)
up and harden RAG pipelines (indexing, retrieval policies, grounding, guardrails) and agent frameworks. Take basic infra ownership on GCP (or AWS/Azure): networking, autoscaling, CI/CD, IaC, observability, and cost tuning. Participate in on‐call for your area and drive root‐cause analysis with crisp follow‐ups. 15% Collaborate Pair with back‐end & front‐end to wire extractors … evals; hands‐on with time‐series analysis (forecasting, change‐point, drift). Cloud & ops: Basic infra ownership on GCP (or AWS/Azure): networking, autoscaling, CI/CD, IaC, observability, and cost control. Communication: You explain results clearly, align stakeholders, and write crisp docs. Bonus points DevOps wizardry; GPU/accelerator experience. Multimodal pipelines (text + voice + screenshots). More ❯
Posted:

Staff Site Reliability Engineer - Observability

City of London, London, United Kingdom
Hybrid / WFH Options
Motive Group
Senior/Staff Site Reliability Engineer - Observability | London (Hybrid) If you care deeply about building and operating world-class infrastructure for AI at scale , this one’s worth your time. We’re working with a company that builds the backbone powering some of the most demanding AI workloads on the planet. Think large-scale GPU clusters, global telemetry systems, and … distributed training environments used by leading research and enterprise teams. They’re looking for a Senior or Staff SRE with deep experience in observability at massive scale - someone who’s tuned Prometheus/Mimir, Loki, or Tempo clusters beyond 100M+ series or 10TB/day logs, and who thrives in highly technical, fast-moving environments. You’ll be working on … Designing and scaling observability for globally distributed GPU infrastructure Building automation that cuts operational toil and improves reliability Partnering with platform and infrastructure teams to deliver true visibility across complex AI systems If you’ve built or operated telemetry stacks for large-scale, GPU-heavy, or multi-tenant environments - and want to work on cutting-edge problems in a business More ❯
Posted:

Staff Site Reliability Engineer - Observability

london (city of london), south east england, united kingdom
Hybrid / WFH Options
Motive Group
Senior/Staff Site Reliability Engineer - Observability | London (Hybrid) If you care deeply about building and operating world-class infrastructure for AI at scale , this one’s worth your time. We’re working with a company that builds the backbone powering some of the most demanding AI workloads on the planet. Think large-scale GPU clusters, global telemetry systems, and … distributed training environments used by leading research and enterprise teams. They’re looking for a Senior or Staff SRE with deep experience in observability at massive scale - someone who’s tuned Prometheus/Mimir, Loki, or Tempo clusters beyond 100M+ series or 10TB/day logs, and who thrives in highly technical, fast-moving environments. You’ll be working on … Designing and scaling observability for globally distributed GPU infrastructure Building automation that cuts operational toil and improves reliability Partnering with platform and infrastructure teams to deliver true visibility across complex AI systems If you’ve built or operated telemetry stacks for large-scale, GPU-heavy, or multi-tenant environments - and want to work on cutting-edge problems in a business More ❯
Posted:

Site Reliability Engineer - AWS - Grafana - Cloudwatch - ELK - UK Remote

City of London, London, United Kingdom
Hybrid / WFH Options
Opus Recruitment Solutions
AWS | GCP | SRE | Site Reliability Engineer | Terraform | Cloudformation | ECS | ELK | Elasticsearch | Logstash | Kabana | Cloudwatch | Grafana | Windows | Observability Are you looking for a genuinely Remote opportunity? Somewhere you're part of something bigger, working on a global product within a close-knit SRE team? I've partnered a WebApp that provide an end to end event management for some of the … planet's biggest artists and they're now looking for a SRE. Someone that knows their way around classic Observability with Grafana, ELK stack, and cost optomisation for the product as they continue scaling. Working across the glove their multi-tenanted, AWS environments requires someone who is able to reverse engineer product faults, or post incident audits to ensure future … like to hear more, send over a CV to robin.shaw@opusrs.com or apply! AWS | GCP | SRE | Site Reliability Engineer | Terraform | Cloudformation | ECS | ELK | Elasticsearch | Logstash | Kabana | Cloudwatch | Grafana | Windows | Observability More ❯
Posted:

Senior Data Engineer

City of London, London, United Kingdom
Hybrid / WFH Options
Identify Solutions
the past year and aggressive expansion across the UK, US, and EU, the company is scaling at pace. Data is the backbone: from APIs and pipelines to governance and observability, their data platform directly powers customer-facing products and AI-driven insights. They’re now hiring a Senior Data Engineer to own and shape this platform, building scalable, production-grade … systems that become the foundation for global brands. Why join? ✨ Greenfield impact – inherit a live but early platform, define best practice across structure, testing, observability, and governance. ✨ Direct product impact – your APIs, pipelines, and orchestration power the platform that 1,000+ brands rely on every day. ✨ AI at the core – work on infrastructure that enables machine learning and intelligent decision … doing: API strategy & development – own and scale FastAPI endpoints that deliver real-time access to platform data. Data pipeline development – build ingestion and replication pipelines with best-in-class observability, latency, and resilience. Platform technical vision – influence architecture and orchestration, shaping how the business handles data at scale. Data quality & governance – embed testing, freshness, lineage, and monitoring to ensure reliability More ❯
Posted:

Senior Data Engineer

london (city of london), south east england, united kingdom
Hybrid / WFH Options
Identify Solutions
the past year and aggressive expansion across the UK, US, and EU, the company is scaling at pace. Data is the backbone: from APIs and pipelines to governance and observability, their data platform directly powers customer-facing products and AI-driven insights. They’re now hiring a Senior Data Engineer to own and shape this platform, building scalable, production-grade … systems that become the foundation for global brands. Why join? ✨ Greenfield impact – inherit a live but early platform, define best practice across structure, testing, observability, and governance. ✨ Direct product impact – your APIs, pipelines, and orchestration power the platform that 1,000+ brands rely on every day. ✨ AI at the core – work on infrastructure that enables machine learning and intelligent decision … doing: API strategy & development – own and scale FastAPI endpoints that deliver real-time access to platform data. Data pipeline development – build ingestion and replication pipelines with best-in-class observability, latency, and resilience. Platform technical vision – influence architecture and orchestration, shaping how the business handles data at scale. Data quality & governance – embed testing, freshness, lineage, and monitoring to ensure reliability More ❯
Posted:

DevOps Engineer

City of London, London, United Kingdom
Tribus
frameworks that support thousands of real-time processes across global markets. This isn’t a maintenance role - it’s an opportunity to modernise the firm’s CI/CD, observability, and runtime environments from the ground up. What you’ll be doing: Engineering and optimising CI/CD pipelines and container orchestration at scale Modernising Linux-based deployment and runtime … low-latency environment What we’re looking for: 5+ years’ experience in DevOps, Systems, or Platform Engineering Deep knowledge of Linux, Python, and shell scripting Proven experience with Kubernetes, observability tooling, and CI/CD frameworks Strong grasp of distributed systems and performance tuning Trading, Crypto or Hedge Fund Experience Experience working in low-latency, high-frequency trading environments Why More ❯
Posted:

DevOps Engineer

london (city of london), south east england, united kingdom
Tribus
frameworks that support thousands of real-time processes across global markets. This isn’t a maintenance role - it’s an opportunity to modernise the firm’s CI/CD, observability, and runtime environments from the ground up. What you’ll be doing: Engineering and optimising CI/CD pipelines and container orchestration at scale Modernising Linux-based deployment and runtime … low-latency environment What we’re looking for: 5+ years’ experience in DevOps, Systems, or Platform Engineering Deep knowledge of Linux, Python, and shell scripting Proven experience with Kubernetes, observability tooling, and CI/CD frameworks Strong grasp of distributed systems and performance tuning Trading, Crypto or Hedge Fund Experience Experience working in low-latency, high-frequency trading environments Why More ❯
Posted:

Staff Software Engineer

City of London, London, United Kingdom
Burns Sheehan
to solve complex challenges. Drive innovation around cloud-native technologies and platform automation. Balance strategic vision with ~30% hands-on coding and design work. Promote best practice in reliability, observability, and scalability. The Ideal Staff Software Engineer Proven experience operating at Staff+ level within a fast-paced engineering organisation. Strong background in cloud platforms (AWS or GCP) and deep knowledge … ability to build operators. Strong coding skills in Golang, Java, or C#, with experience in distributed systems. Demonstrated leadership across multiple squads and technical roadmaps. Expertise in operational excellence: observability, reliability, automation. This is an outstanding opportunity for a Staff Software Engineer join a rapidly scaling company where you’ll play a pivotal role in shaping the technical foundations of More ❯
Posted:

Staff Software Engineer

london (city of london), south east england, united kingdom
Burns Sheehan
to solve complex challenges. Drive innovation around cloud-native technologies and platform automation. Balance strategic vision with ~30% hands-on coding and design work. Promote best practice in reliability, observability, and scalability. The Ideal Staff Software Engineer Proven experience operating at Staff+ level within a fast-paced engineering organisation. Strong background in cloud platforms (AWS or GCP) and deep knowledge … ability to build operators. Strong coding skills in Golang, Java, or C#, with experience in distributed systems. Demonstrated leadership across multiple squads and technical roadmaps. Expertise in operational excellence: observability, reliability, automation. This is an outstanding opportunity for a Staff Software Engineer join a rapidly scaling company where you’ll play a pivotal role in shaping the technical foundations of More ❯
Posted:

Principal Platform Engineer | Fintech | London | Up to £180k + Equity

City of London, London, United Kingdom
Maze
at Tier 1 banks. We're looking for a Principal Platform Engineer to drive the infrastructure behind mission-critical systems: think active-active, five-nines uptime, and real-time observability at global scale. What You'll Do: Own platform architecture for our next-gen ledger infrastructure Scale multi-region Kubernetes environments across cloud & on-prem Harden distributed systems (Kafka, Redis … CockroachDB) for global banking workloads Lead our AI-powered SRE approach: observability, remediation, and auto-response Enforce zero-trust, multi-tenant security and compliance (SOC2, ISO 27001) Define IaC foundations (Terraform, GitOps, Helm) What We're Looking For: Expert with Kubernetes and Distributed Systems Experience building production infrastructure at scale (multi-region, high-availability) Extensive experience building both on-Prem More ❯
Posted:

Principal Site Reliability Engineer | Stealth Fintech | London | Up to 180k + Equity

City of London, Greater London, UK
Maze
at Tier 1 banks. We're looking for a Principal Platform Engineer to drive the infrastructure behind mission-critical systems: think active-active, five-nines uptime, and real-time observability at global scale. What You'll Do: Own platform architecture for our next-gen ledger infrastructure Scale multi-region Kubernetes environments across cloud & on-prem Harden distributed systems (Kafka, Redis … CockroachDB) for global banking workloads Lead our AI-powered SRE approach: observability, remediation, and auto-response Enforce zero-trust, multi-tenant security and compliance (SOC2, ISO 27001) Define IaC foundations (Terraform, GitOps, Helm) What We're Looking For: Expert with Kubernetes and Distributed Systems Experience building production infrastructure at scale (multi-region, high-availability) Extensive experience building both on-Prem More ❯
Employment Type: Part-time
Posted:

Principal Platform Engineer | Fintech | London | Up to £180k + Equity

london (city of london), south east england, united kingdom
Maze
at Tier 1 banks. We're looking for a Principal Platform Engineer to drive the infrastructure behind mission-critical systems: think active-active, five-nines uptime, and real-time observability at global scale. What You'll Do: Own platform architecture for our next-gen ledger infrastructure Scale multi-region Kubernetes environments across cloud & on-prem Harden distributed systems (Kafka, Redis … CockroachDB) for global banking workloads Lead our AI-powered SRE approach: observability, remediation, and auto-response Enforce zero-trust, multi-tenant security and compliance (SOC2, ISO 27001) Define IaC foundations (Terraform, GitOps, Helm) What We're Looking For: Expert with Kubernetes and Distributed Systems Experience building production infrastructure at scale (multi-region, high-availability) Extensive experience building both on-Prem More ❯
Posted:

Data Engineer

City of London, London, United Kingdom
83zero Ltd
to translate complex business requirements into data-driven solutions. Write production-grade SQL and ensure data quality through testing, documentation, and version control. Promote best practices around data reliability, observability, and maintainability. (Optional but valued) Contribute to Infrastructure as Code and CI/CD pipelines (e.g., Terraform, GitHub Actions). Skills & Experience 5+ years of experience in data-focused roles … other data visualisation tools. Familiarity with orchestration tools such as Airflow, Prefect, or Dagster. Understanding of CI/CD practices in data and analytics engineering. Knowledge of data governance, observability, and security best practices in cloud environments. More ❯
Employment Type: Permanent
Salary: £90000 - £100000/annum
Posted:

Data Engineer

london (city of london), south east england, united kingdom
83data
to translate complex business requirements into data-driven solutions. Write production-grade SQL and ensure data quality through testing, documentation, and version control. Promote best practices around data reliability, observability, and maintainability. (Optional but valued) Contribute to Infrastructure as Code and CI/CD pipelines (e.g., Terraform, GitHub Actions). Skills & Experience 5+ years of experience in data-focused roles … other data visualisation tools. Familiarity with orchestration tools such as Airflow, Prefect, or Dagster. Understanding of CI/CD practices in data and analytics engineering. Knowledge of data governance, observability, and security best practices in cloud environments. More ❯
Posted:

Reliability & Operations Engineer

City of London, London, United Kingdom
Heart Mind Talent
Heart Mind Talent are partnering with Verified Global to hire a Reliability & Operations Engineer based in Central London. Verified Global builds cutting-edge algorithms to flip the odds in sports betting. Every hour, millions of fans place sub-optimal bets. More ❯
Posted:

Reliability & Operations Engineer

london (city of london), south east england, united kingdom
Heart Mind Talent
Heart Mind Talent are partnering with Verified Global to hire a Reliability & Operations Engineer based in Central London. Verified Global builds cutting-edge algorithms to flip the odds in sports betting. Every hour, millions of fans place sub-optimal bets. More ❯
Posted:

Staff Backend Engineer (AI Lab | £170,000)

City of London, London, United Kingdom
Hybrid / WFH Options
Paradigm Talent
Role: Staff Software Engineer (Python | Backend | Infrastructure) Location: Hybrid - 2-3 days in London Office Compensation: Up to £170,000 + equity We’re working with a frontier AI lab pushing the boundaries of computational biology, combining machine learning, cloud More ❯
Posted:

Staff Backend Engineer (AI Lab | £170,000)

london (city of london), south east england, united kingdom
Hybrid / WFH Options
Paradigm Talent
Role: Staff Software Engineer (Python | Backend | Infrastructure) Location: Hybrid - 2-3 days in London Office Compensation: Up to £170,000 + equity We’re working with a frontier AI lab pushing the boundaries of computational biology, combining machine learning, cloud More ❯
Posted:

Staff Software Engineer

City of London, London, United Kingdom
SR2 | Socially Responsible Recruitment | Certified B Corporation™
Rust is a bonus) and experience designing distributed systems, REST APIs, and microservices Data & Infrastructure: Experience with Kafka or similar streaming technologies, PostgreSQL, and time-series data management DevOps & Observability: Familiar with Docker, monitoring, and observability tools Bonus Points For: Prior experience with energy markets, trading systems, industrial control systems, or IoT environments If you’re passionate about applying your More ❯
Posted:

Staff Software Engineer

london (city of london), south east england, united kingdom
SR2 | Socially Responsible Recruitment | Certified B Corporation™
Rust is a bonus) and experience designing distributed systems, REST APIs, and microservices Data & Infrastructure: Experience with Kafka or similar streaming technologies, PostgreSQL, and time-series data management DevOps & Observability: Familiar with Docker, monitoring, and observability tools Bonus Points For: Prior experience with energy markets, trading systems, industrial control systems, or IoT environments If you’re passionate about applying your More ❯
Posted:
Observability
the City of London
10th Percentile
£73,125
25th Percentile
£73,750
Median
£85,000
75th Percentile
£105,000