Observability Jobs in the City of London

151 to 175 of 293 Observability Jobs in the City of London

Machine Learning Engineer (Conversational AI)

london (city of london), south east england, united kingdom
Hybrid / WFH Options
Amber Labs
environments. Excellent communication skills and a strong interest in the application of AI in public services. Desirable: Experience with multi-agent orchestration (LangGraph, AutoGen, CrewAI). Familiarity with AI observability tools (TruLens, Helicone). Awareness of AI safety and reliability frameworks (Guardrails AI). Experience working in government or public sector digital projects . More ❯
Posted:

Lead Data Engineer

City of London, London, United Kingdom
Hybrid / WFH Options
Immersum
Hands-on experience with cloud platforms (AWS preferred) . Excellent problem-solving skills and a “get stuff done” attitude. Nice to Have Experience with Grafana or similar monitoring/observability tools. API endpoint design & maintenance. Prior experience in fast-scaling startups or international data systems. Why Join Us Work directly with leadership (CEO and core team) to influence the company More ❯
Posted:

Lead Data Engineer

london (city of london), south east england, united kingdom
Hybrid / WFH Options
Immersum
Hands-on experience with cloud platforms (AWS preferred) . Excellent problem-solving skills and a “get stuff done” attitude. Nice to Have Experience with Grafana or similar monitoring/observability tools. API endpoint design & maintenance. Prior experience in fast-scaling startups or international data systems. Why Join Us Work directly with leadership (CEO and core team) to influence the company More ❯
Posted:

Site Reliability Engineer

City of London, London, United Kingdom
REVYBE IT RECRUITMENT LIMITED
their core software products. Expect a collaborative engineering culture, modern cloud-native stack, and plenty of freedom to influence tooling, architecture, and reliability practices. If youre passionate about automation, observability, and designing systems that just dont fail , this is the perfect environment for you. Tech Stack Cloud: AWS (EC2, RDS, S3, IAM, Lambda, CloudWatch) Containerisation & Orchestration: Docker, Kubernetes (EKS) Infrastructure … as Code: Terraform Configuration Management: Ansible Monitoring & Observability: Prometheus, Grafana, ELK Stack CI/CD: GitHub Actions Scripting & Automation: Python, Bash, or Go What Youll Be Doing Designing and maintaining reliable, scalable, and secure infrastructure for production systems. Automating operational tasks and improving system efficiency. Implementing observability tooling to monitor system health, performance, and capacity. Working closely with development teams … how reliability and performance are engineered at scale. Work with talented developers and DevOps engineers in a collaborative environment. AWS | Site Reliability | SRE | Cloud | Kubernetes | Terraform | CI/CD | Observability | Python | Go | Automation Click APPLY NOW to be considered for this position! Follow ReVybe IT Recruitment to stay up to date with the latest Cloud, Platform & SRE opportunities. More ❯
Employment Type: Permanent
Salary: £75,000
Posted:

DevOps Engineer

City of London, London, United Kingdom
Venture Up
GitLab CI, GitHub Actions, Jenkins, or similar) Proficiency in scripting languages (Python, Bash, or Ruby) Experience with containerization technologies (Docker, and ideally Kubernetes or ECS) Knowledge of monitoring and observability tools (Prometheus, Grafana, CloudWatch, or similar) Understanding of networking concepts, security best practices, and IAM management Experience with configuration management and automation tools Strong problem-solving skills and ability to … and optimising cloud spend to ensure cost efficiency Owning CI/CD pipelines, improving speed and reliability of deployments Providing internal tooling and automation to support development teams Improving observability through logging, metrics, and tracing Ensuring high availability, scalability, and disaster recovery planning Managing security, IAM policies, and compliance best practices Working closely with developers to support deployments and testing More ❯
Posted:

DevOps Engineer

london (city of london), south east england, united kingdom
Venture Up
GitLab CI, GitHub Actions, Jenkins, or similar) Proficiency in scripting languages (Python, Bash, or Ruby) Experience with containerization technologies (Docker, and ideally Kubernetes or ECS) Knowledge of monitoring and observability tools (Prometheus, Grafana, CloudWatch, or similar) Understanding of networking concepts, security best practices, and IAM management Experience with configuration management and automation tools Strong problem-solving skills and ability to … and optimising cloud spend to ensure cost efficiency Owning CI/CD pipelines, improving speed and reliability of deployments Providing internal tooling and automation to support development teams Improving observability through logging, metrics, and tracing Ensuring high availability, scalability, and disaster recovery planning Managing security, IAM policies, and compliance best practices Working closely with developers to support deployments and testing More ❯
Posted:

Senior Platform Engineer

City Of London, England, United Kingdom
develop
Apply strong networking knowledge to optimise performance, security, and reliability. Ensure compliance with financial services regulations and internal security policies. Contribute to CI/CD pipelines and cloud-native observability solutions. Explore and integrate emerging technologies including AI/LLM-based solutions to enhance automation and operational efficiency. Key Skills & Experience Essential: Strong hands-on experience with AWS (EC2, VPC … or working with AI/LLM solutions Familiarity with Terraform, Ansible, GitLab CI/CD, or similar tools Exposure to financial services or other highly regulated industries Experience with observability stacks (Prometheus, Grafana, ELK, etc. More ❯
Posted:

Senior Platform Engineer

london (city of london), south east england, united kingdom
develop
Apply strong networking knowledge to optimise performance, security, and reliability. Ensure compliance with financial services regulations and internal security policies. Contribute to CI/CD pipelines and cloud-native observability solutions. Explore and integrate emerging technologies including AI/LLM-based solutions to enhance automation and operational efficiency. Key Skills & Experience Essential: Strong hands-on experience with AWS (EC2, VPC … or working with AI/LLM solutions Familiarity with Terraform, Ansible, GitLab CI/CD, or similar tools Exposure to financial services or other highly regulated industries Experience with observability stacks (Prometheus, Grafana, ELK, etc. More ❯
Posted:

Linux Production Engineer

City of London, London, United Kingdom
Autonomai Recruitment
distributed systems Contribute to ongoing improvements in reliability, latency, and scalability Qualifications: Linux expertise with a solid understanding of networking and containerisation Proficiency in at least Python Experience with observability tooling Proven track record in designing and maintaining highly distributed systems Apply now for a confidential chat More ❯
Posted:

Linux Production Engineer

london (city of london), south east england, united kingdom
Autonomai Recruitment
distributed systems Contribute to ongoing improvements in reliability, latency, and scalability Qualifications: Linux expertise with a solid understanding of networking and containerisation Proficiency in at least Python Experience with observability tooling Proven track record in designing and maintaining highly distributed systems Apply now for a confidential chat More ❯
Posted:

Cloud Engineer - Azure

City of London, London, United Kingdom
Vallum Associates
Azure Container Instances), and ACA (Azure Container Apps). Create and maintain comprehensive documentation on newly implemented features suitable for an enterprise environment. Design and implement robust monitoring and observability tools to track container performance and health. Automate testing processes by utilizing public cloud elasticity and ephemeral resources, ensuring streamlined operations and reduced manual efforts. Contribute to the software development More ❯
Posted:

Cloud Engineer - Azure

london (city of london), south east england, united kingdom
Vallum Associates
Azure Container Instances), and ACA (Azure Container Apps). Create and maintain comprehensive documentation on newly implemented features suitable for an enterprise environment. Design and implement robust monitoring and observability tools to track container performance and health. Automate testing processes by utilizing public cloud elasticity and ephemeral resources, ensuring streamlined operations and reduced manual efforts. Contribute to the software development More ❯
Posted:

Software Engineer

City of London, London, United Kingdom
Hybrid / WFH Options
SR2 | Socially Responsible Recruitment | Certified B Corporation™
ideally Python , Rust is a bonus Experience with distributed systems, REST APIs, and microservices Knowledge of Kafka (or similar), PostgreSQL , and time-series data Familiar with Docker, monitoring, and observability tools ✅ Experience in a startup or scale-up , collaborating closely with engineers in a fast-moving environment Bonus points if you’ve worked in energy markets, trading systems, industrial control More ❯
Posted:

Python Software Engineer

City of London, London, United Kingdom
Durlston Partners
and maintain systems for real-time market data, trade execution, and reconciliation. Optimise performance and scalability across key trading infrastructure components. Partner with cross-functional teams to improve tooling, observability, and automation. Deliver robust, production-ready solutions in a fast-paced environment. Requirements Min 2+ years of Python experience in a professional setting (HFT, trading, fintech, or other high-performance More ❯
Posted:

Python Software Engineer

london (city of london), south east england, united kingdom
Durlston Partners
and maintain systems for real-time market data, trade execution, and reconciliation. Optimise performance and scalability across key trading infrastructure components. Partner with cross-functional teams to improve tooling, observability, and automation. Deliver robust, production-ready solutions in a fast-paced environment. Requirements Min 2+ years of Python experience in a professional setting (HFT, trading, fintech, or other high-performance More ❯
Posted:

GenAI Solution Architect

City of London, London, United Kingdom
Capgemini
Establish best practices for prompt engineering, model safety, bias mitigation, and responsible AI. Ensure compliance with data privacy regulations (GDPR, HIPAA, etc.) and internal governance policies. Define monitoring and observability strategies for GenAI systems in production. Stakeholder Engagement Translate business requirements into technical specifications and solution blueprints. Present architectural decisions and trade-offs to technical and non-technical stakeholders. Support More ❯
Posted:

GenAI Solution Architect

london (city of london), south east england, united kingdom
Capgemini
Establish best practices for prompt engineering, model safety, bias mitigation, and responsible AI. Ensure compliance with data privacy regulations (GDPR, HIPAA, etc.) and internal governance policies. Define monitoring and observability strategies for GenAI systems in production. Stakeholder Engagement Translate business requirements into technical specifications and solution blueprints. Present architectural decisions and trade-offs to technical and non-technical stakeholders. Support More ❯
Posted:

Platform Engineer - AWS

City Of London, England, United Kingdom
Hybrid / WFH Options
Harrington Starr
enhance CI/CD pipelines using Git and Python. Implement Infrastructure as Code practices using Terraform. Manage containerised environments with Docker or Kubernetes. Collaborate with dev teams to improve observability, deployment processes, and platform reliability. Build observability and monitoring solutions using Grafana, integrating key metrics to support proactive platform operations. Create and enforce internal standards for infrastructure, CI/CD More ❯
Posted:

Linux on Z / zSystems Engineer

City of London, London, United Kingdom
Cognizant
automation tools: Ansible, Terraform or equivalent. Understanding of mainframe hardware/IO concepts (channel subsystem, FICON, zSeries fundamentals) and basic z/OS interplay. Good knowledge of monitoring/observability (CA WatchTower) and incident management (ITIL practices). More ❯
Posted:

Platform Engineer - AWS

london (city of london), south east england, united kingdom
Hybrid / WFH Options
Harrington Starr
enhance CI/CD pipelines using Git and Python. Implement Infrastructure as Code practices using Terraform. Manage containerised environments with Docker or Kubernetes. Collaborate with dev teams to improve observability, deployment processes, and platform reliability. Build observability and monitoring solutions using Grafana, integrating key metrics to support proactive platform operations. Create and enforce internal standards for infrastructure, CI/CD More ❯
Posted:

Linux on Z / zSystems Engineer

london (city of london), south east england, united kingdom
Cognizant
automation tools: Ansible, Terraform or equivalent. Understanding of mainframe hardware/IO concepts (channel subsystem, FICON, zSeries fundamentals) and basic z/OS interplay. Good knowledge of monitoring/observability (CA WatchTower) and incident management (ITIL practices). More ❯
Posted:

Vice President of Engineering

City of London, London, United Kingdom
Understanding Recruitment
the platform architecture and reliability roadmap across Python services and a React app Design the workflow engine with state, retries, idempotency, audit, and multi-tenant isolation Set SLOs and observability, manage incidents, and make on call boring Build and lead a high performing team. Org design, hiring, mentoring, standards, and SDLC Partner with founders and customers to turn outcomes into More ❯
Posted:

Vice President of Engineering

london (city of london), south east england, united kingdom
Understanding Recruitment
the platform architecture and reliability roadmap across Python services and a React app Design the workflow engine with state, retries, idempotency, audit, and multi-tenant isolation Set SLOs and observability, manage incidents, and make on call boring Build and lead a high performing team. Org design, hiring, mentoring, standards, and SDLC Partner with founders and customers to turn outcomes into More ❯
Posted:

CTO

City of London, London, United Kingdom
Hybrid / WFH Options
Albany Growth
or equivalent). Experience designing systems that serve large user bases (high concurrency, high throughput, resilient APIs). Strong cloud & platform experience (AWS or equivalent), CI/CD, and observability practices. Excellent communicator — able to translate technical trade-offs to product and non-technical stakeholders and win alignment. Comfortable operating across multiple teams and time zones; pragmatic, delivery-focused and More ❯
Posted:

CTO - City of London, London, United Kingdom

london (city of london), south east england, united kingdom
Hybrid / WFH Options
Albany Growth
or equivalent). Experience designing systems that serve large user bases (high concurrency, high throughput, resilient APIs). Strong cloud & platform experience (AWS or equivalent), CI/CD, and observability practices. Excellent communicator — able to translate technical trade-offs to product and non-technical stakeholders and win alignment. Comfortable operating across multiple teams and time zones; pragmatic, delivery-focused and More ❯
Posted:
Observability
the City of London
10th Percentile
£73,125
25th Percentile
£73,750
Median
£85,000
75th Percentile
£105,000