426 to 450 of 485 Permanent Observability Jobs

Python Developer

Hiring Organisation
Arcus Search
Location
City of London, London, United Kingdom
Python Developer - Observability Engineering London | Hybrid | Perm I’m working with a leading quantitative research and trading firm looking to expand their Observability Engineering team. This team sits at the centre of engineering productivity, owning the systems that allow teams to produce, move and consume telemetry at scale. The focus … making observability seamless across a large, high-performance environment handling cloud-level volumes of data. The role • Build and extend observability tooling across telemetry pipelines and backend systems • Develop and maintain OpenTelemetry collectors, SDKs and exporters • Define and promote “golden paths” for instrumentation across a wide range of services • Work ...

Lead Dev Ops Engineer

Hiring Organisation
Birketts LLP
Location
Ipswich, Suffolk, England, United Kingdom
Employment Type
Full-Time
Salary
Competitive salary
secret-handling patterns aligned to Birketts expectations Implement and enforce PR/branch policies and release controls to reduce variability and operational risk Platform observability and operational readiness Provide and evolve platform observability foundations: monitoring, logging, metrics, dashboards and alerting (using the agreed toolset) Define and improve incident response ...

Lead Platform Engineer

Hiring Organisation
REVYBE IT RECRUITMENT LIMITED
Location
City of London, London, United Kingdom
Employment Type
Permanent, Work From Home
LeadPlatformEngineer-FinTech £110,000+Bonus(£15k+) CentralLondon-Hybridworking,2/3daysperweekintheoffice WereworkingwithahighlysuccessfulFinTechbusinessinCentralLondonwhoarelookingtohireaLeadPlatformEngineertohelpshapethefutureoftheirinfrastructureandplatformstrategy. Thisisahigh-impactrolewithinagrowingengineeringteamwhereyoullhavetheopportunitytoinfluencearchitecturaldecisions,mentorengineers,andremaindeeplyhands-onwithmoderninfrastructuretooling.Thecompanybuildsallit'ssoftwarein-houseandhasbeeninvestingheavilyinitsplatform,observability,andcloudcapabilitiesastheycontinuetoscale. TheOpportunity: YoulljoinastheLeadPlatformEngineer,workingcloselywithengineeringleadershiptodriveimprovementsacrossinfrastructure,reliability,anddeveloperexperience.Thisrolesitsattheintersectionofhands-onengineering,mentoring,andstrategy.Youllguideplatformdirectionwhilecontinuingtobuildandimprovetheinfrastructurethatpowersthebusiness. Youllalsomentoroneplatformengineer,helpingthemgrowwhileensuringtheteamcontinuesdeliveringhigh-qualityinfrastructureandautomation. Environment: Theplatformcurrentlyoperatesinahybridenvironment: ~60%on-premiseinfrastructure ~40%MicrosoftAzure Thelong-termstrategyisfocusedonmodernisingtheplatform,improvingobservability,andevolvingcloudcapabilities,makingthisanexcellentopportunityforsomeonewhoenjoysbuildingandshapingsystems. TechStack: YoullbeworkingacrossamodernDevOpsandplatformstackincluding: Kubernetes Terraform Hybridcloudinfrastructure(on-premise+Azure …/CD&Automation GitHubActions Python AzureServices AzureKubernetesService(AKS) AzureVirtualMachines AzureVirtualNetworks AzureLoadBalancer AzureApplicationGateway AzureStorageAccounts AzureBlobStorage AzureKeyVault AzureMonitor AzureLogAnalytics AzureActiveDirectory AzureContainerRegistry AzureDNS AzureDevOpsintegrations Observability Logging,monitoring,andtracingacrossdistributedsystems Buildingmeaningfultelemetryandplatformvisibility Whatyou'llbedoing: Leadingtheevolutionofthecompanysplatformandinfrastructurestrategy DesigningandimprovinghybridAzure+on-premiseenvironments DrivingKubernetesplatformimprovements BuildingautomationwithTerraformandPython Improvingobservabilityandmonitoringacrosssystems MentoringaPlatformEngineerandhelpingshapeplatformbestpractices Workingcloselywithengineeringteamstoimprovedeveloperexperienceandreliability Whythisroleisexciting: Hugeimpactonthefutureplatformarchitecture Opportunitytoshapethehybridcloudstrategy Combinationoftechnicalleadershipandhands-onengineering ModernDevOpstoolingandcloudtechnologies Directinfluenceonplatformreliabilityandscalability Package: Salary:Upto£110,000 Bonus:15k+ ...

Lead Software Engineer

Hiring Organisation
5V Video
Location
City of London, London, United Kingdom
+ AWS (Lambda, API Gateway, S3, DynamoDB) Handling event-driven architectures (Kafka, SNS/SQS, etc.) Driving system design decisions across distributed systems Improving observability, reliability, and performance in production Debugging complex issues and leading resolution across teams Staying hands-on while setting technical direction and standards Tech Stack Python … Lambda, API Gateway, S3, DynamoDB, IAM) Event-driven systems (Kafka, SNS/SQS) CI/CD (Concourse, Git workflows) Databases (Postgres, DynamoDB, Couchbase) Observability (Prometheus, Grafana, CloudWatch) What You’ll Bring Strong backend engineering experience (Python preferred) Proven experience building distributed systems at scale Deep understanding of microservices + event ...

Site Reliability Engineer

Hiring Organisation
Halian | Managed Services, Recruitment Agency & Contract Staffing
Location
United Kingdom
improvements Own and refine SLIs, SLOs, and error budgets Reduce operational toil through automation Deep-dive Linux debugging, performance tuning, and systems analysis Strengthen observability, monitoring, and alerting Provide technical leadership to a small SRE/engineering group Improve and manage on‐call processes (PagerDuty, OpsGenie, etc.) Collaborate with development … experience Hands‐on incident management and postmortems Experience mentoring or leading a small technical team Scripting/automation with Python, Go, or Bash Strong observability skills (Datadog, Prometheus, Grafana, CloudWatch) Why This Role Appeals to Real SREs You’ll be solving actual SRE problems: reliability, incidents, resilience, uptime ...

Site Reliability Engineering (SRE) Manager

Hiring Organisation
Halian Technology Limited
Location
United Kingdom
Employment Type
Permanent, Work From Home
improvements Own and refine SLIs, SLOs, and error budgets Reduce operational toil through automation Deep-dive Linux debugging, performance tuning, and systems analysis Strengthen observability, monitoring, and alerting Provide technical leadership to a small SRE/engineering group Improve and manage on-call processes (PagerDuty, OpsGenie, etc.) Collaborate with development … experience Hands-on incident management and postmortems Experience mentoring or leading a small technical team Scripting/automation with Python, Go, or Bash Strong observability skills (Datadog, Prometheus, Grafana, CloudWatch) Why This Role Appeals to Real SREs Youll be solving actual SRE problems: reliability, incidents, resilience, uptime Youll guide ...

Lead DevOps Engineer (Azure)

Hiring Organisation
Reed Technology
Location
East Anglia, United Kingdom
Employment Type
Permanent
Salary
£75,000
pipeline templates, PR/branch policies, approvals and gated releases * Creating 'golden path' delivery patterns so teams can deploy without bespoke pipelines Operational readiness & observability * Defining monitoring, logging, alerting and dashboards * Improving incident response, runbooks and recovery processes * Shaping DR and operational processes (no on-call at present) Ways …/CD engineering experience * Experience implementing governance, security guardrails and delivery controls * Comfortable operating without an existing DevOps team Desirable * Azure Policy at scale * Observability, SRE or platform engineering practices * Container/AKS experience * Cost governance and showback/chargeback experience Why this role? * Opportunity to own and shape DevOps ...

Cloud Security and Platform Engineer

Hiring Organisation
RealityMine
Location
Trafford Park, England, United Kingdom
mainly focused on AWS, with growing involvement in other cloud and SaaS platforms. You’ll improve existing environments—managing identity and access, governance, security, observability, and lifecycle—by reducing risks, eliminating unsafe configurations, validating ownership, and ensuring the cloud estate is clearly governed and auditable. You will take an active … role in improving RealityMine’s security posture by improving and operating security scanning, improving monitoring and observability, and ensuring risks, vulnerabilities, and end of life components are identified and addressed in a timely and pragmatic way. You will also develop automation used to support security and operational hygiene, reducing manual ...

Forward Deployed Engineer

Hiring Organisation
Novatus Global
Location
City of London, London, United Kingdom
Novatus Global is a Series B scale-up RegTech SaaS provider and boutique advisory firm, helping financial institutions manage their most complex regulatory requirements. We combine deep consulting expertise with cutting-edge SaaS solutions, enabling ...

Cloud Advisory - Agentic Focused Architecture Consultant

Hiring Organisation
Accenture
Location
London Area, United Kingdom
where GenAI and Agentic play a role. Champion system performance, resilience, and efficiency: Proactively identifying and addressing consumption and scalability challenges. Champion full stack observability using modern full stack observability, SRE and AIOps. Manage & Mentor: Lead teams of architects and engineers, providing technical coaching, career counselling, performance management, and coaching ...

Data Platform Solution Architect

Hiring Organisation
MarkJames 🌍
Location
Essex, England, United Kingdom
Define and implement data lakehouse solutions using Apache Iceberg and S3 Lead performance tuning across Snowflake, Airflow, and Iceberg environments Ensure platform reliability, observability, and scalability Drive adoption of cloud-native design patterns and best practices Collaborate with engineering, DevOps, and business stakeholders Requirements Strong experience in Solution Architecture … architectures (Iceberg preferred) Expertise in performance tuning and optimisation Nice to Have CI/CD and DevOps practices Terraform/Infrastructure as Code Monitoring & observability tools (APM) Data governance & catalog tools Cloud security best practices Data modelling and ingestion frameworks ...

Lead Site Reliability Engineer

Hiring Organisation
McGregor Boyall
Location
Leeds, West Yorkshire, England, United Kingdom
Employment Type
Full-Time
Salary
£90,000 - £105,000 per annum
they migrate services to the Cloud. Work with Product Owners and Engineering Leads to balance feature delivery with system reliability, performance and health. Use observability tooling, performance metrics and SRE principles to proactively identify issues and reduce operational toil. Implement Incident and problem management practices, ensuring strong root cause analysis … Technical Skills required: Strong cloud engineering background, ideally across Azure and GCP. Experience building or operating large-scale, resilient cloud platforms. Deep understanding of observability tooling (metrics, logs, traces). Hands-on experience with modern SRE practices: SLOs/SLIs Error budgets Automation to reduce toil Production readiness and robust ...

Senior SRE (Java)

Hiring Organisation
Morgan McKinley
Location
City of London, London, England, United Kingdom
Employment Type
Full-Time
Salary
Salary negotiable
Software-First Approach to Reliability I am currently partnering with a major FTSE 100 FinTech company that is undergoing a massive modernisation and observability overhaul. They aren't looking for a traditional, infrastructure-heavy SRE; they need a Senior Java Developer who has recently transitioned into the SRE space. … Foundation: 5+ years of Java development experience with a deep understanding of JVM internals. The SRE Pivot: Recent experience in a Site Reliability or Observability role, with hands-on knowledge of OpenTelemetry , Jaeger, or similar tracing tools. The Mindset: A strong philosophy on what makes a "good ...

AI Architect

Hiring Organisation
Stackstudio Digital Ltd
Location
United Kingdom
Employment Type
Permanent
into high value solutions Enforce IAM least privilege with IAM Conditions, organisation policies, and scoped service accounts; integrate BeyondCorp for zero trust access Operationalise observability using Cloud Logging, Cloud Monitoring, Error Reporting, Trace, and Profiler; build model/LLM telemetry dashboards and alerts Identify the right AI/ML frameworks … patterns, vector databases, embeddings, and prompt/guardrail engineering Desirable Skills/Knowledge/Experience Knowledge of MLOps/AgentOps, CI/CD, and observability Strong understanding of regulated financial services environments Proven experience implementing AI risk controls, model governance, and auditability Ensure alignment with FCA, PRA, data privacy, model ...

Gen AI Engineer

Hiring Organisation
Wave Group
Location
England, United Kingdom
applications in production environments Evidence of debugging real issues such as incorrect outputs, latency spikes, retrieval failures or agent misbehaviour Experience with monitoring and observability of LLM systems, for example Langfuse, Prometheus, Grafana, OpenTelemetry or similar Strong understanding of RAG systems, retrieval pipelines and evaluation workflows Experience with agentic frameworks … application and infrastructure layers Multimodal experience across text and image or video is beneficial Tech stack Python, AWS, LangGraph, LangChain, vector databases, evaluation tooling, observability platforms, Docker Why join Small, senior team with high ownership Systems already in production with real customers Bi-weekly shipping cycles with fast feedback loops ...

Founding Engineer

Hiring Organisation
Omnam Investment Group
Location
London Area, United Kingdom
environments Lead integrations with external systems and support early data onboarding Establish engineering standards, tooling, documentation, and technical processes from the start Set up observability, monitoring, and performance systems Jump in wherever needed, from quick scripts and data cleaning to debugging production issues What You Bring 5+ years of engineering … with backend frameworks (FastAPI, Django, Node.js, Rails, etc.) Strong SQL, data modeling, and database design knowledge Familiarity with IaC, containers, CI/CD, and observability tools Bonus : experience in ETL, or hospitality/proptech/real-estate technology Why Join Us We work together in the heart of London ...

Site Reliability Engineer - Observability

Hiring Organisation
N26 GmbH
Location
Berlin, Germany
Employment Type
Permanent
Salary
EUR Annual
About the opportunity We are seeking a Site Reliability Engineer to join the Observability group inside our Platform Engineering domain. Platform Engineering's goal is to provide easy to use, self-service platforms to enable other segments to easily build, deploy and monitor their business applications. And Observability's role ...

LEAD TECHNICAL ARCHITECT - GLOBAL SAAS/AI PLATFORM

Hiring Organisation
Clarity Resourcing (UK) LLP
Location
United Kingdom
Employment Type
Permanent
Salary
GBP Annual
integrations Real Time systems/telecoms architecture (highly desirable) Salesforce/CRM integration at architectural level AI/ML architecture (integration, pipelines, platform design) Observability, monitoring, resilience engineering Architectural governance and decision frameworks Strong documentation and system design communication EXPERIENCE REQUIRED 8+ years software engineering (Back End/platform focus … Lead architecture reviews and guide engineering decisions Act as escalation point for complex cross-platform challenges Establish architecture governance, documentation, and standards Improve resilience, observability, and operational maturity Lead evolution of AI capabilities across the platform Mentor engineers and elevate technical capability across teams PERSONAL ATTRIBUTES Proactive, ownership-driven mindset ...

Senior Java Engineer (reliability & observability)

Hiring Organisation
GCS
Location
Northampton, Northamptonshire, United Kingdom
Employment Type
Permanent
Salary
£45000 - £60000/annum
Boot development experience in high-throughput systems Deep understanding of event-driven and messaging architectures (Kafka, JMS, AMQP or similar) Experience engineering reliability and observability at scale (monitoring, tracing, SLIs/SLOs) Desirable Skills: Experience building notification delivery infrastructure (webhooks, push, SMS) Awareness of the payments domain, including processing flows ...

Production Engineer- DevOps skills (Lisbon or Porto)

Hiring Organisation
Lùkla
Location
Lisboa, Portugal
Employment Type
Permanent
Salary
EUR Annual
scalable environments. If you are passionate about automation, cloud, and continuous system improvement, this opportunity is for you. Responsibilities: Ensure the stability, performance, and observability of production systems Implement and manage monitoring and observability solutions (e.g., Dynatrace) Automate operational processes through scripts and playbooks Work with orchestration and scheduling tools … infrastructures Collaborate with cross-functional teams in an agile environment Requirements: Technical skills Experience in DevOps/Production Engineering (minimum 2 years) Knowledge of: Observability (e.g., Dynatrace) Terraform OpenShift/Cloud environments Schedulers (CFT, AutoSys) Automation with: Python (scripting) Ansible ( ability to create playbooks from scratch ) Soft Skills Strong communication ...

SRE Lead (Banking/Financial)

Hiring Organisation
Ascendion
Location
City of London, London, United Kingdom
across production systems. Key Responsibilities: Lead the SRE function across the engineering organisation and drive operational excellence across production systems. Define and implement the observability and monitoring strategy, including dashboards, alerting, SLOs, SLAs, and error budgets. Establish comprehensive monitoring coverage to ensure visibility into system health, infrastructure, and business-critical … engineering teams. Manage incident response processes, including on-call management and post-incident reviews. Collaborate with product and engineering teams to build reliability and observability into new systems. Monitor UI behaviour and end-to-end system performance, not just infrastructure metrics. Essential Skills & Experience: Proven experience as an SRE Lead ...

ML & AI -Engineers/Architect/Lead

Hiring Organisation
KBC Technologies Group
Location
England, United Kingdom
version control, and ensuring production-ready AI systems . You’ll also play a key role in integrating AI/LLM agents with strong observability and rollback mechanisms. Location : Leeds/Manchester Client : IT End Client :Banking domain Work Mode: Hybrid Contract : Inside IR 35 Salary : Market Standards Key Responsibilities … workflows Manage model versioning and release processes Monitor inference cost, latency, and model drift Safely integrate AI/LLM agents into production systems Implement observability, alerting, and rollback mechanisms Experience Levels We’re hiring across multiple seniority levels: Senior Developer: 3–6 years (ML/AI Engineering) Lead Engineer ...

Rust Engineer

Hiring Organisation
Huxley Associates
Location
London, United Kingdom
Employment Type
Permanent
Salary
£150000 - £180000/annum
from systems that actually matter. ETrading, you will build the infrastructure that sits between our traders and the market - execution paths, data pipelines, and observability tooling that power trillions in annual notional volume. When a system performs at 3am under peak load, you will be one of the reasons why. … kernel bypass awareness (DPDK, io_uring) Distributed messaging and event streaming: Kafka, NATS, or equivalent; ordering guarantees, exactly-once semantics, consumer group management Production observability: metrics (Prometheus/OpenTelemetry), distributed tracing, structured logging, and alert design CI/CD pipeline design including benchmarking gates, automated performance regression detection, and reproducible ...

Site Reliability Engineer (SRE)

Hiring Organisation
UA Consulting
Location
City of London, London, United Kingdom
Employment Type
Permanent
Salary
£75,000
platform. Key Responsibilities Partner with development teams to define and manage SLOs/SLIs, and use error budgets to guide engineering decisions. Enhance observability ensuring metrics, logs, and tracing are in place to detect and fix issues proactively. Lead cost optimisation initiatives: monitor spend, rightsize workloads, tune autoscaling, and drive … with Kubernetes (on-prem and AWS EKS). Proven track record defining and working with SLOs/SLIs in production environments. Deep understanding of observability (metrics, logging, tracing, telemetry ...

Data Engineer

Hiring Organisation
Tieto
Location
Lisboa, Portugal
Employment Type
Permanent
Salary
EUR Annual
evaluating live data flows, identifying inefficiencies, and improving overall data quality and signal clarity. Working alongside cross-functional teams (data, AI, QA, DevOps, observability), you'll help define which data truly adds value and ensure the platform scales effectively within an Azure and Microsoft Fabric environment. If you enjoy bringing … with ML teams to prepare datasets and support feature development Monitor and analyze production data to improve performance and reduce noise Help define data observability strategies and meaningful metrics Collaborate with multidisciplinary teams across engineering and operations What we're looking for Around 3-5 years of experience in data ...