476 to 500 of 561 Observability Jobs

Site Reliabilty Engineer / SRE

Hiring Organisation
Partnerscale
Location
Manchester, Lancashire, England, United Kingdom
Employment Type
Full-Time
Salary
£55,000 - £65,000 per annum
Site Reliability Engineer to join a well-funded, settled SRE team, working across multiple engineering squads to build tooling and automation, improve observability practices and drive a culture of continuous improvement. It's a great opportunity to do genuinely impactful work within a business that invests heavily in its people … Experience with IaC tools including Ansible or Terraform Background in a large-scale, 24/7 enterprise environment Interest in Platform Engineering and modern observability practices If you're a passionate SRE looking for a step up into a well-resourced, fast-paced environment, apply now. Keywords: Site Reliability Engineer ...

Senior Site Reliability Engineer (Public Cloud)

Hiring Organisation
Head Resourcing
Location
Edinburgh, Midlothian, Scotland, United Kingdom
Employment Type
Full-Time
Salary
£70,000 - £85,000 per annum
experienced Senior Site Reliability Engineer to join the team. This is real SRE work: reducing toil, building automation, improving system reliability and observability, and supporting large-scale cloud environments across Azure and GCP . The Role You'll be part of a unified SRE team supporting multiple cloud teams, working … Reliability, performance and observability across Azure/GCP Automation to reduce repeat incidents, tickets, and manual processes Improving SLOs, SLIs, error budgets and platform health Building and maintaining Terraform modules, GitHub pipelines and IaC Supporting app teams as they migrate large workloads to cloud 1-in-4 on-call (enhanced ...

Platform Engineer

Hiring Organisation
Fruition Group
Location
United Kingdom
Employment Type
Permanent
efficient deployments. The Role As Platform Engineer, you'll be responsible for designing, automating, and optimising cloud infrastructure while helping to improve platform reliability, observability, and security. You'll also contribute to incident management, continuous improvement initiatives, and the development of engineering standards that support long-term scalability. Key Responsibilities … infrastructure and pipeline architecture. Collaborate with data teams on infrastructure supporting large datasets and event-driven systems. Help maintain strong standards in data reliability, observability, and governance. System Reliability & Incident Management Monitor, troubleshoot, and resolve production issues across platform systems. Contribute to incident response, root cause analysis, and preventative improvements. ...

GCP Data Engineer - London

Hiring Organisation
Reed
Location
City of London, London, England, United Kingdom
Employment Type
Contractor
Contract Rate
Salary negotiable
pipeline frameworks. Embed data quality checks by default, including schema validation, completeness, freshness, thresholds, and automated alerting. Enhance end-to-end pipeline resilience, monitoring, observability, failure handling, and recovery mechanisms. Integrate AI/ML features to boost reliability, anomaly detection, and operational efficiency. Collaborate closely with Data Product teams, Analytics …/App Engine. Familiarity with CI/CD & DevOps practices, including automated testing and infrastructure as code. Experience in implementing data quality frameworks, observability tooling, and production monitoring patterns. Proven ability to build reusable data pipeline templates for large-scale, multi-domain platforms. Experience in enterprise data transformation programmes with ...

Azure DevOps Platform Engineer Remote Outside IR35

Hiring Organisation
Interact Consulting Limited
Location
Manchester, North West, United Kingdom
Employment Type
Contract, Work From Home
Building and managing cloud infrastructure using Terraform (IaC) Supporting platform development across Azure environments Driving best practices in DevOps, security, and reliability Contributing to observability and SRE principles Collaborating closely with engineers to deliver resilient, scalable solutions What we're looking for Strong Azure experience (Azure certification required) HashiCorp Terraform … Code Kubernetes certification (required) Solid understanding of DevOps practices and platform engineering Strong awareness of security best practices in cloud environments Familiarity with observability and SRE concepts Why join? £500 per day, outside IR35 Fully remote (UK-based) Work with a highly respected, mission-driven health tech company Be part ...

Senior Front-end Developer (AI-First SaaS Platform)

Hiring Organisation
Keepnet
Location
United Kingdom
live in production , Keepnet runs autonomous systems that plan, build, and operate security awareness and human risk workflows — supported by strong guardrails, auditability, and observability . We’re looking for a Senior Front-end Developer who wants to build production-grade web experiences that power these systems … Create resilient UX for async workflows (jobs, queues, long-running tasks): polling, retries, idempotent actions, progress states, and error recovery. Improve and maintain frontend observability (client-side logging, metrics, tracing where applicable; tools like Sentry ) to prevent incidents rather than react to them. Write and maintain automated tests across levels ...

Data Architect (DV)

Hiring Organisation
Anson Mccade
Location
Manchester, North West, United Kingdom
Employment Type
Permanent, Work From Home
clients translate strategic business needs into scalable, resilient, and secure solutions. You will work across cloud and multi-platform architectures, ensuring data governance, security, observability, and cost efficiency are embedded into every design. The Data Architect role is based in a hybrid model, with a minimum of two days … Architect As a Data Architect , you will: Define end-to-end data architecture for complex programmes, including ingestion, orchestration, governance, security, cost-optimisation, and observability Architect and implement multi-cloud, data lake, and data warehouse platforms Design scalable data pipelines, integration workflows, and analytics solutions Apply ML/AI frameworks ...

Data Architect (DV)

Hiring Organisation
Anson Mccade
Location
London, United Kingdom
Employment Type
Permanent, Work From Home
clients translate strategic business needs into scalable, resilient, and secure solutions. You will work across cloud and multi-platform architectures, ensuring data governance, security, observability, and cost efficiency are embedded into every design. The Data Architect role is based in a hybrid model, with a minimum of two days … Architect As a Data Architect , you will: Define end-to-end data architecture for complex programmes, including ingestion, orchestration, governance, security, cost-optimisation, and observability Architect and implement multi-cloud, data lake, and data warehouse platforms Design scalable data pipelines, integration workflows, and analytics solutions Apply ML/AI frameworks ...

Data Architect (DV)

Hiring Organisation
Anson Mccade
Location
Bristol, Avon, South West, United Kingdom
Employment Type
Permanent, Work From Home
clients translate strategic business needs into scalable, resilient, and secure solutions. You will work across cloud and multi-platform architectures, ensuring data governance, security, observability, and cost efficiency are embedded into every design. The Data Architect role is based in a hybrid model, with a minimum of two days … Architect As a Data Architect , you will: Define end-to-end data architecture for complex programmes, including ingestion, orchestration, governance, security, cost-optimisation, and observability Architect and implement multi-cloud, data lake, and data warehouse platforms Design scalable data pipelines, integration workflows, and analytics solutions Apply ML/AI frameworks ...

Support Engineer

Hiring Organisation
Ordnance Survey
Location
Southampton, Hampshire, England, United Kingdom
Employment Type
Full-Time
Salary
£43,918 - £51,238 per annum
improvements to service performance, including automating deployments, right-sizing systems, and extending monitoring and alerting capabilities Safeguarding critical services by continually assessing and improving observability, resilience and security Investigating and resolving root cause issues, identifying why failures occur, and working with subject matter experts when necessary to fully resolve problems … technologies and best practice - ideally in Azure Infrastructure-as-Code - ideally using Bicep A track record of continually identifying and implementing service improvements or observability Experience of coaching and mentoring other team members and providing consultancy to other teams Additionally, you will provide expert technical consultancy to enable the business ...

Agentic RAG Engineer/Architect - London (Contract)

Hiring Organisation
FUTURUS FINANCIAL RECRUITMENT LTD
Location
London Area, United Kingdom
only retrieve data the requesting user is authorised to see. Implement PII detection, data classification, and audit trails for every retrieval operation. 10. Observability & Performance — Instrument the RAG layer with comprehensive tracing: query decomposition traces, retrieval latency per source, relevance scores, token usage, cache hit rates. Optimise for sub-second … Document processing - Unstructured.io, LlamaParse, Apache Tika ● LLM providers - OpenAI (GPT-4+), Anthropic (Claude), Azure OpenAI ● Languages - Python, Rust ● Evaluation - RAGAS, custom evaluation harnesses, LangSmith ● Observability - OpenTelemetry, LangSmith/LangFuse, Grafana ● Infrastructure - Kubernetes (AKS) Qualifications ● Bachelor's or Master's degree in Computer Science, Information Retrieval, Computational Linguistics, Data Science ...

Artificial Intelligence Engineer

Hiring Organisation
Harnham
Location
Anaheim, California, United States
Employment Type
Permanent
Salary
USD Annual
AI Engineer 1) Organization Overview (Concise & Neutral) A fast growing oncology focused organization is reinventing how clinical trials operate by integrating them tightly with real world clinical practice. Cross disciplinary teams across healthcare, engineering, AI ...

Cloud Architect

Hiring Organisation
Ultima
Location
United Kingdom
Job Description: Cloud Architect – Azure, DevOps, Terraform (with Technical Account Management Focus) Position: Cloud Architect Location: Remote (UK-based) Type: Full-time We are seeking a skilled and client-focused Cloud Architect with deep expertise ...

AI Architect

Hiring Organisation
Stackstudio Digital Ltd
Location
London, United Kingdom
Employment Type
Permanent
Salary
£85,000
Role/Job title: AI Architect Mode of working London- 3 days onsite Type of Employment- Permanent The Role: As an Artificial Intelligence Architect , you will lead client-facing initiatives at the forefront of Generative ...

Senior Fullstack Engineer (Backend)

Hiring Organisation
Ronald James
Location
London, South East, England, United Kingdom
Employment Type
Full-Time
Salary
£60,000 - £95,000 per annum
Senior Fullstack Engineer (Backend) Hybrid/Remote - has to be based in London Role Overview We’re looking for a Senior Full Stack (Backend) Engineer who thrives on building clean, scalable systems quickly. You’ll ...

Production PM

Hiring Organisation
act digital
Location
Lisboa, Portugal
Employment Type
Permanent
Salary
EUR Annual
List of expected mission assistance: • In this context, it is necessary to support: # Project management on time and within budget and quality of deliverables: # Projects must meet the Group's reference standards and ...

Software Integration Apprentice

Hiring Organisation
SPIRAX-SARCO LIMITED
Location
9-15 Runnings Road, Kingsditch Trading Estate, Cheltenham, England, United Kingdom
Employment Type
Higher Apprenticeship
Salary
£25,000 a year
This is a unique opportunity to gain hands-on experience in enterprise integration, data analytics and software design. Working alongside experienced professionals to support the design, development, testing, and monitoring of integration solutions across our ...

Python Developer

Hiring Organisation
Arcus Search
Location
City of London, London, United Kingdom
Python Developer - Observability Engineering London | Hybrid | Perm I’m working with a leading quantitative research and trading firm looking to expand their Observability Engineering team. This team sits at the centre of engineering productivity, owning the systems that allow teams to produce, move and consume telemetry at scale. The focus … making observability seamless across a large, high-performance environment handling cloud-level volumes of data. The role • Build and extend observability tooling across telemetry pipelines and backend systems • Develop and maintain OpenTelemetry collectors, SDKs and exporters • Define and promote “golden paths” for instrumentation across a wide range of services • Work ...

Lead Dev Ops Engineer

Hiring Organisation
Birketts LLP
Location
Ipswich, Suffolk, England, United Kingdom
Employment Type
Full-Time
Salary
Competitive salary
secret-handling patterns aligned to Birketts expectations Implement and enforce PR/branch policies and release controls to reduce variability and operational risk Platform observability and operational readiness Provide and evolve platform observability foundations: monitoring, logging, metrics, dashboards and alerting (using the agreed toolset) Define and improve incident response ...

SRE - Site Reliability Engineer

Hiring Organisation
Randstad Technologies Recruitment
Location
London, United Kingdom
Employment Type
Contract
Contract Rate
£55 - £62/hour
Senior Site Reliability Engineer (Observability) Location: London/UK (Remote) Contract: 12 Months Initial Day rate : £55 Per Hour - £62 Per Hour Inside IR35 Job Overview We are looking for a Senior Site Reliability Engineer with strong experience in Observability, Monitoring and Distributed Systems to support large-scale cloud infrastructure … focuses on building and scaling monitoring, logging and alerting platforms to ensure high availability and performance of cloud services. Responsibilities Design, deploy and scale observability platforms Manage and scale Prometheus monitoring systems Deploy and maintain large Elasticsearch clusters Build and maintain data pipelines using Kafka Develop alerting and monitoring frameworks ...

Lead Platform Engineer

Hiring Organisation
REVYBE IT RECRUITMENT LIMITED
Location
City of London, London, United Kingdom
Employment Type
Permanent, Work From Home
LeadPlatformEngineer-FinTech £110,000+Bonus(£15k+) CentralLondon-Hybridworking,2/3daysperweekintheoffice WereworkingwithahighlysuccessfulFinTechbusinessinCentralLondonwhoarelookingtohireaLeadPlatformEngineertohelpshapethefutureoftheirinfrastructureandplatformstrategy. Thisisahigh-impactrolewithinagrowingengineeringteamwhereyoullhavetheopportunitytoinfluencearchitecturaldecisions,mentorengineers,andremaindeeplyhands-onwithmoderninfrastructuretooling.Thecompanybuildsallit'ssoftwarein-houseandhasbeeninvestingheavilyinitsplatform,observability,andcloudcapabilitiesastheycontinuetoscale. TheOpportunity: YoulljoinastheLeadPlatformEngineer,workingcloselywithengineeringleadershiptodriveimprovementsacrossinfrastructure,reliability,anddeveloperexperience.Thisrolesitsattheintersectionofhands-onengineering,mentoring,andstrategy.Youllguideplatformdirectionwhilecontinuingtobuildandimprovetheinfrastructurethatpowersthebusiness. Youllalsomentoroneplatformengineer,helpingthemgrowwhileensuringtheteamcontinuesdeliveringhigh-qualityinfrastructureandautomation. Environment: Theplatformcurrentlyoperatesinahybridenvironment: ~60%on-premiseinfrastructure ~40%MicrosoftAzure Thelong-termstrategyisfocusedonmodernisingtheplatform,improvingobservability,andevolvingcloudcapabilities,makingthisanexcellentopportunityforsomeonewhoenjoysbuildingandshapingsystems. TechStack: YoullbeworkingacrossamodernDevOpsandplatformstackincluding: Kubernetes Terraform Hybridcloudinfrastructure(on-premise+Azure …/CD&Automation GitHubActions Python AzureServices AzureKubernetesService(AKS) AzureVirtualMachines AzureVirtualNetworks AzureLoadBalancer AzureApplicationGateway AzureStorageAccounts AzureBlobStorage AzureKeyVault AzureMonitor AzureLogAnalytics AzureActiveDirectory AzureContainerRegistry AzureDNS AzureDevOpsintegrations Observability Logging,monitoring,andtracingacrossdistributedsystems Buildingmeaningfultelemetryandplatformvisibility Whatyou'llbedoing: Leadingtheevolutionofthecompanysplatformandinfrastructurestrategy DesigningandimprovinghybridAzure+on-premiseenvironments DrivingKubernetesplatformimprovements BuildingautomationwithTerraformandPython Improvingobservabilityandmonitoringacrosssystems MentoringaPlatformEngineerandhelpingshapeplatformbestpractices Workingcloselywithengineeringteamstoimprovedeveloperexperienceandreliability Whythisroleisexciting: Hugeimpactonthefutureplatformarchitecture Opportunitytoshapethehybridcloudstrategy Combinationoftechnicalleadershipandhands-onengineering ModernDevOpstoolingandcloudtechnologies Directinfluenceonplatformreliabilityandscalability Package: Salary:Upto£110,000 Bonus:15k+ ...

Lead Software Engineer

Hiring Organisation
5V Video
Location
City of London, London, United Kingdom
+ AWS (Lambda, API Gateway, S3, DynamoDB) Handling event-driven architectures (Kafka, SNS/SQS, etc.) Driving system design decisions across distributed systems Improving observability, reliability, and performance in production Debugging complex issues and leading resolution across teams Staying hands-on while setting technical direction and standards Tech Stack Python … Lambda, API Gateway, S3, DynamoDB, IAM) Event-driven systems (Kafka, SNS/SQS) CI/CD (Concourse, Git workflows) Databases (Postgres, DynamoDB, Couchbase) Observability (Prometheus, Grafana, CloudWatch) What You’ll Bring Strong backend engineering experience (Python preferred) Proven experience building distributed systems at scale Deep understanding of microservices + event ...

Site Reliability Engineer

Hiring Organisation
Halian | Managed Services, Recruitment Agency & Contract Staffing
Location
United Kingdom
improvements Own and refine SLIs, SLOs, and error budgets Reduce operational toil through automation Deep-dive Linux debugging, performance tuning, and systems analysis Strengthen observability, monitoring, and alerting Provide technical leadership to a small SRE/engineering group Improve and manage on‐call processes (PagerDuty, OpsGenie, etc.) Collaborate with development … experience Hands‐on incident management and postmortems Experience mentoring or leading a small technical team Scripting/automation with Python, Go, or Bash Strong observability skills (Datadog, Prometheus, Grafana, CloudWatch) Why This Role Appeals to Real SREs You’ll be solving actual SRE problems: reliability, incidents, resilience, uptime ...

Site Reliability Engineering (SRE) Manager

Hiring Organisation
Halian Technology Limited
Location
United Kingdom
Employment Type
Permanent, Work From Home
improvements Own and refine SLIs, SLOs, and error budgets Reduce operational toil through automation Deep-dive Linux debugging, performance tuning, and systems analysis Strengthen observability, monitoring, and alerting Provide technical leadership to a small SRE/engineering group Improve and manage on-call processes (PagerDuty, OpsGenie, etc.) Collaborate with development … experience Hands-on incident management and postmortems Experience mentoring or leading a small technical team Scripting/automation with Python, Go, or Bash Strong observability skills (Datadog, Prometheus, Grafana, CloudWatch) Why This Role Appeals to Real SREs Youll be solving actual SRE problems: reliability, incidents, resilience, uptime Youll guide ...

Platform Engineer (Outside IR35) - MOD SC

Hiring Organisation
Talent Locker
Location
Farnborough, Hampshire, South East, United Kingdom
Employment Type
Contract
Contract Rate
£475 - £500 per day
secure platforms. Key Responsibilities Develop and enhance platform services across hybrid environments Improve and standardise automated deployment and CI/CD pipelines Strengthen observability, monitoring, and proactive operations Support incident response, troubleshooting, and service improvements Provide guidance on platform patterns, tooling, and best practices Contribute to architectural decisions and technical … managing cloud infrastructure OpenShift & Kubernetes - configuring clusters, building containers, and managing repositories CI/CD pipelines - building and improving automated deployment processes Monitoring & Observability - implementing proactive monitoring across platforms Automation of platform components - ensuring reliable, repeatable operations Secure by Design - experience delivering platforms aligned with MOD security standards MOD Applications ...