Observability Job Vacancies

151 to 175 of 2,196 Observability Jobs

VP of Platform Engineering

London, United Kingdom
YouLend
automation, infrastructure provisioning and tooling to enhance development efficiency. You will manage Platform Reliability and Infrastructure ensuring a reliable and stable platform. You will oversee YouLend's Security and Observability frameworks, focusing on platform security, maintaining observability, and providing dashboards for developers to monitor service health. The ideal candidate is someone who has successfully built and scaled platform architectures, led … the ability to work across technical and non-technical teams. Excellent communication skills, with the ability to translate complex technical concepts to business stakeholders. Operational Focus: Expertise in platform observability, monitoring, incident management, and creating highly reliable systems. Experience implementing SLAs, SLOs, and SLIs is a plus. Security & Compliance: In-depth understanding of platform security, data privacy, and regulatory compliance More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Principal Platform Engineer

London, England, United Kingdom
STOXX
Stoxx's GCP platform infrastructure Ensure the platform's scalability, reliability, and efficiency meets business and client requirements Develop, build and support a robust CI/CD pipeline and observability stack Be the go-to person for the most critical Platform issues, leading cross-functional teams where necessary, to deliver best-in-class engineering solutions. Drive continuous improvement initiatives to … Experience working in a global or multinational team setting Strong documentation, communication and collaboration skills Proven ability to drive innovation and continuous improvement initiatives Focus on simplicity, automation and observability Expertise in Python, GitHub Actions, Apigee, Airflow Expertise in Observability tooling such as Prometheus/Grafana, ELK, Splunk or similar Bachelor's or Master's degree in Computer Science or More ❯
Posted:

Senior Site Reliability Engineer (SRE)

Wokingham, England, United Kingdom
Leap29
You’ll Be Responsible For As a Senior SRE, you’ll lead initiatives that: Ensure availability, latency, and performance of mission-critical systems across cloud and hybrid environments. Architect observability solutions (monitoring, logging, alerting) that detect and prevent failures before they impact users. Own and improve incident response workflows, including runbooks, communications, and root cause analysis. Define and enforce SLIs … using tools such as Azure DevOps, GitHub Actions, Jenkins, or GitLab. Lead the design and delivery of resilient, scalable infrastructure using IaC (Terraform, Bicep, etc.). Develop automation and observability tooling that enables fast feedback loops and minimal manual intervention. Strategic & Advisory Define infrastructure architecture to support fault-tolerant applications. Collaborate with developers, architects, and product teams to embed reliability More ❯
Posted:

Cloud Engineer (Full time - Remote Europe)

London, England, United Kingdom
Hybrid / WFH Options
Ikerian
scalable AWS cloud environments and services. Manage and prioritise tasks in the cloud infrastructure backlog to address immediate needs and plan long-term improvements. Set up infrastructure monitoring and observability solutions, proactively addressing availability, performance or security issues. Assess new technologies, systems, and services for production readiness, ensuring seamless and stable integration. Prepare and maintain documentation on cloud processes, procedures … CI/CD pipelines and tools, including GitLab (preferred), GitHub Actions, Jenkins, etc. Basic understanding of cloud networking concepts, including VPC, Subnets, and Load Balancing. Familiarity with monitoring and observability tools for cloud environments, such as Grafana, Prometheus, OpenSearch, and the ELK stack. Strong analytical and problem-solving skills, with a proactive approach to challenges. A genuine interest in staying More ❯
Posted:

Senior Devops Engineer

London, England, United Kingdom
Hybrid / WFH Options
CFP Energy
and enhance CI/CD pipelines, infrastructure/app templates, and automation workflows. Explore and integrate emerging technologies to evolve our platform offerings and support developer needs. Fine-tune observability tools to resolve issues quickly and deliver actionable alerts to the right people. Infrastructure as Code (IaC): Proven experience with cloud infrastructure automation (Terraform and Azure preferred). Kubernetes: Proficiency … GitOps workflows and Helm charts. Security: Hands-on experience with token/secret management tools (e.g., HashiCorp Vault, Azure Key Vault) and SSO/authentication systems (e.g., Okta). Observability: Hands-on experience with platforms like DataDog, Grafana, or Azure Monitor. Networking: Strong understanding of networking principles, DNS, and related technologies. CI/CD: Skilled in creating and maintaining CI More ❯
Posted:

Lead Backend Engineer (m/f/d)

München, Bayern, Germany
Hybrid / WFH Options
Peter Park System GmbH
Architect for Scale & Resilience: Make critical decisions on system design and performance to support a growing platform with increasing complexity and scale. Elevate Operational Maturity: Lead improvements to monitoring, observability, and developer workflows - ensuring backend systems are resilient and teams can ship confidently. Embed Security by Design: Take responsibility for backend security posture, ensuring systems meet best practices and compliance … and SQS. Infrastructure as Code: Experience with Terraform or similar tools for infrastructure automation. High-Throughput Systems: Strong experience in real production projects handling large-scale data flows. Monitoring & Observability: Proficiency in tools like Datadog, Prometheus, and Grafana. Security & Networking: Solid understanding of networking principles, security best practices, and cloud security. Agile & Fast-Paced Environments: Experience in agile teams, working More ❯
Employment Type: Permanent
Salary: EUR Annual
Posted:

Senior Software Engineer – Real-Time Data Applications

London, England, United Kingdom
Snowplow
with cross-functional teams for requirements Review code to maintain quality and provide constructive feedback Manage CI/CD pipelines for automated deployments and reliability Monitor system health with observability tools and address issues proactively Engage with stakeholders for alignment on project goals and updates Research new technologies to improve the Snowplow ecosystem We’d Love to Hear From You … data processing pipelines Experience with Kubernetes, particularly in the context of data processing workflows Knowledge of Snowplow products and services Experience with data analytics platforms and tools Expertise with observability tools like Grafana and Sentry What We Offer You in Return: A competitive package, including share options Flexible working A generous holiday allowance no matter where you are in the More ❯
Posted:

Site Reliability Engineer (SRE) - Weekend Coverage

London, England, United Kingdom
Hybrid / WFH Options
Elwood Technologies
environment. Automate manual processes and workflows, reducing operational overhead. Work closely with engineering teams to design and deploy scalable, fault-tolerant infrastructure solutions on AWS or GCP . Improve observability by utilizing monitoring, logging, and alerting systems (e.g., CloudWatch , Datadog ). Lead post-incident reviews , contribute to the continuous improvement of system reliability and follow up on strategic fixes. Develop … you have experience of some or all of the following: Experience with client-impact triage , working cross-functionally with account managers or product teams. Proficiency with Datadog or similar observability platforms. Knowledge of serverless architectures (e.g., AWS Lambda, GCP Cloud Functions). Familiarity with RDBMS and NoSQL databases , such as RDS, CloudSQL, DynamoDB. Prior experience in fintech , trading platforms, or More ❯
Posted:

DevOps Engineer

London, England, United Kingdom
LSEG
and advocating for the best solutions that improve developer productivity and system efficiency. Infrastructure Automation & Management: Use Terraform/OpenTofu and automation frameworks to provision and manage infrastructure. Monitoring & Observability: Configure and utilise observability tools like Datadog for performance monitoring, alerting, and visualisation, ensuring system reliability and quick identification of issues. Performance Optimisation: Continuously monitor the performance of the tools More ❯
Posted:

Senior AWS Platform Engineer

London Area, United Kingdom
Hybrid / WFH Options
Arcus Search
practices with a strong emphasis on automation, self-service, and operational excellence. Tech You'll Use: Azure & AWS (production experience) Kubernetes (EKS preferred) Terraform & GitHub Actions CI/CD, observability tooling (Grafana, Prometheus), containerisation (Docker) What You'll Be Doing: Designing and implementing secure, resilient AWS infrastructure Building CI/CD pipelines and reusable deployment patterns Advising on cloud-native More ❯
Posted:

Senior AWS Platform Engineer

City of London, London, United Kingdom
Hybrid / WFH Options
Arcus Search
practices with a strong emphasis on automation, self-service, and operational excellence. Tech You'll Use: Azure & AWS (production experience) Kubernetes (EKS preferred) Terraform & GitHub Actions CI/CD, observability tooling (Grafana, Prometheus), containerisation (Docker) What You'll Be Doing: Designing and implementing secure, resilient AWS infrastructure Building CI/CD pipelines and reusable deployment patterns Advising on cloud-native More ❯
Posted:

Senior AWS Platform Engineer

London, England, United Kingdom
Hybrid / WFH Options
JR United Kingdom
and will help clients adopt modern DevOps practices with a strong emphasis on automation, self-service, and operational excellence. Tech You'll Use: Terraform & GitHub Actions CI/CD, observability tooling (Grafana, Prometheus), containerisation (Docker) What You'll Be Doing: Designing and implementing secure, resilient AWS infrastructure Building CI/CD pipelines and reusable deployment patterns Advising on cloud-native More ❯
Posted:

Messaging Administrator - Solace

London Area, United Kingdom
Marlin Selection Recruitment
For: 3+ years’ hands-on experience with Solace PubSub+ in a production environment Strong knowledge of WAN-based distributed systems and networking fundamentals Experience with Prometheus and Grafana for observability and alerting Confident in Linux/Unix systems and scripting (Bash, Python, etc.) Excellent problem-solving instincts and attention to detail Strong communicator who works well across technical teams Bonus More ❯
Posted:

Messaging Administrator - Solace

City of London, London, United Kingdom
Marlin Selection Recruitment
For: 3+ years’ hands-on experience with Solace PubSub+ in a production environment Strong knowledge of WAN-based distributed systems and networking fundamentals Experience with Prometheus and Grafana for observability and alerting Confident in Linux/Unix systems and scripting (Bash, Python, etc.) Excellent problem-solving instincts and attention to detail Strong communicator who works well across technical teams Bonus More ❯
Posted:

Messaging Administator - Solace

East London, London, United Kingdom
Marlin Selection
For: 3+ years hands-on experience with Solace PubSub+ in a production environment Strong knowledge of WAN-based distributed systems and networking fundamentals Experience with Prometheus and Grafana for observability and alerting Confident in Linux/Unix systems and scripting (Bash, Python, etc.) Excellent problem-solving instincts and attention to detail Strong communicator who works well across technical teams Bonus More ❯
Employment Type: Permanent
Posted:

Senior Cloud Engineer

London, England, United Kingdom
JR United Kingdom
governance compliance. Utilize AWS, containerization (e.g., Docker), and Infrastructure as Code tools like Terraform and Ansible for performance and cost optimization. Implement best practices in DevOps and DevSecOps, including observability, security, networking, API integration, and disaster recovery. Mentor junior engineers and contribute to technical leadership, preferably with experience in broadcast workflows, audio/video streaming, and Agile methodologies. Key Requirements More ❯
Posted:

DevOps Engineer

Branston, England, United Kingdom
Amtis Professional Ltd
CloudFormation or ARM templates Scripting & Automation - Proficient in PowerShell, Bash, or Python Infrastructure as Code (IaC) - Hands-on experience with Terraform, Bicep, or ARM Certified: Terraform Associate preferred Monitoring & Observability - Familiarity with tools like Azure Monitor, AWS CloudWatch, Prometheus, Grafana Security & Compliance - Strong understanding of IAM, cloud security, compliance frameworks For immediate consideration apply now! TPBN1_UKTJ More ❯
Posted:

DevOps Engineer

Burton-On-Trent, Staffordshire, West Midlands, United Kingdom
Amtis Professional Ltd
CloudFormation or ARM templates Scripting & Automation - Proficient in PowerShell, Bash, or Python Infrastructure as Code (IaC) - Hands-on experience with Terraform, Bicep, or ARM Certified: Terraform Associate preferred Monitoring & Observability - Familiarity with tools like Azure Monitor, AWS CloudWatch, Prometheus, Grafana Security & Compliance - Strong understanding of IAM, cloud security, compliance frameworks For immediate consideration apply now More ❯
Employment Type: Permanent
Salary: £60,000
Posted:

Cloud Developer with Security Clearance

Washington, Washington DC, United States
Linchpin Software
Infrastructure as Code and automation (e.g., CloudFormation, Terraform, Ansible, Python, Bash) 3) DevOps pipelines, CI/CD tooling, and containerization (e.g., GitLab, Jenkins, Docker, Kubernetes) 4) Monitoring and observability in production environments (e.g., CloudWatch, Splunk, Prometheus) 5) Security, cost optimization, and disaster recovery in cloud environments Ideal Experience: 1) Experience in managing live production workloads in AWS 5) Experience deploying More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:

Site Reliability Engineer

London, England, United Kingdom
SS&C Technologies Holdings
and postmortems to learn from system failures and prevent recurrence. Participate in on-call rotations and respond to incidents, minimising downtime and customer impact. Continuously improve deployment, configuration, and observability processes. Qualifications: Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience. Strong experience with Linux/Unix systems administration. Proficient in scripting and programming languages More ❯
Posted:

Site Reliability Engineer

Edinburgh, Scotland, United Kingdom
Hybrid / WFH Options
JR United Kingdom
ideally with Terraform or CloudFormation. Hands-on experience with CI/CD pipelines and automation tooling. Background in containerisation and orchestration – e.g., Docker, Kubernetes. Familiarity with monitoring, alerting, and observability tools (e.g., Prometheus, Grafana, CloudWatch). Proven ability to troubleshoot and resolve complex infrastructure issues. Experience working in cross-functional engineering teams, ideally in a DevOps or SRE capacity. Strong More ❯
Posted:

Remote Senior Site Reliability Engineer Manager (Remote)

Cambourne, Cambridgeshire, United Kingdom
Hybrid / WFH Options
Remotestar
strong track record of building and maintaining highly reliable infrastructure and services. Expertise in incident management, including incident response, resolution, and post-mortem analysis. Proficiency in monitoring, alerting, and observability tools such as Prometheus, Grafana, ELK stack or Datadog. Experience with cloud platforms such as AWS, Azure, or GCP, including infrastructure as code tools like Terraform or CloudFormation. Strong scripting More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior AWS Platform Engineer

London, England, United Kingdom
Hybrid / WFH Options
Arcus Search
and will help clients adopt modern DevOps practices with a strong emphasis on automation, self-service, and operational excellence. Tech You'll Use: Terraform & GitHub Actions CI/CD, observability tooling (Grafana, Prometheus), containerisation (Docker) What You'll Be Doing: Designing and implementing secure, resilient AWS infrastructure Building CI/CD pipelines and reusable deployment patterns Advising on cloud-native More ❯
Posted:

Platform Engineer

London, England, United Kingdom
Hybrid / WFH Options
Anson McCade Pty
to automate provisioning. • Deploy and manage Kubernetes solutions, including AKS, EKS, and OpenShift. • Implement DevSecOps practices, integrating CI/CD pipelines and security controls. • Optimize cloud environments using FinOps, observability tooling, and SRE methodologies. • Work closely with Cloud Architects, Engineers, and Business Leaders to build scalable, high-performance platforms. • Enhance networking and security capabilities across hybrid cloud environments. The ideal More ❯
Posted:

Manager, SRE

London, England, United Kingdom
Choreograph
some experience in a mentorship or managerial position. Strong knowledge of cloud platforms (AWS, GCP, Azure) and modern infrastructure technologies (Kubernetes, Docker, Terraform). Expertise in monitoring, logging, and observability tools (e.g., Prometheus, Grafana, Datadog, Splunk). Proficiency in at least one programming or scripting language (e.g., Python, Go, Bash). Deep understanding of networking, databases, and distributed systems. Strong More ❯
Posted:
Observability
10th Percentile
£57,500
25th Percentile
£65,000
Median
£80,000
75th Percentile
£97,500
90th Percentile
£120,000