Observability Job Vacancies

101 to 125 of 2,194 Observability Jobs

Senior Platform Engineer | London, UK

London, England, United Kingdom
YouLend
CI/CD pipelines (Jenkins, GitHub Actions) Define and enforce platform standards across environments (dev, staging, prod) Collaborate with developers and DevOps on deployment tooling and security Enable platform observability using tools like Datadog, Prometheus, and CloudWatch Maintain Helm charts and Terraform modules for shared infrastructure Contribute to onboarding documentation and platform adoption practices Participate in incident response and postmortem … containerisation using Docker and secure image management Scripting or programming experience in Bash, Python, or TypeScript Strong understanding of GitOps practices and infrastructure lifecycle management Desirable Skills Experience with observability tooling (Datadog, Prometheus, Fluent Bit) Knowledge of admission controllers, OPA/Gatekeeper (optional for governance) Familiarity with cloud cost optimisation and Kubernetes scaling strategies Exposure to security scanning tools (tfsec More ❯
Posted:

Infrastructure Engineer

London, England, United Kingdom
Hybrid / WFH Options
Keyrock
and optimize Kubernetes clusters for containerized applications, ensuring high availability and security. Automation & CI/CD: Implement and manage CI/CD pipelines for efficient deployment, testing, and monitoring. Observability & Monitoring: Develop monitoring solutions with tools like Prometheus, Grafana, ELK stack to enhance system reliability. Security & Compliance: Apply best practices for cloud security, IAM policies, and compliance standards (SOC2, ISO … Hands-on Kubernetes experience (EKS, K3s, or self-managed). Proficiency in scripting with Python, Bash, or Go. Experience with Infrastructure as Code (Terraform, CloudFormation, Ansible). Familiarity with observability tools (Prometheus, Grafana, Datadog, ELK). Solid understanding of networking (VPC, Load Balancers, DNS, Firewalls). Experience with DevOps, CI/CD, and GitOps practices. Experience with high-performance, low More ❯
Posted:

Principal SRE Engineer

London, South East, England, United Kingdom
Robert Walters
incidents using data-driven decision making to minimise downtime and financial impact while leading root cause analysis and conducting blameless post-mortems.* Enhance application health monitoring by implementing robust observability solutions and automating manual processes to improve system resilience.* Drive cost optimisation initiatives and manage capacity resources to ensure efficient and scalable operations across all FX trading platforms.* Collaborate with … Deep technical expertise in Linux/Unix systems administration combined with strong SQL skills and proficiency in scripting languages such as Python or Java.* Demonstrated experience with monitoring and observability tools including Prometheus, Grafana, Splunk, Geneos, OpenTelemetry or Corvil is highly desirable.* Familiarity with cloud platforms as well as containerisation technologies like Kubernetes or Docker alongside CI/CD pipeline More ❯
Employment Type: Full-Time
Salary: £110,000 - £125,000 per annum
Posted:

Build and Release Engineer

London, England, United Kingdom
Hybrid / WFH Options
Scopely
collaboration and stakeholder management Development of automation tools and processes targeting reproducibility of procedures and development efficiency Monitoring, auditing and reporting of the Build systems and processes, by incorporating observability and alerts through all CICD lifecycle and infrastructure Participate code reviews, development processes related with CICD pipelines and automation tools to improve the effectiveness of engineering team members What We … processes, including CI/CD best practices, specifically for Unity 3D games Professional experience and high proficiency in programing languages and scripting for automation (i.e. python, bash) Experience with observability tools (ELK, Grafana, Prometheus, Datadog) to monitor and alert CICD stability Experience with version control systems, such as Git, and build management tools such as Jenkins, GitLab, Maven or Gradle More ❯
Posted:

Lead/Principle Python Engineer for Generative AI Backend Development

London, England, United Kingdom
Hybrid / WFH Options
Trimble
orchestrate LLM-based agents. Working with RAG frameworks: Use techniques such as chunking, hybrid search, query translation, similarity search, vector DBs, evaluation metrics, and ANN algorithms. Monitoring performance: Using observability services such as Datadog and Databricks for LLM Observability and analytics. Keep track of latest research: Given that this is a fast evolving field, it’s important to keep track More ❯
Posted:

DevOps Engineer (UK or Canada)

London, England, United Kingdom
TrustFlight
operational challenges of supporting SaaS platforms at scale. Demonstrated application of security best practices and DevSecOps principles across infrastructure and deployment lifecycles. Experience applying modern AI tools to enhance observability, operational workflows, or support processes—paired with a solid understanding of their capabilities and limitations. Deep understanding of containerization, orchestration, and virtualization technologies, including Kubernetes, Docker, and related tools. Proficiency … you stand out Experience with GCP or multi-cloud environments. Exposure to GitOps workflows and tools like ArgoCD or Kustomize. Knowledge of .NET applications in cloud settings. Familiarity with observability stacks (e.g., Grafana, ELK, Prometheus). Understanding of compliance frameworks like SOC 2 or ISO 27001. Use of AI tools for enhancing operational efficiency. Experience with SIEM integration and incident More ❯
Posted:

Expert Manager Software Engineer

München, Bayern, Germany
Bain & Company
client developers on modern tooling and DevOps/cloud-native practices, ensuring sustainable ownership after Bain's engagement. Advance cloud-native & DevOps adoption. Champion containerization, infrastructure-as-code, automated observability and secure-by-design principles to improve scalability, reliability and security. Contribute to communities of practice. Share lessons learned and emerging technology trends through internal forums, brown-bag sessions and … Django, .NET Core or Java Spring Boot, including the design of RESTful and GraphQL/gRPC APIs. 3-4 years architecting and operating micro-service ecosystems, emphasizing service discovery, observability, CI/CD automation and blue-/green or canary deployments. Cloud-native delivery on AWS, Azure or GCP - adept with managed services, serverless patterns and infrastructure-as-code (Terraform More ❯
Employment Type: Permanent
Salary: EUR Annual
Posted:

Senior FX Production Support Engineer

City of London, London, United Kingdom
Radley James
services environment Strong technical skills in Linux/Unix systems, SQL, and scripting Strong experience with a programming language such as Python, Java, etc Strong experience with monitoring and observability tools (Prometheus, Grafana, Splunk, Geneos, OpenTelemetry, Corvil) Familiarity with cloud platforms, containerization (e.g., Kubernetes, Docker), and CI (Continuous Integration)/CD (continuous Delivery) pipelines Strong understanding of the trade lifecycle More ❯
Posted:

Senior FX Production Support Engineer

London Area, United Kingdom
Radley James
services environment Strong technical skills in Linux/Unix systems, SQL, and scripting Strong experience with a programming language such as Python, Java, etc Strong experience with monitoring and observability tools (Prometheus, Grafana, Splunk, Geneos, OpenTelemetry, Corvil) Familiarity with cloud platforms, containerization (e.g., Kubernetes, Docker), and CI (Continuous Integration)/CD (continuous Delivery) pipelines Strong understanding of the trade lifecycle More ❯
Posted:

DevOps Engineer

Portsmouth, England, United Kingdom
Hybrid / WFH Options
Trust In SODA
through the entire development life cycle. Infrastructure-as-code Bash Delivery methods and techniques, including agile scrum experience. Desirable Skills: RedHat OpenShift Hashicorp (such as Terraform, Packer, Vault) Ansible Observability (such as Prometheus, Grafana, Splunk) Containerised services (such as Postgres, Redis, Kafka, Keycloak, Elk) Experience of doing all the above at OS or S level YAML based pipelines. Immutable infrastructure More ❯
Posted:

Head of Infrastructure & Cloud

Watford, England, United Kingdom
Halian Technology Limited
managing CI/CD pipelines, Docker containers, and security-first deployment pipelines. Implement high-availability systems and disaster recovery for business continuity across time zones and territories. Maintain system observability and monitoring to proactively identify issues and optimize system health. Ensure compliance with security standards and data privacy regulations across regions. Manage third-party vendors, licenses, and infrastructure budgets. Required More ❯
Posted:

DevOps Engineer with Security Clearance

Moorestown, New Jersey, United States
Elite Government Strategy
experience leading enterprise backup and disaster recovery initiatives. Working knowledge of cloud-native storage solutions such as Longhorn. Strong Linux administration skills, particularly with RHEL environments. Experience implementing comprehensive observability solutions using Prometheus, Grafana, Loki, and related tools. Ability to establish and enforce security policies through tools like Open Policy Agent. Knowledge of identity management solutions such as Keycloak. Experience More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:

Senior Site Reliability Engineer – FinTech / Global Payments – London HQ / Remote First

London, England, United Kingdom
Hybrid / WFH Options
Future Talent Group
resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Strong Linux and networking fundamentals (TCP, DNS, TLS, HTTP More ❯
Posted:

Site Reliability Engineer

London, England, United Kingdom
SS&C Technologies
and postmortems to learn from system failures and prevent recurrence. Participate in on-call rotations and respond to incidents, minimising downtime and customer impact. Continuously improve deployment, configuration, and observability processes. Qualifications: Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience. Strong experience with Linux/Unix systems administration. Proficient in scripting and programming languages More ❯
Posted:

Senior Network Security Engineer

London, United Kingdom
CFP Energy (UK) Ltd
e.g., Slackbots and integrations) to streamline IT operations and business processes. Monitoring and Maintenance: Manage and maintain network security systems through system patches and periodic maintenance tasks. Establish comprehensive observability and proactive issue-resolution strategies using tools like SNMP, Syslog, Netflow, Elasticsearch (ELK Stack), and Grafana. Collaboration and Communication: Work with CyberEnergiateams to identify functional needs, develop secure architectures, and More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior DevOps Engineer (SC Cleared)

London, England, United Kingdom
JR United Kingdom
Experience working in Agile teams using Git , Jira , and Confluence Eligible for SC and NPPV3 clearance Container orchestration with Kubernetes Experience with HashiCorp tools: Vault , Consul , Packer Monitoring and observability with Grafana , Prometheus , or similar Knowledge of cloud networking, VPCs, NAT Gateways, security groups, etc. Personal Attributes: Proactive, self-driven, and passionate about technology Strong problem-solving skills Collaborative team More ❯
Posted:

Site Reliability Engineer III

Glasgow, Scotland, United Kingdom
JPMorgan Chase & Co
recognize road blocks and demonstrates interest in learning technology that facilitates innovation Experience with continuous integration and continuous delivery tools like Jenkins, GitLab, Terraform Experience in at least one observability tool such as Dynatrace, Datadog, New Relic, CloudWatch, AppDynamics, Splunk., Preferred Qualification Experience a plus in common SRE toolchains: Grafana, Prometheus, Elasticsearch, Kibana, Jaeger. #J-18808-Ljbffr More ❯
Posted:

Applications Support Senior Analyst - AVP (Belfast)

Belfast, Northern Ireland, United Kingdom
Citigroup Inc
the business succeed. Provide timely and effective technical support for end users of a designated set of DevOps tools, encompassing traditional tools (e.g., CI/CD platforms, monitoring and observability tools, source code management systems) and GenAI-powered tools. Troubleshoot and resolve complex technical issues involving in-depth analysis of logs, configurations, system behaviour. Proactively monitor the health, performance, and More ❯
Posted:

DV Cleared DevOps Engineer

Bristol, Gloucestershire, United Kingdom
Hybrid / WFH Options
Curo Resourcing Ltd
domain adjacent technologies/services, such as: Docker, OpenShift, Kubernetes etc. Infrastructure as Code and CI/CD paradigms and systems such as: Ansible, Terraform, Jenkins, Bamboo, Concourse etc. Observability - SRE Big Data solutions (ecosystems) and technologies such as: Apache Spark and the Hadoop Ecosystem Excellent knowledge of YAML or similar languages The following Technical Skills & Experience would be desirable More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Cloud Infrastructure Engineer

City of London, London, United Kingdom
Ultralytics
architectures , as described by thought leaders like Martin Fowler. Hands-on experience building and maintaining complex CI/CD pipelines , preferably with GitHub Actions . Familiarity with monitoring and observability tools (e.g., Prometheus, Grafana, Google Cloud's operations suite). A solid understanding of networking principles and cloud security best practices. Experience with other cloud platforms like Amazon Web Services More ❯
Posted:

Cloud Infrastructure Engineer

London Area, United Kingdom
Ultralytics
architectures , as described by thought leaders like Martin Fowler. Hands-on experience building and maintaining complex CI/CD pipelines , preferably with GitHub Actions . Familiarity with monitoring and observability tools (e.g., Prometheus, Grafana, Google Cloud's operations suite). A solid understanding of networking principles and cloud security best practices. Experience with other cloud platforms like Amazon Web Services More ❯
Posted:

Senior Platform Engineer - DevEx

London, England, United Kingdom
Hybrid / WFH Options
9fin
as possible. Designing and implementing a developer portal (eg. Backstage), to provide a service catalog to the engineering team, and also author many other useful DevOps plugins. Contributing to observability best practices and providing key SLI/SLO metric reporting, so that the engineering team can balance velocity and reliability. Develop inner/open source projects to help provide a More ❯
Posted:

FX Production Engineer

London, England, United Kingdom
Hybrid / WFH Options
Deutsche Bank
services environment Strong technical skills in Linux/Unix systems, SQL, and scripting Strong experience with a programming language such as Python, Java, etc Strong experience with monitoring and observability tools (Prometheus, Grafana, Splunk, Geneos, OpenTelemetry, Corvil) Familiarity with cloud platforms, containerization (e.g., Kubernetes, Docker), and CI (Continuous Integration)/CD (continuous Delivery) pipelines Strong understanding of the trade lifecycle More ❯
Posted:

Staff Site Reliability Engineer

Belfast, Northern Ireland, United Kingdom
Hybrid / WFH Options
CME Group Inc
both independently and collaboratively. Key Responsibilities: Collaborate with senior SREs and Product engineering teams to monitor, maintain, and troubleshoot our Markets systems. Collaborate with Product teams to continuously improve observability and alerting of our applications to enable data-driven business decision, faster issue detection and incident resolution. Take accountability for delivery of moderately-complex features. Lead technical discussions for own More ❯
Posted:

Cloud Infrastructure Engineer

London, England, United Kingdom
Ultralytics
architectures , as described by thought leaders like Martin Fowler. Hands-on experience building and maintaining complex CI/CD pipelines , preferably with GitHub Actions . Familiarity with monitoring and observability tools (e.g., Prometheus, Grafana, Google Cloud's operations suite). A solid understanding of networking principles and cloud security best practices. Experience with other cloud platforms like Amazon Web Services More ❯
Posted:
Observability
10th Percentile
£57,500
25th Percentile
£65,000
Median
£80,000
75th Percentile
£97,500
90th Percentile
£120,000