Permanent Observability Job Vacancies

51 to 75 of 677 Permanent Observability Jobs

Senior AWS Engineer

Manchester, North West
Hybrid / WFH Options
BAE Systems
or DevOps Expertise in microservices and API design Docker, and container runtime platforms such as Kubernetes, EKS, ECS etc Strong understand of operational concepts on AWS, particularly monitoring and observability, FinOps Utilising CI/CD tools, such as Bamboo, Jenkins, TeamCity, Bitbucket, in order to streamline delivery of new features and fixes Continual testing of code using Automated Testing Frameworks More ❯
Employment Type: Permanent
Posted:

Senior Production Support Engineer

London, United Kingdom
TP ICAP Group
CI/CD pipelines, infrastructure as code (IaC), and automated testing. Experience with industry-standard monitoring tools (ITRS or similar) Proficiency in managing Kubernetes clusters, including deployment, scaling, storage, observability, and lifecycle management Understanding of financial regulations and reporting requirements in Europe such as MiFID II Person Profile The role will suit someone who relishes the prospect of supporting an More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior Site Reliability Engineer

London, United Kingdom
Hybrid / WFH Options
Stott and May
Tuesdays, Thursdays WFH) Pay: negotiable, inside IR35 We're looking for an experienced DevOps Engineer to join our team on a contract basis, with a focus on AWS infrastructure, observability tooling, and CI/CD automation. This is a hands-on role supporting high-availability systems, rapid deployments, and production incident response. Key Responsibilities - Manage and monitor AWS infrastructure for … performance and security - Respond to production incidents, perform root cause analysis, and implement fixes - Maintain observability tools (Prometheus, Grafana, Splunk) and write PromQL queries - Improve and operate CI/CD pipelines using GitHub Actions and Kubernetes - Automate infrastructure tasks with Python, Bash, Go or SQL - Work with Git-based workflows for infrastructure as code - Troubleshoot Kubernetes workloads and containerised services More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Head of SRE and Production Engineering (London)

London, UK
SS&C Technologies
reliability of cloud and hybrid infrastructure powering some of the most critical client-facing applications in financial services. You will be the strategic and operational leader for platform reliability, observability, incident response, CI/CD modernisation, and developer productivity. You will drive automation, lead with metrics, and build systems and teams that proactively address issues before they impact clients. Key … and services. Implement a comprehensive incident management lifecycle (on-call, escalation, RCA, blameless postmortems). Reduce Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR) through automated observability, alerting, and playbooks. CI/CD and Platform Engineering Oversee the development and evolution of CI/CD pipelines for all GIDS products using GitHub Actions, ArgoCD, TeamCity, Octopus Deploy … and GitOps principles. Integrate static and dynamic code analysis, vulnerability scanning, artifact promotion, and release gating into the SDLC. Ensure pipeline scalability and governance while maintaining developer velocity. Observability & Troubleshooting Lead the implementation and usage of modern observability stacks (e.g., OpenTelemetry, Prometheus, Grafana, Splunk, Datadog). Establish SLOs, SLIs, and error budgets with product and engineering teams. Drive root cause More ❯
Employment Type: Full-time
Posted:

Principal SRE Engineer

London, South East, England, United Kingdom
Robert Walters
incidents using data-driven decision making to minimise downtime and financial impact while leading root cause analysis and conducting blameless post-mortems.* Enhance application health monitoring by implementing robust observability solutions and automating manual processes to improve system resilience.* Drive cost optimisation initiatives and manage capacity resources to ensure efficient and scalable operations across all FX trading platforms.* Collaborate with … Deep technical expertise in Linux/Unix systems administration combined with strong SQL skills and proficiency in scripting languages such as Python or Java.* Demonstrated experience with monitoring and observability tools including Prometheus, Grafana, Splunk, Geneos, OpenTelemetry or Corvil is highly desirable.* Familiarity with cloud platforms as well as containerisation technologies like Kubernetes or Docker alongside CI/CD pipeline More ❯
Employment Type: Full-Time
Salary: £110,000 - £125,000 per annum
Posted:

Software Engineer (BE) - Banking Lab

Edinburgh, United Kingdom
Lloyds Banking Group
/IBM MQ). DevOps Principles: Understanding of DevOps principles and infrastructure as code tools (i.e., Terraform). Performance Tuning: Background in performance tuning, profiling, and monitoring Java applications. Observability and Monitoring: Solid experience with Observability and Monitoring tools (i.e., Splunk/Dynatrace). Leadership and Mentoring: Experience mentoring junior developers or leading small engineering teams. About working for us More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Software Engineer (BE) - Banking Lab

Edinburgh, United Kingdom
Hybrid / WFH Options
Lloyds Bank plc
/IBM MQ). DevOps Principles: Understanding of DevOps principles and infrastructure as code tools (i.e., Terraform). Performance Tuning: Background in performance tuning, profiling, and monitoring Java applications. Observability and Monitoring: Solid experience with Observability and Monitoring tools (i.e., Splunk/Dynatrace). Leadership and Mentoring: Experience mentoring junior developers or leading small engineering teams. About working for us More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Expert Manager Software Engineer

München, Bayern, Germany
Bain & Company
client developers on modern tooling and DevOps/cloud-native practices, ensuring sustainable ownership after Bain's engagement. Advance cloud-native & DevOps adoption. Champion containerization, infrastructure-as-code, automated observability and secure-by-design principles to improve scalability, reliability and security. Contribute to communities of practice. Share lessons learned and emerging technology trends through internal forums, brown-bag sessions and … Django, .NET Core or Java Spring Boot, including the design of RESTful and GraphQL/gRPC APIs. 3-4 years architecting and operating micro-service ecosystems, emphasizing service discovery, observability, CI/CD automation and blue-/green or canary deployments. Cloud-native delivery on AWS, Azure or GCP - adept with managed services, serverless patterns and infrastructure-as-code (Terraform More ❯
Employment Type: Permanent
Salary: EUR Annual
Posted:

DevOps Engineer with Security Clearance

San Diego, California, United States
Elite Government Strategy
experience leading enterprise backup and disaster recovery initiatives. Working knowledge of cloud-native storage solutions such as Longhorn. Strong Linux administration skills, particularly with RHEL environments. Experience implementing comprehensive observability solutions using Prometheus, Grafana, Loki, and related tools. Ability to establish and enforce security policies through tools like Open Policy Agent. Knowledge of identity management solutions such as Keycloak. Experience More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:

Senior Network Security Engineer

London, United Kingdom
CFP Energy (UK) Ltd
e.g., Slackbots and integrations) to streamline IT operations and business processes. Monitoring and Maintenance: Manage and maintain network security systems through system patches and periodic maintenance tasks. Establish comprehensive observability and proactive issue-resolution strategies using tools like SNMP, Syslog, Netflow, Elasticsearch (ELK Stack), and Grafana. Collaboration and Communication: Work with CyberEnergiateams to identify functional needs, develop secure architectures, and More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

DV Cleared DevOps Engineer

Bristol, Gloucestershire, United Kingdom
Hybrid / WFH Options
Curo Resourcing Ltd
domain adjacent technologies/services, such as: Docker, OpenShift, Kubernetes etc. Infrastructure as Code and CI/CD paradigms and systems such as: Ansible, Terraform, Jenkins, Bamboo, Concourse etc. Observability - SRE Big Data solutions (ecosystems) and technologies such as: Apache Spark and the Hadoop Ecosystem Excellent knowledge of YAML or similar languages The following Technical Skills & Experience would be desirable More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior GenAI Infrastructure Engineer

London, England, United Kingdom
Hybrid / WFH Options
BBC
as-Code with AWS CDK, CloudFormation to provision and manage cloud environments. Build and maintain CI/CD pipelines using GitHub Actions, AWS CodePipeline, CodeBuild, Jenkins. Integrate monitoring and observability tools such as AWS CloudWatch, Prometheus, Grafana for infrastructure and model health tracking. Ensure software quality through Test-Driven Development (TDD), unit testing frameworks (e.g., pytest, unittest), and automated integration More ❯
Posted:

Software Engineer - Pensions, ISA & Investments

Bristol, Avon, South West, United Kingdom
Hybrid / WFH Options
Hargreaves Lansdown
Experience with unit, integration, and end to end testing tools and practices (e.g. Jest, Cypress, Backstop, Playwright). Experience with CI/CD and Trunk Based Development. Experience with observability tools and practices, including monitoring, logging, and tracing to ensure system reliability and performance. Understanding of Microservices & principles of RESTful API development, including structuring, documenting, versioning, testing and stubbing/ More ❯
Employment Type: Permanent, Part Time
Salary: £65,000
Posted:

Senior Data Engineer

London, United Kingdom
Hybrid / WFH Options
VivaCity
mentoring engineers and collaborating with stakeholders. Proven ability to resolve technical incidents in unfamiliar production systems. Technical and process documentation champion. Experience of operationally managing production software components, including observability, logging, metrics, error reporting, debugging, and live incident management. Your time will be spent roughly as follows: 60% - Proactive technical work (e.g. migrating DB hosting provider, new message bus system More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

AWS Engineer

London
Hybrid / WFH Options
BAE Systems
or DevOps Expertise in microservices and API design Docker, and container runtime platforms such as Kubernetes, EKS, ECS etc Strong understand of operational concepts on AWS, particularly monitoring and observability, FinOps Utilising CI/CD tools, such as Bamboo, Jenkins, TeamCity, Bitbucket, in order to streamline delivery of new features and fixes Continual testing of code using Automated Testing Frameworks More ❯
Employment Type: Permanent
Posted:

AWS Engineer

Manchester, North West
Hybrid / WFH Options
BAE Systems
or DevOps Expertise in microservices and API design Docker, and container runtime platforms such as Kubernetes, EKS, ECS etc Strong understand of operational concepts on AWS, particularly monitoring and observability, FinOps Utilising CI/CD tools, such as Bamboo, Jenkins, TeamCity, Bitbucket, in order to streamline delivery of new features and fixes Continual testing of code using Automated Testing Frameworks More ❯
Employment Type: Permanent
Posted:

AWS DevOps Engineer

Bristol, Gloucestershire, United Kingdom
Hybrid / WFH Options
Leidos
and managing backup, recovery, and disaster recovery strategies to ensure data protection and business continuity Ability to implement robust monitoring and logging solutions e.g., CloudWatch, to ensure system reliability, observability, and proactive incident response Comfortable working in Agile development teams, translating business requirements into technical solutions, and actively participating in sprint planning, retrospectives, and daily stand-ups Capability to design More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior GenAI Infrastructure Engineer

United Kingdom
Hybrid / WFH Options
BBC Group and Public Services
as-Code with AWS CDK, CloudFormation to provision and manage cloud environments. Build and maintain CI/CD pipelines using GitHub Actions, AWS CodePipeline, CodeBuild, Jenkins. Integrate monitoring and observability tools such as AWS CloudWatch, Prometheus, Grafana for infrastructure and model health tracking. Ensure software quality through Test-Driven Development (TDD), unit testing frameworks (e.g., pytest, unittest), and automated integration More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Sr. MLOps/GenAI Infrastructure Engineer

United Kingdom
Hybrid / WFH Options
BBC Group and Public Services
as-Code with AWS CDK, CloudFormation to provision and manage cloud environments. Build and maintain CI/CD pipelines using GitHub Actions, AWS CodePipeline, CodeBuild, Jenkins. Integrate monitoring and observability tools such as AWS CloudWatch, Prometheus, Grafana for infrastructure and model health tracking. Ensure software quality through Test-Driven Development (TDD), unit testing frameworks (e.g., pytest, unittest), and automated integration More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Cloud Observability Engineer

United Kingdom
Barclays
Join us as a Cloud Observability Engineer at Barclays, where you will lead our enterprise observability strategy across multi-cloud environments. This senior role combines technical leadership with team management, driving operational excellence while architecting resilient solutions and mentoring high-performing teams. To be successful as a Cloud Observability Engineer, you should have experience with The ability to lead and … scale technical teams in multi-faceted governance environments AWS/Azure cloud platforms and enterprise observability tools (Elastic, Grafana, Splunk, DataDog, or similar) SRE/DevOps methodologies with Python proficiency for automation and infrastructure-as-code practices Some other highly valued skills may include AWS or Azure cloud certifications Experience implementing AI-driven observability and AIOps solutions Background in large More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Technical Account Manager

London, United Kingdom
Coralogix, inc
us on our journey to revolutionize observability. In 2023, Dun & Bradstreet ranked Coralogix as one of the best tech startups to work for. Coralogix is a modern, full-stack observability platform transforming how businesses process and understand their data. Our unique architecture powers in-stream analytics without reliance on expensive indexing or hot storage. We specialize in comprehensive monitoring of … logs, metrics, trace and security events with features such as APM, RUM, SIEM, Kubernetes monitoring and more, all enhancing operational efficiency and reducing observability spend by up to 70%. Technical Account Managers in Coralogix are key in our effort to meet our customer's expectations and help them utilize their observability and security data in the most efficient way … looking for hard-working, sharp, and humble professionals with proven technical customer-facing experience. Our Technical Account Managers are trusted advisors and consult our customers upon their monitoring, security & observability journey. This role embodies the critical intersection of very high technical expertise and a focus on customer satisfaction, renewal and expansion. Technical Account Managers are senior-level roles and are More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior Cloud / Platform Engineer

Belfast, United Kingdom
Hybrid / WFH Options
Kadence Limited
operations. Manage and enhance our container orchestration stack using Kubernetes (EKS) and Docker. Develop and maintain robust, scalable CI/CD pipelines with Jenkins, GitHub Actions, and ArgoCD. Strengthen observability across the platform through effective monitoring, logging, and alerting (AWS services, Grafana, etc). Contribute to platform security through infrastructure hardening, role-based access controls, and infrastructure as code (Terraform … CI/CD pipelines using Jenkins, GitHub Actions, and/or ArgoCD. Familiarity with infrastructure as code practices using Terraform, CloudFormation, or similar tools. A solid grasp of system observability, monitoring, and alerting practices (CloudWatch, Grafana, or equivalent). Exposure to platform security principles including identity/access management, secrets handling, and environment isolation. Strong scripting and automation skills (e.g. … Database: MySQL (Aurora DB), Redis (ElastiCache), MongoDB (AWS DocumentDB). Cloud & DevOps: AWS (20+ services), Kubernetes (EKS), Docker, Infrastructure as Code (CloudFormation, Terraform), CI/CD (Jenkins, GitHub Actions), Observability (AWS, Grafana). Development tools: GitHub, Jira, Notion, ChatGPT, Gemini, LangChain, AI-native IDE's (Cursor, JetBrains), LLM-powered internal tools. Test automation: Cypress (E2E), Postman (API), Jest (frontend unit More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Lead Backend Engineer (m/f/d)

München, Bayern, Germany
Hybrid / WFH Options
Peter Park System GmbH
Architect for Scale & Resilience: Make critical decisions on system design and performance to support a growing platform with increasing complexity and scale. Elevate Operational Maturity: Lead improvements to monitoring, observability, and developer workflows - ensuring backend systems are resilient and teams can ship confidently. Embed Security by Design: Take responsibility for backend security posture, ensuring systems meet best practices and compliance … and SQS. Infrastructure as Code: Experience with Terraform or similar tools for infrastructure automation. High-Throughput Systems: Strong experience in real production projects handling large-scale data flows. Monitoring & Observability: Proficiency in tools like Datadog, Prometheus, and Grafana. Security & Networking: Solid understanding of networking principles, security best practices, and cloud security. Agile & Fast-Paced Environments: Experience in agile teams, working More ❯
Employment Type: Permanent
Salary: EUR Annual
Posted:

Messaging Administator - Solace

East London, London, United Kingdom
Marlin Selection
For: 3+ years hands-on experience with Solace PubSub+ in a production environment Strong knowledge of WAN-based distributed systems and networking fundamentals Experience with Prometheus and Grafana for observability and alerting Confident in Linux/Unix systems and scripting (Bash, Python, etc.) Excellent problem-solving instincts and attention to detail Strong communicator who works well across technical teams Bonus More ❯
Employment Type: Permanent
Posted:

DevOps Engineer

Burton-On-Trent, Staffordshire, West Midlands, United Kingdom
Amtis Professional Ltd
CloudFormation or ARM templates Scripting & Automation - Proficient in PowerShell, Bash, or Python Infrastructure as Code (IaC) - Hands-on experience with Terraform, Bicep, or ARM Certified: Terraform Associate preferred Monitoring & Observability - Familiarity with tools like Azure Monitor, AWS CloudWatch, Prometheus, Grafana Security & Compliance - Strong understanding of IAM, cloud security, compliance frameworks For immediate consideration apply now More ❯
Employment Type: Permanent
Salary: £60,000
Posted:
Observability
10th Percentile
£57,500
25th Percentile
£65,000
Median
£80,000
75th Percentile
£97,500
90th Percentile
£120,000