reliability of cloud and hybrid infrastructure powering some of the most critical client-facing applications in financial services. You will be the strategic and operational leader for platform reliability, observability, incident response, CI/CD modernisation, and developer productivity. Why Join SS&C GIDS? Lead mission-critical infrastructure for a globally recognised financial technology provider. Influence the technical direction of … and services. Implement a comprehensive incident management lifecycle (on-call, escalation, RCA, blameless postmortems). Reduce Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR) through automated observability, alerting, and playbooks. CI/CD and Platform Engineering Oversee the development and evolution of CI/CD pipelines for all GIDS products using GitHub Actions, ArgoCD, TeamCity, Octopus Deploy … and GitOps principles. Integrate static and dynamic code analysis, vulnerability scanning, artifact promotion, and release gating into the SDLC. Ensure pipeline scalability and governance while maintaining developer velocity. Observability & Troubleshooting Lead the implementation and usage of modern observability stacks (e.g., OpenTelemetry, Prometheus, Grafana, Splunk, Datadog). Establish SLOs, SLIs, and error budgets with product and engineering teams. Drive root cause More ❯
Site Reliability Engineer (SRE) who will be responsible for ensuring the reliability, scalability, and performance of our cloud infrastructure and applications. This role focuses on maintaining and improving system observability, automating operations, and enhancing deployment practices to support business-critical services. Reporting directly to the Lead Site Reliability Engineer, you will be expected to work independently while collaborating closely with … learning and improving performance based on set targets will be expected. VARIED DAY TO DAY RESPONSIBILITIES Ensuring system reliability, performance, and scalability through monitoring and automation Building and maintaining observability solutions using Grafana, Prometheus, Loki, OpenTelemetry Proactively identifying and resolving performance bottlenecks and infrastructure issues Automating infrastructure provisioning, configuration management, and deployments Implementing effective logging, monitoring, and alerting strategies Managing … efficiency WHAT ARE WE LOOKING FOR IN A CANDIDATE? Experience with SRE principles, such as incident management, error budgets, and service-level objectives (SLOs) Experience designing and implementing robust observability, monitoring and logging solutions Strong proficiency with observability and monitoring tools such as Grafana, Prometheus, and Loki Strong experience with distributed tracing and telemetry tools such as OpenTelemetry An understanding More ❯
skills to identify and solve issues quickly to avoid financial losses Partner with product engineering teams to ensure the AI/ML systems are reliable and high performing. Develop observability, security, automation and fin-ops tools and orchestration. Provide strategic technology leadership by defining and evaluating standards and architecture for reliability, observability and automation frameworks. Build strong cross-functional relationships … as (e.g., Python, Java Spring Boot, .Net, etc.) Deep knowledge of software applications and technical processes with emerging depth in one or more technical disciplines Proficiency and experience in observability such as white and black box monitoring, SLO alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, etc. Proficiency in continuous integration and continuous delivery tools … capabilities, and skills Prior experience working in AI, ML, or Data engineering. Expertise in container orchestration/Kubernetes. Prior experience developing Automation frameworks/AI Ops Prior experience building observability and telemetry tools. About Us J.P. Morgan is a global leader in financial services, providing strategic advice and products to the world’s most prominent corporations, governments, wealthy individuals and More ❯
reliability of cloud and hybrid infrastructure powering some of the most critical client-facing applications in financial services. You will be the strategic and operational leader for platform reliability, observability, incident response, CI/CD modernisation, and developer productivity. Why Join SS&C GIDS? Lead mission-critical infrastructure for a globally recognised financial technology provider. Influence the technical direction of … and services. Implement a comprehensive incident management lifecycle (on-call, escalation, RCA, blameless postmortems). Reduce Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR) through automated observability, alerting, and playbooks. CI/CD and Platform Engineering Oversee the development and evolution of CI/CD pipelines for all GIDS products using GitHub Actions, ArgoCD, TeamCity, Octopus Deploy … and GitOps principles. Integrate static and dynamic code analysis, vulnerability scanning, artifact promotion, and release gating into the SDLC. Ensure pipeline scalability and governance while maintaining developer velocity. Observability & Troubleshooting Lead the implementation and usage of modern observability stacks (e.g., OpenTelemetry, Prometheus, Grafana, Splunk, Datadog). Establish SLOs, SLIs, and error budgets with product and engineering teams. Drive root cause More ❯
London, England, United Kingdom Hybrid / WFH Options
Stott and May
Tuesdays, Thursdays WFH) Pay: negotiable, inside IR35 We're looking for an experienced DevOps Engineer to join our team on a contract basis, with a focus on AWS infrastructure, observability tooling, and CI/CD automation. This is a hands-on role supporting high-availability systems, rapid deployments, and production incident response. Key Responsibilities Manage and monitor AWS infrastructure for … performance and security Respond to production incidents, perform root cause analysis, and implement fixes Maintain observability tools (Prometheus, Grafana, Splunk) and write PromQL queries Improve and operate CI/CD pipelines using GitHub Actions and Kubernetes Automate infrastructure tasks with Python, Bash, Go or SQL Work with Git-based workflows for infrastructure as code Troubleshoot Kubernetes workloads and containerised services More ❯
throughput applications Develop and refine automation solutions using Ansible, Python, and Terraform Troubleshoot hardware, networking, and performance issues in various environments Deploy monitoring and log aggregation tools to improve observability Collaborate with teams to identify bottlenecks and deploy scalable, automated solutions What We're Looking For: 6+ years of Linux system administration and engineering experience in performance-critical environments Proficiency … in Python and bash Scripting, with hands-on Ansible experience Familiarity with observability tools like Prometheus, Grafana, and ELK Infrastructure-as-code experience with Terraform and CI/CD pipelines Proven ability to resolve complex system-level issues and performance challenges Knowledge of container orchestration tools (Docker/containers, Kubernetes) Experience with GPU server deployments Exposure to AWS services and More ❯
London, England, United Kingdom Hybrid / WFH Options
Keyrock
and optimize Kubernetes clusters for containerized applications, ensuring high availability and security. Automation & CI/CD: Implement and manage CI/CD pipelines for efficient deployment, testing, and monitoring. Observability & Monitoring: Develop monitoring solutions with tools like Prometheus, Grafana, ELK stack to enhance system reliability. Security & Compliance: Apply best practices for cloud security, IAM policies, and compliance standards (SOC2, ISO … Hands-on Kubernetes experience (EKS, K3s, or self-managed). Proficiency in scripting with Python, Bash, or Go. Experience with Infrastructure as Code (Terraform, CloudFormation, Ansible). Familiarity with observability tools (Prometheus, Grafana, Datadog, ELK). Solid understanding of networking (VPC, Load Balancers, DNS, Firewalls). Experience with DevOps, CI/CD, and GitOps practices. Experience with high-performance, low More ❯
London, England, United Kingdom Hybrid / WFH Options
Scopely
collaboration and stakeholder management Development of automation tools and processes targeting reproducibility of procedures and development efficiency Monitoring, auditing and reporting of the Build systems and processes, by incorporating observability and alerts through all CICD lifecycle and infrastructure Participate code reviews, development processes related with CICD pipelines and automation tools to improve the effectiveness of engineering team members What We … processes, including CI/CD best practices, specifically for Unity 3D games Professional experience and high proficiency in programing languages and scripting for automation (i.e. python, bash) Experience with observability tools (ELK, Grafana, Prometheus, Datadog) to monitor and alert CICD stability Experience with version control systems, such as Git, and build management tools such as Jenkins, GitLab, Maven or Gradle More ❯
operational challenges of supporting SaaS platforms at scale. Demonstrated application of security best practices and DevSecOps principles across infrastructure and deployment lifecycles. Experience applying modern AI tools to enhance observability, operational workflows, or support processes—paired with a solid understanding of their capabilities and limitations. Deep understanding of containerization, orchestration, and virtualization technologies, including Kubernetes, Docker, and related tools. Proficiency … you stand out Experience with GCP or multi-cloud environments. Exposure to GitOps workflows and tools like ArgoCD or Kustomize. Knowledge of .NET applications in cloud settings. Familiarity with observability stacks (e.g., Grafana, ELK, Prometheus). Understanding of compliance frameworks like SOC 2 or ISO 27001. Use of AI tools for enhancing operational efficiency. Experience with SIEM integration and incident More ❯
Portsmouth, England, United Kingdom Hybrid / WFH Options
Trust In SODA
through the entire development life cycle. Infrastructure-as-code Bash Delivery methods and techniques, including agile scrum experience. Desirable Skills: RedHat OpenShift Hashicorp (such as Terraform, Packer, Vault) Ansible Observability (such as Prometheus, Grafana, Splunk) Containerised services (such as Postgres, Redis, Kafka, Keycloak, Elk) Experience of doing all the above at OS or S level YAML based pipelines. Immutable infrastructure More ❯
London, England, United Kingdom Hybrid / WFH Options
Future Talent Group
resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Strong Linux and networking fundamentals (TCP, DNS, TLS, HTTP More ❯
GCP cloud platforms Working knowledge of CI/CD tooling and practices (GitHub Actions, Jenkins, etc.) Experience with Infrastructure as Code (Terraform, CloudFormation) preferred Understanding of monitoring, logging, and observability tools Solid grasp of software development best practices (testing, code quality, documentation) Experience with modern frontend frameworks (React, Vue, Angular) a plus Background with AI/ML systems integration preferred More ❯
and postmortems to learn from system failures and prevent recurrence. Participate in on-call rotations and respond to incidents, minimising downtime and customer impact. Continuously improve deployment, configuration, and observability processes. Qualifications: Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience. Strong experience with Linux/Unix systems administration. Proficient in scripting and programming languages More ❯
e.g., Slackbots and integrations) to streamline IT operations and business processes. Monitoring and Maintenance: Manage and maintain network security systems through system patches and periodic maintenance tasks. Establish comprehensive observability and proactive issue-resolution strategies using tools like SNMP, Syslog, Netflow, Elasticsearch (ELK Stack), and Grafana. Collaboration and Communication: Work with CyberEnergiateams to identify functional needs, develop secure architectures, and More ❯
Experience working in Agile teams using Git , Jira , and Confluence Eligible for SC and NPPV3 clearance Container orchestration with Kubernetes Experience with HashiCorp tools: Vault , Consul , Packer Monitoring and observability with Grafana , Prometheus , or similar Knowledge of cloud networking, VPCs, NAT Gateways, security groups, etc. Personal Attributes: Proactive, self-driven, and passionate about technology Strong problem-solving skills Collaborative team More ❯
Bristol, Gloucestershire, United Kingdom Hybrid / WFH Options
Curo Resourcing Ltd
domain adjacent technologies/services, such as: Docker, OpenShift, Kubernetes etc. Infrastructure as Code and CI/CD paradigms and systems such as: Ansible, Terraform, Jenkins, Bamboo, Concourse etc. Observability - SRE Big Data solutions (ecosystems) and technologies such as: Apache Spark and the Hadoop Ecosystem Excellent knowledge of YAML or similar languages The following Technical Skills & Experience would be desirable More ❯
architectures , as described by thought leaders like Martin Fowler. Hands-on experience building and maintaining complex CI/CD pipelines , preferably with GitHub Actions . Familiarity with monitoring and observability tools (e.g., Prometheus, Grafana, Google Cloud's operations suite). A solid understanding of networking principles and cloud security best practices. Experience with other cloud platforms like Amazon Web Services More ❯
architectures , as described by thought leaders like Martin Fowler. Hands-on experience building and maintaining complex CI/CD pipelines , preferably with GitHub Actions . Familiarity with monitoring and observability tools (e.g., Prometheus, Grafana, Google Cloud's operations suite). A solid understanding of networking principles and cloud security best practices. Experience with other cloud platforms like Amazon Web Services More ❯
London, England, United Kingdom Hybrid / WFH Options
Wallet in Telegram
balancers (we use Nginx/Traefik, AWS ELB/NLB) Skilled in container orchestration using Docker and Kubernetes Experience with CI/CD processes, specifically with GitLab Knowledge of observability tools like Prometheus/VictoriaMetrics, Grafana, and ELK/EKF/OpenSearch Experience with Infrastructure as Code (IaC) using Ansible and Terraform Scripting abilities in Shell and Python English proficiency More ❯
Slack bots and integrations) to streamline IT operations and business processes. Monitoring and Maintenance: Manage and maintain network security systems through system patches and periodic maintenance tasks. Establish comprehensive observability and proactive issue-resolution strategies using tools like SNMP, Syslog, Netflow, Elasticsearch (ELK Stack), and Grafana. Collaboration and Communication: Work with Cyber Energia teams to identify functional needs, develop secure More ❯
architectures , as described by thought leaders like Martin Fowler. Hands-on experience building and maintaining complex CI/CD pipelines , preferably with GitHub Actions . Familiarity with monitoring and observability tools (e.g., Prometheus, Grafana, Google Cloud's operations suite). A solid understanding of networking principles and cloud security best practices. Experience with other cloud platforms like Amazon Web Services More ❯
London, England, United Kingdom Hybrid / WFH Options
BBC
Code with AWS CDK , CloudFormation to provision and manage cloud environments. Build and maintain CI/CD pipelines using GitHub Actions , AWS CodePipeline , CodeBuild , Jenkins . Integrate monitoring and observability tools such as AWS CloudWatch, Prometheus , Grafana for infrastructure and model health tracking. Ensure software quality through Test-Driven Development (TDD) , unit testing frameworks (e.g., pytest , unittest ), and automated integration More ❯
London, England, United Kingdom Hybrid / WFH Options
9fin
as possible. Designing and implementing a developer portal (eg. Backstage), to provide a service catalog to the engineering team, and also author many other useful DevOps plugins. Contributing to observability best practices and providing key SLI/SLO metric reporting, so that the engineering team can balance velocity and reliability. Develop inner/open source projects to help provide a More ❯
automated deployments Familiarity with Helm charts Experience with Infrastructure as Code (IaC) tools like Terraform Knowledge of container build and deployment automation using CI/CD pipelines Experience in observability tools for both MSK and Kubernetes, including Prometheus, Grafana, and AWS CloudWatch for metrics and logs Deep understanding of Kafka and Kubernetes security practices, including network policies and IAM roles More ❯