London, England, United Kingdom Hybrid / WFH Options
Nordcloud group
languages such as C#, Python, Perl, Java, C++. Experience with CI/CD tools like Azure DevOps, GitHub Actions, GitLab, Jenkins, TeamCity. Scripting skills in PowerShell, Bash. Familiarity with observability and monitoring tools such as Prometheus, Grafana, Splunk. Experience with containerization tools like Docker, Kubernetes, OpenShift, EC2 containers. Analytical and creative problem-solving skills. We encourage you to apply, even More ❯
networking concepts such as IP addressing, subnetting, VLAN configuration, and their practical application in Linux-based environments Scripting & Automation : Proficiency in Python and bash scripting; experience with Ansible required. Observability : Experience with monitoring and log aggregation tools (e.g., Prometheus, Grafana, ELK). DevOps Tooling : Experience with Terraform, Git, CI/CD, and infrastructure-as-code practices. Problem Solving : Proven ability More ❯
London, England, United Kingdom Hybrid / WFH Options
Nordcloud
Patterns for Development Programming languages, such as C#, Python, Perl, Java, C++ CICD tools such as Azure DevOps, GitHub Actions, Gitlab, Jenkins, TeamCity Scripting languages such as PowerShell, bash Observability/Monitoring: Prometheus, Grafana, Splunk Containerisation tools such as Docker, K8S, OpenShift, EC, containers Analytical and creative approach to problem solving We encourage you to apply , even if you don More ❯
London, England, United Kingdom Hybrid / WFH Options
Parity Technologies
Excellence : Contribute to Parity’s blockchain node operations, improving the reliability of the Polkadot network by managing test and benchmark networks in the cloud and on-prem. Enhance our observability initiatives by operating mainnet nodes for the Polkadot and Kusama Relaychain and System parachains, gathering crucial operational data for monitoring and incident response. Infrastructure Solutions : Conceptualize and build innovative infrastructure More ❯
remote teams and distributed delivery models Additional skills that are a plus: Programming languages such as Scala, Rust, Go, Angular, React, Kotlin Database management with PostgreSQL Experience with ElasticSearch, observability tools like Grafana and Prometheus What this role can offer Opportunity to deepen understanding of AI and Data Science applications Mentorship and support from colleagues to apply your talents Career More ❯
Experience working in Agile teams using tools like Git , Jira , and Confluence Eligible for SC and NPPV3 clearance Container orchestration with Kubernetes HashiCorp tools: Vault , Consul , Packer Monitoring and observability with Grafana , Prometheus , or similar Familiarity with cloud networking, VPCs, NAT Gateways, security groups, etc. Personal Attributes: Proactive and self-driven with a passion for technology Strong problem-solving mindset More ❯
recognize road blocks and demonstrates interest in learning technology that facilitates innovation Experience with continuous integration and continuous delivery tools like Jenkins, GitLab, Terraform Experience in at least one observability tool such as Dynatrace, Datadog, New Relic, CloudWatch, AppDynamics, Splunk., Qualification Experience a plus in common SRE toolchains: Grafana, Prometheus, Elasticsearch, Kibana, Jaeger. About Us J.P. Morgan is a global More ❯
teams to build secure, scalable, and cost-efficient cloud solutions. You will be provided with access to cutting-edge cloud technologies, including AWS serverless computing, Kubernetes orchestration, AI-driven observability, and security automation, keeping you at the forefront of innovation. Your responsibilities: Implement and manage highly available, scalable, and secure applications hosted on AWS Cloud, leveraging multi-region deployment strategies More ❯
London, England, United Kingdom Hybrid / WFH Options
Capgemini
using tools such as Terraform or CloudFormation. • Implement CI/CD pipelines, enabling continuous integration and continuous deployment for mission-critical applications. • Monitor system performance, availability, and security, implementing observability best practices. • Work in an Agile environment, engaging with stakeholders to understand requirements and deliver iterative improvements. You can bring your whole self to work. At Capgemini building an inclusive More ❯
IT workflows. Your responsibilities will also include developing CI/CD pipelines tailored for IT infrastructure, enhancing deployment efficiency, and integrating robust network security measures. You will establish comprehensive observability and proactive issue resolution strategies. We are seeking individuals passionate about network automation, security, and scalable IT solutions that enhance both campus and cloud network operations. You should possess extensive More ❯
or DevOps Expertise in microservices and API design Docker, and container runtime platforms such as Kubernetes, EKS, ECS etc Strong understand of operational concepts on AWS, particularly monitoring and observability, FinOps Utilising CI/CD tools, such as Bamboo, Jenkins, TeamCity, Bitbucket, in order to streamline delivery of new features and fixes Continual testing of code using Automated Testing Frameworks More ❯
or DevOps Expertise in microservices and API design Docker, and container runtime platforms such as Kubernetes, EKS, ECS etc Strong understand of operational concepts on AWS, particularly monitoring and observability, FinOps Utilising CI/CD tools, such as Bamboo, Jenkins, TeamCity, Bitbucket, in order to streamline delivery of new features and fixes Continual testing of code using Automated Testing Frameworks More ❯
CI/CD pipelines, infrastructure as code (IaC), and automated testing. Experience with industry-standard monitoring tools (ITRS or similar) Proficiency in managing Kubernetes clusters, including deployment, scaling, storage, observability, and lifecycle management Understanding of financial regulations and reporting requirements in Europe such as MiFID II Person Profile The role will suit someone who relishes the prospect of supporting an More ❯
the firm. You will also be responsible for designing, monitoring, and scaling platform infrastructure, and automating everyday tasks with scripting and configuration management tools. As a member of the Observability stream, you will be responsible for ensuring the high quality of the monitoring, tracing and alerting infrastructure in the company. You will have the opportunity to shape both the architecture … similar discipline. 4+ years of experience in DevOps, SRE, or platform engineering roles. Experience with software development (at least Python, Git - Golang or Rust additionally appreciated) Experience with an observability stack such as Prometheus, VictoriaMetrics, Vector, Elastic stack, Grafana, and AlertManager. Experience with operating highly distributed applications at scale in Kubernetes. Experience with system administration and troubleshooting (Bash, Linux, Containerization More ❯
Tuesdays, Thursdays WFH) Pay: negotiable, inside IR35 We're looking for an experienced DevOps Engineer to join our team on a contract basis, with a focus on AWS infrastructure, observability tooling, and CI/CD automation. This is a hands-on role supporting high-availability systems, rapid deployments, and production incident response. Key Responsibilities - Manage and monitor AWS infrastructure for … performance and security - Respond to production incidents, perform root cause analysis, and implement fixes - Maintain observability tools (Prometheus, Grafana, Splunk) and write PromQL queries - Improve and operate CI/CD pipelines using GitHub Actions and Kubernetes - Automate infrastructure tasks with Python, Bash, Go or SQL - Work with Git-based workflows for infrastructure as code - Troubleshoot Kubernetes workloads and containerised services More ❯
AKA "Mr America" Technical Account Manager - DevOps Specialist London - Hybrid (2 days per week in office) · Full-time · Senior About the company My client are rebuilding the path to observability using a real-time streaming analytics pipeline that provides monitoring, visualization, and alerting capabilities without the burden of indexing. By enabling users to define different data pipelines per use case … we provide deep Observability and Security insights, at an infinite scale, for less than half the cost. About the Position Technical Account Managers in my client are key in our effort to meet our customer’s expectations and help them utilize their observability and security data in the most efficient way possible. We are looking for hard-working, sharp, and … humble professionals with proven technical customer-facing experience. Their Technical Account Managers are trusted advisors and consult their customers upon their monitoring, security & observability journey. This role embodies the critical intersection of very high technical expertise and a focus on customer satisfaction, renewal and expansion. Technical Account Managers are senior-level roles and are expected to professionally and accurately solve More ❯
reliability of cloud and hybrid infrastructure powering some of the most critical client-facing applications in financial services. You will be the strategic and operational leader for platform reliability, observability, incident response, CI/CD modernisation, and developer productivity. Why Join SS&C GIDS? Lead mission-critical infrastructure for a globally recognised financial technology provider. Influence the technical direction of … and services. Implement a comprehensive incident management lifecycle (on-call, escalation, RCA, blameless postmortems). Reduce Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR) through automated observability, alerting, and playbooks. CI/CD and Platform Engineering Oversee the development and evolution of CI/CD pipelines for all GIDS products using GitHub Actions, ArgoCD, TeamCity, Octopus Deploy … and GitOps principles. Integrate static and dynamic code analysis, vulnerability scanning, artifact promotion, and release gating into the SDLC. Ensure pipeline scalability and governance while maintaining developer velocity. Observability & Troubleshooting Lead the implementation and usage of modern observability stacks (e.g., OpenTelemetry, Prometheus, Grafana, Splunk, Datadog). Establish SLOs, SLIs, and error budgets with product and engineering teams. Drive root cause More ❯
Site Reliability Engineer (SRE) who will be responsible for ensuring the reliability, scalability, and performance of our cloud infrastructure and applications. This role focuses on maintaining and improving system observability, automating operations, and enhancing deployment practices to support business-critical services. Reporting directly to the Lead Site Reliability Engineer, you will be expected to work independently while collaborating closely with … learning and improving performance based on set targets will be expected. VARIED DAY TO DAY RESPONSIBILITIES Ensuring system reliability, performance, and scalability through monitoring and automation Building and maintaining observability solutions using Grafana, Prometheus, Loki, OpenTelemetry Proactively identifying and resolving performance bottlenecks and infrastructure issues Automating infrastructure provisioning, configuration management, and deployments Implementing effective logging, monitoring, and alerting strategies Managing … efficiency WHAT ARE WE LOOKING FOR IN A CANDIDATE? Experience with SRE principles, such as incident management, error budgets, and service-level objectives (SLOs) Experience designing and implementing robust observability, monitoring and logging solutions Strong proficiency with observability and monitoring tools such as Grafana, Prometheus, and Loki Strong experience with distributed tracing and telemetry tools such as OpenTelemetry An understanding More ❯
testers and operations to automate builds, deployment and release of applications running in the cloud and on-premise Provide guidance on industry best practices for software deployment, development, and observability Engineer tooling to implement those practices Assist and architect where appropriate solutions using containerisation and serverless technologies Drive automation for environment management, logging and monitoring Engage with vendors and service … stack CI/CD, GitLab, Jenkins, Sonatype Nexus Knowledge and working experience of containerising application components including writing DockerFiles and deploying to Kubernetes Deep understanding of pipelines as code Observability concepts and tooling; Opensearch, Cribl, Grafana, Prometheus, CloudWatch #J-18808-Ljbffr More ❯
reliability of cloud and hybrid infrastructure powering some of the most critical client-facing applications in financial services. You will be the strategic and operational leader for platform reliability, observability, incident response, CI/CD modernisation, and developer productivity. Why Join SS&C GIDS? Lead mission-critical infrastructure for a globally recognised financial technology provider. Influence the technical direction of … and services. Implement a comprehensive incident management lifecycle (on-call, escalation, RCA, blameless postmortems). Reduce Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR) through automated observability, alerting, and playbooks. CI/CD and Platform Engineering Oversee the development and evolution of CI/CD pipelines for all GIDS products using GitHub Actions, ArgoCD, TeamCity, Octopus Deploy … and GitOps principles. Integrate static and dynamic code analysis, vulnerability scanning, artifact promotion, and release gating into the SDLC. Ensure pipeline scalability and governance while maintaining developer velocity. Observability & Troubleshooting Lead the implementation and usage of modern observability stacks (e.g., OpenTelemetry, Prometheus, Grafana, Splunk, Datadog). Establish SLOs, SLIs, and error budgets with product and engineering teams. Drive root cause More ❯
London, England, United Kingdom Hybrid / WFH Options
Stott and May
Tuesdays, Thursdays WFH) Pay: negotiable, inside IR35 We're looking for an experienced DevOps Engineer to join our team on a contract basis, with a focus on AWS infrastructure, observability tooling, and CI/CD automation. This is a hands-on role supporting high-availability systems, rapid deployments, and production incident response. Key Responsibilities Manage and monitor AWS infrastructure for … performance and security Respond to production incidents, perform root cause analysis, and implement fixes Maintain observability tools (Prometheus, Grafana, Splunk) and write PromQL queries Improve and operate CI/CD pipelines using GitHub Actions and Kubernetes Automate infrastructure tasks with Python, Bash, Go or SQL Work with Git-based workflows for infrastructure as code Troubleshoot Kubernetes workloads and containerised services More ❯
throughput applications Develop and refine automation solutions using Ansible, Python, and Terraform Troubleshoot hardware, networking, and performance issues in various environments Deploy monitoring and log aggregation tools to improve observability Collaborate with teams to identify bottlenecks and deploy scalable, automated solutions What We're Looking For: 6+ years of Linux system administration and engineering experience in performance-critical environments Proficiency … in Python and bash Scripting, with hands-on Ansible experience Familiarity with observability tools like Prometheus, Grafana, and ELK Infrastructure-as-code experience with Terraform and CI/CD pipelines Proven ability to resolve complex system-level issues and performance challenges Knowledge of container orchestration tools (Docker/containers, Kubernetes) Experience with GPU server deployments Exposure to AWS services and More ❯
and scalability of a real-time trading environment used by both internal and external clients. While production support remains an important aspect, this position is heavily weighted toward improving observability, driving proactive engineering practices, and developing tooling to eliminate repetitive manual tasks. You'll collaborate closely with developers, traders, and global colleagues to make meaningful changes to how the environment … is monitored, managed, and scaled. Key Responsibilities: Lead the development of automation and monitoring solutions to improve system resilience and eliminate recurring manual work Own and evolve observability practices using tools like Prometheus, Grafana, Splunk, Geneos, Corvil, etc. Engage directly with senior traders and engineers to troubleshoot complex trading system issues and improve end-to-end workflows Drive post-incident More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Vertus Partners
and scalability of a real-time trading environment used by both internal and external clients. While production support remains an important aspect, this position is heavily weighted toward improving observability, driving proactive engineering practices, and developing tooling to eliminate repetitive manual tasks. You'll collaborate closely with developers, traders, and global colleagues to make meaningful changes to how the environment … is monitored, managed, and scaled. Key Responsibilities: Lead the development of automation and monitoring solutions to improve system resilience and eliminate recurring manual work Own and evolve observability practices using tools like Prometheus, Grafana, Splunk, Geneos, Corvil, etc. Engage directly with senior traders and engineers to troubleshoot complex trading system issues and improve end-to-end workflows Drive post-incident More ❯
CI/CD pipelines (Jenkins, GitHub Actions) Define and enforce platform standards across environments (dev, staging, prod) Collaborate with developers and DevOps on deployment tooling and security Enable platform observability using tools like Datadog, Prometheus, and CloudWatch Maintain Helm charts and Terraform modules for shared infrastructure Contribute to onboarding documentation and platform adoption practices Participate in incident response and postmortem … containerisation using Docker and secure image management Scripting or programming experience in Bash, Python, or TypeScript Strong understanding of GitOps practices and infrastructure lifecycle management Desirable Skills Experience with observability tooling (Datadog, Prometheus, Fluent Bit) Knowledge of admission controllers, OPA/Gatekeeper (optional for governance) Familiarity with cloud cost optimisation and Kubernetes scaling strategies Exposure to security scanning tools (tfsec More ❯