automation. Strong knowledge of CI/CD tooling, IaC, and cloud-native technologies. Advanced scripting (Bash, Python) and automation experience. Skilled in monitoring and observability tools (e.g., Prometheus, Grafana, ELK). Strong problem-solving, communication, and leadership skills. Familiarity and Experience of CI/CD Tools: Jenkins, GitLab CI Infrastructure More ❯
Jenkins, Drone) Serverless technologies (e.g. AWS Lambda, Google Cloud Functions, Azure Functions) Containerisation technologies (e.g. Docker, Kubernetes, OpenShift) Tools for logging, monitoring, alerting and observability (e.g. ELK, Splunk, Prometheus, Grafana) Working knowledge of operating systems including CLI experience, deploying and configurating application or web servers We are currently operating a More ❯
and governance policies to ensure compliance and risk mitigation. Monitoring & Logging : Experience with Azure Monitor, Application Insights, Log Analytics, and Prometheus/Grafana for observability and performance monitoring. Scripting & Automation : Strong scripting skills in PowerShell, Bash, and Python , along with automation frameworks like Ansible . Collaboration & Problem-Solving : Ability to More ❯
Liverpool, Lancashire, United Kingdom Hybrid / WFH Options
Acorn Group
Flux). CI/CD - Skilled in building and managing pipelines using Azure DevOps, GitHub Actions, etc. Monitoring - Experience with Prometheus, Grafana, and other observability tools. Application Stack - Familiarity with .NET, Node.js, React, and web server technologies like Nginx. Relevant certifications or the ability to demonstrate equivalent experience, such as More ❯
deployment targets. Proficiency in scripting and automation (e.g., Python, Bash, Go) for developing and maintaining automation tools and pipeline scripts. Experience with monitoring and observability tools (e.g., Prometheus, Grafana, Stackdriver) as they apply to monitoring pipeline performance and deployed application health signals for automation feedback. Solid understanding of security practices More ❯
Develop a baseline monitoring and tooling concept for cloud to address the need for compliance infrastructure reporting within agile deliveries as part of our Observability strategy. Develop concepts and tools for chargeback and showback (Financial Instrumentation) in a multicloud context. Implement and mature a cloud forecasting and capacity management solution More ❯
CD Pipeline Development: Develop and maintain robust CI/CD pipelines for continuous integration and deployment of ML models and related infrastructure Monitoring and Observability: Build and maintain comprehensive monitoring and alerting systems for our ML infrastructure and models, leveraging tools like DataDog to ensure system health and performance Collaboration More ❯
tools like Git , Jira , and Confluence Eligible for SC and NPPV3 clearance Desirable: Container orchestration with Kubernetes HashiCorp tools: Vault , Consul , Packer Monitoring and observability with Grafana , Prometheus , or similar Familiarity with cloud networking, VPCs, NAT Gateways, security groups, etc. Personal Attributes: Proactive and self-driven with a passion for More ❯
london, south east england, United Kingdom Hybrid / WFH Options
Amber Labs
tools like Git , Jira , and Confluence Eligible for SC and NPPV3 clearance Desirable: Container orchestration with Kubernetes HashiCorp tools: Vault , Consul , Packer Monitoring and observability with Grafana , Prometheus , or similar Familiarity with cloud networking, VPCs, NAT Gateways, security groups, etc. Personal Attributes: Proactive and self-driven with a passion for More ❯
slough, south east england, United Kingdom Hybrid / WFH Options
Amber Labs
tools like Git , Jira , and Confluence Eligible for SC and NPPV3 clearance Desirable: Container orchestration with Kubernetes HashiCorp tools: Vault , Consul , Packer Monitoring and observability with Grafana , Prometheus , or similar Familiarity with cloud networking, VPCs, NAT Gateways, security groups, etc. Personal Attributes: Proactive and self-driven with a passion for More ❯
activities Awareness of any cloud infrastructure principles (like AWS, GCP or OCI), understanding basic principles of secure software delivery is a plus Familiar with Observability tools like Grafana or Prometheus, understanding the importance of giving the correct visibility to our platforms and environments We highly value ownership and initiative with More ❯
create CI/CD pipelines for everything Maintain end-to-end security, ensuring projects meet best practices and Thomson Reuters standards Maintain and grow observability and monitor all aspects of our infrastructure Work closely with product, development, operation and support teams; Guide them towards best practices, share knowledge, and improve More ❯
bradford, yorkshire and the humber, United Kingdom Hybrid / WFH Options
Fruition Group
a small team of engineers Align DevOps capabilities with the wider business Champion DevEx, reliability, and security Embed operational excellence and incident response Promote observability and performance optimisation Lead DevOps Engineer Requirements Proven technical and some leader/mentoring experience Cloud-native expertise (any cloud provider is fine: GCP, AWS More ❯
leeds, west yorkshire, yorkshire and the humber, United Kingdom Hybrid / WFH Options
Fruition Group
a small team of engineers Align DevOps capabilities with the wider business Champion DevEx, reliability, and security Embed operational excellence and incident response Promote observability and performance optimisation Lead DevOps Engineer Requirements Proven technical and some leader/mentoring experience Cloud-native expertise (any cloud provider is fine: GCP, AWS More ❯
tools, such as Terraform, CloudFormation, ARM, or Pulumi. Expertise in building secure applications and infrastructure, with strong knowledge of security practices. SRE skills, including observability and telemetry monitoring. Hands-on experience with the HashiCorp Suite (Packer, Terraform, Vault, Vagrant, Consul). Experience in containerisation using Docker, Kubernetes, OpenShift, and Helm. More ❯
tools, such as Terraform, CloudFormation, ARM, or Pulumi. Expertise in building secure applications and infrastructure, with strong knowledge of security practices. SRE skills, including observability and telemetry monitoring. Hands-on experience with the HashiCorp Suite (Packer, Terraform, Vault, Vagrant, Consul). Experience in containerisation using Docker, Kubernetes, OpenShift, and Helm. More ❯
tools, such as Terraform, CloudFormation, ARM, or Pulumi. Expertise in building secure applications and infrastructure, with strong knowledge of security practices. SRE skills, including observability and telemetry monitoring. Hands-on experience with the HashiCorp Suite (Packer, Terraform, Vault, Vagrant, Consul). Experience in containerisation using Docker, Kubernetes, OpenShift, and Helm. More ❯
high availability, optimal performance, and reliability across production and non-production environments. This includes working on incident response, capacity planning, WAN optimization, and system observability using tools like Prometheus and Grafana . Key Responsibilities: Administer and maintain Solace PubSub+ appliances and software brokers across environments (on-prem and cloud). More ❯
high availability, optimal performance, and reliability across production and non-production environments. This includes working on incident response, capacity planning, WAN optimization, and system observability using tools like Prometheus and Grafana . Key Responsibilities: Administer and maintain Solace PubSub+ appliances and software brokers across environments (on-prem and cloud). More ❯
high availability, optimal performance, and reliability across production and non-production environments. This includes working on incident response, capacity planning, WAN optimization, and system observability using tools like Prometheus and Grafana . Key Responsibilities: Administer and maintain Solace PubSub+ appliances and software brokers across environments (on-prem and cloud). More ❯
high availability, optimal performance, and reliability across production and non-production environments. This includes working on incident response, capacity planning, WAN optimization, and system observability using tools like Prometheus and Grafana . Key Responsibilities: Administer and maintain Solace PubSub+ appliances and software brokers across environments (on-prem and cloud). More ❯
engineering team to support microservices architecture, with focus on latency-sensitive and high-availability services. ● Monitor system performance, conduct root cause analysis, and implement observability best practices (metrics, logging, tracing). ● Harden infrastructure and deployments with infrastructure as code (Terraform/CDK/CloudFormation). ● Lead incident response, system reliability More ❯
engineering team to support microservices architecture, with focus on latency-sensitive and high-availability services. ● Monitor system performance, conduct root cause analysis, and implement observability best practices (metrics, logging, tracing). ● Harden infrastructure and deployments with infrastructure as code (Terraform/CDK/CloudFormation). ● Lead incident response, system reliability More ❯
slough, south east england, United Kingdom Hybrid / WFH Options
Aimhire
engineering team to support microservices architecture, with focus on latency-sensitive and high-availability services. ● Monitor system performance, conduct root cause analysis, and implement observability best practices (metrics, logging, tracing). ● Harden infrastructure and deployments with infrastructure as code (Terraform/CDK/CloudFormation). ● Lead incident response, system reliability More ❯
london, south east england, United Kingdom Hybrid / WFH Options
Aimhire
engineering team to support microservices architecture, with focus on latency-sensitive and high-availability services. ● Monitor system performance, conduct root cause analysis, and implement observability best practices (metrics, logging, tracing). ● Harden infrastructure and deployments with infrastructure as code (Terraform/CDK/CloudFormation). ● Lead incident response, system reliability More ❯