or Argo Workflows) for containerized microservices, ML model training, and inference workloads. Integrate automated testing, security scans, and policy checks into the release process. Observability & Reliability Engineering Implement comprehensive monitoring, logging, and tracing stacks (Prometheus/Grafana, Loki, ELK, OpenTelemetry). Define SLOs/SLA dashboards; lead incident response, root More ❯
or Argo Workflows) for containerized microservices, ML model training, and inference workloads. Integrate automated testing, security scans, and policy checks into the release process. Observability & Reliability Engineering Implement comprehensive monitoring, logging, and tracing stacks (Prometheus/Grafana, Loki, ELK, OpenTelemetry). Define SLOs/SLA dashboards; lead incident response, root More ❯
framework to support the development and operations of web applications. Desirable Skills: Serverless & Microservices: Experience with AWS Lambda, Azure Functions, and event-driven architectures. Observability & Monitoring: Familiarity with monitoring tools like Splunk, Datadog, or New Relic for enhanced visibility and observability. Networking: Knowledge of VPCs, VPNs, and load balancing in More ❯
and governance policies to ensure compliance and risk mitigation. Monitoring & Logging : Experience with Azure Monitor, Application Insights, Log Analytics, and Prometheus/Grafana for observability and performance monitoring. Scripting & Automation : Strong scripting skills in PowerShell, Bash, and Python , along with automation frameworks like Ansible . Collaboration & Problem-Solving : Ability to More ❯
Develop a baseline monitoring and tooling concept for cloud to address the need for compliance infrastructure reporting within agile deliveries as part of our Observability strategy. Develop concepts and tools for chargeback and showback (Financial Instrumentation) in a multicloud context. Implement and mature a cloud forecasting and capacity management solution More ❯
available. We combine problem-solving skills with software and systems engineering to take a proactive approach in building fault-tolerant and secure systems, improving observability and zealously automating away toil. In this role you will: Use your site reliability expertise to design, operate and support Preqin's infrastructure, middleware and More ❯
GitLab CI/Jenkins) Automate deployments and monitoring for multiple environments Implement Infrastructure as Code using Terraform Manage containerised environments with Docker & Kubernetes Enhance observability with tools like Prometheus , Grafana , and Datadog Collaborate closely with developers, testers, and platform teams 🧰 Tech Stack You'll Use: Cloud: AWS (core services: EC2 More ❯
london, south east england, United Kingdom Hybrid / WFH Options
Premier Group
GitLab CI/Jenkins) Automate deployments and monitoring for multiple environments Implement Infrastructure as Code using Terraform Manage containerised environments with Docker & Kubernetes Enhance observability with tools like Prometheus , Grafana , and Datadog Collaborate closely with developers, testers, and platform teams 🧰 Tech Stack You'll Use: Cloud: AWS (core services: EC2 More ❯
secure applications and infrastructure Strong communication skills, with the ability to convey and or understand complex technical concepts clearly and concisely SRE skills including observability and telemetry monitoring HashiCorp Suite (Packer, Terraform, Vault, Vagrant, Consul) Containerisation using Docker, Kubernetes, OpenShift & Helm Programming skills using languages such as Python, Go, Java More ❯
tools, such as Terraform, CloudFormation, ARM, or Pulumi. Expertise in building secure applications and infrastructure, with strong knowledge of security practices. SRE skills, including observability and telemetry monitoring. Hands-on experience with the HashiCorp Suite (Packer, Terraform, Vault, Vagrant, Consul). Experience in containerisation using Docker, Kubernetes, OpenShift, and Helm. More ❯
tools, such as Terraform, CloudFormation, ARM, or Pulumi. Expertise in building secure applications and infrastructure, with strong knowledge of security practices. SRE skills, including observability and telemetry monitoring. Hands-on experience with the HashiCorp Suite (Packer, Terraform, Vault, Vagrant, Consul). Experience in containerisation using Docker, Kubernetes, OpenShift, and Helm. More ❯
balancers (F5, HAProxy, Nginx) and network monitoring tools. Experience in DNS management and troubleshooting. Experience in network security best practices. Proficiency in monitoring and observability tools (Prometheus, Grafana, Splunk). Proficiency in at least one scripting language (Python, Bash) for automation. Experience with CI/CD pipeline management and DevOps More ❯
or CloudFormation. Implement CI/CD pipelines, enabling continuous integration and continuous deployment for mission-critical applications. Monitor system performance, availability, and security, implementing observability best practices. Work in an Agile environment, engaging with stakeholders to understand requirements and deliver iterative improvements. Your skills and experience Essential: Experience deploying and More ❯
london, south east england, United Kingdom Hybrid / WFH Options
LHH
or CloudFormation. Implement CI/CD pipelines, enabling continuous integration and continuous deployment for mission-critical applications. Monitor system performance, availability, and security, implementing observability best practices. Work in an Agile environment, engaging with stakeholders to understand requirements and deliver iterative improvements. Your skills and experience Essential: Experience deploying and More ❯
level experience in AWS Networking/TCP/Firewalls/Certs Advanced proficiency with containers and container orchestration tools such as Docker and Kubernetes Observability champion, experience in designing and building monitoring and logging tools such as CloudWatch, ELK, and Grafana Strong scripting skills in Bash, JavaScript or similar Knowledge More ❯
london, south east england, United Kingdom Hybrid / WFH Options
Digital Skills ltd
level experience in AWS Networking/TCP/Firewalls/Certs Advanced proficiency with containers and container orchestration tools such as Docker and Kubernetes Observability champion, experience in designing and building monitoring and logging tools such as CloudWatch, ELK, and Grafana Strong scripting skills in Bash, JavaScript or similar Knowledge More ❯
high availability, optimal performance, and reliability across production and non-production environments. This includes working on incident response, capacity planning, WAN optimization, and system observability using tools like Prometheus and Grafana . Key Responsibilities: Administer and maintain Solace PubSub+ appliances and software brokers across environments (on-prem and cloud). More ❯
high availability, optimal performance, and reliability across production and non-production environments. This includes working on incident response, capacity planning, WAN optimization, and system observability using tools like Prometheus and Grafana . Key Responsibilities: Administer and maintain Solace PubSub+ appliances and software brokers across environments (on-prem and cloud). More ❯
of identity management systems such as Azure B2C or Okta. Familiarity with PostgreSQL or other relational databases at the infrastructure level. Operational experience with observability tools like Prometheus and Grafana. A security-conscious mindset and awareness of cloud architecture best practices. Interest in growing into a leadership or mentoring role More ❯
london, south east england, United Kingdom Hybrid / WFH Options
Albert Bow
of identity management systems such as Azure B2C or Okta. Familiarity with PostgreSQL or other relational databases at the infrastructure level. Operational experience with observability tools like Prometheus and Grafana. A security-conscious mindset and awareness of cloud architecture best practices. Interest in growing into a leadership or mentoring role More ❯
to enhance security, compliance, and incident response. This role provides access to cutting-edge cloud technologies, including AWS serverless computing, Kubernetes orchestration, AI-driven observability, and security automation, keeping you at the forefront of innovation. Your responsibilities: . Implement and manage highly available, scalable, and secure applications hosted on AWS More ❯
Cloud Automation & Tooling (SAT) Team drives automation, security, and compliance for Sovereign Cloud across AWS, Azure, and OpenStack, leveraging IaC, CI/CD, and observability and develops Operations Control Plane (OCP) which orchestrates provisioning, monitoring, and lifecycle management, integrating with our SAP internal tools like SPC, CRM, and cloud automation More ❯
end experience with React or similar frameworks is a plus. Collaborate with the team to implement, configure, and manage comprehensive monitoring, logging, alerting, and observability solutions - advocating for security best practices. Deploy, manage, operate, and scale applications and services on AWS - whilst troubleshooting performance issues across the stack. Collaborative, agile More ❯