or Argo Workflows) for containerized microservices, ML model training, and inference workloads. Integrate automated testing, security scans, and policy checks into the release process. Observability & Reliability Engineering Implement comprehensive monitoring, logging, and tracing stacks (Prometheus/Grafana, Loki, ELK, OpenTelemetry). Define SLOs/SLA dashboards; lead incident response, root More ❯
or Argo Workflows) for containerized microservices, ML model training, and inference workloads. Integrate automated testing, security scans, and policy checks into the release process. Observability & Reliability Engineering Implement comprehensive monitoring, logging, and tracing stacks (Prometheus/Grafana, Loki, ELK, OpenTelemetry). Define SLOs/SLA dashboards; lead incident response, root More ❯
and governance policies to ensure compliance and risk mitigation. Monitoring & Logging : Experience with Azure Monitor, Application Insights, Log Analytics, and Prometheus/Grafana for observability and performance monitoring. Scripting & Automation : Strong scripting skills in PowerShell, Bash, and Python , along with automation frameworks like Ansible . Collaboration & Problem-Solving : Ability to More ❯
level experience in AWS Networking/TCP/Firewalls/Certs Advanced proficiency with containers and container orchestration tools such as Docker and Kubernetes Observability champion, experience in designing and building monitoring and logging tools such as CloudWatch, ELK, and Grafana Strong scripting skills in Bash, JavaScript or similar Knowledge More ❯
london, south east england, United Kingdom Hybrid / WFH Options
Digital Skills ltd
level experience in AWS Networking/TCP/Firewalls/Certs Advanced proficiency with containers and container orchestration tools such as Docker and Kubernetes Observability champion, experience in designing and building monitoring and logging tools such as CloudWatch, ELK, and Grafana Strong scripting skills in Bash, JavaScript or similar Knowledge More ❯
high availability, optimal performance, and reliability across production and non-production environments. This includes working on incident response, capacity planning, WAN optimization, and system observability using tools like Prometheus and Grafana . Key Responsibilities: Administer and maintain Solace PubSub+ appliances and software brokers across environments (on-prem and cloud). More ❯
high availability, optimal performance, and reliability across production and non-production environments. This includes working on incident response, capacity planning, WAN optimization, and system observability using tools like Prometheus and Grafana . Key Responsibilities: Administer and maintain Solace PubSub+ appliances and software brokers across environments (on-prem and cloud). More ❯
available. We combine problem-solving skills with software and systems engineering to take a proactive approach in building fault-tolerant and secure systems, improving observability and zealously automating away toil. In this role you will: Use your site reliability expertise to design, operate and support Preqin's infrastructure, middleware and More ❯
tools, such as Terraform, CloudFormation, ARM, or Pulumi. Expertise in building secure applications and infrastructure, with strong knowledge of security practices. SRE skills, including observability and telemetry monitoring. Hands-on experience with the HashiCorp Suite (Packer, Terraform, Vault, Vagrant, Consul). Experience in containerisation using Docker, Kubernetes, OpenShift, and Helm. More ❯
tools, such as Terraform, CloudFormation, ARM, or Pulumi. Expertise in building secure applications and infrastructure, with strong knowledge of security practices. SRE skills, including observability and telemetry monitoring. Hands-on experience with the HashiCorp Suite (Packer, Terraform, Vault, Vagrant, Consul). Experience in containerisation using Docker, Kubernetes, OpenShift, and Helm. More ❯
Cloud Automation & Tooling (SAT) Team drives automation, security, and compliance for Sovereign Cloud across AWS, Azure, and OpenStack, leveraging IaC, CI/CD, and observability and develops Operations Control Plane (OCP) which orchestrates provisioning, monitoring, and lifecycle management, integrating with our SAP internal tools like SPC, CRM, and cloud automation More ❯
end experience with React or similar frameworks is a plus. Collaborate with the team to implement, configure, and manage comprehensive monitoring, logging, alerting, and observability solutions - advocating for security best practices. Deploy, manage, operate, and scale applications and services on AWS - whilst troubleshooting performance issues across the stack. Collaborative, agile More ❯
or CloudFormation. Implement CI/CD pipelines, enabling continuous integration and continuous deployment for mission-critical applications. Monitor system performance, availability, and security, implementing observability best practices. Work in an Agile environment, engaging with stakeholders to understand requirements and deliver iterative improvements. Your skills and experience Essential: Experience deploying and More ❯
london, south east england, United Kingdom Hybrid / WFH Options
LHH
or CloudFormation. Implement CI/CD pipelines, enabling continuous integration and continuous deployment for mission-critical applications. Monitor system performance, availability, and security, implementing observability best practices. Work in an Agile environment, engaging with stakeholders to understand requirements and deliver iterative improvements. Your skills and experience Essential: Experience deploying and More ❯
streamline IT operations and business processes. Monitoring and Maintenance: Manage and maintain network security systems through system patches and periodic maintenance tasks. Establish comprehensive observability and proactive issue-resolution strategies using tools like SNMP, Syslog, Netflow, Elasticsearch (ELK Stack), and Grafana. Collaboration and Communication: Work with CyberEnergiateams to identify functional More ❯
frameworks (e.g., Hibernate), messaging tools (Kafka, Kinesis, Redis), and cloud infrastructure technologies (AWS, Docker, Kubernetes, Terraform). Strong understanding of CI/CD pipelines, observability tools (e.g., DataDog), and Agile and Lean methodologies. Demonstrated ability to adapt to new technologies, align technical decisions with business goals, and champion quality engineering More ❯
Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Hands-on AWS expertise More ❯
Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Hands-on AWS expertise More ❯
Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Hands-on AWS expertise More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Future Talent Group
Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Hands-on AWS expertise More ❯
East London, London, United Kingdom Hybrid / WFH Options
Future Talent Group
Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Hands-on AWS expertise More ❯
Central London / West End, London, United Kingdom Hybrid / WFH Options
Future Talent Group
Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Hands-on AWS expertise More ❯
london, south east england, United Kingdom Hybrid / WFH Options
Future Talent Group
Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Hands-on AWS expertise More ❯