networking security (NSGs, ASGs), and governance policies to ensure compliance and risk mitigation. Monitoring & Logging : Experience with Azure Monitor, Application Insights, Log Analytics, and Prometheus/Grafana for observability and performance monitoring. Scripting & Automation : Strong scripting skills in PowerShell, Bash, and Python , along with automation frameworks like Ansible . Collaboration More ❯
Kubernetes, Mesosphere etc. • Configuration Management – Ansible, Chef, Puppet etc. • Cloud – AWS preferred; multi clould experience ie with Azure, GCP etc. highly desirable • Monitoring – ELK, Prometheus, Splunk etc. • Experience in one of the following scripting language: Java, Bash, Python, Powershell, Golang, etc. • Experience working with Linux and/or Windows systems More ❯
e.g. AWS Lambda, Google Cloud Functions, Azure Functions) Containerisation technologies (e.g. Docker, Kubernetes, OpenShift) Tools for logging, monitoring, alerting and observability (e.g. ELK, Splunk, Prometheus, Grafana) Working knowledge of operating systems including CLI experience, deploying and configurating application or web servers We are currently operating a discretionary hybrid working model More ❯
Solid understanding of CI/CD pipelines and tools (e.g., Github CI, GitLab CI, CircleCI, Jenkins). Familiarity with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack). Strong scripting skills (e.g., Bash, Python) for automation tasks. Excellent problem-solving skills and attention to detail. Strong communication and collaboration More ❯
Solid understanding of CI/CD pipelines and tools (e.g., Github CI, GitLab CI, CircleCI, Jenkins). Familiarity with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack). Strong scripting skills (e.g., Bash, Python) for automation tasks. Excellent problem-solving skills and attention to detail. Strong communication and collaboration More ❯
in scripting and automation (e.g., Python, Bash, Go) for developing and maintaining automation tools and pipeline scripts. Experience with monitoring and observability tools (e.g., Prometheus, Grafana, Stackdriver) as they apply to monitoring pipeline performance and deployed application health signals for automation feedback. Solid understanding of security practices in CI/ More ❯
own end to end delivery of solutions. Expert knowledge of SRE fundamentals and a commitment to best practice Fluency with common observability tooling like Prometheus, Grafana, OTEL and Cloudwatch Experience analysing and building data telemetry, querying (PromQL), modelling, pipelines and dashboards to provide concise, focused insights and alerts for distributed More ❯
AWS cloud services to implement highly efficient architecture. Ability to analyze infrastructure and implement security best practices. Experience with infrastructure monitoring tools like Nagios, Prometheus, Grafana. Expertise in containerization platforms like Docker and container orchestration platforms like Kubernetes and Rancher. Familiarity with infrastructure as code tools such as Terraform, CloudFormation More ❯
Jira , and Confluence Eligible for SC and NPPV3 clearance Desirable: Container orchestration with Kubernetes HashiCorp tools: Vault , Consul , Packer Monitoring and observability with Grafana , Prometheus , or similar Familiarity with cloud networking, VPCs, NAT Gateways, security groups, etc. Personal Attributes: Proactive and self-driven with a passion for technology Strong problem More ❯
culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Hands-on AWS expertise (EKS, SQS More ❯
culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Hands-on AWS expertise (EKS, SQS More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Future Talent Group
culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Hands-on AWS expertise (EKS, SQS More ❯
East London, London, United Kingdom Hybrid / WFH Options
Future Talent Group
culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Hands-on AWS expertise (EKS, SQS More ❯
culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Hands-on AWS expertise (EKS, SQS More ❯
Central London / West End, London, United Kingdom Hybrid / WFH Options
Future Talent Group
culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Hands-on AWS expertise (EKS, SQS More ❯
Alerting: Implement comprehensive monitoring, logging, and alerting systems to proactively identify and address performance issues, errors, and security threats. Use tools like Azure Monitor, Prometheus, Grafana, or similar to collect and analyse metrics, logs, and traces. Configure alerts and notifications to ensure timely responses to critical events. Security & Compliance: Implement More ❯
Alerting: Implement comprehensive monitoring, logging, and alerting systems to proactively identify and address performance issues, errors, and security threats. Use tools like Azure Monitor, Prometheus, Grafana, or similar to collect and analyse metrics, logs, and traces. Configure alerts and notifications to ensure timely responses to critical events. Security & Compliance: Implement More ❯
in Observability, specifically experience working with one or more of the following tools - Kibana, Open-Search, Grafana, Datadog, Sumo Logic, New Relic, AppDynamics, Dynatrace, Prometheus, Logz.io, SignalFX, Instana, Splunk, Honeycomb, Jaeger Hands-on experience with Infrastructure as a Code (Terraform/Ansible) Hands-on experience in technical integrations (OpenTelemetry/ More ❯
in Observability, specifically experience working with one or more of the following tools - Kibana, Open-Search, Grafana, Datadog, Sumo Logic, New Relic, AppDynamics, Dynatrace, Prometheus, Logz.io, SignalFX, Instana, Splunk, Honeycomb, Jaeger Hands-on experience with Infrastructure as a Code (Terraform/Ansible) Hands-on experience in technical integrations (OpenTelemetry/ More ❯
on practice in Observability, specifically experience working with one or more of the following tools - Kibana, Open-Search, Grafana, Datadog, Sumologic, NewRelic, AppDynamics, Dynatrace, Prometheus, Logz.io, SignalFX, Instana, Splunk, Honeycomb, Jaeger Hands-on experience with Infrastructure as a Code (Terraform/Ansible) Hands-on experience in technical integrations (OpenTelemetry/ More ❯
in Java and C++. We use Airflow for workflow management, Kafka for data pipelines, Bitbucket for source control, Jenkins for continuous integration, Grafana + Prometheus for metrics collection, ELK for log shipping and monitoring, Docker and Kubernetes for containerisation, OpenStack for our private cloud, Ansible and Terraform for architecture automation More ❯
and QBRs. ✅ What You Bring: Strong hands-on experience with cloud platforms (AWS, GCP, Azure) and DevOps tooling Familiarity with observability stacks like Grafana, Prometheus, Datadog, Splunk, Kibana, etc. Experience with technical integrations (OpenTelemetry, Fluentd, Fluentbit, Filebeat, etc.) Skilled in troubleshooting Kubernetes and containerised environments Strong communication skills — able to More ❯
and QBRs. ✅ What You Bring: Strong hands-on experience with cloud platforms (AWS, GCP, Azure) and DevOps tooling Familiarity with observability stacks like Grafana, Prometheus, Datadog, Splunk, Kibana, etc. Experience with technical integrations (OpenTelemetry, Fluentd, Fluentbit, Filebeat, etc.) Skilled in troubleshooting Kubernetes and containerised environments Strong communication skills — able to More ❯
Implement best practices for cluster security, service mesh (e.g., Istio ), autoscaling, node pools, and network policies. Monitor and troubleshoot cluster performance using Cloud Operations , Prometheus/Grafana , and logging via Cloud Logging or ELK . Work closely with application teams to containerize workloads, optimize Dockerfiles, and ensure production readiness. Drive More ❯