Site Reliability Engineer
Site Reliability Engineer - JP Morgan - Bournemouth - 6 months contract - Hybrid (4 days in the office) - PAYE
We are seeking a Site Reliability Engineer with strong Kubernetes, Observability and Telemetry skills to join JP Morgan in Bournemouth on an initial 6 months contract to help with the build of a greenfield Kubernetes based app.
As a Site Reliability Engineer, you will architect and maintain highly reliable, scalable containerized platforms using Kubernetes and Docker. You will be responsible for building robust observability solutions, implementing SRE best practices, and ensuring optimal performance of cloud-native applications. Through infrastructure as code, container orchestration, and comprehensive monitoring strategies, you will drive reliability improvements and operational excellence. You are a significant contributor to your team by sharing your expertise in container platforms, observability tooling, and site reliability engineering principles.
What You'll Do
- Design, implement, and maintain scalable Kubernetes clusters and containerized deployments across multi-environment setups, ensuring high availability, security, and efficiency with Docker optimization.
- Build and enhance observability platforms using Prometheus, Grafana, Tempo, Dynatrace, and Splunk for comprehensive monitoring, alerting, telemetry, SLIs/SLOs, and proactive incident prevention.
- Implement and manage CI/CD pipelines and GitOps workflows with Jenkins, GitLab, ArgoCD, and Flux to automate deployments in Kubernetes environments.
- Develop infrastructure as code (IaC) using Terraform and Helm to automate provisioning, configuration, and reliability improvements like capacity planning and disaster recovery.
- Troubleshoot complex issues in containerized environments, covering networking, storage, resource management, application performance, and cross-team collaboration for SRE enhancements.
- Write automation scripts in Python, Go, or Bash; champion SRE best practices including error budgets, blameless postmortems, and chaos engineering to reduce toil and boost operational efficiency.
What You Bring
- Hold formal SRE training/certification with proficient hands-on experience applying SRE concepts in Kubernetes environments.
- Demonstrate strong Kubernetes expertise, including cluster administration, workload management, CNI networking, CSI storage, and security policies.
- Excel in Docker containerization, optimizing Dockerfiles, managing registries, and implementing security best practices.
- Possess deep observability and telemetry expertise with Prometheus, Grafana, Tempo, Dynatrace/Splunk for dashboards, alerts, and SLO-based monitoring.
- Experienced in CI/CD pipelines, GitOps (Jenkins, GitLab, ArgoCD, Flux/Spinnaker), and IaC (Terraform, Helm, Kustomize/CloudFormation).
- Proficient in cloud platforms (AWS EKS, Azure AKS, GCP GKE), advanced networking (DNS, load balancing, Istio/Linkerd, ingress/network policies), Python/Go/Bash Scripting, proactive problem-solving, and cross-team collaboration.
What's Next
If you are ready to take the next step, apply now! Successful applicants will be contacted directly by a recruiter to discuss the role more.
We are committed to creating an inclusive recruitment experience. If you require support or adjustments to the recruitment process, our Adjustment Concierge Service is here to help. Please feel free to contact us at (see below) to discuss how we can support you.
This position is being recruited on behalf of our client through our Outsourcing service line. Resource Solutions Limited, trading as Robert Walters, acts as an employment business and agency, partnering with top organizations to help them find the best talent. We welcome applications from all candidates and are committed to providing equal opportunities.