Manchester, Lancashire, United Kingdom Hybrid / WFH Options
Smart DCC
Develop automated test suites for data pipelines, ensuring data quality and transformation integrity. Monitoring & Performance Optimization: Monitor data pipelines with tools like Prometheus and Datadog to ensure optimal performance and health. Proactively implement anomaly detection and optimize system performance and resource allocation. Collaborate with cross-functional teams to align DataOps More ❯
leeds, west yorkshire, yorkshire and the humber, United Kingdom Hybrid / WFH Options
KPMG UK
AWS, Azure, GCP) Knowledge of Database systems and models. Ability to use wide variety of open-source technologies. Experience with logging/monitoring tools (DataDog, StackDriver, Prometheus etc), Knowledge of test automation frameworks. To discuss this or wider Technology roles with our recruitment team, all you need to do is More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
Arm Limited
issues in server and cloud contexts with a deep understanding of computer architecture. Familiarity with performance profiling and monitoring tools (e.g., VTune, New Relic, Datadog, AppDynamics, Grafana, Prometheus). A passion and history of writing and sharing technical knowledge. Proficient in both high and low level programming preferably with C++ More ❯
Job Title: Senior Site Reliability Engineer (SRE) Location: Leeds (Hybrid - c. 1-2 days per week) Salary: £60,000 - £80,000 + benefits Why Apply? This is a fantastic opportunity for a seasoned Senior Site Reliability Engineer to take a More ❯
leeds, west yorkshire, yorkshire and the humber, United Kingdom Hybrid / WFH Options
Fruition Group
Job Title: Senior Site Reliability Engineer (SRE) Location: Leeds (Hybrid - c. 1-2 days per week) Salary: £60,000 - £80,000 + benefits Why Apply? This is a fantastic opportunity for a seasoned Senior Site Reliability Engineer to take a More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
Embarcaderomediagroup
security scanning and progressive delivery Supporting AKS clusters and Azure services (SQL, Cosmos DB, ADF, Functions, Logic Apps, etc.) Improving monitoring and alerting with Datadog, Grafana, ELK, and proactive failure detection Participating in the on-call rota and leading incident response workflows and blameless postmortems Coaching engineers, upskilling teams, and … Code skills with Terraform (v1.7+) Experience with CI/CD pipelines, GitOps, and automation tools (PowerShell, Bash) Familiarity with observability and incident tools like Datadog, ELK, and synthetic monitoring Solid understanding of networking (TCP/IP, Load Balancing, DNS, Routing) Good knowledge of DevSecOps practices - including security scanning, IAM, and … Trivy, tfsec) integrated into pipelines A proactive approach to problem-solving, documentation, and coaching Additional bonus skills include experience with Azure governance tools, advanced Datadog capabilities, Kubernetes autoscaling solutions, GitOps workflows, automated cost dashboards, compliance frameworks, and internal platform development. What You Can Expect: Competitive salary More ❯
using alert systems like BigPanda, taking ownership of alerts and troubleshooting them with the aid of Runbooks and SOPs. Basic knowledge in Splunk and Datadog complements their ability to analyze and monitor system performance, while understanding API concepts and applications enhances their technical capabilities. Their ITIL experience covers Incident Management …/or other event management systems/taking ownership of alerts and troubleshooting them using Runbook’s/SOPs Basic Knowledge in Splunk/Datadog Basic Knowledge in API concepts and applications ITIL knowledge Preferred Experience/Education: Three to five years or relevant experience A degree from a four More ❯
liverpool, north west england, United Kingdom Hybrid / WFH Options
Outsource UK
using alert systems like BigPanda, taking ownership of alerts and troubleshooting them with the aid of Runbooks and SOPs. Basic knowledge in Splunk and Datadog complements their ability to analyze and monitor system performance, while understanding API concepts and applications enhances their technical capabilities. Their ITIL experience covers Incident Management …/or other event management systems/taking ownership of alerts and troubleshooting them using Runbook’s/SOPs Basic Knowledge in Splunk/Datadog Basic Knowledge in API concepts and applications ITIL knowledge Preferred Experience/Education: Three to five years or relevant experience A degree from a four More ❯