Go Significant experience with AWS cloud infrastructure Deep understanding of IaC tools: Terraform, Packer, CloudFormation Proven leadership in multidisciplinary delivery teams Skills in Databases: MongoDB/Atlas, Messaging: Kafka, Observability: Prometheus, Grafana, Splunk Experience of working in a DevOps environment - favouring and implementing Continuous Integration & Deployment over manual processes. Experience of designing, implementing, securing and supporting Unix/Linux based More ❯
will involve designing robust software solutions that enhance system performance while ensuring high availability for critical applications. You will work hand-in-hand with product engineering teams to improve observability tools and telemetry systems, driving forward automation initiatives that reduce manual intervention. By participating in incident management processes-facilitating transparent communication with stakeholders and leading blameless post-mortems-you will … a focus on automating these activities wherever possible.* Provide on-call support during production incidents outside standard working hours as required by the business needs.* Contribute to enhancing product observability and telemetry by supporting ongoing modernisation efforts within the infrastructure.* Collaborate closely with engineering teams to brainstorm ideas that simplify infrastructure management and streamline SRE practices. What you bring: * Proficiency More ❯
South West London, London, England, United Kingdom
Oscar Technology
experienced Site Reliability Engineer (SRE) to join them on a 6-month contract (outside IR35) You'll be leading efforts acriss AWS and Azure Cloud environments, focusing on automation, observability, infrastructure as code and performance at scale. Stakeholder engagements and strong communication is essential in this role, so if you've been in a start-up/smaller team- this … scripting (Python, Bash, PowerShell), and cloud architecture Comfortable with containerisation and orchestration ( Docker, Kubernetes ) Understanding of networking, DNS, IAM, and load balancing in cloud environments Hands-on experience with observability tooling and production-level troubleshooting If this sounds like you, it's a great opportunity so apply now! Site Reliability Engineer - AWS/Azure | Outside IR35 | £450-500/day More ❯
Site Reliability/DevOp Engineer London - 5 Days Onsite Up to £550 per day (Umbrella, Inside IR35) 12-Month Contract Must hold live and transferrable DV Clearance Are you passionate about reliability, automation, and supporting mission-critical systems? Join this More ❯
Site Reliability/DevOp Engineer London - 5 Days Onsite Up to 550 per day (Umbrella, Inside IR35) 12-Month Contract Must hold live and transferrable DV Clearance Are you passionate about reliability, automation, and supporting mission-critical systems? Join this More ❯
Job Title: Senior SRE - Site Reliability Engineering for Observability Location: London (Mostly Remote | 1 Day/Week in Office) Pay Rate: £50 - £62 per hour (Inside IR35) Contract Duration: Initial 12 Months Working Hours: 11:00 AM - 7:00 PM About the Role We're looking for a Senior Site Reliability Engineer (SRE) to join a high-impact Observability team … monitoring and logging platforms that ensure service reliability, performance, and visibility. If you're passionate about distributed systems, high-throughput data pipelines, and enabling engineering teams with top-tier observability tooling-this is the role for you. What You'll Be Doing Designing and operating observability platforms (logging, monitoring, alerting) at scale. Managing large, high-performance ElasticSearch clusters and Prometheus … deployments. Building scalable data pipelines using Kafka to process millions of events per second. Developing tools, APIs, and dashboards to enable self-service observability for engineering teams. Automating infrastructure using Terraform and configuration with Ansible . Participating in on-call rotations to ensure platform uptime and responsiveness. What We're Looking For 5+ years of experience in SRE/DevOps More ❯
Solihull, West Midlands, England, United Kingdom Hybrid / WFH Options
Sanderson
as Code (IaC) using Terraform , Vagrant , and related tools. Build and maintain secure CI/CD pipelines using Jenkins , Groovy scripting , and other automation tools. Enable robust monitoring and observability through Grafana, Prometheus, Alert Manager , and related tools. Apply DevSecOps practices , integrating tools like SonarQube , ClamAV , and MS Defender into delivery pipelines. Essential Skills & Experience: 10+ years of hands-on More ❯
design and architecture through to production deployment and support.You'll work closely with experienced engineers and domain experts to deliver mission-critical services with a strong focus on scalability, observability (DataDog), and quality. You'll also contribute to architectural design, sequence diagrams, and flow mapping, ensuring robust documentation and testing standards are met.This is a full Agile environment, and you More ❯
and architecture through to production deployment and support. You'll work closely with experienced engineers and domain experts to deliver mission-critical services with a strong focus on scalability, observability (DataDog), and quality. You'll also contribute to architectural design, sequence diagrams, and flow mapping, ensuring robust documentation and testing standards are met. This is a full Agile environment, and More ❯
Cambridge (onsite travel required) Job Type: 12-Month Contract (Inside IR35) Experience Level: Mid to Senior Level Role Overview We are seeking an experienced Dynatrace Consultant to join our Observability Team on a 12-month engagement. This role is critical in driving the adoption and integration of Dynatrace across a complex enterprise environment. You will work closely with platform teams … application owners, and DevOps engineers to enable full observability, implement best practices, and ensure successful platform rollout as part of our new Center of Excellence initiative. Key Responsibilities Provide technical consulting and enablement to internal engineering teams for effective use of Dynatrace. Build dashboards, alerts, and service flow mappings aligned with application performance needs. Develop and optimize Dynatrace Query Language … DQL) queries for actionable insights. Support observability design and migration from tools such as Prometheus, Grafana, and AWS CloudWatch to Dynatrace. Advise on RBAC models, data access strategies , and security best practices for multi-team environments. Design monitoring strategies for Kubernetes workloads in hybrid cloud/on-prem environments. Promote observability-as-code using tools like Terraform and GitLab for More ❯
AWS services at the DevOps Engineer level Incident, change & problem management experience. This role is heavily operational-oriented, including on-call requirements Strong background in setup & operation of enterprise observability tooling, specifically Prometheus, Grafana and Splunk, including usage of PromQL Proficient in one or more languages of Python, Go, Bash, SQL Familiar with GitHub/GitOps/container orchestration/ More ❯
support) What You Bring: Strong Java (streams, lambdas, concurrency) and front-end skills with React.js Deep knowledge of multithreaded, distributed systems and asynchronous architecture Experience with JVM tuning and observability tools (Prometheus, Elastic, etc.) TDD, CI/CD, and agile delivery experience Ability to deliver from design to deployment Bonus Points: Experience in Front Office, Risk, or Pricing within investment More ❯
service applications in production Support development of microservices with focus on security, scalability and reliability Ensure best practice in code quality (TDD, SOLID principles, peer reviews) Support API architecture, observability, and service monitoring Participate in a 24/7 on-call rota (Level 3 support) Required Experience: Proven experience leading engineering teams on complex digital projects Excellent knowledge of at … JavaScript/Vue.js & PHP/Drupal & WordPress (database tuning, CDN, caching) Knowledge of AWS DevOps principles & Docker, Terraform, Kubernetes, Helm, Git C#, Java (Springboot, JPA/Hibernate), REST APIs, observability & monitoring, queue technologies & security History working on building new, evolving, high availability microservices with data integrity Strong understanding of test methodologies: JUnit, TDD, Integration Tests & E2E Experience working with relational More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Morgan Hunt Recruitment
Postgres) Implement OCR, NLP, and ML for document analysis and automation risk assessment Lead R&D spikes and validate system improvements through robust data analysis Ensure code quality, testing, observability, and non-functional compliance (security, UX, performance) Coach team members and contribute to Agile delivery practices Essential Skills Strong commercial experience with Python, TypeScript, SpaCy, and AWS (serverless) Background in More ❯
insight, and proactive incident management. Key Responsibilities: Translate high-level monitoring non-functional requirements (NFRs) into actionable configurations across tools such as Splunk, Dynatrace, and AppDynamics. Deliver full-stack observability solutions, including application-aware network performance monitoring (NPM), synthetics, log analytics, and infrastructure metrics. Provide live support for monitoring technologies and assist with live service support, including key business events More ❯
that drive Business Intelligence and service assurance. Translate high-level monitoring non-functional requirements (NFRs) into actionable configurations across tools such as Splunk, Dynatrace, and AppDynamics. Deliver full-stack observability solutions, including application-aware network performance monitoring (NPM), synthetics, log analytics, and infrastructure metrics. Provide live support for monitoring technologies and assist with live service support, including key business events More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Develop
data, integration layers, and authentication modules Ensure secure, scalable deployment using Azure cloud-native tools Build and support systems using PostgreSQL, Java, and Spring Boot Integrate and monitor using observability tools like Datadog and BigPanda Collaborate closely with architects, DevOps, and security teams across the full SDLC Core Skills & Technologies Strong backend development in Java with Spring Boot Cloud migration … experience, particularly Azure Lift-and-Shift Familiarity with cloud infrastructure and deployment pipelines Exposure to PostgreSQL, authentication/security patterns Monitoring/observability tooling: Datadog, BigPanda Apply now to be considered. More ❯
Telford, Shropshire, United Kingdom Hybrid / WFH Options
Stealth IT Consulting
insight, and proactive incident management. Key Responsibilities Translate high-level monitoring non-functional requirements (NFRs) into actionable configurations across tools such as Splunk, Dynatrace, and AppDynamics. Deliver full-stack observability solutions, including application-aware network performance monitoring (NPM), synthetics, log analytics, and infrastructure metrics. Provide live support for monitoring technologies and assist with live service support, including key business events More ❯
insight, and proactive incident management. Key Responsibilities Translate high-level monitoring non-functional requirements (NFRs) into actionable configurations across tools such as Splunk, Dynatrace, and AppDynamics. Deliver full-stack observability solutions, including application-aware network performance monitoring (NPM), synthetics, log analytics, and infrastructure metrics. Provide live support for monitoring technologies and assist with live service support, including key business events More ❯
Wellington, Shropshire, United Kingdom Hybrid / WFH Options
Experis
proactive incident management. Key Skills/requirements Translate high-level monitoring non-functional requirements (NFRs) into actionable configurations across tools such as Splunk, Dynatrace, and AppDynamics. Deliver full-stack observability solutions, including application-aware network performance monitoring (NPM), synthetics, log analytics, and infrastructure metrics. Provide live support for monitoring technologies and assist with live service support, including key business events More ❯
Telford, Shropshire, West Midlands, United Kingdom Hybrid / WFH Options
Experis
proactive incident management. Key Skills/requirements Translate high-level monitoring non-functional requirements (NFRs) into actionable configurations across tools such as Splunk, Dynatrace, and AppDynamics. Deliver full-stack observability solutions, including application-aware network performance monitoring (NPM), synthetics, log analytics, and infrastructure metrics. Provide live support for monitoring technologies and assist with live service support, including key business events More ❯
Manchester, Lancashire, England, United Kingdom Hybrid / WFH Options
Lorien
growing to meet our business needs. What you'll lead: Shape and evolve the backend technical architecture to support product scale and complexity Identify and drive improvements in performance, observability, and infrastructure Lead the design of domain models aligned with evolving business needs Be a go-to person for backend excellence, and improve code quality Engineering centric requirement definition (user More ❯
insight, and proactive incident management. Key Responsibilities Translate high-level monitoring non-functional requirements (NFRs) into actionable configurations across tools such as Splunk, Dynatrace, and AppDynamics. Deliver full-stack observability solutions, including application-aware network performance monitoring (NPM), synthetics, log analytics, and infrastructure metrics. Provide live support for monitoring technologies and assist with live service support, including key business events More ❯
insight, and proactive incident management. Key Responsibilities: * Translate high-level monitoring non-functional requirements (NFRs) into actionable configurations across tools such as Splunk, Dynatrace, and AppDynamics. * Deliver full-stack observability solutions, including application-aware network performance monitoring (NPM), synthetics, log analytics, and infrastructure metrics. * Provide live support for monitoring technologies and assist with live service support, including key business events More ❯