will involve designing robust software solutions that enhance system performance while ensuring high availability for critical applications. You will work hand-in-hand with product engineering teams to improve observability tools and telemetry systems, driving forward automation initiatives that reduce manual intervention. By participating in incident management processes-facilitating transparent communication with stakeholders and leading blameless post-mortems-you will … a focus on automating these activities wherever possible.* Provide on-call support during production incidents outside standard working hours as required by the business needs.* Contribute to enhancing product observability and telemetry by supporting ongoing modernisation efforts within the infrastructure.* Collaborate closely with engineering teams to brainstorm ideas that simplify infrastructure management and streamline SRE practices. What you bring: * Proficiency More ❯
South West London, London, England, United Kingdom
Oscar Technology
experienced Site Reliability Engineer (SRE) to join them on a 6-month contract (outside IR35) You'll be leading efforts acriss AWS and Azure Cloud environments, focusing on automation, observability, infrastructure as code and performance at scale. Stakeholder engagements and strong communication is essential in this role, so if you've been in a start-up/smaller team- this … scripting (Python, Bash, PowerShell), and cloud architecture Comfortable with containerisation and orchestration ( Docker, Kubernetes ) Understanding of networking, DNS, IAM, and load balancing in cloud environments Hands-on experience with observability tooling and production-level troubleshooting If this sounds like you, it's a great opportunity so apply now! Site Reliability Engineer - AWS/Azure | Outside IR35 | £450-500/day More ❯
Site Reliability/DevOp Engineer London - 5 Days Onsite Up to £550 per day (Umbrella, Inside IR35) 12-Month Contract Must hold live and transferrable DV Clearance Are you passionate about reliability, automation, and supporting mission-critical systems? Join this More ❯
Site Reliability/DevOp Engineer London - 5 Days Onsite Up to 550 per day (Umbrella, Inside IR35) 12-Month Contract Must hold live and transferrable DV Clearance Are you passionate about reliability, automation, and supporting mission-critical systems? Join this More ❯
Job Title: Senior SRE - Site Reliability Engineering for Observability Location: London (Mostly Remote | 1 Day/Week in Office) Pay Rate: £50 - £62 per hour (Inside IR35) Contract Duration: Initial 12 Months Working Hours: 11:00 AM - 7:00 PM About the Role We're looking for a Senior Site Reliability Engineer (SRE) to join a high-impact Observability team … monitoring and logging platforms that ensure service reliability, performance, and visibility. If you're passionate about distributed systems, high-throughput data pipelines, and enabling engineering teams with top-tier observability tooling-this is the role for you. What You'll Be Doing Designing and operating observability platforms (logging, monitoring, alerting) at scale. Managing large, high-performance ElasticSearch clusters and Prometheus … deployments. Building scalable data pipelines using Kafka to process millions of events per second. Developing tools, APIs, and dashboards to enable self-service observability for engineering teams. Automating infrastructure using Terraform and configuration with Ansible . Participating in on-call rotations to ensure platform uptime and responsiveness. What We're Looking For 5+ years of experience in SRE/DevOps More ❯
to make an impact. As a Platform Engineer, you'll help design, build, and support the infrastructure and tooling that underpins critical systems - from CI/CD pipelines and observability tooling to service deployment and runtime environments. You'll be part of a high-trust team that values clean code, quick iteration, and leaving things better than you found them. … or Python for building internal tooling and services Hands-on experience with AWS, Kubernetes, Docker, and modern CI/CD pipelines Familiarity with infrastructure-as-code (e.g., Terraform) and observability tooling (e.g., Prometheus, Grafana) Comfortable working on distributed systems and improving developer workflows A product mindset and a collaborative approach to problem-solving Experience with Kafka, gRPC, or open-source More ❯
AWS services at the DevOps Engineer level Incident, change & problem management experience. This role is heavily operational-oriented, including on-call requirements Strong background in setup & operation of enterprise observability tooling, specifically Prometheus, Grafana and Splunk, including usage of PromQL Proficient in one or more languages of Python, Go, Bash, SQL Familiar with GitHub/GitOps/container orchestration/ More ❯
support) What You Bring: Strong Java (streams, lambdas, concurrency) and front-end skills with React.js Deep knowledge of multithreaded, distributed systems and asynchronous architecture Experience with JVM tuning and observability tools (Prometheus, Elastic, etc.) TDD, CI/CD, and agile delivery experience Ability to deliver from design to deployment Bonus Points: Experience in Front Office, Risk, or Pricing within investment More ❯
and CI/CD pipelines. Experience supporting real-time trading applications and proficient in scripting and automation (Python, Bash, PowerShell). Knowledge of messaging middleware (e.g., Solace, 29West) and observability platforms (e.g., ITRS Geneos, Prometheus). Excellent communication skills and comfortable working in Linux systems and hybrid infrastructure. Benefits: Flexible working options between office and home. Exposure to global production More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Morgan Hunt Recruitment
Postgres) Implement OCR, NLP, and ML for document analysis and automation risk assessment Lead R&D spikes and validate system improvements through robust data analysis Ensure code quality, testing, observability, and non-functional compliance (security, UX, performance) Coach team members and contribute to Agile delivery practices Essential Skills Strong commercial experience with Python, TypeScript, SpaCy, and AWS (serverless) Background in More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Develop
data, integration layers, and authentication modules Ensure secure, scalable deployment using Azure cloud-native tools Build and support systems using PostgreSQL, Java, and Spring Boot Integrate and monitor using observability tools like Datadog and BigPanda Collaborate closely with architects, DevOps, and security teams across the full SDLC Core Skills & Technologies Strong backend development in Java with Spring Boot Cloud migration … experience, particularly Azure Lift-and-Shift Familiarity with cloud infrastructure and deployment pipelines Exposure to PostgreSQL, authentication/security patterns Monitoring/observability tooling: Datadog, BigPanda Apply now to be considered. More ❯
and Python code for use in Databricks Contributing to architectural decisions around pipeline scalability and performance Supporting the integration of diverse data sources into the platform Ensuring data quality, observability, and cost-efficiency KEY SKILLS AND REQUIREMENTS Strong experience with DBT, Airflow, and Databricks Advanced SQL and solid Python scripting skills Solid understanding of modern data engineering best practices Ability More ❯
specialism in vulnerability management Self-starter, able to work in technical detail and motivate a diverse group of stakeholders to build sponsorship for significant and impactful change Desired: Establishing observability platforms Capabilities adjacent to exposure/vulnerability management capabilities (ie cyber security asset management, attack surface management, etc) Pragmatic application of zero-trust philosophies Cloud based security (GCP, AWS and More ❯
Data Operations Manager: We are seeking a dynamic and driven Data Operations Manager to lead a team of data engineers. You will oversee the daily operations of our data infrastructure and ensure the accuracy, availability, and security of data across More ❯