London, England, United Kingdom Hybrid / WFH Options
Bright Purple
native Infrastructure-as-Code (IaC) solutions from the ground up? Our client is seeking a talented and motivated Senior Software Engineer to lead the development of our next-generation observability platform. THIS IS NOT A DEVOPS ROLE. Responsibilities Collaborate within a dynamic software engineering team to architect and build a new cloud-native IaC platform. Develop software using technologies such More ❯
Glasgow, Scotland, United Kingdom Hybrid / WFH Options
iO Associates - UK/EU
applications into various environments. Objectives of this role Develop and manage CI/CD pipelines to automate deployment processes. Monitor system performance and troubleshoot issues using CloudWatch and other observability tools. Manage and optimise Kafka clusters for real-time data streaming. Oversee and maintain containerized workloads using EKS (Kubernetes on AWS) . Support data infrastructure, including Amazon Redshift for analytics More ❯
Architect and optimize Generative AI applications using Large Language Models and agent frameworks. Working with RAG frameworks: Utilize techniques like chunking, hybrid search, and vector databases. Monitoring Performance: Use observability tools such as Datadog and Databricks. Research & Innovation: Keep abreast of advancements in fine-tuning, RLHF, and prompt engineering. System Design & Architecture: Translate requirements into scalable, modern architectures. Testing & Integration More ❯
in Azure. Proficiency with containerization and orchestration tools like Docker, Kubernetes, AKS, and Helm. Programming skills in Python, Java, PowerShell, or Go, with understanding of REST APIs. Experience with observability tools such as DataDog, Prometheus, Splunk, Elasticsearch, Grafana, Azure Monitor. Experience with CI/CD tools like Git, Terraform, Jenkins. Azure cloud expertise in mission-critical environments. Additional qualifications Azure More ❯
this role, you will assist in upgrading the Elastic DP estate to Kubernetes, moving away from obsolete technology (Cloudera), upgrading to RHEL 8, and contributing to improving stability and observability of the platform. You will provide advanced analytics tooling and services for modeling analytics, working across continuous integration, development, build, and deployment using automation and cloud technologies to support the More ❯
London, England, United Kingdom Hybrid / WFH Options
Anson McCade
to define technical roadmaps, service models, and adoption strategies • Shape CI/CD pipelines and infrastructure-as-code with tools like Terraform, GitHub, and Jenkins • Design cloud-native operations, observability, and performance monitoring frameworks • Contribute to best practices in platform engineering, security, and cost optimization (FinOps) Required Experience: • 6+ years in cloud platform architecture and solution design • Deep knowledge of More ❯
of CI/CD pipelines and version control systems (GitLab, GitHub). Advanced experience with Kubernetes, including cluster management (EKS or similar) and deploying applications with Helm. Familiarity with observability tools and building real-time monitoring solutions. Strong understanding of Linux systems (RHEL, Amazon Linux 2) and cloud platforms (AWS services like EC2, S3, RDS, Lambda). Additional Skills: Proficiency More ❯
code (Terraform, Ansible) Messaging and streaming (Kafka, Kinesis) Docker, Kubernetes Scripting (Shell, PowerShell) Programming (Python, Go, C#) Web services, REST APIs Databases and storage systems CI/CD pipelines Observability and monitoring tools Join us to turn insights into action. At CGI, ownership, teamwork, respect, and belonging are fundamental. From day one, you are an owner, shaping our strategy and More ❯
Bristol, Gloucestershire, United Kingdom Hybrid / WFH Options
Twinstream Limited
Socials & Events Cycle to Work Scheme & Life Assurance Key Responsibilities of the Site Reliability Engineer: Work closely with engineers and sysadmins to increase performance and reduce toil Advance system observability, monitoring and alerting Automate, troubleshoot, and proactively resolve issues before they escalate Improve development environments to meet delivery and quality targets Research and evaluate tools and platforms to support scale More ❯
BS1, Bristol, City of Bristol, United Kingdom Hybrid / WFH Options
Twinstream Limited
Socials & Events Cycle to Work Scheme & Life Assurance Key Responsibilities of the Site Reliability Engineer: Work closely with engineers and sysadmins to increase performance and reduce toil Advance system observability, monitoring and alerting Automate, troubleshoot, and proactively resolve issues before they escalate Improve development environments to meet delivery and quality targets Research and evaluate tools and platforms to support scale More ❯
Employment Type: Permanent
Salary: £80000 - £110000/annum Hybrid, Great Benefits
Bristol, Avon, South West, United Kingdom Hybrid / WFH Options
Twinstream Limited
Socials & Events Cycle to Work Scheme & Life Assurance Key Responsibilities of the Site Reliability Engineer: Work closely with engineers and sysadmins to increase performance and reduce toil Advance system observability, monitoring and alerting Automate, troubleshoot, and proactively resolve issues before they escalate Improve development environments to meet delivery and quality targets Research and evaluate tools and platforms to support scale More ❯
multi-tenant SaaS or large enterprise application. Certifications: AWS Certified Solutions Architect, Google Professional Cloud Architect, Azure Solutions Architect Expert. Experience in data architecture, AI/ML integration, and observability frameworks . More ❯
multi-tenant SaaS or large enterprise application. Certifications: AWS Certified Solutions Architect, Google Professional Cloud Architect, Azure Solutions Architect Expert. Experience in data architecture, AI/ML integration, and observability frameworks . More ❯
multi-tenant SaaS or large enterprise application. Certifications: AWS Certified Solutions Architect, Google Professional Cloud Architect, Azure Solutions Architect Expert. Experience in data architecture, AI/ML integration, and observability frameworks . More ❯
as Docker, Kubernetes, AKS and HELM Proficient in at least one programming language such as Python, Java, PowerShell, or GO along with good understanding of REST APIs Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Data Dog, Prometheus, Splunk, Elasticsearch, Grafana, Azure monitoring and others Experience with More ❯
Wise, Red Bull, GetYourGuide, and Aesop trust us to supercharge their corporate travel. We are seeking a skilled Site Reliability Engineer (SRE) with experience in AWS, Serverless, Monitoring, and Observability to join our team. Responsibilities Design, build and maintain scalable, and reliable cloud infrastructure in AWS Monitor and manage the performance, reliability, and security of our systems Implement and improve More ❯
London, England, United Kingdom Hybrid / WFH Options
Deel
our valuation to $12B. There’s never been a more exciting time to join Deel — the international payroll and compliance market leader. Responsibilities: Design and manage various monitoring and observability tools to efficiently troubleshoot and resolve production issues. Oversee the management of our Active Directory infrastructure and ensure the smooth operation of Windows operating systems supporting internal applications. Supervise our More ❯
support high-performance computing workloads and scalable services. Collaborate with R&D teams to provision and manage compute environments for model training and experimentation. Maintain/monitor systems, implement observability solutions (e.g., logging, metrics, tracing), and proactively resolve infrastructure issues. Manage CI/CD pipelines for rapid, reliable deployment of services and models. Ensure high availability, disaster recovery, and robust More ❯
support high-performance computing workloads and scalable services. Collaborate with R&D teams to provision and manage compute environments for model training and experimentation. Maintain/monitor systems, implement observability solutions (e.g., logging, metrics, tracing), and proactively resolve infrastructure issues. Manage CI/CD pipelines for rapid, reliable deployment of services and models. Ensure high availability, disaster recovery, and robust More ❯
Terraform). Experience in software development in general, with skills in a high-level language (e.g., Python, JavaScript, TypeScript, Java) and familiarity with modern development practices Understanding of Cloud Observability, Monitoring, and Tracing tools (Datadog, CloudWatch, Jaeger, ELK) and how best to leverage to support effective MTTR and mitigate high CFR Our UK benefits: Stock Options Annual Performance Bonus or More ❯
ecosystem Implement modern identity solutions using Entra SSO Automate infrastructure deployment using Terraform and Azure DevOps Maintain high-availability web hosting services for marketing campaigns Lead monitoring and observability initiatives Optimize cloud resources for cost-effectiveness Provide technical leadership and mentoring to the team What You Will Bring To The Role You're not just technically proficient - you're a More ❯
such as Python, Java Spring Boot, Unix Shell. Deep knowledge of software applications and technical processes with emerging depth in one or more technical disciplines Proficiency and experience in observability such as white and black box monitoring, SLO alerting, and telemetry collection using tools such as Grafana, Geneos, Dynatrace, Prometheus, Datadog, Splunk, etc. Proficiency in continuous integration and continuous delivery More ❯
London, England, United Kingdom Hybrid / WFH Options
Ten Lifestyle Group
cost optimisation). Experience with cloud platforms (AWS, GCP, Azure) and infrastructure-as-code (Terraform). Familiarity and hands-on with DevOps practices (CI/CD, Docker, K8s) and observability tools (Prometheus, Grafana, Datadog). Experience in distributed systems and scaling. Knowledge and hands-on experience with multiple data stores (both SQL and NoSQL). Desired experience in building agentic More ❯
CD pipelines for data transformation code and infrastructure; Configure and optimise database resources for performance and cost; Implement proper security controls and multi-tenant isolation; Contribute to monitoring and observability solutions; Troubleshoot and resolve complex data infrastructure issues. Client Implementation Support Adapt standard pipelines for client-specific requirements; Assist with data migration and integration for new clients; Support UAT and More ❯
Enterprise level Cloud & DevOps standards and best practices in the areas of cloud infrastructure, infrastructure as code, and DevOps toolchain Provide thought leadership, design and implementation roadmap for Continuous Observability platform to maintain overall services & infrastructure health along with automated remediation and disaster recovery failovers capabilities Team Leadership: Leadership & Mentoring: As an Azure Sr Architect, you will be responsible to More ❯