Jira, Team City Expert level knowledge of DevOps tools like Bitbucket/GitHub, Sonar Cube, CAST, Team City/Jenkins/Azure DevOps Expert level knowledge of telemetry and observability platforms like ELK stack, Grafana, Kibana, Azure Application Insights, AWS Cloud Watch etc., Scripting languages preferably python, PowerShell Database technologies preferably MS SQL Server, Postgres SQL Infrastructure as code – AWS More ❯
london (city of london), south east england, united kingdom
rmg digital
Jira, Team City Expert level knowledge of DevOps tools like Bitbucket/GitHub, Sonar Cube, CAST, Team City/Jenkins/Azure DevOps Expert level knowledge of telemetry and observability platforms like ELK stack, Grafana, Kibana, Azure Application Insights, AWS Cloud Watch etc., Scripting languages preferably python, PowerShell Database technologies preferably MS SQL Server, Postgres SQL Infrastructure as code – AWS More ❯
Apache Airflow for orchestrating complex data workflows and ensuring reliable execution. Understanding of cloud security and governance practices including IAM, KMS, and data access policies. Experience with monitoring and observability tools such as CloudWatch. Experience working in Agile/Scrum environments, participating in sprint planning, retrospectives, and backlog grooming. Good to Have : Exposure to Azure data services such as Azure More ❯
london (city of london), south east england, united kingdom
HCLTech
Apache Airflow for orchestrating complex data workflows and ensuring reliable execution. Understanding of cloud security and governance practices including IAM, KMS, and data access policies. Experience with monitoring and observability tools such as CloudWatch. Experience working in Agile/Scrum environments, participating in sprint planning, retrospectives, and backlog grooming. Good to Have : Exposure to Azure data services such as Azure More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Talent
pipelines, monitoring, and infrastructure provisioning. Collaborate with developers and engineers to streamline deployments and workflows. Manage AWS services effectively and efficiently. Promote best practices in Infrastructure as Code (IaC), observability, and DevSecOps. Experience & Skills Required Active SC Clearance – mandatory requirement. Strong hands-on experience with AWS services. Proficiency in Terraform and IaC principles. Solid understanding of CI/CD pipelines More ❯
london (city of london), south east england, united kingdom Hybrid / WFH Options
Talent
pipelines, monitoring, and infrastructure provisioning. Collaborate with developers and engineers to streamline deployments and workflows. Manage AWS services effectively and efficiently. Promote best practices in Infrastructure as Code (IaC), observability, and DevSecOps. Experience & Skills Required Active SC Clearance – mandatory requirement. Strong hands-on experience with AWS services. Proficiency in Terraform and IaC principles. Solid understanding of CI/CD pipelines More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Hlx Technology
neuroscience, and clinical datasets Build a unified feature store to serve ML training and downstream biological analysis Develop scalable storage, ingestion, and validation systems with a focus on robustness, observability, and versioning Collaborate with ML researchers and biologists to translate raw data into actionable insights and high-quality training data Scale distributed systems using Kubernetes, Terraform, and orchestration tools such More ❯
london (city of london), south east england, united kingdom Hybrid / WFH Options
Hlx Technology
neuroscience, and clinical datasets Build a unified feature store to serve ML training and downstream biological analysis Develop scalable storage, ingestion, and validation systems with a focus on robustness, observability, and versioning Collaborate with ML researchers and biologists to translate raw data into actionable insights and high-quality training data Scale distributed systems using Kubernetes, Terraform, and orchestration tools such More ❯
/CD pipelines with Azure DevOps, ensuring robust version control, testing, and seamless deployment. * Monitor production ML systems for performance, data drift, and anomalies using Azure Monitor or other observability tools. * Schedule and automate model retraining pipelines to maintain performance over time. 3. Data Engineering & Preprocessing * Develop and maintain scalable ETL/ELT data pipelines using Azure Data Factory, Data More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Rise Technical Recruitment Limited
data is delivered on time and without failure. The ideal candidate will have a strong experience working with streaming and batch data systems, a solid understanding of monitoring a observability, and hands-on experience working with AWS, Apache Flink, Kafka, and Python. This is a fantastic opportunity to step into a SRE role focused on data reliability in a modern More ❯
teams to operationalize models and ship ML-powered features into production. Continuously assess and iterate on production models, balancing long-term ML strategy with tactical improvements. Champion code quality, observability, and resilience within their ML systems through reviews and hands-on contributions. Help shape their internal ML standards and practices, ensuring they stay ahead of industry advancements. Offer technical mentorship More ❯
F# are welcome) Proven track record of building and scaling distributed backend systems Solid understanding of infrastructure-as-code and cloud orchestration (AWS, Terraform, Docker) Familiarity with queue management, observability tooling, and shipping in fast-paced environments Awareness of GenAI and prompt engineering, or a keen interest to develop expertise in this area A self-starter attitude, with a strong More ❯
edge devices. Deploying machine learning models to production. Optimizing the platform runtime for maximum performance. This is largely C++ code with parts of the pipeline running on GPU. Building observability and telemetry. This is a 5 day a week in the office role. Qualifications 3+ years of experience writing production software in C++ and Python of experience building applications processing More ❯
and help shape how platform engineering is done as the team continues to scale. Tech stack AWS (Core services - EC2, RDS, S3, IAM, etc.) Configuration Management Ansible Monitoring and Observability Grafana, Prometheus Kubernetes (building and managing production clusters) Terraform (IaC provisioning) Python or Java (scripting, automation) GitHub Actions (CI/CD pipelines) What They’re Looking For Experience in AWS … cloud infrastructure (ideally in a regulated or high-traffic environment) Previous experience working with Monitoring and Observability Tools Hands-on Kubernetes know-how, specifically with EKS. Solid IaC experience with Terraform. Experience with containerisation (Docker, Helm) and CI/CD (GitHub Actions or similar) Solid scripting/Automation experience with Python or Java A good communicator who enjoys working collaboratively More ❯
MY client are transforming observability with a modern, full-stack platform that delivers logs, metrics, traces, and security monitoring — cutting costs by up to 70% while boosting efficiency. They are looking for a Lead SRE to own and elevate our Alerting & Incident Management platform . You’ll be the driving force behind reliability, customer satisfaction, and product excellence — ensuring smooth More ❯
performing team of ML engineers. Combine ML with physics-based risk models (flooding, tropical cyclones, wildfires) to deliver grounded, high-impact solutions. Establish gold-standard practices for evaluation, deployment, observability, and maintainability in ML model development. Turn complex technical challenges into clear business outcomes for colleagues and customers. Requirements: MSc or PhD Degree in Computer Science, Artificial Intelligence, Mathematics, Statistics More ❯
london (city of london), south east england, united kingdom
Harnham
performing team of ML engineers. Combine ML with physics-based risk models (flooding, tropical cyclones, wildfires) to deliver grounded, high-impact solutions. Establish gold-standard practices for evaluation, deployment, observability, and maintainability in ML model development. Turn complex technical challenges into clear business outcomes for colleagues and customers. Requirements: MSc or PhD Degree in Computer Science, Artificial Intelligence, Mathematics, Statistics More ❯
in biotech, pharma, or AI-driven drug discovery Experience in both large organisations (with structured processes and metrics) and smaller/startup environments (delivering with limited resources) Knowledge of observability and reliability practices for product platforms Security or compliance experience Why Join? Be part of a world-class AI-first research environment shaping the future of drug discovery Work on More ❯
london (city of london), south east england, united kingdom
Hlx Life Sciences
in biotech, pharma, or AI-driven drug discovery Experience in both large organisations (with structured processes and metrics) and smaller/startup environments (delivering with limited resources) Knowledge of observability and reliability practices for product platforms Security or compliance experience Why Join? Be part of a world-class AI-first research environment shaping the future of drug discovery Work on More ❯
and help shape how platform engineering is done as the team continues to scale. Tech stack AWS (Core services - EC2, RDS, S3, IAM, etc.) Configuration Management Ansible Monitoring and Observability Grafana, Prometheus Kubernetes (building and managing production clusters) Terraform (IaC provisioning) GitHub Actions (CI/CD pipelines) What They’re Looking For Experience in AWS cloud infrastructure (ideally in a … regulated or high-traffic environment) Previous experience working with Monitoring and Observability Tools Hands-on Kubernetes know-how, specifically with EKS. Solid IaC experience with Terraform. Experience with containerisation (Docker, Helm) and CI/CD (GitHub Actions or similar) A good communicator who enjoys working collaboratively across product and engineering The client is willing to take someone that doesn't More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Anecdote
up and harden RAG pipelines (indexing, retrieval policies, grounding, guardrails) and agent frameworks. Take basic infra ownership on GCP (or AWS/Azure): networking, autoscaling, CI/CD, IaC, observability, and cost tuning. Participate in on‐call for your area and drive root‐cause analysis with crisp follow‐ups. 15% Collaborate Pair with back‐end & front‐end to wire extractors … evals; hands‐on with time‐series analysis (forecasting, change‐point, drift). Cloud & ops: Basic infra ownership on GCP (or AWS/Azure): networking, autoscaling, CI/CD, IaC, observability, and cost control. Communication: You explain results clearly, align stakeholders, and write crisp docs. Bonus points DevOps wizardry; GPU/accelerator experience. Multimodal pipelines (text + voice + screenshots). More ❯
london (city of london), south east england, united kingdom Hybrid / WFH Options
Anecdote
up and harden RAG pipelines (indexing, retrieval policies, grounding, guardrails) and agent frameworks. Take basic infra ownership on GCP (or AWS/Azure): networking, autoscaling, CI/CD, IaC, observability, and cost tuning. Participate in on‐call for your area and drive root‐cause analysis with crisp follow‐ups. 15% Collaborate Pair with back‐end & front‐end to wire extractors … evals; hands‐on with time‐series analysis (forecasting, change‐point, drift). Cloud & ops: Basic infra ownership on GCP (or AWS/Azure): networking, autoscaling, CI/CD, IaC, observability, and cost control. Communication: You explain results clearly, align stakeholders, and write crisp docs. Bonus points DevOps wizardry; GPU/accelerator experience. Multimodal pipelines (text + voice + screenshots). More ❯
We are seeking a highly experienced Splunk ITSI Expert with 10+ years in observability to enhance our monitoring and analytics capabilities. Key Responsibilities: Design and implement advanced monitoring strategies using Splunk IT Service Intelligence (ITSI). Create service models, define KPIs, and build glass tables to visualize key business services. Utilize Splunk ES for security event monitoring and correlation searches. … Automate tasks and integrate systems using Python, Shell, or Perl scripting. Perform root cause analysis and anomaly detection by analyzing complex log data. Requirements: 10+ years experience in observability, with deep expertise in Splunk, especially ITSI. Proficiency in Scripting (Shell/PowerShell/Python). Strong understanding of Load Balancers such as F5, Netscaler, and AWS ELB. Hands-on experience More ❯