storage environments that power a global trading platform. The successful candidate will be involved in every layer of the technology stack—from hardware and operating systems to automation and observability—while gaining exposure to how a world-class investment firm manages its technology infrastructure. Key Responsibilities Manage a distributed compute environment and several petabyte-scale storage systems Install, configure, and … software development practices (version control, agile methodologies) Familiarity with infrastructure automation and configuration management tools (Chef, Puppet, or Ansible) Exposure to distributed storage systems and related protocols Experience with observability and monitoring tools (Elasticsearch, Logstash, Kibana, Datadog, Prometheus, Grafana) Strong written and verbal communication skills Demonstrated ability to learn quickly and adapt to evolving technologies Ability to work effectively in More ❯
manage and support a customer's AWS and Data platform To be technical hands on Provide Incident and problem management on the AWS IaaS and PaaS Platform Monitoring and observability of system and platform performance Collaboration with development and build teams on application and platform deployments and changes Involvement in the resolution of Incidents and problems in an efficient and … timely manner Actively monitor an AWS platform and components for technical issues Implement and improve on existing monitoring and observability solution To be involved in the resolution of technical incidents tickets Assist in the root cause analysis of incidents Assist with improving efficiency and processes within the team Examining traces and logs Escalate incidents and problems to the appropriate teams More ❯
records). Write performant SQL for data transformations, ETL workflows, and analytical use cases. Contribute to discussions on architecture and design, focusing on scalability, cost, reliability, and performance. Improve observability, testing, and overall system robustness. Participate in incident reviews and continuous improvement initiatives within the squad. Tech You’ll Work With Python (primary language) SQL Large-scale data workflows (ETL … handles large data volumes effectively. You contribute to improving data pipelines, performance, and system reliability. You participate actively in design discussions, planning, and squad rituals. You help strengthen testing, observability, and operational excellence. You continually learn and take on more ownership as part of a tight, high-performing squad. More ❯
records). Write performant SQL for data transformations, ETL workflows, and analytical use cases. Contribute to discussions on architecture and design, focusing on scalability, cost, reliability, and performance. Improve observability, testing, and overall system robustness. Participate in incident reviews and continuous improvement initiatives within the squad. Tech You’ll Work With Python (primary language) SQL Large-scale data workflows (ETL … handles large data volumes effectively. You contribute to improving data pipelines, performance, and system reliability. You participate actively in design discussions, planning, and squad rituals. You help strengthen testing, observability, and operational excellence. You continually learn and take on more ownership as part of a tight, high-performing squad. More ❯
RBAC, PIM), and ensure secure authentication (SAML/OAuth, MFA). Support CI/CD pipelines via Azure DevOps or GitHub Actions, troubleshoot builds, and manage YAML configurations. Implement observability best practices using Azure Monitor, Log Analytics, Application Insights, and dashboards (KQL and Datadog experience desirable). Ensure compliance and security through Microsoft Defender for Cloud, Azure Policy, Key Vault … in Terraform and Ansible for automation and infrastructure management. Deep technical understanding of networking, identity, and security within the Azure ecosystem. Strong exposure to CI/CD, monitoring, and observability tools. Experience supporting financial services or highly regulated environments is advantageous. How to Apply If your experience aligns with the requirements above, please apply with an updated CV. More ❯
RBAC, PIM), and ensure secure authentication (SAML/OAuth, MFA). Support CI/CD pipelines via Azure DevOps or GitHub Actions, troubleshoot builds, and manage YAML configurations. Implement observability best practices using Azure Monitor, Log Analytics, Application Insights, and dashboards (KQL and Datadog experience desirable). Ensure compliance and security through Microsoft Defender for Cloud, Azure Policy, Key Vault … in Terraform and Ansible for automation and infrastructure management. Deep technical understanding of networking, identity, and security within the Azure ecosystem. Strong exposure to CI/CD, monitoring, and observability tools. Experience supporting financial services or highly regulated environments is advantageous. How to Apply If your experience aligns with the requirements above, please apply with an updated CV. More ❯
up and harden RAG pipelines (indexing, retrieval policies, grounding, guardrails) and agent frameworks. Take basic infra ownership on GCP (or AWS/Azure): networking, autoscaling, CI/CD, IaC, observability, and cost tuning. Participate in on‐call for your area and drive root‐cause analysis with crisp follow‐ups. 15% Collaborate Pair with back‐end & front‐end to wire extractors … evals; hands‐on with time‐series analysis (forecasting, change‐point, drift). Cloud & ops: Basic infra ownership on GCP (or AWS/Azure): networking, autoscaling, CI/CD, IaC, observability, and cost control. Communication: You explain results clearly, align stakeholders, and write crisp docs. Bonus points DevOps wizardry; GPU/accelerator experience. Multimodal pipelines (text + voice + screenshots). More ❯
up and harden RAG pipelines (indexing, retrieval policies, grounding, guardrails) and agent frameworks. Take basic infra ownership on GCP (or AWS/Azure): networking, autoscaling, CI/CD, IaC, observability, and cost tuning. Participate in on‐call for your area and drive root‐cause analysis with crisp follow‐ups. 15% Collaborate Pair with back‐end & front‐end to wire extractors … evals; hands‐on with time‐series analysis (forecasting, change‐point, drift). Cloud & ops: Basic infra ownership on GCP (or AWS/Azure): networking, autoscaling, CI/CD, IaC, observability, and cost control. Communication: You explain results clearly, align stakeholders, and write crisp docs. Bonus points DevOps wizardry; GPU/accelerator experience. Multimodal pipelines (text + voice + screenshots). More ❯
buckinghamshire, south east england, united kingdom Hybrid/Remote Options
Rightmove
scientists to take models from development to production-grade systems, ensuring scalability, reproducibility, and robustness. Automating feature engineering and data pipeline processes, ensuring reproducibility and auditability. Implementing monitoring and observability to detect drift, bias, and performance degradation, and setting up rollback/recovery processes. Using MLOps tools (e.g., Vertex Pipelines, Kubeflow, Weights & Biases) for experiment tracking, model registry, and automated … distributed systems). 3+ years of experience as an ML Engineer, MLOps Engineer, Data Engineer, or similar, in a larger-scale, production-focused environment. Hands-on with model monitoring, observability, and retraining pipelines. Exposure to feature stores, registries, and experimentation frameworks. Familiarity with business-driven metrics and experience balancing ML performance with commercial goals. Experience with generative AI and LLM More ❯
Central London, London, United Kingdom Hybrid/Remote Options
Halian Technology Limited
A leading fintech company is seeking a Lead AppSec Engineer to join their established team. Youll be instrumental in embedding security into every stage of the software development lifecycleguiding engineers, shaping best practices, and driving secure, scalable solutions across our More ❯
Farnborough, Hampshire, South East, United Kingdom Hybrid/Remote Options
Spectrum It Recruitment Limited
Senior DevOps Engineer - AWS/Azure Government Transformation Projects (AWS/Azure/DevOps) Location: Winchester, Hampshire, Hybrid Our client is a cloud-first digital consultancy, founded over 10 years ago and trusted by government, policing, and public sector organisations More ❯
Your new company Join a fast-growing tech start-up that's recently expanded into multiple new markets and earned recognition as one of the UK's most exciting technology businesses. With a proactive, fail-fast culture, this is a More ❯
A fast-growing technology business is developing advanced software for accounting, payroll, tax, and practice management. With a strong engineering foundation and a clear commercial vision, the company is now expanding its focus on artificial intelligence to transform how professional More ❯
If you need support in completing the application or if you require a different format of this document, please get in touch with at UKI.recruitment@tcs.com or call TCS London Office number 02031552100/+44 204 520 2575 with the More ❯
If you need support in completing the application or if you require a different format of this document, please get in touch with at UKI.recruitment@tcs.com or call TCS London Office number 02031552100/+44 204 520 2575 with the More ❯
A leading fintech company is seeking aLead AppSec Engineerto join their established team. Youll be instrumental in embedding security into every stage of the software development lifecycleguiding engineers, shaping best practices, and driving secure, scalable solutions across our platform. Key More ❯
act? This is a chance to design and deliver agentic AI systems on Azure that automate real business workflows through tool use, retrieval, and reasoning, with the reliability and observability of true production engineering. In this position you’ll take ownership of designing and scaling end-to-end agentic solutions on Azure, combining LLMs, APIs, and orchestration frameworks to deliver … Productionise on Azure using AI Foundry/OpenAI, Azure ML, Functions, Event Grid/Service Bus, and Kubernetes. Build LLMOps pipelines for evaluation, monitoring, safety, and cost control. Define observability standards across prompts, tools, and data flows. Establish governance patterns, safety, privacy, and auditability. Stay hands-on with critical code paths while guiding architecture and best practice. 🧠Required Skills/ More ❯
experience building technology 0→1 , owning systems end-to-end, and working close to the metal. They will operate across everything from bare-metal Linux to modern build and observability stacks . Linux Platform Engineer – Trading Infrastructure Overview The firm is seeking a Linux Platform Engineer to join a small, high-impact engineering group supporting ML/AI-driven trading. … latency . Contribute to kernel-level debugging and system improvements . Automate Linux fleet builds—creating consistent, reproducible systems . Manage Kubernetes cluster infrastructure, networking, and container orchestration. Enhance observability Analyze and optimize networking across the full TCP/IP stack . Investigate core dumps, memory bottlenecks, and CPU performance issues across distributed systems. Develop Python tooling for internal automation More ❯
experience building technology 0→1 , owning systems end-to-end, and working close to the metal. They will operate across everything from bare-metal Linux to modern build and observability stacks . Linux Platform Engineer – Trading Infrastructure Overview The firm is seeking a Linux Platform Engineer to join a small, high-impact engineering group supporting ML/AI-driven trading. … latency . Contribute to kernel-level debugging and system improvements . Automate Linux fleet builds—creating consistent, reproducible systems . Manage Kubernetes cluster infrastructure, networking, and container orchestration. Enhance observability Analyze and optimize networking across the full TCP/IP stack . Investigate core dumps, memory bottlenecks, and CPU performance issues across distributed systems. Develop Python tooling for internal automation More ❯
Surrey, England, United Kingdom Hybrid/Remote Options
La Fosse
scale. This is a pivotal, visible role reporting directly to the CTO. The Opportunity You’ll shape the operational strategy and modernise how the platform is managed, driving reliability, observability, automation, and cost efficiency. You’ll manage the Head of DevOps and work closely with Product, Engineering and Finance to ensure the platform is secure, resilient, scalable and commercially efficient. … IT, Security and Platform Operations Reliability, performance and availability of a cloud-native SaaS platform (AWS/serverless) Cost-to-Serve ownership and cloud cost visibility/optimisation Maturing observability, incident management & operational governance Uplifting DevOps engineering practices and platform automation, use of AI Vendor & outsourced IT partner management Supporting a high-change organisation scaling for enterprise success What You More ❯
lead performance testing and chaos engineering initiatives, and embed reliability best practices across engineering, DevOps, and infrastructure teams. This is a senior, strategic leadership role focused on system excellence, observability, and continuous improvement. Ideal Candidate: Proven experience leading Performance Engineering, Reliability, or SRE functions Deep expertise in performance testing methodologies (load, stress, spike, soak) Strong hands-on background with LoadRunner … strategy across critical platforms and services Oversee load, stress, and chaos testing initiatives to ensure systems perform and recover under real-world conditions Define and drive best practices for observability, monitoring, and APM adoption using tools like Dynatrace Drive incident reduction, faster recovery (MTTR) , and continuous reliability improvements Champion a culture of performance ownership , ensuring teams build with scalability, stability More ❯
Employment Type: Full-Time
Salary: £84,000 - £95,000 per annum, Negotiable, Inc benefits
lead performance testing and chaos engineering initiatives, and embed reliability best practices across engineering, DevOps, and infrastructure teams. This is a senior, strategic leadership role focused on system excellence, observability, and continuous improvement. Ideal Candidate: Proven experience leading Performance Engineering, Reliability, or SRE functions Deep expertise in performance testing methodologies (load, stress, spike, soak) Strong hands-on background with LoadRunner … strategy across critical platforms and services Oversee load, stress, and chaos testing initiatives to ensure systems perform and recover under real-world conditions Define and drive best practices for observability, monitoring, and APM adoption using tools like Dynatrace Drive incident reduction, faster recovery (MTTR) , and continuous reliability improvements Champion a culture of performance ownership , ensuring teams build with scalability, stability More ❯
data sources using advanced web scraping and reverse-engineering techniques. Developing and maintaining low-latency, real-time data feeds to support internal systems and strategies. Improving internal visibility and observability tooling to help diagnose integration issues and identify improvements. Contributing across the full lifecycle of your work — design, development, testing, review, deployment, and ongoing support. Working within an agile, flexible … a rotational basis. Tech Stack Languages: Python (3.10+), plus TypeScript/JavaScript for frontend work, and occasional Go for infrastructure tasks. Messaging: RabbitMQ, Kafka Storage: PostgreSQL, Redis Environment: Linux Observability: OpenTelemetry, Prometheus, Grafana, Zabbix Requirements Must-haves Strong software development experience, especially with Python. A degree in Computer Science or another numerical discipline. Clear communication skills, able to explain technical More ❯
Day You'll Be: Design and build reliable backend systems and infrastructure tooling Use TDD to write high-quality, maintainable code and build out automated test suites Own reliability, observability, and performance of key services Collaborate with clients to understand requirements, debug issues, and propose solutions Drive improvements to system architecture, automation, and deployment processes Mentor junior developers and contribute … in writing and on calls Desirable Skills & Experience: Experience owning backend systems in production environments Experience with Cloud Platforms AWS or GCP Infrastructure-as-code, CI/CD, and observability tooling Experience scaling systems under sustained load Contributions to internal tooling or open source Experience with large datasets and machine learning models Impact You'll Make: What's In It More ❯
handling, JWK publishing, and SSO connection setup. Utilising Infrastructure as Code (Terraform) and CI/CD (GitHub Actions) to manage Auth0 configuration and ensure safe, repeatable deployments. Implementing comprehensive observability for authentication paths with structured logs, monitoring dashboards, alerts, and SLOs. Collaborating closely with product, engineering, and support teams on migration timelines, communications, and incident response. This role's for … and identity configurations, including secure secrets management. Solid understanding of core AWS services relevant to modern authentication patterns, such as API Gateway, Lambda authorisers, and CloudWatch. A commitment to observability, with hands-on experience implementing structured logging, dashboards, and SLOs for critical services. Excellent collaboration skills, demonstrated through participation in design reviews, pairing, and writing clear technical documentation (e.g., runbooks More ❯