Azure Databricks Engineer
This is a rare opportunity to apply serious data engineering in a domain where latency, correctness, and reliability carry direct commercial weight.
* Requirements
* 6+ years data engineering in production environments; Python expertise - idiomatic, well-tested, production-grade code, not notebook scripts
* ETL/ELT pipeline design and implementation at scale; orchestration with Airflow, Prefect, or equivalent; reliability-first mindset including backfill, retry, and exactly-once semantics
* Azure data platform - Azure Data Factory, Azure Databricks, Azure Synapse Analytics, Azure Data Lake Storage; infrastructure as code for data workloads (Terraform or Bicep)
* Databricks - Delta Lake, Unity Catalog, job cluster vs interactive cluster trade-offs, cost-aware compute management, Spark job optimisation
* Relational databases: PostgreSQL at production scale - query optimisation, indexing strategies, table partitioning, replication, schema design for both OLTP and analytical workloads
* MongoDB - document modelling, aggregation pipelines, indexing strategy, replica sets; clear judgment on when document vs relational storage is the right architectural call
* Containerisation: Docker and Kubernetes-based deployment of data workloads; reproducible, environment-agnostic data infrastructure
* Data modelling for analytical workloads - dimensional modelling, data vault, or equivalent; schema evolution, slowly changing dimensions, and downstream impact analysis
* Stream and batch processing patterns; late data handling, watermarking, and backfill strategies; throughput vs latency trade-offs in pipeline design
* Production data observability - data lineage, quality checks, SLA monitoring, alerting on freshness and completeness; treating data correctness as a first-class concern
* CI/CD for data infrastructure - version-controlled pipelines, automated data quality testing, reproducible and auditable deploys
* Ability to work directly with quant researchers, risk managers, and traders - translate business requirements into reliable, well-documented data products
* Nice to Have
* Financial markets data - market data feeds (Bloomberg, Refinitiv), tick data, trade history, reference data, or instrument master management
* Apache Spark or Flink for large-scale stream and batch processing beyond the Databricks ecosystem
* dbt or equivalent SQL transformation layer; experience building and maintaining dbt projects in a production data warehouse
* Event streaming with Kafka or Confluent Platform - topic design, consumer group management, exactly-once delivery guarantees
* OLAP-optimised stores - ClickHouse, DuckDB, or equivalent; understanding of columnar storage and vectorised query execution
* Energy, commodities, or broader financial markets domain knowledge
* What We're Looking For
* You treat data as a product, not a side effect. You know what it takes to make a pipeline trustworthy - not just running, but observable, tested, and recoverable when something upstream changes at 3am. You think in systems: schema evolution, lineage, freshness SLAs, and the downstream impact of every modelling decision. At ETrading , that data is the foundation of billion-dollar trading decisions. You are the reason it is right.
To find out more about Huxley, please visit (url removed)
Huxley, a trading division of SThree Partnership LLP is acting as an Employment Business in relation to this vacancy | Registered office | 8 Bishopsgate, London, EC2N 4BQ, United Kingdom | Partnership Number | OC(phone number removed) England and Wales