privacy, and security, ensuring our AI systems are developed and used responsibly and ethically. Tooling the Future: Get hands-on with cutting-edge technologies like Hugging Face, PyTorch, TensorFlow, ApacheSpark, Apache Airflow, and other modern data and ML frameworks. Collaborate and Lead: Partner closely with ML Engineers, Data Scientists, and Researchers to understand their data needs … their data, compute, and storage services. Programming Prowess: Strong programming skills in Python and SQL are essential. Big Data Ecosystem Expertise: Hands-on experience with big data technologies like ApacheSpark, Kafka, and data orchestration tools such as Apache Airflow or Prefect. ML Data Acumen: Solid understanding of data requirements for machine learning models, including feature engineering More ❯
Manchester, Lancashire, England, United Kingdom Hybrid / WFH Options
Searchability
position, you'll develop and maintain a mix of real-time and batch ETL processes, ensuring accuracy, integrity, and scalability across vast datasets. You'll work with Python, SQL, ApacheSpark, and AWS services such as EMR, Athena, and Lambda to deliver robust, high-performance solutions.You'll also play a key role in optimising data pipeline architecture, supporting … Proven experience as a Data Engineer, with Python & SQL expertise Familiarity with AWS services (or equivalent cloud platforms) Experience with large-scale datasets and ETL pipeline development Knowledge of ApacheSpark (Scala or Python) beneficial Understanding of agile development practices, CI/CD, and automated testing Strong problem-solving and analytical skills Positive team player with excellent communication … required skills) your application to our client in conjunction with this vacancy only. KEY SKILLS:Data Engineer/Python/SQL/AWS/ETL/Data Pipelines/ApacheSpark/EMR/Athena/Lambda/Big Data/Manchester/Hybrid Working More ❯
extract data from diverse sources, transform it into usable formats, and load it into data warehouses, data lakes or lakehouses. Big Data Technologies: Utilize big data technologies such as Spark, Kafka, and Flink for distributed data processing and analytics. Cloud Platforms: Deploy and manage data solutions on cloud platforms such as AWS, Azure, or Google Cloud Platform (GCP), leveraging … SQL for data manipulation and scripting. Strong understanding of data modelling concepts and techniques, including relational and dimensional modelling. Experience in big data technologies and frameworks such as Databricks, Spark, Kafka, and Flink. Experience in using modern data architectures, such as lakehouse. Experience with CI/CD pipelines, version control systems like Git, and containerization (e.g., Docker). Experience … with ETL tools and technologies such as Apache Airflow, Informatica, or Talend. Strong understanding of data governance and best practices in data management. Experience with cloud platforms and services such as AWS, Azure, or GCP for deploying and managing data solutions. Strong problem-solving and analytical skills with the ability to diagnose and resolve complex data-related issues. SQL More ❯
extract data from diverse sources, transform it into usable formats, and load it into data warehouses, data lakes or lakehouses. Big Data Technologies: Utilize big data technologies such as Spark, Kafka, and Flink for distributed data processing and analytics. Cloud Platforms: Deploy and manage data solutions on cloud platforms such as AWS, Azure, or Google Cloud Platform (GCP), leveraging … SQL for data manipulation and scripting. Strong understanding of data modelling concepts and techniques, including relational and dimensional modelling. Experience in big data technologies and frameworks such as Databricks, Spark, Kafka, and Flink. Experience in using modern data architectures, such as lakehouse. Experience with CI/CD pipelines and version control systems like Git. Knowledge of ETL tools and … technologies such as Apache Airflow, Informatica, or Talend. Knowledge of data governance and best practices in data management. Familiarity with cloud platforms and services such as AWS, Azure, or GCP for deploying and managing data solutions. Strong problem-solving and analytical skills with the ability to diagnose and resolve complex data-related issues. SQL (for database management and querying More ❯
data-based insights, collaborating closely with stakeholders. Passionately discover hidden solutions in large datasets to enhance business outcomes. Design, develop, and maintain data processing pipelines using Cloudera technologies, including Apache Hadoop, ApacheSpark, Apache Hive, and Python. Collaborate with data engineers and scientists to translate data requirements into technical specifications. Develop and maintain frameworks for efficient More ❯
in either Python or Scala Working knowledge of two or more common Cloud ecosystems (AWS, Azure, GCP) with expertise in at least one Deep experience with distributed computing with ApacheSpark and knowledge of Spark runtime internals Familiarity with CI/CD for production deployments Working knowledge of MLOps Design and deployment of performant end-to-end … Platform to unify and democratize data, analytics and AI. Databricks is headquartered in San Francisco, with offices around the globe and was founded by the original creators of Lakehouse, ApacheSpark, Delta Lake and MLflow. To learn more, follow Databricks on Twitter ,LinkedIn and Facebook . Benefits At Databricks, we strive to provide comprehensive benefits and perks that More ❯
services such as S3, Glue, Lambda, Redshift, EMR, Kinesis, and more-covering data pipelines, warehousing, and lakehouse architectures. Drive the migration of legacy data workflows to Lakehouse architectures, leveraging Apache Iceberg to enable unified analytics and scalable data management. Operate as a subject matter expert across multiple data projects, providing strategic guidance on best practices in design, development, and … in designing and implementing scalable data engineering solutions. Bring extensive experience in software architecture and solution design, ensuring robust and future-proof systems. Hold specialised proficiency in Python and ApacheSpark, enabling efficient processing of large-scale data workloads. Demonstrate the ability to set technical direction, uphold high standards for code quality, and optimise performance in data-intensive … of continuous learning and innovation. Extensive background in software architecture and solution design, with deep expertise in microservices, distributed systems, and cloud-native architectures. Advanced proficiency in Python and ApacheSpark, with a strong focus on ETL data processing and scalable data engineering workflows. In-depth technical knowledge of AWS data services, with hands-on experience implementing data More ❯
and/or demonstrated competence in OLTP systems along with one of Azure, AWS or GCP cloud providers Demonstrated competence in the Lakehouse architecture including hands-on experience with ApacheSpark, Python and SQL Excellent communication skills; both written and verbal Experience in pre-sales selling highly desired About Databricks Databricks is the data and AI company. More … Platform to unify and democratize data, analytics and AI. Databricks is headquartered in San Francisco, with offices around the globe and was founded by the original creators of Lakehouse, ApacheSpark, Delta Lake and MLflow. To learn more, follow Databricks on Twitter ,LinkedIn and Facebook . Benefits At Databricks, we strive to provide comprehensive benefits and perks that More ❯
and streaming data pipelines Azure Purview or equivalent for data governance and lineage tracking Experience with data integration, MDM, governance, and data quality tools . Hands-on experience with ApacheSpark, Python, SQL, and Scala for data processing. Strong understanding of Azure networking, security, and IAM , including Azure Private Link, VNETs, Managed Identities, and RBAC . Deep knowledge … for scalable data lakes Azure Purview or equivalent for data governance and lineage tracking Experience with data integration, MDM, governance, and data quality tools . Hands-on experience with ApacheSpark, Python, SQL, and Scala for data processing. Strong understanding of Azure networking, security, and IAM , including Azure Private Link, VNETs, Managed Identities, and RBAC . Deep knowledge More ❯
focused data team responsible for building and optimising scalable, production-grade data pipelines and infrastructure. Key Responsibilities: Design and implement robust, scalable ETL/ELT pipelines using Databricks and ApacheSpark Ingest, transform, and manage large volumes of data from diverse sources Collaborate with analysts, data scientists, and business stakeholders to deliver clean, accessible datasets Ensure high performance … practices Work with cloud-native tools and services (preferably Azure ) Required Skills & Experience: Proven experience as a Data Engineer on cloud-based projects Strong hands-on skills with Databricks , ApacheSpark , and Python or Scala Proficient in SQL and working with large-scale data environments Experience with Delta Lake , Azure Data Lake , or similar technologies Familiarity with version More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Randstad Technologies
scalable data pipelines, specifically using the Hadoop ecosystem and related tools. The role will focus on designing, building and maintaining scalable data pipelines using big data hadoop ecosystems and apachespark for large datasets. A key responsibility is to analyse infrastructure logs and operational data to derive insights, demonstrating a strong understanding of both data processing and the … underlying systems. The successful candidate should have the following key skills Experience with Open Data Platform Hands on experience with Python for Scripting ApacheSpark Prior experience of building ETL pipelines Data Modelling 6 Months Contract - Remote Working - £300 to £350 a day Inside IR35 If you are an experienced Hadoop engineer looking for a new role then More ❯
company covering the entire data transformation from architecture to implementation. Beyond delivering solutions, we also provide data & AI training and enablement. We are backed by Databricks - the creators of ApacheSpark, and act as a delivery partner and training provider for them in Europe. Additionally, we are Microsoft Gold Partners in delivering cloud migration and data architecture on … company covering the entire data transformation from architecture to implementation. Beyond delivering solutions, we also provide data & AI training and enablement. We are backed by Databricks - the creators of ApacheSpark, and act as a delivery partner and training provider for them in Europe. Additionally, we are Microsoft Gold Partners in delivering cloud migration and data architecture on More ❯
issues for non-technical audiences. Self-motivated and able to work independently. Preferred Qualifications: Background in investment banking or financial services. Hands-on experience with Hive, Impala, and the Spark ecosystem (e.g., HDFS, ApacheSpark, Spark-SQL, UDFs, Sqoop). Proven experience building and optimizing big data pipelines, architectures, and data sets. More ❯
Belfast, County Antrim, Northern Ireland, United Kingdom
Hays
Criteria 5+ years of experience in Technical Data Analysis. Proficiency in SQL, Python, and Spark. Experience within an investment banking or financial services environment. Exposure to Hive, Impala, and Spark ecosystem technologies (e.g. HDFS, ApacheSpark, Spark-SQL, UDF, Sqoop). Experience building and optimizing Big Data pipelines, architectures, and data sets. Familiarity with Hadoop and More ❯
Role Title: Infrastructure/Platform Engineer - Apache Duration: 9 Months Location: Remote Rate: £ - Umbrella only Would you like to join a global leader in consulting, technology services and digital transformation? Our client is at the forefront of innovation to address the entire breadth of opportunities in the evolving world of cloud, digital and platforms. Role purpose/summary ? Refactor … prototype Spark jobs into production-quality components, ensuring scalability, test coverage, and integration readiness. ? Package Spark workloads for deployment via Docker/Kubernetes and integrate with orchestration systems (e.g., Airflow, custom schedulers). ? Work with platform engineers to embed Spark jobs into InfoSum's platform APIs and data pipelines. ? Troubleshoot job failures, memory and resource issues, and … execution anomalies across various runtime environments. ? Optimize Spark job performance and advise on best practices to reduce cloud compute and storage costs. ? Guide engineering teams on choosing the right execution strategies across AWS, GCP, and Azure. ? Provide subject matter expertise on using AWS Glue for ETL workloads and integration with S3 and other AWS-native services. ? Implement observability tooling More ❯
Role Title: Infrastructure/Platform Engineer - Apache Duration: 9 Months Location: Remote Rate: £ - Umbrella only Would you like to join a global leader in consulting, technology services and digital transformation? Our client is at the forefront of innovation to address the entire breadth of opportunities in the evolving world of cloud, digital and platforms. Role purpose/summary ? Refactor … prototype Spark jobs into production-quality components, ensuring scalability, test coverage, and integration readiness. ? Package Spark workloads for deployment via Docker/Kubernetes and integrate with orchestration systems (e.g., Airflow, custom schedulers). ? Work with platform engineers to embed Spark jobs into InfoSum's platform APIs and data pipelines. ? Troubleshoot job failures, memory and resource issues, and … execution anomalies across various runtime environments. ? Optimize Spark job performance and advise on best practices to reduce cloud compute and storage costs. ? Guide engineering teams on choosing the right execution strategies across AWS, GCP, and Azure. ? Provide subject matter expertise on using AWS Glue for ETL workloads and integration with S3 and other AWS-native services. ? Implement observability tooling More ❯
Synechron is looking for a skilled Machine Learning Developer with expertise in Spark ML to work with a leading financial organisation on a global programme of work. The role involves predictive modeling, and deploying training and inference pipelines on distributed systems such as Hadoop. The ideal candidate will design, implement, and optimise machine learning solutions for large-scale data … processing and predictive analytics. Role: Develop and implement machine learning models using Spark ML for predictive analytics Design and optimise training and inference pipelines for distributed systems (e.g., Hadoop) Process and analyse large-scale datasets to extract meaningful insights and features Collaborate with data engineers to ensure seamless integration of ML workflows with data pipelines Evaluate model performance and … time and batch inference Monitor and troubleshoot deployed models to ensure reliability and performance Stay updated with advancements in machine learning frameworks and distributed computing technologies Experience: Proficiency in ApacheSpark and Spark MLlib for machine learning tasks Strong understanding of predictive modeling techniques (e.g., regression, classification, clustering) Experience with distributed systems like Hadoop for data storage More ❯
Skills: Proven expertise in designing, building, and operating data pipelines, warehouses, and scalable data architectures. Deep hands-on experience with modern data stacks. Our tech includes Python, SQL, Snowflake, Apache Iceberg, AWS S3, PostgresDB, Airflow, dbt, and ApacheSpark, deployed via AWS, Docker, and Terraform. Experience with similar technologies is essential. Coaching & Growth Mindset: Passion for developing More ❯
Skills: Proven expertise in designing, building, and operating data pipelines, warehouses, and scalable data architectures. Deep hands-on experience with modern data stacks. Our tech includes Python, SQL, Snowflake, Apache Iceberg, AWS S3, PostgresDB, Airflow, dbt, and ApacheSpark, deployed via AWS, Docker, and Terraform. Experience with similar technologies is essential. Coaching & Growth Mindset: Passion for developing More ❯
working and understanding the tradeoffs of at least one of the following Data Lake table/file formats: Delta Lake, Parquet, Iceberg, Hudi Previous h ands-on expertise with Spark Experience working with containerisation technologies - Docker, Kubernetes Streaming Knowledge: Experience with Kafka/Flink or other streaming ecosystems, with a solid understanding of their components DevOps experience building CI More ❯
end tech specs and modular architectures for ML frameworks in complex problem spaces in collaboration with product teams Experience with large scale, distributed data processing frameworks/tools like Apache Beam, ApacheSpark, and cloud platforms like GCP or AWS Experience with technologies such as Kubernetes, Ray is a plus Experience troubleshooting model training and deployment across More ❯
platform components. Big Data Architecture: Build and maintain big data architectures and data pipelines to efficiently process large volumes of geospatial and sensor data. Leverage technologies such as Hadoop, ApacheSpark, and Kafka to ensure scalability, fault tolerance, and speed. Geospatial Data Integration: Develop systems that integrate geospatial data from a variety of sources (e.g., satellite imagery, remote … driven applications. Familiarity with geospatial data formats (e.g., GeoJSON, Shapefiles, KML) and tools (e.g., PostGIS, GDAL, GeoServer). Technical Skills: Expertise in big data frameworks and technologies (e.g., Hadoop, Spark, Kafka, Flink) for processing large datasets. Proficiency in programming languages such as Python, Java, or Scala, with a focus on big data frameworks and APIs. Experience with cloud services … or related field. Experience with data visualization tools and libraries (e.g., Tableau, D3.js, Mapbox, Leaflet) for displaying geospatial insights and analytics. Familiarity with real-time stream processing frameworks (e.g., Apache Flink, Kafka Streams). Experience with geospatial data processing libraries (e.g., GDAL, Shapely, Fiona). Background in defense, national security, or environmental monitoring applications is a plus. Compensation and More ❯
two of the following: Python, SQL, Java Commercial experience in client-facing projects is a plus, especially within multi-disciplinary teams Deep knowledge of database technologies: Distributed systems (e.g., Spark, Hadoop, EMR) RDBMS (e.g., SQL Server, Oracle, PostgreSQL, MySQL) NoSQL (e.g., MongoDB, Cassandra, DynamoDB, Neo4j) Solid understanding of software engineering best practices - code reviews, testing frameworks, CI/CD More ❯
West London, London, United Kingdom Hybrid / WFH Options
Young's Employment Services Ltd
a Senior Data Engineer, Tech Lead, Data Engineering Manager etc. Proven success with modern data infrastructure: distributed systems, batch and streaming pipelines Hands-on knowledge of tools such as ApacheSpark, Kafka, Databricks, DBT or similar Experience building, defining, and owning data models, data lakes, and data warehouses Programming proficiency in Python, Pyspark, Scala or Java. Experience operating More ❯
West London, London, United Kingdom Hybrid / WFH Options
Young's Employment Services Ltd
a Senior Data Engineer, Tech Lead, Data Engineering Manager etc. Proven success with modern data infrastructure: distributed systems, batch and streaming pipelines Hands-on knowledge of tools such as ApacheSpark, Kafka, Databricks, DBT or similar Experience building, defining, and owning data models, data lakes, and data warehouses Programming proficiency in Python, Pyspark, Scala or Java. Experience operating More ❯