data graphing tool traversal capabilities built upon Apache Gremlin. (Preferred) • Experience building and operating high performance data processing pipelines using Lambda, Step Functions and PySpark on the infrastructure with EMR. (Preferred) • Experience working with enterprise services used for Data Management, including the enterprise catalog service (and associated APIs), and More ❯
frameworks and data governance practices, with an emphasis on scalability and compliance in research environments. Enterprise exposure to data engineering tools and products (Spark, PySpark, BigQuery, Pub/Sub) with an understanding of product/market fit for internal stakeholders Familiarity with cloud computing environments, including but not limited More ❯
ADF, Synapse, SQL, ADB , etc.. Should be strong in Databricks notebooks development for data ingestion, validation, transformation and metric build. Should be strong in PySpark and SQL. Should be strong in ADF pipeline development, data orchestration techniques, monitoring and troubleshooting Should be strong in stored procedure development. Good knowledge More ❯
GitHub. Understanding of how to build and run containerized applications (Docker, Helm) Familiarity with, or a working understanding of big data search tools (Airflow, Pyspark, Trino, OpenSearch, Elastic, etc. More ❯
requirements and mission outcomes. Requirements Minimum Qualifications: - 3-5 years of professional experience in Python development. - Strong experience with data processing libraries such as PySpark, Pandas, and NumPy. - Proficiency in API development using Python libraries such as FastAPI. - Hands-on experience with unit testing frameworks including PyTest and mocking More ❯
East London, London, United Kingdom Hybrid / WFH Options
McGregor Boyall Associates Limited
Science, Data Science, Mathematics, or related field. 5+ years of experience in ML modeling, ranking, or recommendation systems . Proficiency in Python, SQL, Spark, PySpark, TensorFlow . Strong knowledge of LLM algorithms and training techniques . Experience deploying models in production environments. Nice to Have: Experience in GenAI/ More ❯
systems. Strong expertise in ML/DL/LLM algorithms, model architectures, and training techniques. Proficiency in programming languages such as Python, SQL, Spark, PySpark, TensorFlow, or equivalent analytical/model-building tools. Familiarity with tools and technologies related to LLMs. Ability to work independently while also thriving in More ❯
of both data pipelines and complex ontology development Shall have demonstrable extensive knowledge of distributed computing frameworks (e.g., Spark) and Foundry programming languages (Python, PySpark, SQL, TypeScript) Foundry Data Engineer certification required Minimum Clearance Required to Start: Top Secret SCI This position is part of our Federal Solutions team. More ❯
queries for huge datasets. Has a solid understanding of blockchain ecosystem elements like DeFi, Exchanges, Wallets, Smart Contracts, mixers and privacy services. Databricks and PySpark Analysing blockchain data Building and maintaining data pipelines Deploying machine learning models Use of graph analytics and graph neural networks If this sounds like More ❯
queries for huge datasets. Has a solid understanding of blockchain ecosystem elements like DeFi, Exchanges, Wallets, Smart Contracts, mixers and privacy services. Databricks and PySpark Analysing blockchain data Building and maintaining data pipelines Deploying machine learning models Use of graph analytics and graph neural networks If this sounds like More ❯
Catalogue etc.). Prove experience designing high volume, live data streaming solutions using Azure DLT (Delta Live Tables). Expert with Apache Spark and PySpark (ability to review quality of code and debug issues). Experience with Qlik Replicate to move data from on-prem to the cloud. Background More ❯
essential skills: Typical Data Engineering Experience required (3+ yrs): Strong knowledge and experience: Azure Data Factory and Synapse data solution provision Power BI PythonPySpark (Preference will be given to those who hold relevant certifications) Proficient in SQL. Knowledge of Terraform Ability to develop and deliver complex visualisation, reporting More ❯
in programming languages and data structures such as SAS, Python, R, SQL is key. With Python background, particularly familiarity with pandas/polars/pyspark, pytest; understanding of OOP principles; git version control; knowledge of the following frameworks a plus: pydantic, pandera, sphinx Additionally, experience in any or all More ❯
london, south east england, united kingdom Hybrid / WFH Options
Carnegie Consulting Limited
in programming languages and data structures such as SAS, Python, R, SQL is key. With Python background, particularly familiarity with pandas/polars/pyspark, pytest; understanding of OOP principles; git version control; knowledge of the following frameworks a plus: pydantic, pandera, sphinx Additionally, experience in any or all More ❯
Cardiff, South Glamorgan, Wales, United Kingdom Hybrid / WFH Options
Yolk Recruitment
Azure-based platforms (Synapse, Data Factory, Databricks) Familiarity with data regulations (GDPR, FCA) and SMCR environments Bonus points for experience with Python, R, or PySpark Why You Should Apply: Executive-level visibility and the chance to lead a high-impact transformation Full budget ownership with freedom to shape systems More ❯
suit an experienced Director, Senior Manager or Lead of Data Engineering. Technical Expertise: A background of strong working knowledge of technologies including Python, SQL, PySpark and SAS. Education: Bachelor’s degree in business administration, IT, Data Science, or related field (master’s preferred). Further to this, any professional More ❯
requirements. Preferred Skills and Experience Databricks Azure Data Factory Data Lakehouse Medallion architecture Microsoft Azure T-SQL Development (MS SQL Server 2005 onwards) Python, PySpark Experience of the following systems would also be advantageous: Azure DevOps MDS Kimball Dimensional Modelling Methodology Power Bi Unity Catalogue Microsoft Fabric Experience of More ❯
Experience in CI/CD and automation. Knowledge of Identity and Access Management (IAM), networking, and security. Affinity with programming languages such as Python, PySpark, SQL, and Bash. Experience with containerized solutions such as Docker and Kubernetes. You are currently living in the Netherlands and have a valid work More ❯
on platforms such as AWS, GCP, and Azure. Extensive hands-on experience with cloud-based AI/ML solutions and programming languages (e.g., Python, PySpark), data modelling, and microservices. Proficient in LLM orchestration on platforms such as OpenAI on Azure, AWS Bedrock, GCP Vertex AI, or Gemini AI. Serve More ❯
AI Platforms: Google Cloud Platform, Amazon Web Services, Microsoft Azure, Databricks. Experience in one or more of the listed Languages or Packages: Python, R, Pyspark, Scala, PowerBI, Tableau. WHAT YOU'LL LOVE ABOUT WORKING HERE? Data Science Consulting brings an inventive quantitative approach to our clients' biggest business and More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
Capgemini
AI Platforms: Google Cloud Platform, Amazon Web Services, Microsoft Azure, Databricks. Experience in one or more of the listed Languages or Packages: Python, R, Pyspark, Scala, PowerBI, Tableau. Proven experience in successfully delivering multiple complex data rich workstreams in parallel to supporting wider strategic ambitions and supporting others in More ❯
Newbury, Berkshire, United Kingdom Hybrid / WFH Options
Intuita - Vacancies
effectiveness, including Azure DevOps. Considerable experience designing and building operationally efficient pipelines, utilising core Azure components, such as Azure Data Factory, Azure Databricks and Pyspark etc. Proven experience in modelling data through a medallion-based architecture, with curated dimensional models in the gold layer built for analytical use. Strong More ❯
and access controls. Monitor and optimize performance of data workflows using CloudWatch, AWS Step Functions, and performance tuning techniques. Automate data processes using Python, PySpark, SQL, or AWS SDKs. Collaborate with cross-functional teams to support AI/ML, analytics, and business intelligence initiatives. Maintain and enhance CI/… a cloud environment. Required Skills & Qualifications: 5+ years of experience in data engineering with a strong focus on AWS cloud technologies. Proficiency in Python, PySpark, SQL, and AWS Glue for ETL development. Hands-on experience with AWS data services, including Redshift, Athena, Glue, EMR, and Kinesis. Strong knowledge of More ❯
Future Talent Pool - GCP Data Engineer, London, hybrid role - digital Google Cloud transformation programme Proficiency in programming languages such as Python, PySpark and Java develop ETL processes for Data ingestion & preparation SparkSQL CloudRun, DataFlow, CloudStorage GCP BigQuery Google Cloud Platform Data Studio Unix/Linux Platform Version control tools More ❯
with proficiency in designing and implementing CI/CD pipelines in Cloud environments. Excellent practical expertise in Performance tuning and system optimisation. Experience with PySpark and Azure Databricks for distributed data processing and large-scale data analysis. Proven experience with web frameworks , including knowledge of Django and experience with More ❯