Expertise in data warehousing, data modelling, and data integration. Experience in MLOps and machine learning pipelines. Proficiency in SQL and data manipulation languages. Experience with big data platforms (including Apache Arrow, ApacheSpark, Apache Iceberg, and Clickhouse) and cloud-based infrastructure on AWS. Education & Qualifications Bachelors or Masters degree in Computer Science, Engineering, or a related More ❯
extract data from diverse sources, transform it into usable formats, and load it into data warehouses, data lakes or lakehouses. Big Data Technologies: Utilize big data technologies such as Spark, Kafka, and Flink for distributed data processing and analytics. Cloud Platforms: Deploy and manage data solutions on cloud platforms such as AWS, Azure, or Google Cloud Platform (GCP), leveraging … SQL for data manipulation and scripting. Strong understanding of data modelling concepts and techniques, including relational and dimensional modelling. Experience in big data technologies and frameworks such as Databricks, Spark, Kafka, and Flink. Experience in using modern data architectures, such as lakehouse. Experience with CI/CD pipelines and version control systems like Git. Knowledge of ETL tools and … technologies such as Apache Airflow, Informatica, or Talend. Knowledge of data governance and best practices in data management. Familiarity with cloud platforms and services such as AWS, Azure, or GCP for deploying and managing data solutions. Strong problem-solving and analytical skills with the ability to diagnose and resolve complex data-related issues. SQL (for database management and querying More ❯
in either Python or Scala Working knowledge of two or more common Cloud ecosystems (AWS, Azure, GCP) with expertise in at least one Deep experience with distributed computing with ApacheSpark and knowledge of Spark runtime internals Familiarity with CI/CD for production deployments Working knowledge of MLOps Design and deployment of performant end-to-end … Platform to unify and democratize data, analytics and AI. Databricks is headquartered in San Francisco, with offices around the globe and was founded by the original creators of Lakehouse, ApacheSpark, Delta Lake and MLflow. To learn more, follow Databricks on Twitter ,LinkedIn and Facebook . Benefits At Databricks, we strive to provide comprehensive benefits and perks that More ❯
Azure, AWS, GCP) Hands-on experience with SQL, Data Pipelines, Data Orchestration and Integration Tools Experience in data platforms on premises/cloud using technologies such as: Hadoop, Kafka, ApacheSpark, Apache Flink, object, relational and NoSQL data stores. Hands-on experience with big data application development and cloud data warehousing (e.g. Hadoop, Spark, Redshift, Snowflake More ❯
technologies Azure, AWS, GCP, Snowflake, Databricks Must Have Hands on experience on at least 2 Hyperscalers (GCP/AWS/Azure platforms) and specifically in Big Data processing services (ApacheSpark, Beam or equivalent). In-depth knowledge on key technologies like Big Query/Redshift/Synapse/Pub Sub/Kinesis/MQ/Event Hubs … skills. A minimum of 5 years experience in a similar role. Ability to lead and mentor the architects. Mandatory Skills [at least 2 Hyperscalers] GCP, AWS, Azure, Big data, Apachespark, beam on BigQuery/Redshift/Synapse, Pub Sub/Kinesis/MQ/Event Hubs, Kafka Dataflow/Airflow/ADF Designing Databricks based solutions for More ❯
/or teaching technical concepts to non-technical and technical audiences alike Passion for collaboration, life-long learning, and driving business value through ML [Preferred] Experience working with Databricks & ApacheSpark to process large-scale distributed datasets About Databricks Databricks is the data and AI company. More than 10,000 organizations worldwide including Comcast, Cond Nast, Grammarly, and … Platform to unify and democratize data, analytics and AI. Databricks is headquartered in San Francisco, with offices around the globe and was founded by the original creators of Lakehouse, ApacheSpark, Delta Lake and MLflow. To learn more, follow Databricks on Twitter ,LinkedIn and Facebook . Benefits At Databricks, we strive to provide comprehensive benefits and perks that More ❯
/or teaching technical concepts to non-technical and technical audiences alike Passion for collaboration, life-long learning, and driving business value through ML Preferred Experience working with Databricks & ApacheSpark to process large-scale distributed datasets As a client-facing role, travel may be necessary to support meetings and engagements. About Databricks Databricks is the data and … Platform to unify and democratize data, analytics and AI. Databricks is headquartered in San Francisco, with offices around the globe and was founded by the original creators of Lakehouse, ApacheSpark, Delta Lake and MLflow. To learn more, follow Databricks on Twitter ,LinkedIn and Facebook . Benefits At Databricks, we strive to provide comprehensive benefits and perks that More ❯
company covering the entire data transformation from architecture to implementation. Beyond delivering solutions, we also provide data & AI training and enablement. We are backed by Databricks - the creators of ApacheSpark, and act as a delivery partner and training provider for them in Europe. Additionally, we are Microsoft Gold Partners in delivering cloud migration and data architecture on … company covering the entire data transformation from architecture to implementation. Beyond delivering solutions, we also provide data & AI training and enablement. We are backed by Databricks - the creators of ApacheSpark, and act as a delivery partner and training provider for them in Europe. Additionally, we are Microsoft Gold Partners in delivering cloud migration and data architecture on More ❯
Synechron is looking for a skilled Machine Learning Developer with expertise in Spark ML to work with a leading financial organisation on a global programme of work. The role involves predictive modeling, and deploying training and inference pipelines on distributed systems such as Hadoop. The ideal candidate will design, implement, and optimise machine learning solutions for large-scale data … processing and predictive analytics. Role: Develop and implement machine learning models using Spark ML for predictive analytics Design and optimise training and inference pipelines for distributed systems (e.g., Hadoop) Process and analyse large-scale datasets to extract meaningful insights and features Collaborate with data engineers to ensure seamless integration of ML workflows with data pipelines Evaluate model performance and … time and batch inference Monitor and troubleshoot deployed models to ensure reliability and performance Stay updated with advancements in machine learning frameworks and distributed computing technologies Experience: Proficiency in ApacheSpark and Spark MLlib for machine learning tasks Strong understanding of predictive modeling techniques (e.g., regression, classification, clustering) Experience with distributed systems like Hadoop for data storage More ❯
Skills: Proven expertise in designing, building, and operating data pipelines, warehouses, and scalable data architectures. Deep hands-on experience with modern data stacks. Our tech includes Python, SQL, Snowflake, Apache Iceberg, AWS S3, PostgresDB, Airflow, dbt, and ApacheSpark, deployed via AWS, Docker, and Terraform. Experience with similar technologies is essential. Coaching & Growth Mindset: Passion for developing More ❯
platform components. Big Data Architecture: Build and maintain big data architectures and data pipelines to efficiently process large volumes of geospatial and sensor data. Leverage technologies such as Hadoop, ApacheSpark, and Kafka to ensure scalability, fault tolerance, and speed. Geospatial Data Integration: Develop systems that integrate geospatial data from a variety of sources (e.g., satellite imagery, remote … driven applications. Familiarity with geospatial data formats (e.g., GeoJSON, Shapefiles, KML) and tools (e.g., PostGIS, GDAL, GeoServer). Technical Skills: Expertise in big data frameworks and technologies (e.g., Hadoop, Spark, Kafka, Flink) for processing large datasets. Proficiency in programming languages such as Python, Java, or Scala, with a focus on big data frameworks and APIs. Experience with cloud services … or related field. Experience with data visualization tools and libraries (e.g., Tableau, D3.js, Mapbox, Leaflet) for displaying geospatial insights and analytics. Familiarity with real-time stream processing frameworks (e.g., Apache Flink, Kafka Streams). Experience with geospatial data processing libraries (e.g., GDAL, Shapely, Fiona). Background in defense, national security, or environmental monitoring applications is a plus. Compensation and More ❯
two of the following: Python, SQL, Java Commercial experience in client-facing projects is a plus, especially within multi-disciplinary teams Deep knowledge of database technologies: Distributed systems (e.g., Spark, Hadoop, EMR) RDBMS (e.g., SQL Server, Oracle, PostgreSQL, MySQL) NoSQL (e.g., MongoDB, Cassandra, DynamoDB, Neo4j) Solid understanding of software engineering best practices - code reviews, testing frameworks, CI/CD More ❯
and managing machine learning models and infrastructure. Data Management Knowledge: Understanding of data management principles, including experience with databases (SQL and NoSQL) and familiarity with big data frameworks like ApacheSpark or Hadoop. Knowledge of data ingestion, storage, and management is essential. Monitoring and Logging Tools : Experience with monitoring and logging tools to track system performance and model More ❯
Cleared: Required Essential Skills & Experience: 10+ years of experience in data engineering, with at least 3+ years of hands-on experience with Azure Databricks. Strong proficiency in Python and Spark (PySpark) or Scala. Deep understanding of data warehousing principles, data modelling techniques, and data integration patterns. Extensive experience with Azure data services, including Azure Data Factory, Azure Blob Storage More ❯
SageMaker, GCP AI Platform, Azure ML, or equivalent). Solid understanding of data-engineering concepts: SQL/noSQL, data pipelines (Airflow, Prefect, or similar), and batch/streaming frameworks (Spark, Kafka). Leadership & Communication: Proven ability to lead cross-functional teams in ambiguous startup settings. Exceptional written and verbal communication skillsable to explain complex concepts to both technical and More ❯
of Relational Databases and Data Warehousing concepts. Experience of Enterprise ETL tools such as Informatica, Talend, Datastage or Alteryx. Project experience using the any of the following technologies: Hadoop, Spark, Scala, Oracle, Pega, Salesforce. Cross and multi-platform experience. Team building and leading. You must be: Willing to work on client sites, potentially for extended periods. Willing to travel More ❯
Maths or similar Science or Engineering discipline Strong Python and other programming skills (Java and/or Scala desirable) Strong SQL background Some exposure to big data technologies (Hadoop, spark, presto, etc.) NICE TO HAVES OR EXCITED TO LEARN: Some experience designing, building and maintaining SQL databases (and/or NoSQL) Some experience with designing efficient physical data models More ❯
analytics, content management systems (CMS), subscription platforms, ad tech, and social media. Proven ability to automate and optimise data workflows, using modern ETL/ELT tools (e.g., Airflow, dbt, ApacheSpark) to ensure timely and reliable delivery of data. Experience building robust data models and reporting layers to support performance dashboards, user engagement analytics, ad revenue tracking, and … with 5+ years of hands-on experience in a data engineering role. Tools & Technologies: Databases: Proficient in relational SQL databases. Workflow Management Tools: Experience with orchestration platforms such as Apache Airflow. Programming Languages: Skilled in one or more of the following languages, i.e.: Python, Java, Scala. Cloud Infrastructure: Strong understanding of cloud infrastructure such as GCP and tools within More ❯
analytics, content management systems (CMS), subscription platforms, ad tech, and social media. Proven ability to automate and optimise data workflows, using modern ETL/ELT tools (e.g., Airflow, dbt, ApacheSpark) to ensure timely and reliable delivery of data. Experience building robust data models and reporting layers to support performance dashboards, user engagement analytics, ad revenue tracking, and … with 5+ years of hands-on experience in a data engineering role. Tools & Technologies: Databases: Proficient in relational SQL databases. Workflow Management Tools: Experience with orchestration platforms such as Apache Airflow. Programming Languages: Skilled in one or more of the following languages, i.e.: Python, Java, Scala. Cloud Infrastructure: Strong understanding of cloud infrastructure such as GCP and tools within More ❯
web analytics, content management systems (CMS), subscription platforms, ad tech, and social media. Ability to automate and optimise data workflows, using modern ETL/ELT tools (e.g., Airflow, dbt, ApacheSpark) to ensure timely and reliable delivery of data. Experience building robust data models and reporting layers to support performance dashboards, user engagement analytics, ad revenue tracking, and … with 2+ years of hands-on experience in a data engineering role. Tools & Technologies: Databases: Proficient in relational SQL databases. Workflow Management Tools: Exposure to orchestration platforms such as Apache Airflow. Programming Languages: Skilled in one or more of the following languages, i.e.: Python, Java, Scala. Cloud Infrastructure: Understanding of cloud infrastructure such as GCP and tools within the More ❯
web analytics, content management systems (CMS), subscription platforms, ad tech, and social media. Ability to automate and optimise data workflows, using modern ETL/ELT tools (e.g., Airflow, dbt, ApacheSpark) to ensure timely and reliable delivery of data. Experience building robust data models and reporting layers to support performance dashboards, user engagement analytics, ad revenue tracking, and … with 2+ years of hands-on experience in a data engineering role. Tools & Technologies: Databases: Proficient in relational SQL databases. Workflow Management Tools: Exposure to orchestration platforms such as Apache Airflow. Programming Languages: Skilled in one or more of the following languages, i.e.: Python, Java, Scala. Cloud Infrastructure: Understanding of cloud infrastructure such as GCP and tools within the More ❯
Programming Mastery: Advanced skills in Python or another major language; writing clean, testable, production-grade ETL code at scale. Modern Data Pipelines: Experience with batch and streaming frameworks (e.g., ApacheSpark, Flink, Kafka Streams, Beam), including orchestration via Airflow, Prefect or Dagster. Data Modeling & Schema Management: Demonstrated expertise in designing, evolving, and documenting schemas (OLAP/OLTP, dimensional More ❯
architecture principles, including data modeling, data warehousing, data integration, and data governance. Databricks Expertise: They have hands-on experience with the Databricks platform, including its various components such as Spark, Delta Lake, MLflow, and Databricks SQL. They are proficient in using Databricks for various data engineering and data science tasks. Cloud Platform Proficiency: They are familiar with cloud platforms More ❯
Scala) Extensive experience with cloud platforms (AWS, GCP, or Azure) Experience with: Data warehousing and lake architectures ETL/ELT pipeline development SQL and NoSQL databases Distributed computing frameworks (Spark, Kinesis etc) Software development best practices including CI/CD, TDD and version control. Strong understanding of data modelling and system architecture Excellent problem-solving and analytical skills Whilst More ❯
using RDBMS, NO-SQL and Big Data technologies. Data visualization – Tools like Tableau Big data – Hadoop eco-system, Distributions like Cloudera/Hortonworks, Pig and HIVE Data processing frameworks – Spark & Spark streaming Hands-on experience with multiple databases like PostgreSQL, Snowflake, Oracle, MS SQL Server, NOSQL (HBase/Cassandra, MongoDB) Experience in cloud data eco-system - AWS, Azure More ❯