pipeline. Requirements Minimum Qualifications: - 7+ years of professional experience in Python development and data processing. - Deep expertise with Python data processing libraries such as PySpark, Pandas, and NumPy. - Strong experience with API development using FastAPI or similar frameworks. - Proficiency in test-driven development using PyTest and mocking libraries. - Advanced More ❯
will be at the forefront of processing ETL on new and varied data sets and working with Big Data at scale, using Spark/PySpark to handle petabytes of data. Key Responsibilities: • Customer Liaison: Develop a deep understanding of the customer's current and upcoming mission needs, generating requirements More ❯
this position. Qualifications: To be considered for this role, you must have: Active Secret clearance or higher (required). Strong proficiency in Python, SQL, PySpark, and Java. At least 5 years of hands-on experience in pipeline engineering or data engineering roles. Demonstrated ability to perform root cause analysis More ❯
Chantilly (both locations are in close proximity to exits of Rt 28). Required Skills: Extract, Transform and Load (ETL) tools and processes Python, Pyspark, Pytorch AWS SQL APIs Linux Geospatial tools/data Desired Skills: Agile experience delivering on agile teams (Participates in scrum and PI Planning) Docker More ❯
National Geospatial-Intelligence Agency (NGA) to process ETL on new and varied data sets, and work with Big Data at scale, using Spark/PySpark to handle petabytes of data. As an immediate liaison with the customer, you will have a deep understanding of their current and upcoming mission More ❯
National Geospatial-Intelligence Agency (NGA) to process ETL on new and varied data sets, and work with Big Data at scale, using Spark/PySpark to handle petabytes of data. As an immediate liaison with the customer, you will have a deep understanding of their current and upcoming mission More ❯
National Geospatial-Intelligence Agency (NGA) to process ETL on new and varied data sets, and work with Big Data at scale, using Spark/PySpark to handle petabytes of data. As an immediate liaison with the customer, you will have a deep understanding of their current and upcoming mission More ❯
required. A master's degree in a related field and 10+ years' relevant experience. Data Architecture experience. Well-versed in Python and Spark/PySpark Experience building and managing data pipelines and using data analytics tools such as Databricks, Qlik, Power BI Experience working via the terminal/command More ❯
does not apply) At least 1 year of experience in big data technologies Preferred Qualifications: 5+ years of experience in application development including Python, Pyspark or Scala 2+ years of experience with AWS 3+ years experience with Distributed data/computing tools (MapReduce, Hadoop, Hive, EMR, Kafka, Spark, Gurobi More ❯
experience in coding languages e.g. Python, C++, etc.; (Python preferred). Proficiency in database technologies e.g. SQL, No-SQL and Big Data technologies e.g. PySpark, Hive, etc. Experience working with structured and unstructured data e.g. Text, PDFs, jpgs, call recordings, video, etc. Knowledge of machine learning modelling techniques and More ❯
experience in coding languages e.g. Python, C++, etc.; (Python preferred). Proficiency in database technologies e.g. SQL, No-SQL and Big Data technologies e.g. pySpark, Hive, etc. Experience working with structured and unstructured data e.g. Text, PDFs, jpgs, call recordings, video, etc. Knowledge of machine learning modelling techniques and More ❯
Skills: Experience integrating Large Language Models and developing conversational AI solutions. Familiarity with deep learning libraries (Keras, PyTorch, TensorFlow) Experience with data processing tools (Pyspark, Hive) and databases (MongoDB, SQL, NoSQL) NLP expertise (nltk, SpaCy), especially with unstructured financial data. Understanding of financial applications and terminology. Education: BS or More ❯
using Azure services including Azure Databricks, Azure Data Factory, Delta Lake, Azure Data Lake (ADLS), Power BI. Solid hands-on experience with Azure Databricks - Pyspark coding and Spark SQL coding - Must have. Very good knowledge of data warehousing skills including dimensional modeling, slowly changing dimension patterns, and time travel. More ❯
using Azure services including Azure Databricks, Azure Data Factory, Delta Lake, Azure Data Lake (ADLS), Power BI. Solid hands-on experience with Azure Databricks - Pyspark coding and Spark SQL coding - Must have. Very good knowledge of data warehousing skills including dimensional modeling, slowly changing dimension patterns, and time travel. More ❯
for seamless data workflows. Collaborate with cross-functional teams to ensure data integrity, security, and accessibility. Key Skills & Experience: Strong programming skills in Python (PySpark). Hands-on experience with Azure Data Services (Azure Data Factory, Databricks, Synapse, Data Lakes, etc.). Experience with CI/CD pipelines for More ❯
London, England, United Kingdom Hybrid / WFH Options
Insight Global
transformation logic, and build of all Power BI dashboards - including testing, optimization & integration to data sources Good exposure to all elements of data engineering (PySpark, Lakehouses, Kafka, etc.) Experience building reports from streaming data Strong understanding of CI/CD pipeline Financial and/or trading exposure, particularly energy More ❯
transformation logic, and build of all Power BI dashboards - including testing, optimization & integration to data sources Good exposure to all elements of data engineering (PySpark, Lakehouses, Kafka, etc.) Experience building reports from streaming data Strong understanding of CI/CD pipeline Financial and/or trading exposure, particularly energy More ❯
london, south east england, united kingdom Hybrid / WFH Options
Insight Global
transformation logic, and build of all Power BI dashboards - including testing, optimization & integration to data sources Good exposure to all elements of data engineering (PySpark, Lakehouses, Kafka, etc.) Experience building reports from streaming data Strong understanding of CI/CD pipeline Financial and/or trading exposure, particularly energy More ❯
experience or PhD with 5-7 years of relevant experience Experience in research Data Engineering Experience (DE Path) ETL Big Data Experience and Tooling (PySpark, Databricks) Python Testing Frameworks Data Validation and Data Quality Frameworks Data Handling (SQL & NoSQL) Feature Engineering Chunking Document Ingestion Graph Data Structures (Neo4j) CI More ❯
Ensure compliance with GDPR and other data regulations when handling sensitive information. Support the stability and performance of enterprise data platforms. Requirements: Proficient with PySpark, Delta Lake, Unity Catalog and Python (including unit and integration testing). Deep understanding of software development principles (SOLID, testing, CI/CD, version More ❯
preferably GCP. Expertise in event-driven data integrations and click-stream ingestion. Proven ability in stakeholder management and project leadership. Proficiency in SQL, Python, PySpark Solid background in data pipeline orchestration, data access, and retention tooling. Demonstrable impact on infrastructure scalability and data privacy initiatives. A collaborative spirit, innovative More ❯
to optimize relevant workflow components for advanced artificial intelligence applications. The Contractor shall lead work to optimize cloud-based computing technologies, such as leveraging pyspark, distributed computation, and model training/inference, and integrate solutions into relevant delivery mechanisms or partner systems. The Contractor shall build tools and scripts More ❯
of GenAI models. Familiarity with prompt engineering and model optimization techniques. Contributions to open-source projects in the MLOps or GenAI space. Familiarity with PySpark for distributed data processing. We are dedicated to building a diverse, inclusive, and authentic workplace, so if you’re excited about this role but More ❯
frameworks and data governance practices, with an emphasis on scalability and compliance in research environments. Enterprise exposure to data engineering tools and products (Spark, PySpark, BigQuery, Pub/Sub) with an understanding of product/market fit for internal stakeholders Familiarity with cloud computing environments, including but not limited More ❯
frameworks and data governance practices, with an emphasis on scalability and compliance in research environments. Enterprise exposure to data engineering tools and products (Spark, PySpark, BigQuery, Pub/Sub) with an understanding of product/market fit for internal stakeholders Familiarity with cloud computing environments, including but not limited More ❯