governance. Convert between tactical data formats, e.g., XML, JSON, GEOJSON, Cursor-on-Target (CoT), JREAP-C, Link-16. Implement automation using CHRON/CRON jobs and data orchestration frameworks (Apache Airflow, Kafka, AWS Step Functions). Architect and maintain scalable data stores (e.g., ApacheIceberg, Elasticsearch) with schema evolution and deduplication. Enable real-time data indexing and … Data Links. Experience in natural language parsing (NLP) and using regular expressions, RegEx Experience implementing CHRON/CRON jobs Experience in data pipeline orchestration, ETL, orchestration tools (e.g., Kafka, Apache Airflow, AWS Step Functions) Focused experience implementing and monitoring data stores i.e. (ApacheIceberg, elastic search, data lake management) Experience with schema evolution, dataflow and governance Experience More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Tenth Revolution Group
Be Doing You'll be a key contributor to the development of a next-generation data platform, with responsibilities including: Designing and implementing scalable data pipelines using Python and Apache Spark Building and orchestrating workflows using AWS services such as Glue , Lambda , S3 , and EMR Serverless Applying best practices in software engineering: CI/CD , version control , automated testing … and modular design Supporting the development of a lakehouse architecture using ApacheIceberg Collaborating with product and business teams to deliver data-driven solutions Embedding observability and quality checks into data workflows Participating in code reviews, pair programming, and architectural discussions Gaining domain knowledge in financial data and sharing insights with the team What They're Looking For … for experience with type hints, linters, and testing frameworks like pytest) Solid understanding of data engineering fundamentals: ETL/ELT, schema evolution, batch processing Experience or strong interest in Apache Spark for distributed data processing Familiarity with AWS data tools (e.g., S3, Glue, Lambda, EMR) Strong communication skills and a collaborative mindset Comfortable working in Agile environments and engaging More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Tenth Revolution Group
Be Doing You'll be a key contributor to the development of a next-generation data platform, with responsibilities including: Designing and implementing scalable data pipelines using Python and Apache Spark Building and orchestrating workflows using AWS services such as Glue , Lambda , S3 , and EMR Serverless Applying best practices in software engineering: CI/CD , version control , automated testing … and modular design Supporting the development of a lakehouse architecture using ApacheIceberg Collaborating with product and business teams to deliver data-driven solutions Embedding observability and quality checks into data workflows Participating in code reviews, pair programming, and architectural discussions Gaining domain knowledge in financial data and sharing insights with the team What They're Looking For … for experience with type hints, linters, and testing frameworks like pytest) Solid understanding of data engineering fundamentals: ETL/ELT, schema evolution, batch processing Experience or strong interest in Apache Spark for distributed data processing Familiarity with AWS data tools (e.g., S3, Glue, Lambda, EMR) Strong communication skills and a collaborative mindset Comfortable working in Agile environments and engaging More ❯
Arlington, Virginia, United States Hybrid / WFH Options
STR
the following software/tools: Big Data tools: e.g. Hadoop, Spark, Kafka, ElasticSearch AWS: Athena, RDB, AWS credentials from Cloud Practitioner to Solutions Architect Data Lakes: e.g. Delta Lake, Apache Hudi, ApacheIceberg Distributed SQL interfaces: e.g. Apache Hive, Presto/Trino, Spark Data pipeline and workflow management tools: e.g Luigi, Airflow Dashboard frontends: e.g. Grafana More ❯
APIs, access control, and auditing Experience with DevOps pipelines Experience using the following software/tools: Big Data tools: e.g. Hadoop, Spark, Kafka, ElasticSearch Data Lakes: e.g. Delta Lake, Apache Hudi, ApacheIceberg Distributed Data Warehouse Frontends: e.g. Apache Hive, Presto Data pipeline and workflow management tools: e.g Luigi, Airflow Dashboard frontends: e.g. Grafana, Kibana Stream More ❯
able to work across full data cycle. - Proven Experience working with AWS data technologies (S3, Redshift, Glue, Lambda, Lake formation, Cloud Formation), GitHub, CI/CD - Coding experience in Apache Spark, Iceberg or Python (Pandas) - Experience in change and release management. - Experience in Database Warehouse design and data modelling - Experience managing Data Migration projects. - Cloud data platform development … the AWS services like Redshift, Lambda,S3,Step Functions, Batch, Cloud formation, Lake Formation, Code Build, CI/CD, GitHub, IAM, SQS, SNS, Aurora DB - Good experience with DBT, ApacheIceberg, Docker, Microsoft BI stack (nice to have) - Experience in data warehouse design (Kimball and lake house, medallion and data vault) is a definite preference as is knowledge More ❯
in Kubernetes, your work will empower analysts, data scientists, and leadership with the insights they need, when they need them. If you're fluent in tools like Spark, Trino, Iceberg, and Python, and you thrive in high-security environments, this could be your next mission-critical opportunity. In This Role, You'll: • Design, build, and maintain secure, scalable data … pipelines and services • Ingest, transform, and model structured and unstructured data for analytics and ML • Work with technologies like Apache Spark, ApacheIceberg, Trino, NiFi, OpenSearch, and AWS EMR • Ensure data integrity, lineage, and security across the entire lifecycle • Collaborate with DevOps to deploy containerized data solutions using Kubernetes • Support Agile delivery, version control, and data governance More ❯
Chantilly, Virginia, United States Hybrid / WFH Options
The DarkStar Group
rather huge and includes Python (Pandas, numpy, scipy, scikit-learn, standard libraries, etc.), Python packages that wrap Machine Learning (packages for NLP, Object Detection, etc.), Linux, AWS/C2S, Apache NiFi, Spark, pySpark, Hadoop, Kafka, ElasticSearch, Solr, Kibana, neo4J, MariaDB, Postgres, Docker, Puppet, and many others. Work on this program takes place in Chantilly, VA, McLean, VA and in … standards. Develop and deliver documentation for each project including ETL mappings, code use guide, code location and access instructions. Design and optimize Data Pipelines using tools such as Spark, ApacheIceberg, Trino, OpenSearch, EMR cloud services, NiFi and Kubernetes containers Ensure the pedigree and provenance of the data is maintained such that the access to data is protected More ❯
Herndon, Virginia, United States Hybrid / WFH Options
The DarkStar Group
rather huge and includes Python (Pandas, numpy, scipy, scikit-learn, standard libraries, etc.), Python packages that wrap Machine Learning (packages for NLP, Object Detection, etc.), Linux, AWS/C2S, Apache NiFi, Spark, pySpark, Hadoop, Kafka, ElasticSearch, Solr, Kibana, neo4J, MariaDB, Postgres, Docker, Puppet, and many others. Work on this program takes place in Chantilly, VA, McLean, VA and in … standards. Develop and deliver documentation for each project including ETL mappings, code use guide, code location and access instructions. Design and optimize Data Pipelines using tools such as Spark, ApacheIceberg, Trino, OpenSearch, EMR cloud services, NiFi and Kubernetes containers Ensure the pedigree and provenance of the data is maintained such that the access to data is protected More ❯
MySQL Exposure to Docker, Kubernetes, AWS, Helm, Terraform, Vault, Grafana, ELK Stack, New Relic Relevant experience in the maintenance of data APIs and data lake architectures, including experience with ApacheIceberg, Trino/Presto, Clickhouse, Snowflake, BigQuery. Master's degree in Computer Science or Engineering-related field Equal Opportunity Employer As an Equal Opportunity Employer, qualified applicants will More ❯
Falls Church, Virginia, United States Hybrid / WFH Options
Rackner
years of software engineering experience (backend, API, or full-stack) - Python, Java, or C# expertise - Experience with REST APIs (FastAPI, AWS Lambda), OpenAPI, and data pipelines (dbt, Airflow, Spark, Iceberg) - Knowledge of FHIR, OMOP, HL7, CDA, and federal compliance frameworks Bonus Experience: - DHA, VA, or federal healthcare IT programs - OCR/NLP/AI-ML workflows - AWS GovCloud (IL5 More ❯
Terraform and Kubernetes is a plus! A genuine excitement for significantly scaling large data systems Technologies we use (experience not required): AWS serverless architectures Kubernetes Spark Flink Databricks Parquet. Iceberg, Delta lake, Paimon Terraform Github including Github Actions Java PostgreSQL About Chainalysis Blockchain technology is powering a growing wave of innovation. Businesses and governments around the world are using More ❯
standards. Develop and deliver documentation for each project including ETL mappings, code use guide, code location and access instructions. Design and optimize Data Pipelines using tools such as Spark, ApacheIceberg, Trino, OpenSearch, EMR cloud services, NiFi and Kubernetes containers Ensure the pedigree and provenance of the data is maintained such that the access to data is protected More ❯
dbt,Talend, or FiveTran. Strong understanding of data governance and regulatory compliance frameworks such asPCI-DSS, GDPR, SOX, and CPRA. Bonus: Experience with Lakehouse technologies such as Delta Lake, ApacheIceberg, or Hudi. Daily Duties: Design, develop, and maintain scalable data pipelines that support enterprise reporting, regulatory compliance, and analytics. Implement data validation frameworks to ensure high accuracy More ❯
such as Snowflake. Strong expertise in SQL, including development, optimization, and performance tuning for large-scale data environments. Working knowledge of Python would be a plus. Working knowledge of ApacheIceberg is an asset. Experience with Palantir Foundry is advantageous, and knowledge of Ontology concepts is a plus. AIG, we value in-person collaboration as a vital part More ❯
innovation through advanced analytics and research-based problem solving. To be successful you should have: 10 years hands-on experience in AWS data engineering technologies, including Glue, PySpark, Athena, Iceberg, Databricks, Lake Formation, and other standard data engineering tools. Previous experience in implementing best practices for data engineering, including data governance, data quality, and data security. Proficiency in data More ❯
innovation through advanced analytics and research-based problem solving. To be successful you should have: 10 years hands-on experience in AWS data engineering technologies, including Glue, PySpark, Athena, Iceberg, Databricks, Lake Formation, and other standard data engineering tools. Previous experience in implementing best practices for data engineering, including data governance, data quality, and data security. Proficiency in data More ❯
standards. Develop and deliver documentation for each project, including ETL mappings, code use guides, code location and access instructions. • Design and optimize Data Pipelines using tools such as Spark, ApacheIceberg, Trino, OpenSearch, EMR cloud services, NiFi and Kubernetes containers • Ensure the pedigree and provenance of the data is maintained such that the access to data is protected More ❯
processing. • Implement offline pipelines using Dagster (or Airflow) for batch processing. • Parse and process binary message formats from multiple sensor and partner data sources. • Build data warehouses using Postgres, ApacheIceberg, Parquet, and S3. • Design and maintain data models optimized for high-performance queries. • Validate, normalize, and ensure correctness and trustworthiness of all data sources. Required Experience: • Active More ❯
standards. Develop and deliver documentation for each project including ETL mappings, code use guide, code location and access instructions. Design and optimize Data Pipelines using tools such as Spark, ApacheIceberg, Trino, OpenSearch, EMR cloud services, NiFi and Kubernetes containers Ensure the pedigree and provenance of the data is maintained such that the access to data is protected More ❯
and coding standards. Develop and deliver documentation for each project, including ETL mappings, user guides, and access instructions. Design and optimize scalable data pipelines using technologies such as Spark, ApacheIceberg, Trino, OpenSearch, EMR cloud services, NiFi, and Kubernetes containers. Ensure data lineage, pedigree, and security are maintained throughout the lifecycle. Clean and preprocess datasets to enable advanced More ❯
standards. Develop and deliver documentation for each project including ETL mappings, code use guide, code location and access instructions. Design and optimize Data Pipelines using tools such as Spark, ApacheIceberg, Trino, OpenSearch, EMR cloud services, NiFi and Kubernetes containers Ensure the pedigree and provenance of the data is maintained such that the access to data is protected More ❯
sets. Collaborate with data scientists to deploy machine learning models. Contribute to strategy, planning, and continuous improvement. Required Experience: Hands-on experience with AWS data tools: Glue, PySpark, Athena, Iceberg, Lake Formation. Strong Python and SQL skills for data processing and analysis. Deep understanding of data governance, quality, and security. Knowledge of market data and its business applications. Desirable More ❯
sets. Collaborate with data scientists to deploy machine learning models. Contribute to strategy, planning, and continuous improvement. Required Experience: Hands-on experience with AWS data tools: Glue, PySpark, Athena, Iceberg, Lake Formation. Strong Python and SQL skills for data processing and analysis. Deep understanding of data governance, quality, and security. Knowledge of market data and its business applications. Desirable More ❯
Manchester, Lancashire, England, United Kingdom Hybrid / WFH Options
Lorien
passionate about building scalable, cloud-native data platforms. You'll be a key player in a growing team, helping to shape the future of data infrastructure using AWS, PySpark, Iceberg, and more. From designing high-performance pipelines to supporting a full-scale migration from SQL Server to AWS, this role offers the chance to work on real-time data More ❯