governance. Convert between tactical data formats, e.g., XML, JSON, GEOJSON, Cursor-on-Target (CoT), JREAP-C, Link-16. Implement automation using CHRON/CRON jobs and data orchestration frameworks (Apache Airflow, Kafka, AWS Step Functions). Architect and maintain scalable data stores (e.g., ApacheIceberg, Elasticsearch) with schema evolution and deduplication. Enable real-time data indexing and … Data Links. Experience in natural language parsing (NLP) and using regular expressions, RegEx Experience implementing CHRON/CRON jobs Experience in data pipeline orchestration, ETL, orchestration tools (e.g., Kafka, Apache Airflow, AWS Step Functions) Focused experience implementing and monitoring data stores i.e. (ApacheIceberg, elastic search, data lake management) Experience with schema evolution, dataflow and governance Experience More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Tenth Revolution Group
Be Doing You'll be a key contributor to the development of a next-generation data platform, with responsibilities including: Designing and implementing scalable data pipelines using Python and Apache Spark Building and orchestrating workflows using AWS services such as Glue , Lambda , S3 , and EMR Serverless Applying best practices in software engineering: CI/CD , version control , automated testing … and modular design Supporting the development of a lakehouse architecture using ApacheIceberg Collaborating with product and business teams to deliver data-driven solutions Embedding observability and quality checks into data workflows Participating in code reviews, pair programming, and architectural discussions Gaining domain knowledge in financial data and sharing insights with the team What They're Looking For … for experience with type hints, linters, and testing frameworks like pytest) Solid understanding of data engineering fundamentals: ETL/ELT, schema evolution, batch processing Experience or strong interest in Apache Spark for distributed data processing Familiarity with AWS data tools (e.g., S3, Glue, Lambda, EMR) Strong communication skills and a collaborative mindset Comfortable working in Agile environments and engaging More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Tenth Revolution Group
Be Doing You'll be a key contributor to the development of a next-generation data platform, with responsibilities including: Designing and implementing scalable data pipelines using Python and Apache Spark Building and orchestrating workflows using AWS services such as Glue , Lambda , S3 , and EMR Serverless Applying best practices in software engineering: CI/CD , version control , automated testing … and modular design Supporting the development of a lakehouse architecture using ApacheIceberg Collaborating with product and business teams to deliver data-driven solutions Embedding observability and quality checks into data workflows Participating in code reviews, pair programming, and architectural discussions Gaining domain knowledge in financial data and sharing insights with the team What They're Looking For … for experience with type hints, linters, and testing frameworks like pytest) Solid understanding of data engineering fundamentals: ETL/ELT, schema evolution, batch processing Experience or strong interest in Apache Spark for distributed data processing Familiarity with AWS data tools (e.g., S3, Glue, Lambda, EMR) Strong communication skills and a collaborative mindset Comfortable working in Agile environments and engaging More ❯
Arlington, Virginia, United States Hybrid / WFH Options
STR
the following software/tools: Big Data tools: e.g. Hadoop, Spark, Kafka, ElasticSearch AWS: Athena, RDB, AWS credentials from Cloud Practitioner to Solutions Architect Data Lakes: e.g. Delta Lake, Apache Hudi, ApacheIceberg Distributed SQL interfaces: e.g. Apache Hive, Presto/Trino, Spark Data pipeline and workflow management tools: e.g Luigi, Airflow Dashboard frontends: e.g. Grafana More ❯
APIs, access control, and auditing Experience with DevOps pipelines Experience using the following software/tools: Big Data tools: e.g. Hadoop, Spark, Kafka, ElasticSearch Data Lakes: e.g. Delta Lake, Apache Hudi, ApacheIceberg Distributed Data Warehouse Frontends: e.g. Apache Hive, Presto Data pipeline and workflow management tools: e.g Luigi, Airflow Dashboard frontends: e.g. Grafana, Kibana Stream More ❯
able to work across full data cycle. - Proven Experience working with AWS data technologies (S3, Redshift, Glue, Lambda, Lake formation, Cloud Formation), GitHub, CI/CD - Coding experience in Apache Spark, Iceberg or Python (Pandas) - Experience in change and release management. - Experience in Database Warehouse design and data modelling - Experience managing Data Migration projects. - Cloud data platform development … the AWS services like Redshift, Lambda,S3,Step Functions, Batch, Cloud formation, Lake Formation, Code Build, CI/CD, GitHub, IAM, SQS, SNS, Aurora DB - Good experience with DBT, ApacheIceberg, Docker, Microsoft BI stack (nice to have) - Experience in data warehouse design (Kimball and lake house, medallion and data vault) is a definite preference as is knowledge More ❯
in Kubernetes, your work will empower analysts, data scientists, and leadership with the insights they need, when they need them. If you're fluent in tools like Spark, Trino, Iceberg, and Python, and you thrive in high-security environments, this could be your next mission-critical opportunity. In This Role, You'll: • Design, build, and maintain secure, scalable data … pipelines and services • Ingest, transform, and model structured and unstructured data for analytics and ML • Work with technologies like Apache Spark, ApacheIceberg, Trino, NiFi, OpenSearch, and AWS EMR • Ensure data integrity, lineage, and security across the entire lifecycle • Collaborate with DevOps to deploy containerized data solutions using Kubernetes • Support Agile delivery, version control, and data governance More ❯
Chantilly, Virginia, United States Hybrid / WFH Options
The DarkStar Group
rather huge and includes Python (Pandas, numpy, scipy, scikit-learn, standard libraries, etc.), Python packages that wrap Machine Learning (packages for NLP, Object Detection, etc.), Linux, AWS/C2S, Apache NiFi, Spark, pySpark, Hadoop, Kafka, ElasticSearch, Solr, Kibana, neo4J, MariaDB, Postgres, Docker, Puppet, and many others. Work on this program takes place in Chantilly, VA, McLean, VA and in … standards. Develop and deliver documentation for each project including ETL mappings, code use guide, code location and access instructions. Design and optimize Data Pipelines using tools such as Spark, ApacheIceberg, Trino, OpenSearch, EMR cloud services, NiFi and Kubernetes containers Ensure the pedigree and provenance of the data is maintained such that the access to data is protected More ❯
Herndon, Virginia, United States Hybrid / WFH Options
The DarkStar Group
rather huge and includes Python (Pandas, numpy, scipy, scikit-learn, standard libraries, etc.), Python packages that wrap Machine Learning (packages for NLP, Object Detection, etc.), Linux, AWS/C2S, Apache NiFi, Spark, pySpark, Hadoop, Kafka, ElasticSearch, Solr, Kibana, neo4J, MariaDB, Postgres, Docker, Puppet, and many others. Work on this program takes place in Chantilly, VA, McLean, VA and in … standards. Develop and deliver documentation for each project including ETL mappings, code use guide, code location and access instructions. Design and optimize Data Pipelines using tools such as Spark, ApacheIceberg, Trino, OpenSearch, EMR cloud services, NiFi and Kubernetes containers Ensure the pedigree and provenance of the data is maintained such that the access to data is protected More ❯
MySQL Exposure to Docker, Kubernetes, AWS, Helm, Terraform, Vault, Grafana, ELK Stack, New Relic Relevant experience in the maintenance of data APIs and data lake architectures, including experience with ApacheIceberg, Trino/Presto, Clickhouse, Snowflake, BigQuery. Master's degree in Computer Science or Engineering-related field Equal Opportunity Employer As an Equal Opportunity Employer, qualified applicants will More ❯
Falls Church, Virginia, United States Hybrid / WFH Options
Rackner
years of software engineering experience (backend, API, or full-stack) - Python, Java, or C# expertise - Experience with REST APIs (FastAPI, AWS Lambda), OpenAPI, and data pipelines (dbt, Airflow, Spark, Iceberg) - Knowledge of FHIR, OMOP, HL7, CDA, and federal compliance frameworks Bonus Experience: - DHA, VA, or federal healthcare IT programs - OCR/NLP/AI-ML workflows - AWS GovCloud (IL5 More ❯
Terraform and Kubernetes is a plus! A genuine excitement for significantly scaling large data systems Technologies we use (experience not required): AWS serverless architectures Kubernetes Spark Flink Databricks Parquet. Iceberg, Delta lake, Paimon Terraform Github including Github Actions Java PostgreSQL About Chainalysis Blockchain technology is powering a growing wave of innovation. Businesses and governments around the world are using More ❯
standards. Develop and deliver documentation for each project including ETL mappings, code use guide, code location and access instructions. Design and optimize Data Pipelines using tools such as Spark, ApacheIceberg, Trino, OpenSearch, EMR cloud services, NiFi and Kubernetes containers Ensure the pedigree and provenance of the data is maintained such that the access to data is protected More ❯
dbt,Talend, or FiveTran. Strong understanding of data governance and regulatory compliance frameworks such asPCI-DSS, GDPR, SOX, and CPRA. Bonus: Experience with Lakehouse technologies such as Delta Lake, ApacheIceberg, or Hudi. Daily Duties: Design, develop, and maintain scalable data pipelines that support enterprise reporting, regulatory compliance, and analytics. Implement data validation frameworks to ensure high accuracy More ❯
such as Snowflake. Strong expertise in SQL, including development, optimization, and performance tuning for large-scale data environments. Working knowledge of Python would be a plus. Working knowledge of ApacheIceberg is an asset. Experience with Palantir Foundry is advantageous, and knowledge of Ontology concepts is a plus. AIG, we value in-person collaboration as a vital part More ❯
innovation through advanced analytics and research-based problem solving. To be successful you should have: 10 years hands-on experience in AWS data engineering technologies, including Glue, PySpark, Athena, Iceberg, Databricks, Lake Formation, and other standard data engineering tools. Previous experience in implementing best practices for data engineering, including data governance, data quality, and data security. Proficiency in data More ❯
innovation through advanced analytics and research-based problem solving. To be successful you should have: 10 years hands-on experience in AWS data engineering technologies, including Glue, PySpark, Athena, Iceberg, Databricks, Lake Formation, and other standard data engineering tools. Previous experience in implementing best practices for data engineering, including data governance, data quality, and data security. Proficiency in data More ❯
both defect and performance issue resolution. Proven result oriented individual with the delivery focus in a high velocity, high-quality environment. Experience in Hadoop ecosystems such as Hadoop, Spark, Iceberg, YuniKorn, etc. Basic Qualifications: Master Degree in Computer Science or a bachelor’s degree with significant proven experience is required Strong Java development experience Good Linux and networking knowledge More ❯
and deliver critical systems with impact. Proficiency in Java/Python, CI/CD, and containerized environments. Hands-on expertise in tools like Kafka/Flink, Spark, Delta/Iceberg, Kubernetes, NoSQL/columnar stores. Experience in streaming and batch data platforms. Strong foundation in algorithms and distributed design. BS/MS in CS or equivalent experience. The base More ❯
running such systems in production. Strong coding skills in Java/Python and familiarity with CI/CD. Hands-on with some of: Kafka/Flink, Spark, Delta/Iceberg, Kubernetes, NoSQL/columnar stores. Proven ability to work independently, make sound tradeoffs, and deliver quality outcomes with minimal supervision. Solid debugging, performance analysis, and system design skills. Nice More ❯
or equivalent) and in infra-as-code, CI/CD, and containerized environments. Hands-on deep internal expertise in several of the following: Kafka/Flink, Spark,, Delta/Iceberg, GraphQL/REST APIs, RDBMS/NoSQL, Kubernetes, Airflow. Experience building both streaming and batch data platforms, improving reliability, quality, and developer velocity. Demonstrated ability to mentor senior engineers More ❯
standards. Develop and deliver documentation for each project, including ETL mappings, code use guides, code location and access instructions. • Design and optimize Data Pipelines using tools such as Spark, ApacheIceberg, Trino, OpenSearch, EMR cloud services, NiFi and Kubernetes containers • Ensure the pedigree and provenance of the data is maintained such that the access to data is protected More ❯
processing. • Implement offline pipelines using Dagster (or Airflow) for batch processing. • Parse and process binary message formats from multiple sensor and partner data sources. • Build data warehouses using Postgres, ApacheIceberg, Parquet, and S3. • Design and maintain data models optimized for high-performance queries. • Validate, normalize, and ensure correctness and trustworthiness of all data sources. Required Experience: • Active More ❯
standards. Develop and deliver documentation for each project including ETL mappings, code use guide, code location and access instructions. Design and optimize Data Pipelines using tools such as Spark, ApacheIceberg, Trino, OpenSearch, EMR cloud services, NiFi and Kubernetes containers Ensure the pedigree and provenance of the data is maintained such that the access to data is protected More ❯
and coding standards. Develop and deliver documentation for each project, including ETL mappings, user guides, and access instructions. Design and optimize scalable data pipelines using technologies such as Spark, ApacheIceberg, Trino, OpenSearch, EMR cloud services, NiFi, and Kubernetes containers. Ensure data lineage, pedigree, and security are maintained throughout the lifecycle. Clean and preprocess datasets to enable advanced More ❯