Software DATA Engineer
Location: Remote
Job Type: Full-Time
Salary: 77K
About the Role
We are seeking an experienced and highly motivated Data Engineer to join our growing team. In this role, you will be responsible for designing, developing, and maintaining scalable data platforms and pipelines that support business intelligence, analytics, machine learning, and operational reporting initiatives.
You will work closely with data analysts, software engineers, architects, and business stakeholders to deliver robust, high-performance data solutions in a cloud-native AWS environment. The ideal candidate has strong expertise in PySpark, Python, Apache Airflow, AWS services, Terraform, and modern DevOps practices.
Good to have SC eligibility or SC clearance
Key Responsibilities
Data Engineering & Pipeline Development
- Design, develop, and maintain scalable, reliable, and efficient data pipelines using PySpark and Python.
- Build high-volume batch and real-time data processing solutions capable of handling large-scale datasets.
- Develop, optimize, and monitor ETL/ELT workflows to ensure data quality, consistency, and availability.
- Implement data transformation, cleansing, enrichment, and validation processes.
- Troubleshoot and resolve data pipeline failures, bottlenecks, and performance issues.
Workflow Orchestration
- Design and manage complex workflows using Apache Airflow.
- Create and maintain DAGs with robust scheduling, dependency management, alerting, and recovery mechanisms.
- Monitor workflow execution and proactively address failures or performance concerns.
- Implement workflow best practices to ensure reliability and maintainability.
Cloud Data Architecture (AWS)
- Architect and implement cloud-native data solutions on AWS.
- Develop scalable and secure data platforms leveraging:
- Amazon S3
- Amazon Redshift
- AWS Glue
- AWS Lambda
- Amazon EMR
- API Gateway
- Amazon CloudWatch
- AWS IAM
- Ensure adherence to security, governance, and compliance standards.
- Optimize cloud resources for performance and cost efficiency.
Infrastructure as Code
- Provision and manage AWS infrastructure using Terraform.
- Develop reusable Terraform modules and templates.
- Implement infrastructure automation to support development, testing, and production environments.
- Maintain version-controlled infrastructure and deployment processes.
DevOps & CI/CD
- Design and maintain CI/CD pipelines using GitHub Actions.
- Automate testing, deployment, monitoring, and infrastructure updates.
- Support continuous integration and continuous delivery best practices.
- Collaborate with engineering teams to improve deployment reliability and efficiency.
.
Required Skills & Experience
- Strong experience with Python and PySpark.
- Hands-on expertise with Apache Airflow.
- Extensive experience working with AWS cloud services.
- Strong knowledge of Amazon Redshift, AWS Glue, S3, Lambda, EMR, API Gateway, CloudWatch, and IAM.
- Experience with Terraform and Infrastructure as Code (IaC).
- Proficiency with Git, GitHub Actions, and CI/CD pipelines.
- Solid understanding of distributed data processing and Spark optimization.
- Experience designing scalable data architectures and data models.
- Strong SQL skills and understanding of data warehousing concepts.
- Excellent troubleshooting, analytical, and problem-solving abilities.
- Strong communication and collaboration skills.
If you are passionate about building scalable data platforms and solving complex data challenges, we would love to hear from you.