Senior Data Engineer
The International Rescue Committee (IRC) responds to the world's worst humanitarian crises, helping to restore health, safety, education, economic wellbeing, and power to people devastated by conflict and disaster. Founded in 1933 at the call of Albert Einstein, the IRC is one of the world's largest international humanitarian non-governmental organizations (INGO), at work in more than 40 countries and 29 U.S. cities helping people to survive, reclaim control of their future and strengthen their communities. A force for humanity, IRC employees deliver lasting impact by restoring safety, dignity and hope to millions. If you're a solutions-driven, passionate change-maker, come join us in positively impacting the lives of millions of people world-wide for a better future. IRC UK IRC UK is part of the IRC global network, which has its global headquarters in New York. Our team in the UK works to raise profile, deliver policy and practice change, and increase funding to help restore health, safety, education, economic wellbeing and power to people devastated by conflict and disaster. Since 2021, IRC UK has also provided integration services directly to refugees in England, a programme that is rapidly growing. In Europe, the IRC also has offices in Berlin, Bonn, Brussels, Geneva and Stockholm. The Purpose of the Role The External Relations (ER) department was created in February 2020 and is comprised of 3 main but complementary functions: Private fundraising, Communications and Policy & Advocacy. The ER department is three years into a 5-year ground-breaking and ambitious global strategy that will improve IRC’s ability to ‘punch above its weight’ in private income, advocacy and brand awareness. The main objective of the department is to enable this organization of more than 12,000 staff to have the resources needed to continue serving 18 million people worldwide in places affected by war and disaster, shape the humanitarian sector by influencing key policies and reforms, and build and grow IRC’s reputation. We are seeking a skilled and versatile Data Engineer to join our dynamic analytics team, which includes data scientists and analysts. In this role, you will leverage your expertise in both analytics engineering and machine learning operations (ML Ops), as well as infrastructure design and deployment, to build, maintain, and optimize the systems and tools that support our data pipelines, machine learning workflows, and business intelligence reporting. You will play an active role in scaling IRC’s internal data capabilities as the volume and complexity of our data and ML models grow and business needs evolve. Major Responsibilities
- Support the entire workflow of the ER data model: data pipeline development, ELT performance, timely loading of data sets, and maintenance of data models via the use of monitoring, testing, and automation.
- Collaborate with analysts, data scientists, and ER stakeholders to understand the opportunities to develop well-defined, integrated, production-quality, and re-usable data models in SQL using dbt, ensuring data quality.
- Collaborate with data scientists to build and automate end-to-end ML pipelines, from data preparation to model deployment and monitoring, including designing, implementing, and maintaining MLflow-based workflows for model tracking, versioning, and deployment.
- Apply software engineering practices when creating new data models to ensure data quality & standardization across our pipelines, and ML and BI tools.
- Employ comprehensive testing and documentation practices.
- Drive clear requirements documentation and contribute to code review.
- Identify and execute internal process improvements, including re-designing infrastructure for greater scalability and automating manual processes.
- Act as a technical expert to the rest of the ER analytics team to mentor analysts and improve analytics engineering as a practice across all ER analytics (query development, extending data models, software development practices, PowerBI data modeling governance, ML Ops).
- Contribute to continuously clarifying, simplifying, and otherwise improving the conceptual foundations of ER Analytics data models; develop and maintain conceptual data model artifacts including readme-level documentation, model diagrams, prototypes, change notices, et cetera,
- Collaborate with engineering team, analysts, and business users to implement new ELT pipelines, data infrastructure improvements, and integration of new ER and cross-IRC data sets and other data consumption assets.
- Partner with the Associate Director, Analytics Engineering to evaluate data stack improvements.
- Support of other analytics tasks as needed
- Curiosity to explore complex and ambiguous problems and deliver structured analytics solutions **
- 4+ years working in the field of data and analytics **
- At least 2+ years of professional experience manipulating large scale data, using both Python and SQL (nested data structure manipulation, windowing functions, query optimization, data partitioning techniques) **
- Strong experience with data pipeline management technologies (e.g. Airflow, dbt), dependency checking, schema design, and dimensional data modeling **
- Strong experience with ML model management tools, such as ML Flow **
- 2+ years of experience with cloud-based data warehouses (Snowflake, Databricks, BigQuery, Redshift, Azure) **
- Knowledgeable and passionate about the “modern data stack” **
- Strong adherence to data ops best practices, including version control (e.g., GitHub), and data testing **
- Independent worker with strong attention to detail & commitment to a high standard of work product **
- Ability to communicate technical concepts to non-technical stakeholders and translate business needs into technical requirements. **
- Desire to work in a multi-cultural environment and collaborate with people from different backgrounds and experiences **
- Familiarity with Salesforce or similar CRM technology
- Experience owning dbt in a high-growth org, including deploying capabilities such as utils, packages, tests, snapshots, and incremental tables
- Experience in Snowflake and Databricks
- Exposure to Microsoft BI tooling: PowerBI, Power Query and, DAX/MDX scripting language
- Understanding of infrastructure-as-code (Terraform, CloudFormation) and CI/CD pipelines for ML/AI workflows.
- Experience with distributed data processing frameworks such as Apache Spark or Apache Kafka is a plus.
- Standard office working environment.
- This role may require working remotely full or part time and part time remote employees may be required to share workspace.
- Screening call online
- First round panel interview online – including assessment / test
- Second round panel interview – online – including presentation task
- Final role and expectations alignment call with Senior Director, online