Data Engineering Projects

Automated data pipelines and ETL systems for complex, multi-source datasets built during my work at the Leonard C. Cooper Jnr International Trade Center.

ERS-Cooper: Automated Global Trade Data Pipeline

A professional Python-based ETL system that synchronizes complex trade datasets from the USDA, WTO, World Bank, and IMF. Features modular architecture, automated error handling, and data normalization.

The Challenge

Agricultural trade researchers at NC A&T were spending 20+ hours per week manually downloading, cleaning, and merging data from multiple international sources.

Manual data entry errors
Inconsistent country names
Time-consuming downloads
Difficulty tracking updates
No version control
Limited reproducibility

Modular Architecture

Dedicated scripts for each data source
Standardized error handling
Comprehensive logging system
Environment variable management
Data validation checks
Incremental update capability

Multi-Source Integration

UN Comtrade API
Trade flow data
USDA FAS Database
Agricultural exports/imports
World Bank API
GDP, population, exchange rates
IMF Data Portal
Economic indicators
WTO Trade Statistics
Trade agreements, tariffs

Measurable Impact

90% reduction in data processing time
Zero manual data entry errors
Real-time data updates enabled
Supported graduate-level research
200+ countries covered
10+ years of historical data
CAES Showcase of Excellence Award

Supporting Academic Research

This pipeline was the foundation of my master's thesis: “Quantifying the Impact of U.S. Free Trade Agreements on Agricultural Exports”

CAES Showcase of Excellence Award (2025)
10+
Years of Data
200+
Countries
15+
Trade Agreements
5+
Data Sources

Additional Data Projects

Country Reference Data Synchronization

Automated system for normalizing country names and codes across multiple international data sources with 99.5% matching accuracy.

PythonPandasFuzzyWuzzyAPIs

Macroeconomic Data Integration

Automated extraction and integration of GDP, exchange rates, and population data from World Bank and IMF covering 200+ countries over 20+ years.

PythonPandasPostgreSQLAPIs

Interested in Data Engineering?

I'm passionate about building robust, scalable data systems that enable research and decision-making.