Forecasting • Causal Inference • Python / R / SQL / Stata
End-to-end quantitative projects in forecasting, causal inference, and applied economics. Each project includes full methodology, code, and documented reasoning.
Between 2019 and 2025, electricity demand in Texas grew by roughly 27 percent. In PJM and MISO — the two largest electricity markets in the country — demand was essentially flat. That divergence coincides with the largest surge in data center investment in American history.
This project asks how much of the divergence data center investment can explain, and whether the existing investment pipeline is large enough to stress the grid in ways that current planning models are not built to anticipate.
The core problem is a data gap. Every major U.S. electricity market maintains an internal queue of data center developers requesting grid connections at scale. None of them make it public. What is public is the generation-side queue — filings made by power plant developers seeking to connect new supply. This project uses those filings as a proxy for where data center load is being built, since the generation is being requested primarily to serve that load.
The method is four layers, not one: panel regression, synthetic control, difference-in-differences with low-exposure control regions, and a narrative check against known data center opening dates. No single method is clean enough to carry the result alone. Where all four point the same direction despite failing differently, that convergence is the evidence.
The headline result comes from minimum hourly demand — the lowest point electricity consumption reaches in a given month, typically around 3am on a mild spring morning. That floor has been rising in Texas in a way that weather cannot explain. Data centers run continuously. Their load signature shows up precisely there. The synthetic control produces a 34.8 index-point gap between actual ERCOT minimum demand and what a counterfactual built from low-exposure regions predicts it would have reached without the data center surge.
California serves as a check. Same method, same time period, a grid that also received significant data center investment. No structural break detected, no credible gap. The approach finds a signal where the institutional story predicts one and finds nothing where it does not.
A three-model forecasting pipeline — ARIMA, Prophet, and XGBoost — extends the analysis through 2027 using ERCOT zone-level data. The gap between the baseline forecast and the model that incorporates queue filings measures the informational value of the proxy. The uncertainty that remains after incorporating it measures what better public data would reduce.
Full paper and SSRN link • GitHub • Interactive Dashboard
March 2026 • Synthetic Control • DiD • Panel Regression • ARIMA • XGBoost • Python • Stata
SQL analysis of 100,000 orders — growth trends, customer retention, delivery performance, and review score drivers. Built with DuckDB and Python.
March 2026 • SQL • Python • DuckDB
Econometric evaluation of a municipal bag tax using a clean DiD design — estimating causal effects on bag usage, retailer behavior, and environmental outcomes.
March 2026 • Econometrics • DiD • Python