What you can build (together)
The ecosystem centers around shared interfaces and reusable components: a unified data layer, common instrument definitions, and standardized APIs for pricing, risk, optimization, backtesting, and explainable reporting.
Shared repo framework
Standard interfaces (the glue)
Groups should code to shared, minimal function signatures.
Suggested folder layout
One repo. Each group owns one module folder + tests + a demo notebook.
Project briefs (quick scan)
1) Volatility Surface + Calibration
Build an implied-vol surface from public option chains; calibrate simple models; expose iv(K,T).
2) American Option Pricing Engines
Implement binomial-tree and LSMC pricers for American puts/calls; compare accuracy vs compute.
3) Dynamic Hedging + P&L Attribution
Backtest delta hedging with transaction costs; analyze hedging error distributions and drivers.
4) Portfolio Optimization under Tail Risk
Mean–variance vs CVaR optimization; rolling backtests; constraint handling; solver interface.
5) Factor & Macro Modeling
CAPM + multifactor regressions with macro series; exposure analysis; expected return forecasts.
6) Strategy Module: Pairs Trading with Risk Controls
Design a pairs framework with costs, sizing, and risk limits; validate robustness.
7) GenAI “Copilot” for Explainable Analytics
Use open LLMs to generate faithful reports grounded in computed outputs + guardrails.
Open-source data sources we will consider
Students should not depend on proprietary datasets. These are the recommended public sources:
- Yahoo Finance via yfinance: equities/ETFs, OHLCV, some corporate actions, and option chains.
- FRED (Federal Reserve Economic Data): macro series (rates, inflation, unemployment, spreads).
- Stooq: free historical price data (often used as a backup source for equities/indices/FX).
- World Bank / IMF public datasets: macro indicators (optional for macro-heavy projects).
- ECB / other central bank public feeds: policy rates and selected macro series (optional).
- Open-source libraries (not data, but critical): NumPy/Pandas/Statsmodels; CVXPy/Scipy for optimization; QuantLib for pricing engines; Matplotlib/Plotly for visuals.
- Open-source LLMs via Hugging Face: small-to-mid models suitable for CPU inference; optional quantized models.
Note: availability of option chains and depth can vary by ticker/date; groups should implement a fallback plan (switch underlyings / cache snapshots / simplify the surface).
Detailed project statements
Each group must deliver: (1) a runnable module, (2) a demo notebook, (3) tests for at least 3 invariants, and (4) a short report explaining assumptions, validation, and limitations.
Project 1 — Volatility Surface + Calibration Module
Goal: Convert public option chain data into a usable implied-volatility surface and provide calibration tools for simple models.
- Inputs: option chains (strikes, expiries, mid prices), underlying price, risk-free proxy.
- Core tasks:
- Compute implied volatility per (K, T) using a robust root-finder.
- Build an interpolated surface
iv(K,T)with basic smoothing/regularization. - Calibrate at least one parametric model (e.g., constant vol baseline; optional Heston/local-vol proxy).
- Quantify fit errors (pricing error, IV error) and stability across dates.
- Optimization component: parameter calibration via least squares with regularization.
- Validation: put–call parity checks, monotonicity vs strike, arbitrage sanity checks.
- Deliverables: surface builder API, calibration notebook, error diagnostics plots.
Project 2 — American Option Pricing (Trees + LSMC)
Goal: Provide discrete-time pricing engines for American options and compare methods.
- Methods: CRR (or JR) binomial tree; Least Squares Monte Carlo (Longstaff–Schwartz).
- Core tasks:
- Implement American put pricing and exercise boundary extraction from the tree.
- Implement LSMC with configurable basis functions and regression choices.
- Run convergence experiments (tree steps vs MC paths; basis complexity).
- ML component: regression basis selection and hyperparameter study.
- Validation: compare to European prices (American ≥ European for puts; equality cases).
- Deliverables: pricer API, comparison notebook, runtime/accuracy tradeoff table.
Project 3 — Dynamic Hedging + Hedging Error Attribution
What are model Greeks? Greeks (delta, gamma, vega, theta, rho) are partial derivatives of an option's price with respect to market factors. Delta measures price sensitivity to the underlying; gamma measures delta's sensitivity; vega measures sensitivity to volatility; theta measures time decay; rho measures rate sensitivity. In hedging, Greeks guide position sizing and rebalancing frequency.
Goal: Build a hedging backtester that connects model Greeks to realized P&L.
- Core tasks:
- Implement delta hedging for a European option (baseline) using BS delta.
- Support discrete re-hedging (daily/weekly) and transaction cost models (fixed bps per trade).
- Compute hedging P&L and decompose drivers (rebalancing frequency, vol misspecification, jumps/regimes).
- Optimization component: choose re-hedge frequency to minimize tail risk of hedging error under cost.
- Risk component: hedging error VaR/CVaR over backtest windows.
- Deliverables: strategy module + demo notebook showing hedging error distributions and conclusions.
Project 4 — Portfolio Optimization: Mean–Variance vs CVaR
Goal: Provide an optimization layer with consistent APIs and compare risk objectives.
- Core tasks:
- Implement mean–variance optimizer with constraints (long-only, weight caps, turnover constraints optional).
- Implement CVaR optimizer using scenario returns (LP form) and comparable constraints.
- Rolling-window backtest and evaluation (vol, drawdown, tail loss, turnover, costs).
- Optimization component: convex optimization + constraint handling.
- Validation: sanity checks (weights sum to 1, constraint satisfaction, stable solutions).
- Deliverables: optimizer API, comparative notebook, discussion: when CVaR helps/hurts.
Project 5 — Factor & Macro Modeling (CAPM + Multifactor)
Goal: Provide factor exposure estimation and simple expected return modeling using public macro data.
- Core tasks:
- Estimate CAPM beta/alpha for a basket of assets; rolling stability analysis.
- Extend to multifactor regression using macro series (FRED) + market proxies.
- Out-of-sample evaluation (walk-forward): forecast errors and economic usefulness (tilts/overlays).
- ML component: regularized regression (ridge/lasso) + model selection.
- Deliverables: factor API, notebook with exposure plots and a clear bias/overfitting discussion.
Project 6 — Pairs Trading Strategy Framework with Risk Controls
What is a pairs framework? A pairs strategy identifies two correlated assets that have temporarily diverged in price (the "spread" has widened). The strategy goes long the underperformer and short the outperformer, betting the spread mean-reverts. Risk controls ensure position sizing stays rational and losses are capped. Success depends on finding stable pairs, reliable signals, and robust execution under costs.
Goal: Build a strategy module that stresses integration: data → signals → sizing → risk → backtest.
- Core tasks:
- Select/identify pairs (sector ETFs, similar equities); compute spread and stationarity diagnostics.
- Signal generation (z-score mean reversion) + position sizing (vol targeting).
- Add risk controls (max leverage, stop-loss/time stop, VaR limits) and transaction costs.
- Evaluate robustness: subperiods, parameter sweeps, stress periods.
- Optimization component: threshold tuning and sizing under constraints via cross-validation.
- Deliverables: strategy API, backtest notebook, honest “does it survive costs & risk?” conclusions.
Project 7 — GenAI Copilot for Explainable Quant Analytics (Open Models)
Goal: Produce an explainability layer that generates faithful reports grounded in computed outputs.
- Core tasks:
- Design templates and “formula cards” so the model references computed numbers (not invented math).
- Build a small pipeline: user query → run analytics → generate report with citations to computed fields.
- Implement guardrails: refuse on missing inputs; unit tests for key computations; hallucination checks.
- Evaluate quality with a rubric: faithfulness, completeness, clarity, consistency.
- GenAI component: open-source models via Hugging Face + light evaluation harness.
- Deliverables: demo notebook or simple app that produces a final “portfolio + options + risk” report.
Integration requirement (recommended)
To ensure a true ecosystem, require a final integrated demo notebook: Integrated Demo
- Build vol surface (P1) → price options (P1/P2) → hedge (P3)
- Construct a portfolio overlay with CVaR constraints (P4)
- Explain exposures via factors/macro (P5)
- Add a small “alpha sleeve” via pairs strategy (P6)
- Generate a final narrative report with copilot (P7)