Problem Description
Solution
Here is a complete, runnable QuantConnect algorithm that demonstrates how to calculate the annualized historical volatility of a stock using NumPy.
Strategy Logic
- Data Ingestion: We subscribe to
SPYwith Daily resolution. - Historical Data: In the
on_datamethod, we request the last 30 days of closing price data. - Log Returns: We calculate the natural logarithm of the daily returns ($\ln(\frac{P_t}{P_{t-1}})$). Log returns are preferred in quantitative finance over simple percentage returns because they are time-additive.
- Standard Deviation: We use
np.stdto find the volatility of these returns. - Annualization: We multiply the daily volatility by the square root of 252 (the approximate number of trading days in a year) to get the annualized volatility.
Python Implementation
# region imports
from AlgorithmImports import *
import numpy as np
# endregion
class HistoricalVolatilityAlgorithm(QCAlgorithm):
def initialize(self):
"""
Initialize the algorithm settings, cash, and security subscriptions.
"""
self.set_start_date(2023, 1, 1) # Set Start Date
self.set_end_date(2024, 1, 1) # Set End Date
self.set_cash(100000) # Set Strategy Cash
# Add Equity with Daily resolution
self.symbol = self.add_equity("SPY", Resolution.DAILY).symbol
# Define the lookback period for volatility calculation (e.g., 30 days)
self.lookback = 30
def on_data(self, data: Slice):
"""
Event handler for new data. Calculates volatility every day.
"""
# Ensure the symbol is in the current data slice
if not data.contains_key(self.symbol):
return
# 1. Get Historical Data
# We request lookback + 1 days to calculate 'lookback' number of returns
history = self.history(self.symbol, self.lookback + 1, Resolution.DAILY)
# Check if we have enough data
if history.empty or len(history) < self.lookback + 1:
return
# Extract the 'close' column.
# history.loc[self.symbol] removes the symbol index, leaving a time-series
closes = history.loc[self.symbol]['close']
# 2. Calculate Log Returns
# Formula: R_t = ln(P_t / P_{t-1})
# We use numpy log and pandas shift to vectorize the operation
log_returns = np.log(closes / closes.shift(1)).dropna()
# 3. Calculate Standard Deviation (Volatility)
# ddof=1 uses the unbiased estimator (sample standard deviation)
daily_volatility = np.std(log_returns, ddof=1)
# 4. Annualize the Volatility
# Multiply by sqrt(252) for daily data
annualized_volatility = daily_volatility * np.sqrt(252)
# Log the result
self.log(f"{self.time} :: Annualized Volatility (30-day): {annualized_volatility:.4f}")
# Example Trading Logic:
# If volatility is high (> 20%), we might reduce position size (not implemented here)
Key NumPy Functions Used
np.log(array): Calculates the natural logarithm. Used here to transform price ratios into log returns.np.std(array, ddof=1): Calculates the standard deviation.ddof=1(Delta Degrees of Freedom) is used to calculate the sample standard deviation rather than the population standard deviation.np.sqrt(number): Calculates the square root. Used to scale the daily volatility up to an annual timeframe.
Q&A: Quantitative Analysis with NumPy
Q: Why use Log Returns instead of Simple Percentage Returns for volatility?
A: Log returns are theoretically preferred because they are time-additive and follow a normal distribution more closely than simple returns, which is a key assumption in many volatility models (like Black-Scholes). For short time steps (like daily), the difference is negligible, but log returns are the mathematical standard.
Q: Why multiply by np.sqrt(252)?
A: Volatility scales with the square root of time. Since standard deviation is a measure of dispersion, and variance scales linearly with time (assuming independent returns), the standard deviation scales with the square root of time. There are typically 252 trading days in a year.
Q: Can I use self.History with Minute resolution?
A: Yes. If you use Minute resolution, you must adjust the annualization factor. There are typically 390 minutes in a trading day (6.5 hours). The annualization factor would be np.sqrt(252 * 390).
Q: How does ddof=1 affect the calculation?
A: ddof=1 divides the sum of squared deviations by $(N-1)$ instead of $N$. This provides an unbiased estimator for the variance of a population when you are calculating it from a sample (the historical data).