🚀 AI One-Click Generation of quantconnect Strategy Code

How to Implement a Statistical Arbitrage Pairs Trading Strategy in QuantConnect

quantconnect | Published on: 2025/12/10
The following content is generated by EasyQuant .

Problem Description

Can you help me write a pairs trading strategy for two correlated stocks?

Solution

Here is a complete, robust Pairs Trading strategy implemented in Python for the QuantConnect Lean Engine.

Strategy Overview

This strategy utilizes a Mean Reversion approach based on the cointegration of two correlated assets (Pepsi and Coca-Cola).

  1. Data Ingestion: We subscribe to PEP and KO at Daily resolution.
  2. Statistical Calculation:
    • We use a rolling window (e.g., 60 days) of historical closing prices.
    • We calculate the Hedge Ratio using Ordinary Least Squares (OLS) regression ($Price_Y = \beta \times Price_X + \alpha$).
    • We derive the Spread: $Spread = Price_Y - (\beta \times Price_X)$.
    • We calculate the Z-Score of the current spread relative to its historical mean and standard deviation.
  3. Entry Logic:
    • Short the Spread (Sell Y, Buy X) if Z-Score > 2.0 (Spread is statistically too high).
    • Long the Spread (Buy Y, Sell X) if Z-Score < -2.0 (Spread is statistically too low).
  4. Exit Logic:
    • Close all positions when the Z-Score reverts to 0 (Mean Reversion).

Python Implementation

# region imports
from AlgorithmImports import *
import numpy as np
import statsmodels.api as sm
# endregion

class PairsTradingAlgorithm(QCAlgorithm):

    def initialize(self):
        # 1. Set Setup Parameters
        self.set_start_date(2018, 1, 1)
        self.set_end_date(2023, 1, 1)
        self.set_cash(100000)

        # 2. Add Assets (Pepsi and Coca-Cola)
        self.pep = self.add_equity("PEP", Resolution.DAILY).symbol
        self.ko = self.add_equity("KO", Resolution.DAILY).symbol
        
        # 3. Strategy Parameters
        self.lookback = 60          # Days for regression calculation
        self.entry_threshold = 2.0  # Z-Score to enter trade
        self.exit_threshold = 0.0   # Z-Score to exit trade (mean reversion)
        
        # Warm up period to ensure we have data immediately
        self.set_warm_up(self.lookback)

    def on_data(self, data: Slice):
        # Ensure we are not warming up and data exists for both symbols
        if self.is_warming_up:
            return
        
        if not (data.contains_key(self.pep) and data.contains_key(self.ko)):
            return

        # 1. Get Historical Data
        history = self.history([self.pep, self.ko], self.lookback, Resolution.DAILY)
        
        # Check if history is empty or incomplete
        if history.empty or 'close' not in history.columns:
            return
            
        # Unstack to get a DataFrame where columns are symbols and rows are time
        df = history['close'].unstack(level=0)
        
        # Ensure we have enough data points after unstacking
        if len(df) < self.lookback:
            return

        # 2. Perform OLS Regression to find Hedge Ratio (Beta)
        # Y = PEP, X = KO
        y_vals = df[self.pep].values
        x_vals = df[self.ko].values
        
        # Add constant for OLS (Intercept)
        x_with_const = sm.add_constant(x_vals)
        
        model = sm.OLS(y_vals, x_with_const).fit()
        beta = model.params[1]
        intercept = model.params[0]

        # 3. Calculate Spread and Z-Score
        # Spread = Y - (Beta * X + Intercept)
        # We calculate the spread series for the whole lookback window to get stats
        spread_series = y_vals - (beta * x_vals + intercept)
        
        mean_spread = np.mean(spread_series)
        std_spread = np.std(spread_series)
        
        current_spread = spread_series[-1]
        
        if std_spread == 0:
            return

        z_score = (current_spread - mean_spread) / std_spread

        # 4. Execution Logic
        
        # Check if we have open positions
        invested = self.portfolio.invested
        
        # --- Entry Logic ---
        if not invested:
            # Short the Spread: Spread is too high, expect it to drop.
            # Sell PEP (Y), Buy KO (X)
            if z_score > self.entry_threshold:
                self.set_holdings(self.pep, -0.5)
                self.set_holdings(self.ko, 0.5)
                self.debug(f"Entry Short Spread | Z-Score: {z_score:.2f}")

            # Long the Spread: Spread is too low, expect it to rise.
            # Buy PEP (Y), Sell KO (X)
            elif z_score < -self.entry_threshold:
                self.set_holdings(self.pep, 0.5)
                self.set_holdings(self.ko, -0.5)
                self.debug(f"Entry Long Spread | Z-Score: {z_score:.2f}")

        # --- Exit Logic ---
        else:
            # We exit when the spread reverts to the mean (crosses 0)
            # Depending on direction, we check if it crossed the exit threshold
            
            # If we are Short the Spread (Short PEP, Long KO)
            if self.portfolio[self.pep].is_short and z_score <= self.exit_threshold:
                self.liquidate()
                self.debug(f"Exit Short Spread (Mean Reversion) | Z-Score: {z_score:.2f}")
            
            # If we are Long the Spread (Long PEP, Short KO)
            elif self.portfolio[self.pep].is_long and z_score >= self.exit_threshold:
                self.liquidate()
                self.debug(f"Exit Long Spread (Mean Reversion) | Z-Score: {z_score:.2f}")

Key Components Explained

  1. initialize:

    • We select PEP and KO as our pair. These are historically cointegrated consumer staples.
    • self.lookback = 60 defines the rolling window size. A shorter window makes the strategy more reactive but noisier; a longer window makes it smoother but slower.
  2. on_data:

    • Data Handling: We use self.history to fetch the last 60 days of closing prices. The .unstack(level=0) method is crucial to transform the multi-index dataframe into a format where columns represent symbols.
    • OLS Regression: We use statsmodels (sm.OLS) to dynamically calculate the relationship between the two stocks. This provides the beta (hedge ratio).
    • Z-Score: This normalizes the spread. A Z-Score of +2.0 means the spread is 2 standard deviations above its mean.
  3. Execution:

    • set_holdings: We allocate 50% of capital to the long side and 50% to the short side. This creates a dollar-neutral portfolio (ignoring beta weighting for simplicity in this example).
    • Mean Reversion: The core philosophy is that the Z-Score will eventually return to 0. We liquidate positions when this happens to capture the profit.

Q&A: Pairs Trading on QuantConnect

Q: Why use Resolution.DAILY instead of MINUTE?
A: Pairs trading relies on statistical relationships that are often more stable on daily timeframes. Using minute data introduces microstructure noise and bid-ask bounce, which can generate false signals unless the strategy is highly optimized for high-frequency execution.

Q: How can I make the portfolio "Beta Neutral" instead of "Dollar Neutral"?
A: In the code above, we use 0.5 and -0.5 (Dollar Neutral). To be Beta Neutral, you would adjust the weights based on the calculated beta. For example, if Beta is 1.2, for every $1.00 of X you buy, you should sell $1.20 of Y.

Q: What happens if the correlation breaks?
A: This is the main risk of pairs trading. If the fundamental relationship between PEP and KO changes (e.g., one company is acquired), the spread may drift indefinitely (divergence). To mitigate this, you should add a "Stop Loss" logic, for example, liquidating if the Z-Score expands beyond 4.0.