🚀 AI One-Click Generation of quantconnect Strategy Code

How Do I Import and Use Custom Data (e.g., CSV) in a QuantConnect Backtest?

quantconnect | Published on: 2025/12/10
The following content is generated by EasyQuant .

Problem Description

How do I import and use my own CSV data for backtesting a trading strategy?

Solution

To import and use your own CSV data in QuantConnect, you must create a custom class that inherits from PythonData. This class tells the Lean engine where to find your file (Source) and how to parse each line (Reader).

Here is a complete, step-by-step implementation.

Key Steps

  1. Host your CSV: Ensure your CSV is accessible via a direct URL (e.g., GitHub Raw, Dropbox) or located in the specific data folder if running locally.
  2. Define the Class: Create a class inheriting from PythonData.
  3. Implement get_source: Returns the location of the file.
  4. Implement reader: Parses a single line of the CSV into a PythonData object.
  5. Subscribe: Use self.add_data in your algorithm's initialize method.

Full Strategy Code

The following example assumes a CSV format of: Date, Open, High, Low, Close, Volume.

# region imports
from AlgorithmImports import *
from datetime import datetime
# endregion

class CustomCSVDataAlgorithm(QCAlgorithm):

    def initialize(self):
        self.set_start_date(2020, 1, 1)
        self.set_end_date(2023, 1, 1)
        self.set_cash(100000)

        # Subscribe to the custom data.
        # Arguments: Type, Ticker, Resolution
        self.symbol = self.add_data(MyCustomData, "MY_CUSTOM_TICKER", Resolution.DAILY).symbol

    def on_data(self, data):
        # Check if our custom data exists in the current slice
        if data.contains_key(self.symbol):
            custom_bar = data[self.symbol]
            
            # Example Strategy: Buy if price is above $100, otherwise liquidate
            if custom_bar.close > 100:
                self.set_holdings(self.symbol, 1.0)
            else:
                self.liquidate(self.symbol)
                
            self.plot("Custom Data", "Price", custom_bar.close)

class MyCustomData(PythonData):
    """
    Custom Data Class to parse CSV data.
    Assumed CSV Format: Date, Open, High, Low, Close, Volume
    Example Line: 2020-01-01,100,110,90,105,10000
    """

    def get_source(self, config, date, is_live_mode):
        # Define the source URL. This example uses a placeholder GitHub URL.
        # You must replace this with your actual raw CSV URL.
        source_url = "https://raw.githubusercontent.com/QuantConnect/Lean/master/Data/equity/usa/daily/spy.csv"
        
        # If using local files, use SubscriptionTransportMedium.LOCAL_FILE
        return SubscriptionDataSource(source_url, SubscriptionTransportMedium.REMOTE_FILE)

    def reader(self, config, line, date, is_live_mode):
        # Create a new instance of our custom data class
        data = MyCustomData()
        data.symbol = config.symbol

        try:
            # Split the CSV line by comma
            # Note: Adjust the parsing logic based on your specific CSV structure
            parts = line.split(',')

            # 1. Parse Date (Column 0)
            # Adjust format string '%Y%m%d %H:%M' etc based on your CSV
            # The example SPY data used above is usually formatted YYYYMMDD in QC repo, 
            # but for standard CSVs it might be YYYY-MM-DD.
            # Here we assume a standard YYYY-MM-DD format for demonstration:
            data.time = datetime.strptime(parts[0], "%Y-%m-%d") 
            
            # 2. Parse OHLCV (Columns 1-5)
            data.open = float(parts[1])
            data.high = float(parts[2])
            data.low = float(parts[3])
            data.close = float(parts[4])
            data.volume = float(parts[5])

            # 3. Set the 'Value' property. 
            # This is required for the engine to know the "current price" of the asset.
            data.value = data.close

            # 4. Set the EndTime (Optional, defaults to Time)
            data.end_time = data.time + timedelta(days=1)

            return data

        except ValueError:
            # Return None if the line cannot be parsed (e.g., header row)
            return None

Implementation Details

  1. get_source Method:

    • This method is called by Lean to find out where the data is located.
    • SubscriptionTransportMedium.REMOTE_FILE: Used for data hosted on the web (GitHub, Dropbox, S3).
    • SubscriptionTransportMedium.LOCAL_FILE: Used if you are running Lean locally and the file is on your hard drive.
  2. reader Method:

    • This converts a raw string line from the CSV into a PythonData object.
    • Crucial: You must set data.symbol, data.time, and data.value. If data.value is missing, the engine will not register the price.
  3. Date Parsing:

    • The datetime.strptime function is sensitive. Ensure the format string (e.g., "%Y-%m-%d") matches your CSV exactly.

Q&A: Custom Data in QuantConnect

Q: Can I use data with a resolution lower than Daily (e.g., Minute)?
A: Yes. In initialize, change Resolution.DAILY to Resolution.MINUTE. In your reader method, ensure your date parsing includes hours and minutes (e.g., "%Y-%m-%d %H:%M").

Q: My CSV has a header row. How do I handle it?
A: The reader method processes every line. If the first line contains text headers (like "Date,Open..."), the float() conversion will fail. The try-except block in the example above handles this gracefully by returning None, which tells Lean to skip that line.

Q: How do I access the data in on_data?
A: The data is passed via the Slice object. You can access it using the symbol object created in initialize (e.g., data[self.symbol]) or by using the string ticker (e.g., data["MY_CUSTOM_TICKER"]).

Q: Why is my custom data not appearing in the backtest?
A: Common reasons include:

  1. Date Range: The set_start_date and set_end_date do not cover the dates inside your CSV.
  2. Parsing Error: The reader method is failing silently. Add print(line) inside the reader to debug.
  3. Value Property: You forgot to set data.value = ... in the reader.