Problem Description
Solution
To import and use your own CSV data in QuantConnect, you must create a custom class that inherits from PythonData. This class tells the Lean engine where to find your file (Source) and how to parse each line (Reader).
Here is a complete, step-by-step implementation.
Key Steps
- Host your CSV: Ensure your CSV is accessible via a direct URL (e.g., GitHub Raw, Dropbox) or located in the specific data folder if running locally.
- Define the Class: Create a class inheriting from
PythonData. - Implement
get_source: Returns the location of the file. - Implement
reader: Parses a single line of the CSV into aPythonDataobject. - Subscribe: Use
self.add_datain your algorithm'sinitializemethod.
Full Strategy Code
The following example assumes a CSV format of: Date, Open, High, Low, Close, Volume.
# region imports
from AlgorithmImports import *
from datetime import datetime
# endregion
class CustomCSVDataAlgorithm(QCAlgorithm):
def initialize(self):
self.set_start_date(2020, 1, 1)
self.set_end_date(2023, 1, 1)
self.set_cash(100000)
# Subscribe to the custom data.
# Arguments: Type, Ticker, Resolution
self.symbol = self.add_data(MyCustomData, "MY_CUSTOM_TICKER", Resolution.DAILY).symbol
def on_data(self, data):
# Check if our custom data exists in the current slice
if data.contains_key(self.symbol):
custom_bar = data[self.symbol]
# Example Strategy: Buy if price is above $100, otherwise liquidate
if custom_bar.close > 100:
self.set_holdings(self.symbol, 1.0)
else:
self.liquidate(self.symbol)
self.plot("Custom Data", "Price", custom_bar.close)
class MyCustomData(PythonData):
"""
Custom Data Class to parse CSV data.
Assumed CSV Format: Date, Open, High, Low, Close, Volume
Example Line: 2020-01-01,100,110,90,105,10000
"""
def get_source(self, config, date, is_live_mode):
# Define the source URL. This example uses a placeholder GitHub URL.
# You must replace this with your actual raw CSV URL.
source_url = "https://raw.githubusercontent.com/QuantConnect/Lean/master/Data/equity/usa/daily/spy.csv"
# If using local files, use SubscriptionTransportMedium.LOCAL_FILE
return SubscriptionDataSource(source_url, SubscriptionTransportMedium.REMOTE_FILE)
def reader(self, config, line, date, is_live_mode):
# Create a new instance of our custom data class
data = MyCustomData()
data.symbol = config.symbol
try:
# Split the CSV line by comma
# Note: Adjust the parsing logic based on your specific CSV structure
parts = line.split(',')
# 1. Parse Date (Column 0)
# Adjust format string '%Y%m%d %H:%M' etc based on your CSV
# The example SPY data used above is usually formatted YYYYMMDD in QC repo,
# but for standard CSVs it might be YYYY-MM-DD.
# Here we assume a standard YYYY-MM-DD format for demonstration:
data.time = datetime.strptime(parts[0], "%Y-%m-%d")
# 2. Parse OHLCV (Columns 1-5)
data.open = float(parts[1])
data.high = float(parts[2])
data.low = float(parts[3])
data.close = float(parts[4])
data.volume = float(parts[5])
# 3. Set the 'Value' property.
# This is required for the engine to know the "current price" of the asset.
data.value = data.close
# 4. Set the EndTime (Optional, defaults to Time)
data.end_time = data.time + timedelta(days=1)
return data
except ValueError:
# Return None if the line cannot be parsed (e.g., header row)
return None
Implementation Details
-
get_sourceMethod:- This method is called by Lean to find out where the data is located.
SubscriptionTransportMedium.REMOTE_FILE: Used for data hosted on the web (GitHub, Dropbox, S3).SubscriptionTransportMedium.LOCAL_FILE: Used if you are running Lean locally and the file is on your hard drive.
-
readerMethod:- This converts a raw string line from the CSV into a
PythonDataobject. - Crucial: You must set
data.symbol,data.time, anddata.value. Ifdata.valueis missing, the engine will not register the price.
- This converts a raw string line from the CSV into a
-
Date Parsing:
- The
datetime.strptimefunction is sensitive. Ensure the format string (e.g.,"%Y-%m-%d") matches your CSV exactly.
- The
Q&A: Custom Data in QuantConnect
Q: Can I use data with a resolution lower than Daily (e.g., Minute)?
A: Yes. In initialize, change Resolution.DAILY to Resolution.MINUTE. In your reader method, ensure your date parsing includes hours and minutes (e.g., "%Y-%m-%d %H:%M").
Q: My CSV has a header row. How do I handle it?
A: The reader method processes every line. If the first line contains text headers (like "Date,Open..."), the float() conversion will fail. The try-except block in the example above handles this gracefully by returning None, which tells Lean to skip that line.
Q: How do I access the data in on_data?
A: The data is passed via the Slice object. You can access it using the symbol object created in initialize (e.g., data[self.symbol]) or by using the string ticker (e.g., data["MY_CUSTOM_TICKER"]).
Q: Why is my custom data not appearing in the backtest?
A: Common reasons include:
- Date Range: The
set_start_dateandset_end_datedo not cover the dates inside your CSV. - Parsing Error: The
readermethod is failing silently. Addprint(line)inside the reader to debug. - Value Property: You forgot to set
data.value = ...in the reader.