Importing Historical Data for volumetric analysis

I have spent over a week trying to figure this out.

I have data that i converted from SCID sierra chart files. the code tries to take the proper approach at converting the SCID tick data into NT8 data.

My biggest issue is when i finally get bid and ask values to populate in a 5 min chart, when analyzing the volumetric data the total trades always equal the volume. And this is not the case in the live data feed.

I’m looking for some definitive guidance on what the expected format to get volumetric reconstruction is, below is my conversion script. If anyone has any definitive experience with this please chime in, im really tired of shooting in the dark:

   """

SCID to NinjaTrader 8 Converter (TickStory Format)

Converts Sierra Chart SCID tick data files to NinjaTrader 8 import format.
Supports single file conversion or batch conversion of multiple contracts.

Format: yyyyMMdd HHmmss fffffff;last;bid;ask;volume

  • Tick Replay format with bid/ask for volumetric analysis
  • Sub-second precision (7 decimal places)
  • Validates bid <= last <= ask
  • Uses multiprocessing for fast conversion

Import Settings in NT8:

  • Format: “NinjaTrader (start of bar timestamps)”
  • Data Type: “Last”
  • Time Zone: “UTC”

Usage:
Single file: python scid_to_nt8_converter.py MNQZ25-CME.scid
Batch (2019-2025): python scid_to_nt8_converter.py --batch
Batch (all): python scid_to_nt8_converter.py --batch-all

Author: Created for SCID to NT8 tick data import
Date: November 2025
“”"

import struct
import os
import glob
import subprocess
import sys
from datetime import date, datetime, time, timedelta
from multiprocessing import Pool, cpu_count

SCID file format constants

SIZE_HEADER = 0x38 # 56 bytes
SIZE_RECORD = 0x28 # 40 bytes

def deserialize_datetime(excel_date_time):
“”“Convert SCID datetime (Excel format) to Python datetime.”“”
try:
date_tok = int(excel_date_time)
time_tok = round(86400 * (excel_date_time - date_tok))
d = date(1899, 12, 30) + timedelta(date_tok)

    factor = time_tok / 86399.99
    hours = int((factor * 24) % 24)
    minutes = int((factor * 24 * 60) % 60)
    seconds = int((factor * 24 * 60 * 60) % 60)
    
    t = time(hours, minutes, seconds, 0)
    return datetime.combine(d, t)
except:
    return None

def process_chunk(args):
“”“Process a chunk of SCID records in parallel.”“”
input_file, start_record, num_records = args
lines =

with open(input_file, 'rb') as f:
    f.seek(SIZE_HEADER + start_record * SIZE_RECORD)
    
    for _ in range(num_records):
        data = f.read(SIZE_RECORD)
        if len(data) < SIZE_RECORD:
            break
        
        # Unpack SCID record: datetime, OHLC, volume, trades, bid_vol, ask_vol
        dt_raw, o, h, l, c, vol, trades, bid_vol, ask_vol = struct.unpack('<q4f4I', data)
        
        # SCID datetime is microseconds since epoch
        # Convert to days for deserialize_datetime
        dt_days = dt_raw / 1000000.0 / 86400.0
        dt = deserialize_datetime(dt_days)
        
        if dt is None or c <= 0:
            continue
        
        # Extract microseconds for sub-second precision
        microseconds = dt_raw % 1000000
        
        # Per Sierra Chart docs for tick data:
        # High = Ask price, Low = Bid price, Close = Last price
        last_price = c
        bid_price = l
        ask_price = h
        
        # NT8 requires: Bid <= Last <= Ask
        if bid_price > last_price:
            bid_price = last_price
        if ask_price < last_price:
            ask_price = last_price
        if bid_price > ask_price:
            bid_price = ask_price = last_price
        
        # Format: yyyyMMdd HHmmss fffffff;last;bid;ask;volume
        # Match NT8 Tick Replay format exactly (no spaces after semicolons)
        datetime_str = dt.strftime('%Y%m%d %H%M%S')
        fractional = f"{microseconds:07d}"
        line = f"{datetime_str} {fractional};{last_price:.2f};{bid_price:.2f};{ask_price:.2f};{vol}\n"
        lines.append(line)

return lines

def convert_scid_to_nt8(input_file, output_file=None, num_workers=None):
“”"
Convert a single SCID file to NT8 format.

Args:
    input_file: Path to SCID file
    output_file: Optional output path (auto-generated if None)
    num_workers: Number of CPU cores to use (auto-detect if None)
"""
if not os.path.exists(input_file):
    print(f"Error: {input_file} not found")
    return False

file_size = os.path.getsize(input_file)
total_records = (file_size - SIZE_HEADER) // SIZE_RECORD

if num_workers is None:
    num_workers = cpu_count()

# Determine output filename from SCID filename
if output_file is None:
    base = os.path.splitext(os.path.basename(input_file))[0]
    # Extract contract info (e.g., MNQZ25-CME -> MNQ 12-25)
    if len(base) >= 6:
        symbol = base[:3]
        month_code = base[3]
        year = base[4:6]
        
        month_map = {'F':1,'G':2,'H':3,'J':4,'K':5,'M':6,'N':7,'Q':8,'U':9,'V':10,'X':11,'Z':12}
        month_num = month_map.get(month_code, 1)
        
        output_file = f"NT8_Imports/{symbol} {month_num:02d}-{year}.txt"
    else:
        output_file = f"NT8_Imports/{base}.txt"

# Create output directory
os.makedirs(os.path.dirname(output_file), exist_ok=True)

print(f"\n{'='*80}")
print(f"Converting: {os.path.basename(input_file)}")
print(f"Output: {output_file}")
print(f"Format: yyyyMMdd HHmmss fffffff;last;bid;ask;volume")
print(f"Total records: {total_records:,}")
print(f"Workers: {num_workers}")
print(f"{'='*80}\n")

# Create chunks for parallel processing
chunk_size = max(1, total_records // num_workers)
chunks = []
for i in range(num_workers):
    start = i * chunk_size
    if i == num_workers - 1:
        count = total_records - start
    else:
        count = chunk_size
    if count > 0:
        chunks.append((input_file, start, count))

print("Processing in parallel...")
with Pool(num_workers) as pool:
    results = pool.map(process_chunk, chunks)

# Combine and sort by timestamp
print("Sorting and writing output...")
all_lines = []
for chunk_lines in results:
    all_lines.extend(chunk_lines)

all_lines.sort(key=lambda x: x.split(';')[0])

with open(output_file, 'w') as f:
    f.writelines(all_lines)

print(f"✓ {output_file}: {len(all_lines):,} records\n")
print(f"{'='*80}")
print("COMPLETE!")
print(f"{'='*80}\n")

return True

def batch_convert(year_filter=None):
“”"
Batch convert multiple SCID files.

Args:
    year_filter: List of year codes to include (e.g., ['19','20','21','22','23','24','25'])
                If None, converts all MNQ files found
"""
# Find all MNQ SCID files
all_files = sorted(glob.glob("MNQ*-CME.scid"))

if year_filter:
    # Filter for specific years
    scid_files = [f for f in all_files if any(f"MNQ{m}{y}" in f 
                  for m in ['F','G','H','J','K','M','N','Q','U','V','X','Z'] 
                  for y in year_filter)]
    year_range = f"{year_filter[0]}-{year_filter[-1]}"
else:
    scid_files = all_files
    year_range = "all"

if not scid_files:
    print(f"No MNQ SCID files found!")
    return

print(f"\n{'='*80}")
print(f"BATCH CONVERSION: {len(scid_files)} MNQ contracts ({year_range})")
print(f"{'='*80}\n")

total_size = sum(os.path.getsize(f) for f in scid_files) / (1024**3)
print(f"Total data size: {total_size:.2f} GB\n")

success_count = 0
for i, scid_file in enumerate(scid_files, 1):
    file_size_gb = os.path.getsize(scid_file) / (1024**3)
    print(f"\n[{i}/{len(scid_files)}] {scid_file} ({file_size_gb:.2f} GB)")
    print("-" * 80)
    
    if convert_scid_to_nt8(scid_file):
        success_count += 1
        print(f"✓ {scid_file} complete")
    else:
        print(f"❌ ERROR converting {scid_file}")

print(f"\n{'='*80}")
print(f"BATCH CONVERSION COMPLETE!")
print(f"Successfully converted: {success_count}/{len(scid_files)} files")
print(f"All files saved to: NT8_Imports\\")
print(f"{'='*80}\n")
print("\nNext steps:")
print("1. Open NT8: Tools → Import → Historical Data")
print("2. Format: 'NinjaTrader (start of bar timestamps)'")
print("3. Data Type: 'Last'")
print("4. Time Zone: 'UTC'")
print("5. Import all .txt files from NT8_Imports folder")
print("\nNote: Tick Replay format with bid/ask for accurate volumetric analysis!")
print(f"{'='*80}\n")

def print_usage():
“”“Print usage instructions.”“”
print(doc)
print(“\nExamples:”)
print(" python scid_to_nt8_converter.py MNQZ25-CME.scid")
print(" python scid_to_nt8_converter.py --batch")
print(" python scid_to_nt8_converter.py --batch-all")
print()

if name == “main”:
if len(sys.argv) < 2:
print_usage()
sys.exit(1)

arg = sys.argv[1]

if arg == "--batch":
    # Batch convert 2019-2025
    batch_convert(year_filter=['19','20','21','22','23','24','25'])
elif arg == "--batch-all":
    # Batch convert all MNQ files
    batch_convert(year_filter=None)
elif arg in ["-h", "--help"]:
    print_usage()
else:
    # Single file conversion
    convert_scid_to_nt8(arg)

PS C:\SierraChart\Data> python read_scid_sample.py
First 5 records from MNQZ25-CME.scid:

DateTime Open High Low Close Volume Trades BidVol AskVol

2025-09-07 22:00:00.000000 0.00 23908.50 23908.50 23908.50 1 2 2 0
2025-09-07 22:00:00.000001 0 0.00 23908.50 23908.50 23908.50 1 1 1 0
2025-09-07 22:00:00.088000 0.00 23929.50 23905.00 23929.50 1 4 0 4
2025-09-07 22:00:00.088001 0.00 23929.50 23905.00 23929.75 1 1 0 1
2025-09-07 22:00:00.088002 0.00 23929.50 23905.00 23930.00 1 1 0 1

Sometimes I see the most complicated approaches to things and I wonder if there is even any edge in whatever the aim is, and after reviewing, no, there ususally isn’t any edge.

NinjaTrader already has tick data that you can download, so why are you going through the most convoluted process to do the most simple things.

To be frank, will this help you make money on the markets or are you simply keeping yourself busy with IT work.

There are many wierd pitfalls that people can fall into when going into trading and the one that I see the most is the IT related branch. You’re doing all this fanciful IT transformations, but as to whether you’re making money, or if the IT transformation will help in making money…probably not.

In Ninjatrader, you can download historical tick data, but I guess you can spend your intellectual capital with IT related data transformations.

Godspeed Sir, Godspeed.

To your question: You can probably ask Gemini or ChatGPT, and they’ll sort your IT work for you.

1 Like

The question to your answer is simple, and might have been obvious if instead of antagonizing you provided some quick thought.

NT8 only provides 1year of tick data, I have bought dxfeed and they provide 2 years. I want 5 years, and sierra chart you can get this for free from them, with a little elbow grease you can make it work. I can guarantee you im not the only one, nor the last, as the ones that seem to get it working never follow up with some information.

So thanks for your comment, but how my process yields or doesn’t shouldn’t be of your concern.

Note:

SCID files bundle trades, to properly get NT8 transformations, one must script a bundle loop where the logic finds the start and end of the bundle, it aggregates it into one proper tick.

This should yield one tick/trade. Also one must ignore their Volume column as this is just a trade counter, will always be 1.

With these changes I managed to get this to work, and now have over 10 years of tick level backtest data.

Cool man. Didn’t know that. Hope you get it sorted.

I never understood why people need years of data.
I use the last 21 hours (ETH) to trade and it works perfect. I never needed older data.
Markov’s chain confirmed to me that you don’t need that much data at all.

Here you find alot of information:
https://www.youtube.com/results?search_query=markov+chain+monte+carlo

1 Like

Great video to watch. The genesis of that idea is very interesting and what you’ve inferred from it to achieve your objective, is exactly the point. Its incredible how these types of ideas generate alpha from nuclear fission to helping you arrive at an improved process of reaching a solution. Goodstuff.