Backtests in Strategy Analyzer give very different results on 3 separate computers, although everything is identical:
NT version: 8.1.6.3 64-bit – same build everywhere
Windows 11, same time zone
Data: db folder copied byte-for-byte + Repair Database on each PC
Scripts: exported/imported via .zip, no changes
Backtest: offline mode, no internet, identical instrument, timeframe, dates, session template, fill type, commissions
Results example: PC1: ~$100,000 net profit PC2: ~$145,000 PC3: ~$165,000
Differences in trade count, entries/exits, equity curve – not small errors. Tried: cache clear, re-import, multiple DB repairs, file hash check (db matches).
What could be the possible causes of such big discrepancies?
Did you verify that the downloaded historical data on all three machines is exactly identical? For example, it’s easy to accidentally download minute data instead of tick data, or select a slightly different date range. Even small differences in the underlying data can lead to large changes in backtest results. Also make sure that all strategy analyzer properties are identical on each PC, including fill type, order resolution, trading hours template, commissions, and any other settings used when running the test.
That said, I personally would not rely too heavily on NT backtesting results. One issue I’ve noticed is that indicator values calculated during a backtest can differ significantly from those calculated during market replay. For example, the EMA value produced by NT in strategy analyzer may not match or is significantly different from the EMA value you see in Replay for the same bar. This suggests that the calculation used during backtesting is not same as the one used during market replay.
Because of this, I prefer not to rely on NT’s built-in indicators for strategy logic. Instead, I calculate those values myself inside the strategy. For example, rather than using the built-in EMA, I maintain my own rolling window of n bars and compute the EMA manually so the calculation is fully deterministic. Although not exact, this resulted in closer values in market replay for me.
My typical backtesting workflow is:
Run the strategy without using NT indicator derived values.
Export the raw bar data and any required fields to a database.
Use Python to simulate bar-by-bar streaming and run the strategy logic there.
Use Market Replay with the same strategy for confirmation.
Thank you for the insights @WaleeTheRobot. That is really helpful and explains some of the discrepancies I have seen while backtesting. I am going to try your method this weekend.
In addition, I suggest using time series. The reason is that the exported data you use for backtesting tends to match time series over something like a tick or range. For example, if you export that EMA value for the bars, it seems to match more when you run market replay in faster speed against it. In a non-timeseries like a tick chart, it can be off a little, but it can start to compound in replay making the backtest results not reliable.
Another tip, use the values from completed bars. The current bar in backtest is not completed yet in real time. It should only be used to check for potential entry, stop and target. For example, if the current bar being checked has an entry, enter, but also check for stops and targets in the same bar. You don’t know which hit first, but I prioritize the stop if both are hit in that bar.
Thank you for the feedback. I found much of what you wrote to be extremely useful. I would be very grateful if you could share the system architecture for simulating streaming data and executing strategy logic, so that I could build a similar system myself.
Ideally, you want to have a single source of truth so the features you build and export are sequentially and logically the same. In Python, just have AI recreate exactly the same strategy you are using and just tell it to build a script to read from the database.