1. 程式人生 > 實用技巧 >金融業的人工智慧如何最終開始相信您的回測2 3

金融業的人工智慧如何最終開始相信您的回測2 3

模擬,風險和度量 (SIMULATIONS, RISKS, AND METRICS)

Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works without seeking professional advice. See our

Reader Terms for details.

Towards Data Science編輯的注意事項: 儘管我們允許獨立作者按照我們的 規則和指南 發表文章 ,但我們不認可每位作者的貢獻。 您不應在未徵求專業意見的情況下依賴作者的作品。 有關 詳細資訊, 請參見我們的 閱讀器條款

Let’s disassemble backtests and make them great again :) In the previous part, we have reviewed the main dangers of the classic backtesting routine with the historical data and standard metrics related to the strategy performance. We have introduced new statistics groups related to data, models, efficiency, and trades that can present more insights about the underlying strategy.

讓我們分解回溯測試,使其再次變得更好:)在上一部分中 ,我們使用與策略績效相關的歷史資料和標準指標,回顧了經典回溯例程的主要危險。 我們引入了與資料,模型,效率和交易相關的新統計組,可以提供有關基礎策略的更多見解。

Image for post
https://harrypotter.fandom.com/wiki/Doubling_Charm, no copyright infringement is intended https://harrypotter.fandom.com/wiki/Doubling_Charm ,無意侵犯版權

However, as was mentioned, backtesting on a single historical path that was generated by some extremely complex stochastic process with numerous variables doesn’t seem adequate at all. It allows neither probabilistic interpretation, neither a scenario-based view on the strategy. This, second part of the article is about techniques that open to us a broader approach to validation of our quantitative strategies:

但是,正如前面提到的,對由具有多個變數的極其複雜的隨機過程所產生的單一歷史路徑進行回測似乎根本不夠。 它既不允許概率解釋,也不允許基於策略的基於場景的檢視。 本文的第二部分介紹了一些技術,這些技術為我們提供了更廣泛的方法來驗證我們的定量策略:

  • Backtesting through cross-validation: first, we will start a technique that will allow sampling stochastic data without knowing an explicit data generation model using cross-validation;

    通過交叉驗證進行回測:首先,我們將開始一項技術,該技術將允許在不知道使用交叉驗證的顯式資料生成模型的情況下對隨機資料進行取樣。

  • Backtesting on synthetic data: then, we will show how to use stochastic modeling and generation of sample paths for backtesting;

    對合成資料進行回測:然後,我們將展示如何使用隨機建模和樣本路徑生成進行回測;

  • Stress-scenario-based backtesting: lastly, we will check how to sample synthetic data with controlling the main factors, hence, allowing us to model exceptional situations.

    基於壓力場景的回測:最後,我們將檢查如何在控制主要因素的情況下對合成資料進行取樣,從而使我們能夠對異常情況進行建模。

Like most of my recent articles, this one is inspired by books of Dr. Lopez De Prado and I recommend them to dive deeper into the topic. As always, the source could you can find on my GitHub.

像我最近的大多數文章一樣,該文章的靈感來自洛佩茲·德普拉多(Lopez De Prado)博士的書,我建議他們更深入地研究該主題。 與往常一樣,您可以在我的GitHub上找到源。

通過交叉驗證進行回測 (Backtesting through cross-validation)

Long story short, we want to have more than one historical path to check our strategy performance. We could sample it somehow from the historical data, but in what way? We could take different parts from different times from the whole dataset as training and testing sets. For generating these parts we already know the mechanism — it’s called cross-validation. For our purposes, we need as rich as possible a set of simulations — all possible combinations of subsets for training and testing the algorithm, which brings us the Combinatorial Purged Cross-Validation algorithm:

長話短說,我們希望有一條歷史之路來檢驗我們的戰略績效。 我們可以從歷史資料中以某種方式對其進行取樣 ,但是用什麼方式呢? 我們可以將整個資料集中來自不同時間的不同部分作為訓練和測試集。 為了生成這些部分,我們已經知道了這種機制-稱為交叉驗證 。 為了我們的目的,我們需要儘可能豐富的一組模擬-訓練和測試該演算法的子集的所有可能組合 ,這為我們帶來了組合清除交叉驗證演算法:

Image for post
https://www.amazon.com/Advances-Financial-Machine-Learning-Marcos/dp/1119482089 and https://www.amazon.com/Advances-Financial-Machine-Learning-Marcos/dp/1119482089http://www.quantresearch.org/Innovations.htm http://www.quantresearch.org/Innovations.htm

For example, we split the whole dataset into N = 6 groups G1…G6, from which we take 2 for testing purposes randomly. Hence, we can have options of 15 splits shown above as columns S1…S15. In each of these splits, 2 groups are for testing and 4 groups are for training with all combinations present. Now, we can test our algorithm 15 times more than with a single backtest and obtaining the distribution of related Sharpe ratios and other risk measures. Despite all the advantages, this method has some drawbacks:

例如,我們將整個資料集分為N = 6組G1…G6 ,從中我們隨機抽取2個用於測試目的。 因此,我們可以有15個拆分的選項,如列S1…S15所示 。 在所有這些分組中,有2組用於測試,4組用於訓練並使用所有存在的組合。 現在, 我們可以比單次回測多測試15次演算法 ,並獲得相關的Sharpe比率和其他風險度量的分佈 。 儘管具有所有優點,但此方法有一些缺點:

  • it doesn’t allow historical interpretation,

    不允許歷史解釋

  • data leakages are possible and it needs to be fixed separately.

    資料可能洩漏 ,需要單獨修復。

對綜合資料進行回測 (Backtesting on synthetic data)

Combinatorial Purged Cross-Validation is a powerful tool, but it limits us to the subsets of the data available. In financial mathematics, we are using Monte-Carlo simulations for thousands and millions of times to get accurate estimates. For example, in derivatives pricing, we use stochastic differential equations to simulate underlying prices, and depending on the equation, these stochastic simulations can look very different:

組合清除交叉驗證是一個功能強大的工具,但將我們限制為可用資料的子集。 在金融數學中,我們使用了成千上萬次的蒙特卡洛模擬來獲得準確的估計。 例如,在衍生產品定價中,我們使用隨機微分方程式來模擬基礎價格,並且根據方程式,這些隨機模擬可能看起來非常不同:

Image for post
Image for post
Image for post
http://www.turingfinance.com/random-walks-down-wall-street-stochastic-processes-in-python/# http://www.turingfinance.com/random-walks-down-wall-street-stochastic-processes-in-python/#的插圖

Such simulations can give us many different backtest data, but we have two problems here: we don’t know from which exactly stochastic process financial data is sampled from, and how to re-create exogenous variables as fundamentals, sentiment, etc.

這樣的模擬可以為我們提供許多不同的回測資料,但是這裡存在兩個問題:我們不知道從哪個隨機過程財務資料中準確取樣,以及如何重新建立作為基本面,情緒等的外生變數

  • The first problem can be solved via the calibration process, i.e. finding exact values of parameters as drift, volatility, jump probability, mean-reversion coefficient, etc.

    第一個問題可以通過校準過程解決,即找到諸如漂移,波動率,跳躍概率,均值回覆係數等引數的精確值。

  • The second one is more sophisticated. In the code I used for the experiments, I am training a separate machine learning model to predict high, low, open prices and volume using the close price only. Seems like overkill (and overfit), but we don’t aim to predict anything here, just to replicate underlying dynamics, hence, it’s more or less legit (but doubtful, if you know better approaches — let me know please).

    第二個更復雜。 在用於實驗的程式碼中 ,我正在訓練一個單獨的機器學習模型,以僅使用收盤價來預測高,低,開盤價和交易量。 看起來像是過度殺傷(和過度擬合),但我們不打算在這裡預測任何東西,僅僅是為了複製潛在的動態,因此,它或多或少是合法的(但值得懷疑,如果您知道更好的方法,請告訴我)。

The end result is a distribution of backtest metrics over different scenarios, the same as with the combinatorial cross-validation approach.

最終結果是回測指標在不同情況下的分佈,與組合交叉驗證方法相同。

壓力情景回測 (Stress scenario backtesting)

Simulations can be used also for the generation of specific regimes and scenarios we’re interested in. For example, we are interested in knowing how our strategy will behave in cases of sudden market falls. How we could do it? If we take just historical data, we can find 2–3 of such crises depending on the market. Here, Monte-Carlo simulations are useful again, but we need to pick the parameters of the particular stochastic process very carefully: to model the exact risk that we are testing this strategy against. For example, for market falls described above, we can simulate the jump-diffusion process with a negative jump size and corresponding frequency.

模擬還可以用於生成我們感興趣的特定機制和場景。例如,我們有興趣瞭解在市場突然下跌的情況下我們的策略將如何表現。 我們該怎麼做? 如果僅採用歷史資料,則取決於市場,我們會發現2-3種此類危機。 在這裡,蒙特卡洛模擬再次很有用,但是我們需要非常仔細地選擇特定隨機過程的引數:為測試該策略所針對的確切風險建模。 例如,對於上述市場下跌,我們可以模擬具有負跳數和相應頻率的跳擴散過程。

There are also other data-driven ways to simulate realistic scenarios based on the generative machine learning models as GANs. There are several promising approaches for both returns time series generation and correlation matrices sampling:

還有其他資料驅動的方法,可以基於生成的機器學習模型(如GAN)來模擬現實情況。 對於返回時間序列生成和相關矩陣取樣,有幾種有希望的方法:

Image for post
Image for post
Image for post
http://www.corrgan.io/predict http://www.corrgan.io/predict中檢查自己

However, usually, GANs don’t allow us controlled scenario generation, since neural representations are entangled, i.e. we can’t tell where is the “button” for manipulating drift, volatility, or another financial variable. Variational autoencoders could be an interesting approach here, I wrote an article on disentangled representation learning a while ago which might be useful here.

但是,通常,GAN 不允許我們生成受控的情景 ,因為神經表示會糾纏在一起,即我們無法分辨操縱漂移,波動率或其他金融變數的“按鈕”在哪裡。 變體自動編碼器在這裡可能是一種有趣的方法,我寫了一篇有關解纏表示的文章,前一陣子在這裡可能很有用。

Image for post
A brief illustration of beta-VAEs trying to disentangle different market properties from raw data. More details at https://towardsdatascience.com/gans-vs-odes-the-end-of-mathematical-modeling-ec158f04acb9
beta-VAE試圖從原始資料中區分不同市場屬性的簡要說明。 有關更多詳細資訊, 訪問 https://towardsdatascience.com/gans-vs-odes-the-end-of-mathematical-modeling-ec158f04acb9

數值實驗 (Numerical experiments)

Let’s take Deutsche Bank stock price data as we did in the previous article. The backtests of the ML-based strategy looked very well until we realized that on other banks this approach fails, which doesn’t allow us to consider our financial finding credible. Now, I would like to show, how we could realize it without looking at similar market players, but using probabilistic interpretations of the metrics.

讓我們像上一篇文章那樣獲取德意志銀行的股價資料。 在我們意識到在其他銀行這種方法失敗之前,基於ML的策略的回測看起來非常好,這使我們無法認為我們的財務發現可信。 現在,我想展示一下,如何在不考慮類似市場參與者的情況下如何使用指標的概率解釋來實現它。

組合交叉驗證 (Combinatorial cross-validation)

Let’s reshuffle the first pieces of our price time series to generate 15 train and backtests paths (as we discussed above). We can see on some of the illustrations below, how our backtest data already has different market regimes and directions which immediately allows scenario-based validation:

讓我們重新調整價格時間序列的前幾部分,以生成15條訓練和回測路徑(如上所述)。 我們可以在下面的一些插圖中看到,我們的回測資料如何已經具有不同的市場制度和方向,從而可以立即進行基於場景的驗證:

Image for post
Image for post
Image for post
Three out of fifteen combinations of training and backtest data based on the DB time series split
基於資料庫時間序列劃分的訓練和回測資料的十五分之三的組合

After we run the ML and strategy pipeline from the previous post, we should check the performances of ML models first on these backtest data, and only after we will be confident in machine learning performance, we can run strategy backtests. From the histogram below, we can see, that on average MCC (Matthews Correlation Coefficient) is positive, however, there are several data pieces that give us negative performance (we already can estimate some risks from here).

在執行上一篇文章中的ML和策略管道之後 ,我們應該首先在這些回測資料上檢查ML模型的效能,只有在對機器學習效能充滿信心之後,我們才能執行策略回測。 從下面的直方圖中,我們可以看到,平均MCC(馬修斯相關係數)為正,但是,有幾個資料片段給我們帶來了負面的表現(我們已經可以從此處估算一些風險)。

Image for post
Image for post
Image for post
Histograms of mean MCC, its standard deviation, and the ratio of both represented as “MCC Sharpe ratio” on the backtest data
平均MCC的直方圖,其標準偏差以及兩者之比在回測資料上均表示為“ MCC夏普比”

Let’s assume the risk is acceptable and plot histograms for strategy Sharpe ratio, Deflated Sharpe ratio, and Probabilistic Sharpe ratio (see the previous article for more details). We can see a “fatter” left tail in the Sharpe ratios compared to the MCCs histogram, and, more important, the total prevalence of zero-valued Deflated Sharpe ratios on these backtests, which means, that our results are prone to the issue of the repeated experiments and are not reliable.

讓我們假設風險是可以接受的,並繪製策略夏普比率,放氣夏普比率和概率夏普比率的直方圖(有關更多詳細資訊,請參見上一篇文章 )。 與“我的客戶中心直方圖相比,我們可以看到Sharpe比率中有一個“較輕”的左尾巴 ,更重要的是,在這些回溯測試中零值化Deflate Sharpe比率的總患病率,這意味著我們的結果容易出現重複的實驗並不可靠。

Image for post
Image for post
Image for post
Histograms of Sharpe, Deflated Sharpe, and Probabilistic Sharpe ratios on the backtest data
回測資料上的Sharpe,放氣Sharpe和概率Sharpe比率的直方圖

As we can see, a single walk-forward couldn’t open to us these problems, and simple combinatorial cross-validation already shows much more compared to the point metrics estimate.

正如我們所看到的,單步前進無法解決這些問題,與點度量估計相比,簡單的組合交叉驗證已顯示出更多的優勢。

蒙特卡洛模擬 (Monte-Carlo simulations)

If combinatorial cross-validation works so well, what can we do with the simulations from the stochastic models? Let’s look at the DB prices time series below, how it behaves? What stochastic model and what parameters are describing it?

如果組合交叉驗證效果很好,那麼我們如何處理隨機模型的模擬呢? 讓我們看看下面的DB價格時間序列,它的表現如何? 什麼隨機模型和什麼引數來描述它?

Image for post
Image for post
Image for post
DB close price time series and a couple of simulations. Dynamics look similar, right?
DB收盤價時間序列和一些模擬。 動態看起來相似,對不對?

We can clearly see some jumps (and time-varying volatility too by the way), but for simplicity, let’s assume that this process follows the Merton jump-diffusion model, with drift and volatility taken from the historical data (-3.036e-05, 0.02789), jump intensity, size, and its standard deviation chosen “on the eye” (0.1, -0.01, 0.001). Let’s simulate a couple of paths of such close price time series, predict corresponding low, high, open prices, and the volume and backtest strategies on these generated paths.

我們可以清楚地看到一些跳躍(順便說一下,波動率也隨時間變化),但是為簡單起見,我們假設此過程遵循Merton跳躍擴散模型 ,其中漂移和波動率取自歷史資料(-3.036e-05 (0.02789),跳躍強度,大小及其標準偏差選擇為“在眼睛上”(0.1,-0.01、0.001)。 讓我們模擬幾個這樣的接近價格時間序列的路徑,預測相應的低價,高價,開盤價以及這些生成的路徑上的數量和回測策略。

Image for post
Image for post
Image for post
Backtest data simulation #1: simulation data itself, a histogram of MCCs of different bagging runs, and corresponding strategy backtest
回測資料模擬#1:模擬資料本身,不同裝袋執行的MCC直方圖以及相應的策略回測

From the first one, we can see from the distribution of MCCs calculated over multiple runs of the bagging classifier, that on average our model accuracy out-of-sample is negative. It should’ve stopped us from running the backtest, but out of curiosity let’s do this and we can see, that such a backtest can be lying — it outperforms the benchmark even with a poor model! A good point to the previous article on the importance of the correct metrics calculation and tracking.

從第一個中可以看出,從在多次套袋分類器中計算出的MCC分佈來看,平均而言,樣本外模型的精度為負 。 它本應阻止我們進行回溯測試,但是出於好奇,讓我們執行此操作,我們可以看到,這樣的回溯測試可能存在謊言,即使在模型較差的情況下也能跑贏基準測試! 關於上一篇文章的正確點是正確計算和跟蹤的重要性。

Image for post
Image for post
Image for post
Backtest data simulation #2: simulation data itself, a histogram of MCCs of different bagging runs, and corresponding strategy backtest
回測資料模擬2:模擬資料本身,不同套袋執行的MCC直方圖以及相應的策略回測

Re-launching simulation gives similar bearish time series, but now with more jumps over time (which looks more similar to the original DB time series), but the distribution of MCCs gives negative results again, and the backtest is again misleading.

重新啟動模擬給出了相似的看跌時間序列,但是現在隨著時間的推移出現了更多跳躍(這看起來更類似於原始的DB時間序列),但是MCC的分佈再次給出了負結果 ,並且回測再次引起了誤導。

Of course, if we will re-sample this time series more times and will build a histogram of such predictions, we will clearly see a fat left tail in model accuracies which will clearly signal to us that our initial financial hypothesis doesn’t hold anymore if we sample data with very similar dynamics, yet a bit different from the historical data. Which shouldn’t be the case if we did our preliminary research right :)

當然,如果我們將更多時間對該時間序列進行重新取樣並建立此類預測的直方圖,則我們將清楚地看到模型精度中的粗尾巴,這顯然向我們表明我們的初始財務假設不再成立如果我們以非常相似的動力學取樣資料,但與歷史資料有些不同。 如果我們進行了初步研究,那就不應該這樣了:)

結論 (Conclusions)

In this article, we have reviewed techniques that allow probabilistic backtest as compared to the historical walk-forward ones. The main problem of the latter is that it is just a single realization of a complex stochastic process that could have gone in many different ways, and we don’t want to overfit to one sample from a distribution.

在本文中,我們回顧了與歷史遍歷相比允許概率回測的技術。 後者的主要問題在於,它只是一個複雜隨機過程的單一實現,而該過程可能已經以許多不同的方式進行了,並且我們不想過度擬合分佈中的一個樣本

We have used two major techniques to address this problem: combinatorial cross-validation and stochastic simulations, each with its benefits and drawbacks. In the previous article, we have seen that our ML-based strategy performed well on the DB stock price, but experiments on the other assets from the banking universe have shown, that this “finding” doesn’t generalize to the market. In this article, we have shown, that if we used not walk-forward estimates, but combinatorial CV or stochastic simulations, we could see that this strategy is not reliable even without checking other assets on the market, which could save us time for the research.

我們使用了兩種主要技術來解決此問題: 組合交叉驗證隨機模擬 ,每種都有其優點和缺點。 在上一篇文章中 ,我們已經看到我們基於ML的策略在DB股票價格上表現良好,但是對銀行業其他資產的實驗表明,這種“發現”並不能推廣到市場。 在本文中,我們已經表明,如果不使用前瞻性估計,而是使用組合CV或隨機模擬,則即使不檢查市場上的其他資產,我們也可以看到該策略是不可靠的,這可以為我們節省時間。研究。

We have studied additional metrics that tell more us about backtest performance and we have expanded them to the probabilistic estimates. We do this all to estimate the risk of our strategy performing poorly out-of-sample, which is relevant to measuring the “overfitting” of these strategies. We also know, that overfitting is about the tradeoff between in-sample and out-of-sample performance, and in these two articles, we were focused only on the OOS one. In the next, third article in this short series, we will focus on other measures of overfitting risk that takes into account in-sample data as well and will conclude with a general framework for testing quantitative trading strategies. Stay tuned and don’t forget to check out the source code :)

我們研究了其他指標,這些指標可以使我們更多地瞭解回測效能,並將其擴充套件到概率估計。 我們通過所有這些操作來估計我們的策略執行樣本不足的風險,這與衡量這些策略“過擬合”有關 。 我們也知道,過度擬合是關於樣本內和樣本外效能之間的權衡,在這兩篇文章中,我們僅關注於OOS。 在這個簡短系列的下一篇文章的第三篇中,我們將著重於考慮樣本資料的其他過度擬合風險的度量,並以測試量化交易策略的通用框架作為結束。 請繼續關注,不要忘記檢視原始碼 :)

P.S.You also can connect with me on the Facebook blog or Linkedin, where I regularly post some AI articles or news that are too short for Medium and Instagram for some more personal content :)

PS您還可以在Facebook部落格或Linkedin上與我聯絡,我經常在其中釋出一些AI文章或新聞,這些文章或新聞對於Medium和Instagram來說太短了,無法提供更多個人內容:)

翻譯自: https://towardsdatascience.com/ai-in-finance-how-to-finally-start-to-believe-your-backtests-2-3-adfd13da20ec