Mizar
  • Whitepaper
    • Abstract
    • C-Mizar
      • Problem
      • Solution
      • Opportunity
      • Product
        • Marketplace
        • DCA Bots
        • API Bots
        • Smart Trading
        • Paper Trading
        • Portfolio Manager
    • D-Mizar
      • Problem
      • Solution
      • Opportunity
      • Product
        • Contract Sniffer
        • Sniper Bot
    • $MZR Token
      • Use Cases
      • Token Metrics
      • Vesting Schedule and Release
      • FAQ
    • Roadmap
      • Supersonic Phase (C-Phase)
      • Hypersonic Phase (D-Phase)
    • Team
  • SDK
    • DCA Bots
      • DCA Bot SDK
      • DCA Bot - TradingView
    • API Bots
      • API Trading SDK
      • API Trading - TradingView
  • Mizar AI (on hold)
    • Mizar AI (on hold)
    • Data Sources
    • Model
      • Downsampling with CUSUM Filter
      • Average Uniqueness
      • Sample Weights
      • Sequentially Bootstrapped Bagging Classifier
      • Metalabeling
      • Bet Sizing
      • Combinatorial Purged Cross Validation
    • Structural Breaks
    • Transformations
      • Labeling Methods
      • Technical Analysis Features
      • Microstructural Features
    • Strategy Backtesting
    • Strategy Deployment
Powered by GitBook
On this page

Was this helpful?

  1. Mizar AI (on hold)
  2. Transformations

Labeling Methods

Building better targets for your model.

In supervised learning, every model requires target labels, without target labels it is not possible to fit a machine learning model with features, as the model does not know what the outcomes should be. In financial machine learning, the target labels can be constructed in various ways.

Fixed Time Horizon Method

The most common way to construct target labels for a trading strategy is to use fixed time horizons, for example, we take the price ptp_tpt​at time tttand look ahead with a fixed horizonkkkat price pt+k.p_{t+k}.pt+k​. To calculate the target label we take the difference between the two prices, if the difference is positive, then we label this sample with 1 and otherwise -1. Note how this method does not take into account the intermediate prices, i.e. the target ignores the path taken from ptp_tpt​to pt+kp_{t+k}pt+k​, while in practice if you went long and the price dropped or increased significantly in this period, then you would have either have hit your stop-loss or profit-taking levels respectively.

Triple Barrier Labeling Method

The triple barrier labeling method by Prado (2018) takes into account the path taken when a position is taken at time t.t.t.First, we determine what our volatility-adjusted stop-loss and profit-taking levels are, these are the so-called horizontal barriers. We also define after how many periods the position should be closed out, as it is unrealistic to hold the position indefinitely, which we call the vertical barrier.

The triple barrier labeling method determines which of the horizontal and vertical barriers are hit first. In the context of taking a long position, the labels are assigned as follows. If the profit-taking level is hit first, then the label is 1. If the stop loss level is hit, then we label it with -1. If the price moves between the two barriers and hits the vertical barrier then we can either label it with 0 or we determine whether at that point in time whether we have made a loss or a profit and label it -1 or 1 respectively.

This method of creating labels is more in tune with what would actually happen in the real world and is therefore preferred over the fixed time horizon method.

Trend Scanning Method

The barrier levels in the triple barrier labeling are set by the user at their own discretion. The trend scanning method by Prado (2020) determines labels without having to set such barriers. The goal of trend scanning is to determine the dominant trend for a certain period and assign that period a label yt∈{−1,0,1},y_t \in \{-1, 0, 1 \},yt​∈{−1,0,1},which corresponds with a down-trend, no-trend, or up-trend respectively. Consider a time series of prices {pt}t=1,...,T\{p_t\}_{t=1,...,T}{pt​}t=1,...,T​, to determine the label of ptp_tpt​we look ahead LLLperiods and we compute the following

pt+l=β0+β1l+ϵt+lt^β1=β^1σ^β1p_{t+l} = \beta_{0} + \beta_{1} l + \epsilon_{t+l} \\ \hat{t}_{\beta_1} = \dfrac{\hat{\beta}_1}{\hat{\sigma}_{\beta_1}}pt+l​=β0​+β1​l+ϵt+l​t^β1​​=σ^β1​​β^​1​​

where l=0,...,L−1,Ll=0, ..., L - 1, Ll=0,...,L−1,Lby taking different values for LLLwe get different t-values. The label that is assigned to ptp_tpt​ is determined by the LLL for which the absolute value of the t-value ∣tβ1^∣\big|\hat{t_{\beta_1}}\big|​tβ1​​^​​is maximised.

References

  • De Prado, M. L. (2018). Advances in financial machine learning. John Wiley & Sons.

  • López de Prado, Marcos M. 2020. Machine Learning for Asset Managers. Elements in Quantitative Finance. Cambridge: Cambridge University Press. doi:10.1017/9781108883658.

PreviousTransformationsNextTechnical Analysis Features

Last updated 4 years ago

Was this helpful?