Mizar
  • Whitepaper
    • Abstract
    • C-Mizar
      • Problem
      • Solution
      • Opportunity
      • Product
        • Marketplace
        • DCA Bots
        • API Bots
        • Smart Trading
        • Paper Trading
        • Portfolio Manager
    • D-Mizar
      • Problem
      • Solution
      • Opportunity
      • Product
        • Contract Sniffer
        • Sniper Bot
    • $MZR Token
      • Use Cases
      • Token Metrics
      • Vesting Schedule and Release
      • FAQ
    • Roadmap
      • Supersonic Phase (C-Phase)
      • Hypersonic Phase (D-Phase)
    • Team
  • SDK
    • DCA Bots
      • DCA Bot SDK
      • DCA Bot - TradingView
    • API Bots
      • API Trading SDK
      • API Trading - TradingView
  • Mizar AI (on hold)
    • Mizar AI (on hold)
    • Data Sources
    • Model
      • Downsampling with CUSUM Filter
      • Average Uniqueness
      • Sample Weights
      • Sequentially Bootstrapped Bagging Classifier
      • Metalabeling
      • Bet Sizing
      • Combinatorial Purged Cross Validation
    • Structural Breaks
    • Transformations
      • Labeling Methods
      • Technical Analysis Features
      • Microstructural Features
    • Strategy Backtesting
    • Strategy Deployment
Powered by GitBook
On this page

Was this helpful?

  1. Mizar AI (on hold)
  2. Model

Average Uniqueness

When labels for samples are created without a fixed horizon (e.g. triple barrier labeling method), they each span a different period. These samples can therefore overlap with other samples in various degrees. Samples that do not overlap much with other samples are more unique and are therefore more interesting for the model to look at. This becomes more relevant for machine learning models which bootstrap the training data by random sampling from the dataset, however samples are bootstrapped according to a uniform distribution. This implies that samples that overlap much with other samples are as likely to be sampled as more unique samples that do not overlap with other samples. Ideally, we would therefore like to bootstrap the samples according to their uniqueness to get a more diverse bootstrapped dataset.

Number of Concurrent Events

To calculate the average uniqueness, we first have to calculate the number of concurrent events. This can be calculated with an indicator matrix.

1t,i=[110110010011]1_{t, i} = \begin{bmatrix} 1 & 1 & 0 \\ 1 & 1 & 0 \\ 0 & 1& 0 \\ 0 & 1 & 1 \\ \end{bmatrix}1t,i​=​1100​1111​0001​​

the rows represent time periods t=1,...,Tt=1,...,Tt=1,...,Tand the columns represent the samples i=1,...,I.i=1,...,I.i=1,...,I.In the above example the T=4T=4T=4and I=3.I=3.I=3.To calculate the number of concurrent events we sum over the columns ct=∑i=1I1ic_t = \sum_{i=1}^{I}1_ict​=∑i=1I​1i​, in this example the number of concurrent events are 2,2,1,2.2, 2, 1, 2.2,2,1,2.

Average Uniqueness

The uniqueness for sample iii at time tttcan be calculated withui=1t,ict.u_i=\dfrac{1_{t, i}}{c_t}.ui​=ct​1t,i​​.In this example, the uniqueness matrix is

[0.50.500.50.5001000.50.5]\begin{bmatrix} 0.5 & 0.5 & 0 \\ 0.5 & 0.5 & 0 \\ 0 & 1 & 0 \\ 0 & 0.5 & 0.5 \\ \end{bmatrix}​0.50.500​0.50.510.5​0000.5​​

To calculate the average uniqueness we take the average of the uniqueness values over ttt

uˉi=∑t=1Tut,i∑t=1T1t,i.\bar{u}_i = \dfrac{\sum_{t=1}^{T}u_{t, i}}{\sum_{t=1}^{T}1_{t,i}}.uˉi​=∑t=1T​1t,i​∑t=1T​ut,i​​.

In our example, the average uniquenesses of the samples are 0.5, 0.83, and 0.5 respectively.

PreviousDownsampling with CUSUM FilterNextSample Weights

Last updated 4 years ago

Was this helpful?