Mizar
  • Whitepaper
    • Abstract
    • C-Mizar
      • Problem
      • Solution
      • Opportunity
      • Product
        • Marketplace
        • DCA Bots
        • API Bots
        • Smart Trading
        • Paper Trading
        • Portfolio Manager
    • D-Mizar
      • Problem
      • Solution
      • Opportunity
      • Product
        • Contract Sniffer
        • Sniper Bot
    • $MZR Token
      • Use Cases
      • Token Metrics
      • Vesting Schedule and Release
      • FAQ
    • Roadmap
      • Supersonic Phase (C-Phase)
      • Hypersonic Phase (D-Phase)
    • Team
  • SDK
    • DCA Bots
      • DCA Bot SDK
      • DCA Bot - TradingView
    • API Bots
      • API Trading SDK
      • API Trading - TradingView
  • Mizar AI (on hold)
    • Mizar AI (on hold)
    • Data Sources
    • Model
      • Downsampling with CUSUM Filter
      • Average Uniqueness
      • Sample Weights
      • Sequentially Bootstrapped Bagging Classifier
      • Metalabeling
      • Bet Sizing
      • Combinatorial Purged Cross Validation
    • Structural Breaks
    • Transformations
      • Labeling Methods
      • Technical Analysis Features
      • Microstructural Features
    • Strategy Backtesting
    • Strategy Deployment
Powered by GitBook
On this page

Was this helpful?

  1. Mizar AI (on hold)
  2. Model

Sample Weights

In financial time series, the samples in the training set do not contain equal amounts of information, ideally the model would focus on significant events. For example, samples wherein the subsequent period a large absolute return can be realised, are more interesting for the model than periods where small returns are made. In addition, it makes intuitive sense that recent information is more valuable than dated information in financial markets. It therefore desirable for the model to place more emphasis on recent information than dated information.

These two concepts can be formalised by calculating sample weights for each sample. These weights are then used by the model to place more emphasis on samples with a high weight.

Return Attribution

To calculate the sample weight based on the sample's return, we transform the prices to log prices such that the sum of log prices is approximately equal to the return over that period. The weight for sampleiiiwith a lifespan between [ti,1,ti,2][t_{i,1}, t_{i,2}][ti,1​,ti,2​] can be calculated as follows

w~i=∣∑t=ti,0ti,1rt−1,tct∣wi=w~i∑j=1Iw~j\widetilde{w}_i = \bigg|\sum_{t=t_{i,0}}^{t_{i,1}}\dfrac{r_{t-1, t}}{c_t}\bigg| \\ w_i = \dfrac{\widetilde{w}_i}{\sum_{j=1}^{I}\widetilde{w}_j}wi​=​t=ti,0​∑ti,1​​ct​rt−1,t​​​wi​=∑j=1I​wj​wi​​

where ctc_tct​is the number of concurrent events, i.e. the number of samples that (partially) overlap in the period [t−1,t][t-1, t][t−1,t]. The weights are then normalised such that they sum one.

Time Decay

It is possible to assign more weight to recent samples than older samples by calculating time decay factor d≥0.d \ge 0.d≥0.To calculate these time decay factors, we use the array with the average uniqueness uˉi,\bar{u}_i,uˉi​,where the most recent sample always receives a weight of 1. The user can control the amount of time decay with parameter c∈(−1,1].c \in (-1, 1].c∈(−1,1].The weight of the oldest sample is c,c,c, for c∈[0,1].c\in[0,1].c∈[0,1]. When c∈(−1,0),c \in (-1, 0),c∈(−1,0),the decay factor is 0 for some samples, which implies the model will fully ignore these samples. For other samples, the decay factor can be computed with a linear piecewise function defined as

d=max⁡(0,a+bx)a=1−b∑i=1Iuˉib=1−c∑i=1Iuˉi,∀c∈[0,1]b=[(c+1)∑i=1Iuˉi]−1,∀c∈(−1,0)d=\max(0, a + b x) \\ a=1-b\sum_{i=1}^{I}\bar{u}_i \\b = \dfrac{1-c}{\sum_{i=1}^{I}\bar{u}_i}, \forall c \in [0, 1] \\ b =\bigg[(c+1)\sum_{i=1}^{I}\bar{u}_i\bigg]^{-1}, \forall c \in (-1, 0)d=max(0,a+bx)a=1−bi=1∑I​uˉi​b=∑i=1I​uˉi​1−c​,∀c∈[0,1]b=[(c+1)i=1∑I​uˉi​]−1,∀c∈(−1,0)

PreviousAverage UniquenessNextSequentially Bootstrapped Bagging Classifier

Last updated 4 years ago

Was this helpful?