I3 Indicators

Improperly Implemented Indicators

Before creating bta-lib some research was done on technical analysis libraries written in Python or with binding and some surprises showed up. The findings:

  • Some indicators are not properly implemented

  • Some indicators do not even deliver what the API contract promises(the name is the contract, and so is the documentation when available). Something else is actually delivered

  • And this sometimes creates a chain of improperly implemented indicators, because indicators re-use other indicators as the basis for the calculations

Hence the name given to them: I3 Indicators which is a kind of recursive acronym which stands for: "Improperly Implemented Indicators"

EMA

Even the EMA (or Exponential Moving Average), something which with the available warfare (libraries) should be easy to get right, it is more often than not improperly implemented for reasons such as:

  • No seed value is used

  • The ewm method in pandas.Series (or DataFrame) is blindly used without understanding what it actually does

  • Feeding an EMA into an EMA fails because the initial warm-up period is not accounted for

Considering that several indicators like the MACD, DEMA, TRIX, etc, do actually depend on a correct implementation of the basic EMA, this ends up creating a long list of improperly implemented indicators.

It would seem that some of the libraries recognized the problem internally when developing those other indicators and re-implemented the wheel in each indicator with a new in-indicator custom EMA, which sometimes does still fail because even if the initial offset no longer matters, the actual implementation is still the wrong one.

RSI

The RSI is another good example. A 40+ years old indicator, strictly defined in a published book and even well documented in the Wikipedia is also wrong. In this case

  • The SMMA (Smoothed Exponential Moving Average) is either wrongly implemented, changed to be the EMA and even replaced by a poorly implemented EMA

  • The initial 1-day period needed to calculate the difference to the previous close for upday, downday calculations is not respected

  • A variant of the RSI is delivered, because ... for no good reason at all.

A MACD example with ta-lib

ta-lib and The MACD are a good joint example. The MACD has well defined formulas and being easy to implement. The formula for the MACD outputs with the default 12, 26 and 9 parameters.

  • macd = ema(data, 12) - ema(data, 26)
  • signal = ema(macd, 9)
  • histogram = macd - signal

To illustrate the example and because signal and histogram do fully depend on the values of macd, only the latter is going to be considered.

The following snippet, using the sample data from bta-lib, compares a manually implemented macd output with the output of the MACD implemented in ta-lib.

from talib import EMA, MACD
import pandas as pd

# Read a csv file into a pandas dataframe
csv = './2006-day-001.txt'
df = pd.read_csv(
    csv, parse_dates=True, index_col='date', skiprows=1,
    names=['date', 'open', 'high', 'low', 'close', 'volume', 'openinterest'],
)

my_macd = EMA(df.close, timeperiod=12) - EMA(df.close, timeperiod=26)
ta_macd, *_ = MACD(df.close, fastperiod=12, slowperiod=26, signalperiod=9)
eq = ta_macd.eq(my_macd)  # check equality
p = range(1, len(ta_macd) + 1)  # counter to see differences

dfout = pd.DataFrame(dict(p=p, ta_macd=ta_macd, my_macd=my_macd, eq=eq))
print(dfout.to_string())

Rather than simply pasting the entire output, the interesting parts are going to be examined.

First Values

              p    ta_macd    my_macd     eq
date
2006-01-02    1        NaN        NaN  False
2006-01-03    2        NaN        NaN  False
...

No surprises here. The fastest EMA has a period of 12, but the result shall be dominated by the slowest EMA with a period of 26. No result shall be expected until p reaches a count of at least 26.

First (Late) Delivery

              p    ta_macd    my_macd     eq
...
2006-02-03   25        NaN        NaN  False
2006-02-06   26        NaN  11.600864  False
2006-02-07   27        NaN  11.568218  False
2006-02-08   28        NaN  11.034284  False
2006-02-09   29        NaN  12.701495  False
2006-02-10   30        NaN  12.620119  False
2006-02-13   31        NaN  13.681718  False
2006-02-14   32        NaN  14.637538  False
2006-02-15   33        NaN  15.032924  False
2006-02-16   34  13.463325  16.235543  False
2006-02-17   35  15.094544  17.440267  False
...

Ooooops! The custom my_macd has delivered values after 26 periods, but the line from the standard talib.MACD implementation does so after 34 periods.

This is the first improper implementation:

  • Waiting for slower outputs before delivering what can already be delivered.

In this case the complete talib.MACD has the 12, 26 periods of the EMA indicators, fast and slow, and then the extra 9 periods of the signal output. And the output of the macd line is delayed until signal delivers.

Note

12 + 26 + 9 = 35. Delivery after 34 periods is not wrong, because there is an overla. The calculation of signal starts already at 26 and counting 9 from 26 puts the first delivery position at 34

Granted, this is one which can be accepted. After all it may not be productive to consider anything coming out of the MACD before the minimum period of both main outputs, until macd and signal deliver meaningful values.

First (Wrong) Delivery

Upon closer inspection of what happens when the first delivery at period 34 takes place, it should be clear something is odd.

              p    ta_macd    my_macd     eq
...
2006-02-15   33        NaN  15.032924  False
2006-02-16   34  13.463325  16.235543  False
2006-02-17   35  15.094544  17.440267  False
...

The value from talib is 13.463325 and the one from the direct EMA(12) - EMA(26) is 16.235543 which obviously fails to compare as True

Going down the output produced by the small snippet above, the first comparison flagged as True is here

              p    ta_macd    my_macd     eq
...
2006-10-26  211  23.231338  23.231338  False
2006-10-27  212  22.489098  22.489098   True
...

When p == 211 or 211 - 34 = 177 price bars after it is expected. Ooops again! The convergence happens because exponentially smoothed averages tend to converge after a numbers of bars has gone by, because the weight of the initial bars in the calculation is no longer relevant

The reason for the late delivery

In the manual macd, the values of the EMA(12) and EMA(26) are calculated from the star of the input.

In the case of ta-lib the following twisted procedure is followed

  • The EMA(9) is not calculated from the start of the input

  • The EMA(9) is first calculated to synchronize the first output value with the first output value of the EMA(26)

  • Which in practice means that the EMA(9) calculation starts after 26 - 9 bars, i.e.: after 17 bars, effectively ignoring the first 16 inputs.

Even if the values converge 177 bars later, this deemed as really wrong

Longer wrong if looking at the signal

It has to be recalled that the MACD outputs 3 values: macd, signal and histogram. For comparison purposes the histogram can be ignored, for it is the simple difference macd - signal. If those two values have converged, the histogram will be good too.

Here is an extended version of the snippet above which adds the signal to the mix.

from talib import EMA, MACD
import pandas as pd

# Read a csv file into a pandas dataframe
csv = './2006-day-001.txt'
df = pd.read_csv(
    csv, parse_dates=True, index_col='date', skiprows=1,
    names=['date', 'open', 'high', 'low', 'close', 'volume', 'openinterest'],
)

my_macd = EMA(df.close, timeperiod=12) - EMA(df.close, timeperiod=26)
my_sig = EMA(my_macd, 9)

ta_macd, ta_sig, *_ = MACD(df.close,
                           fastperiod=12, slowperiod=26, signalperiod=9)

eq_macd = ta_macd.eq(my_macd)  # check equality
eq_sig = ta_macd.eq(my_sig)

p = range(1, len(ta_macd) + 1)  # counter to see differences

kwdict = dict(
    p=p,
    ta_macd=ta_macd, my_macd=my_macd, eq_macd=eq_macd,
    ta_sig=ta_macd, my_sig=my_macd, eq_sig=eq_sig,
)

dfout = pd.DataFrame(kwdict)
print(dfout.to_string())

Without further ado the end of the output can be examined.

              p    ta_macd    my_macd  eq_macd     ta_sig     my_sig  eq_sig
date
...
2006-12-20  250  12.668380  12.668380     True  12.668380  12.668380   False
2006-12-21  251  12.873413  12.873413     True  12.873413  12.873413    True
2006-12-22  252  11.347745  11.347745     True  11.347745  11.347745    True
2006-12-27  253  12.470509  12.470509     True  12.470509  12.470509    True
2006-12-28  254  13.040532  13.040532     True  13.040532  13.040532    True
2006-12-29  255  12.910943  12.910943     True  12.910943  12.910943    True

Convergence happens after 251 bars, i.e.: 251 - 34 = 217 bars too late.

The avid reader will for sure notice that the values for p == 250 on 2006-12-20 are identical for ta_sig and my_sig. This is only because pandas truncates the output to 6 decimals and in the process the values end up looking identical.

Note

The argument is not whether 6 decimals are enough to consider equality, but wether the implementation of MACD in ta-lib is proper and understanding the time delay implications until convergene is achieved.

Finally

bta-lib offers what it seems the sound calculation in which the fast moving average is calculated from the beginning of the input. It offers also the possibility to deliver the calculation as done by ta-lib, for those who may prefer convolution or are bound by previous results which used the MACD in ta-lib