Improperly Implemented Indicators
bta-lib some research was done on technical analysis
libraries written in Python or with binding and some surprises showed
up. The findings:
Some indicators are not properly implemented
Some indicators do not even deliver what the API contract promises(the name is the contract, and so is the documentation when available). Something else is actually delivered
And this sometimes creates a chain of improperly implemented indicators, because indicators re-use other indicators as the basis for the calculations
Hence the name given to them: I3 Indicators which is a kind of recursive acronym which stands for: "Improperly Implemented Indicators"
Exponential Moving Average), something which with the
available warfare (libraries) should be easy to get right, it is more often
than not improperly implemented for reasons such as:
No seed value is used
DataFrame) is blindly used without understanding what it actually does
EMAfails because the initial warm-up period is not accounted for
Considering that several indicators like the
TRIX, etc, do
actually depend on a correct implementation of the basic
EMA, this ends up
creating a long list of improperly implemented indicators.
It would seem that some of the libraries recognized the problem internally when
developing those other indicators and re-implemented the wheel in each
indicator with a new in-indicator custom
EMA, which sometimes does still
fail because even if the initial offset no longer matters, the actual
implementation is still the wrong one.
RSI is another good example. A 40+ years old indicator, strictly defined
in a published book and even well documented in the Wikipedia is also
wrong. In this case
SMMA(Smoothed Exponential Moving Average) is either wrongly implemented, changed to be the
EMAand even replaced by a poorly implemented
1-dayperiod needed to calculate the difference to the previous close for
downdaycalculations is not respected
A variant of the
RSIis delivered, because ... for no good reason at all.
A MACD example with
ta-lib and The
MACD are a good joint example. The
MACD has well defined
formulas and being easy to implement. The formula for the
MACD outputs with
macd = ema(data, 12) - ema(data, 26)
signal = ema(macd, 9)
histogram = macd - signal
To illustrate the example and because
histogram do fully depend
on the values of
macd, only the latter is going to be considered.
The following snippet, using the sample data from
bta-lib, compares a
macd output with the output of the
MACD implemented in
from talib import EMA, MACD import pandas as pd # Read a csv file into a pandas dataframe csv = './2006-day-001.txt' df = pd.read_csv( csv, parse_dates=True, index_col='date', skiprows=1, names=['date', 'open', 'high', 'low', 'close', 'volume', 'openinterest'], ) my_macd = EMA(df.close, timeperiod=12) - EMA(df.close, timeperiod=26) ta_macd, *_ = MACD(df.close, fastperiod=12, slowperiod=26, signalperiod=9) eq = ta_macd.eq(my_macd) # check equality p = range(1, len(ta_macd) + 1) # counter to see differences dfout = pd.DataFrame(dict(p=p, ta_macd=ta_macd, my_macd=my_macd, eq=eq)) print(dfout.to_string())
Rather than simply pasting the entire output, the interesting parts are going to be examined.
p ta_macd my_macd eq date 2006-01-02 1 NaN NaN False 2006-01-03 2 NaN NaN False ...
No surprises here. The fastest
EMA has a period of
12, but the result shall
be dominated by the slowest
EMA with a period of
26. No result shall be
p reaches a count of at least
First (Late) Delivery
p ta_macd my_macd eq ... 2006-02-03 25 NaN NaN False 2006-02-06 26 NaN 11.600864 False 2006-02-07 27 NaN 11.568218 False 2006-02-08 28 NaN 11.034284 False 2006-02-09 29 NaN 12.701495 False 2006-02-10 30 NaN 12.620119 False 2006-02-13 31 NaN 13.681718 False 2006-02-14 32 NaN 14.637538 False 2006-02-15 33 NaN 15.032924 False 2006-02-16 34 13.463325 16.235543 False 2006-02-17 35 15.094544 17.440267 False ...
Ooooops! The custom
my_macd has delivered values after
26 periods, but the
line from the standard
talib.MACD implementation does so after
This is the first improper implementation:
- Waiting for slower outputs before delivering what can already be delivered.
In this case the complete
talib.MACD has the
26 periods of the
indicators, fast and slow, and then the extra
9 periods of the
output. And the output of the
macd line is delayed until
12 + 26 + 9 = 35. Delivery after
34 periods is not wrong, because there
is an overla. The calculation of signal starts already at
26 and counting
26 puts the first delivery position at
Granted, this is one which can be accepted. After all it may not be productive
to consider anything coming out of the
MACD before the minimum period of both
main outputs, until
signal deliver meaningful values.
First (Wrong) Delivery
Upon closer inspection of what happens when the first delivery at period
takes place, it should be clear something is odd.
p ta_macd my_macd eq ... 2006-02-15 33 NaN 15.032924 False 2006-02-16 34 13.463325 16.235543 False 2006-02-17 35 15.094544 17.440267 False ...
The value from
13.463325 and the one from the direct
16.235543 which obviously fails to compare as
Going down the output produced by the small snippet above, the first comparison
True is here
p ta_macd my_macd eq ... 2006-10-26 211 23.231338 23.231338 False 2006-10-27 212 22.489098 22.489098 True ...
p == 211 or
211 - 34 = 177 price bars after it is expected. Ooops
again! The convergence happens because exponentially smoothed averages tend to
converge after a numbers of bars has gone by, because the weight of the
initial bars in the calculation is no longer relevant
The reason for the late delivery
In the manual
macd, the values of the
EMA(26) are calculated
from the star of the input.
In the case of
ta-lib the following twisted procedure is followed
EMA(9)is not calculated from the start of the input
EMA(9)is first calculated to synchronize the first output value with the first output value of the
Which in practice means that the
EMA(9)calculation starts after
26 - 9bars, i.e.: after
17bars, effectively ignoring the first
Even if the values converge
177 bars later, this deemed as really wrong
Longer wrong if looking at the signal
It has to be recalled that the
MACD outputs 3 values:
histogram. For comparison purposes the
histogram can be ignored, for it is
the simple difference
macd - signal. If those two values have converged, the
histogram will be good too.
Here is an extended version of the snippet above which adds the
signal to the
from talib import EMA, MACD import pandas as pd # Read a csv file into a pandas dataframe csv = './2006-day-001.txt' df = pd.read_csv( csv, parse_dates=True, index_col='date', skiprows=1, names=['date', 'open', 'high', 'low', 'close', 'volume', 'openinterest'], ) my_macd = EMA(df.close, timeperiod=12) - EMA(df.close, timeperiod=26) my_sig = EMA(my_macd, 9) ta_macd, ta_sig, *_ = MACD(df.close, fastperiod=12, slowperiod=26, signalperiod=9) eq_macd = ta_macd.eq(my_macd) # check equality eq_sig = ta_macd.eq(my_sig) p = range(1, len(ta_macd) + 1) # counter to see differences kwdict = dict( p=p, ta_macd=ta_macd, my_macd=my_macd, eq_macd=eq_macd, ta_sig=ta_macd, my_sig=my_macd, eq_sig=eq_sig, ) dfout = pd.DataFrame(kwdict) print(dfout.to_string())
Without further ado the end of the output can be examined.
p ta_macd my_macd eq_macd ta_sig my_sig eq_sig date ... 2006-12-20 250 12.668380 12.668380 True 12.668380 12.668380 False 2006-12-21 251 12.873413 12.873413 True 12.873413 12.873413 True 2006-12-22 252 11.347745 11.347745 True 11.347745 11.347745 True 2006-12-27 253 12.470509 12.470509 True 12.470509 12.470509 True 2006-12-28 254 13.040532 13.040532 True 13.040532 13.040532 True 2006-12-29 255 12.910943 12.910943 True 12.910943 12.910943 True
Convergence happens after
251 bars, i.e.:
251 - 34 = 217 bars too late.
The avid reader will for sure notice that the values for
p == 250 on
2006-12-20 are identical for
my_sig. This is only because
pandas truncates the output to
6 decimals and in the process the values end
up looking identical.
The argument is not whether 6 decimals are enough to consider equality, but
wether the implementation of
ta-lib is proper and understanding
the time delay implications until convergene is achieved.
bta-lib offers what it seems the sound calculation in which the fast moving
average is calculated from the beginning of the input. It offers also the
possibility to deliver the calculation as done by
ta-lib, for those who
may prefer convolution or are bound by previous results which used the