I3 Indicators
Improperly Implemented Indicators
Before creating btalib
some research was done on technical analysis
libraries written in Python or with binding and some surprises showed
up. The findings:

Some indicators are not properly implemented

Some indicators do not even deliver what the API contract promises(the name is the contract, and so is the documentation when available). Something else is actually delivered

And this sometimes creates a chain of improperly implemented indicators, because indicators reuse other indicators as the basis for the calculations
Hence the name given to them: I3 Indicators which is a kind of recursive acronym which stands for: "Improperly Implemented Indicators"
EMA
Even the EMA
(or Exponential Moving Average
), something which with the
available warfare (libraries) should be easy to get right, it is more often
than not improperly implemented for reasons such as:

No seed value is used

The
ewm
method inpandas.Series
(orDataFrame
) is blindly used without understanding what it actually does 
Feeding an
EMA
into anEMA
fails because the initial warmup period is not accounted for
Considering that several indicators like the MACD
, DEMA
, TRIX
, etc, do
actually depend on a correct implementation of the basic EMA
, this ends up
creating a long list of improperly implemented indicators.
It would seem that some of the libraries recognized the problem internally when
developing those other indicators and reimplemented the wheel in each
indicator with a new inindicator custom EMA
, which sometimes does still
fail because even if the initial offset no longer matters, the actual
implementation is still the wrong one.
RSI
The RSI
is another good example. A 40+ years old indicator, strictly defined
in a published book and even well documented in the Wikipedia is also
wrong. In this case

The
SMMA
(Smoothed Exponential Moving Average) is either wrongly implemented, changed to be theEMA
and even replaced by a poorly implementedEMA

The initial
1day
period needed to calculate the difference to the previous close forupday
,downday
calculations is not respected 
A variant of the
RSI
is delivered, because ... for no good reason at all.
A MACD example with talib
talib
and The MACD
are a good joint example. The MACD
has well defined
formulas and being easy to implement. The formula for the MACD
outputs with
the default 12
, 26
and 9
parameters.
macd = ema(data, 12)  ema(data, 26)
signal = ema(macd, 9)
histogram = macd  signal
To illustrate the example and because signal
and histogram
do fully depend
on the values of macd
, only the latter is going to be considered.
The following snippet, using the sample data from btalib
, compares a
manually implemented macd
output with the output of the MACD
implemented in
talib
.
from talib import EMA, MACD import pandas as pd # Read a csv file into a pandas dataframe csv = './2006day001.txt' df = pd.read_csv( csv, parse_dates=True, index_col='date', skiprows=1, names=['date', 'open', 'high', 'low', 'close', 'volume', 'openinterest'], ) my_macd = EMA(df.close, timeperiod=12)  EMA(df.close, timeperiod=26) ta_macd, *_ = MACD(df.close, fastperiod=12, slowperiod=26, signalperiod=9) eq = ta_macd.eq(my_macd) # check equality p = range(1, len(ta_macd) + 1) # counter to see differences dfout = pd.DataFrame(dict(p=p, ta_macd=ta_macd, my_macd=my_macd, eq=eq)) print(dfout.to_string())
Rather than simply pasting the entire output, the interesting parts are going to be examined.
First Values
p ta_macd my_macd eq date 20060102 1 NaN NaN False 20060103 2 NaN NaN False ...
No surprises here. The fastest EMA
has a period of 12
, but the result shall
be dominated by the slowest EMA
with a period of 26
. No result shall be
expected until p
reaches a count of at least 26
.
First (Late) Delivery
p ta_macd my_macd eq ... 20060203 25 NaN NaN False 20060206 26 NaN 11.600864 False 20060207 27 NaN 11.568218 False 20060208 28 NaN 11.034284 False 20060209 29 NaN 12.701495 False 20060210 30 NaN 12.620119 False 20060213 31 NaN 13.681718 False 20060214 32 NaN 14.637538 False 20060215 33 NaN 15.032924 False 20060216 34 13.463325 16.235543 False 20060217 35 15.094544 17.440267 False ...
Ooooops! The custom my_macd
has delivered values after 26
periods, but the
line from the standard talib.MACD
implementation does so after 34
periods.
This is the first improper implementation:
 Waiting for slower outputs before delivering what can already be delivered.
In this case the complete talib.MACD
has the 12
, 26
periods of the EMA
indicators, fast and slow, and then the extra 9
periods of the signal
output. And the output of the macd
line is delayed until signal
delivers.
Note
12 + 26 + 9 = 35
. Delivery after 34
periods is not wrong, because there
is an overla. The calculation of signal starts already at 26
and counting
9
from 26
puts the first delivery position at 34
Granted, this is one which can be accepted. After all it may not be productive
to consider anything coming out of the MACD
before the minimum period of both
main outputs, until macd
and signal
deliver meaningful values.
First (Wrong) Delivery
Upon closer inspection of what happens when the first delivery at period 34
takes place, it should be clear something is odd.
p ta_macd my_macd eq ... 20060215 33 NaN 15.032924 False 20060216 34 13.463325 16.235543 False 20060217 35 15.094544 17.440267 False ...
The value from talib
is 13.463325
and the one from the direct EMA(12) 
EMA(26)
is 16.235543
which obviously fails to compare as True
Going down the output produced by the small snippet above, the first comparison
flagged as True
is here
p ta_macd my_macd eq ... 20061026 211 23.231338 23.231338 False 20061027 212 22.489098 22.489098 True ...
When p == 211
or 211  34 = 177
price bars after it is expected. Ooops
again! The convergence happens because exponentially smoothed averages tend to
converge after a numbers of bars has gone by, because the weight of the
initial bars in the calculation is no longer relevant
The reason for the late delivery
In the manual macd
, the values of the EMA(12)
and EMA(26)
are calculated
from the star of the input.
In the case of talib
the following twisted procedure is followed

The
EMA(9)
is not calculated from the start of the input 
The
EMA(9)
is first calculated to synchronize the first output value with the first output value of theEMA(26)

Which in practice means that the
EMA(9)
calculation starts after26  9
bars, i.e.: after17
bars, effectively ignoring the first16
inputs.
Even if the values converge 177
bars later, this deemed as really wrong
Longer wrong if looking at the signal
It has to be recalled that the MACD
outputs 3 values: macd
, signal
and
histogram
. For comparison purposes the histogram
can be ignored, for it is
the simple difference macd  signal
. If those two values have converged, the
histogram
will be good too.
Here is an extended version of the snippet above which adds the signal
to the
mix.
from talib import EMA, MACD import pandas as pd # Read a csv file into a pandas dataframe csv = './2006day001.txt' df = pd.read_csv( csv, parse_dates=True, index_col='date', skiprows=1, names=['date', 'open', 'high', 'low', 'close', 'volume', 'openinterest'], ) my_macd = EMA(df.close, timeperiod=12)  EMA(df.close, timeperiod=26) my_sig = EMA(my_macd, 9) ta_macd, ta_sig, *_ = MACD(df.close, fastperiod=12, slowperiod=26, signalperiod=9) eq_macd = ta_macd.eq(my_macd) # check equality eq_sig = ta_macd.eq(my_sig) p = range(1, len(ta_macd) + 1) # counter to see differences kwdict = dict( p=p, ta_macd=ta_macd, my_macd=my_macd, eq_macd=eq_macd, ta_sig=ta_macd, my_sig=my_macd, eq_sig=eq_sig, ) dfout = pd.DataFrame(kwdict) print(dfout.to_string())
Without further ado the end of the output can be examined.
p ta_macd my_macd eq_macd ta_sig my_sig eq_sig date ... 20061220 250 12.668380 12.668380 True 12.668380 12.668380 False 20061221 251 12.873413 12.873413 True 12.873413 12.873413 True 20061222 252 11.347745 11.347745 True 11.347745 11.347745 True 20061227 253 12.470509 12.470509 True 12.470509 12.470509 True 20061228 254 13.040532 13.040532 True 13.040532 13.040532 True 20061229 255 12.910943 12.910943 True 12.910943 12.910943 True
Convergence happens after 251
bars, i.e.: 251  34 = 217
bars too late.
The avid reader will for sure notice that the values for p == 250
on
20061220
are identical for ta_sig
and my_sig
. This is only because
pandas
truncates the output to 6
decimals and in the process the values end
up looking identical.
Note
The argument is not whether 6 decimals are enough to consider equality, but
wether the implementation of MACD
in talib
is proper and understanding
the time delay implications until convergene is achieved.
Finally
btalib
offers what it seems the sound calculation in which the fast moving
average is calculated from the beginning of the input. It offers also the
possibility to deliver the calculation as done by talib
, for those who
may prefer convolution or are bound by previous results which used the MACD
in talib