Data Input

A sample input

The sample data provided with the library has this format

Date,Open,High,Low,Close,Volume,OpenInterest
2006-01-02,1789.36,1802.98,1789.36,1802.16,0.00,0.00

It has the usual standard fields OHLC (which obviously stands for Open-High-Low-Close), plus an initial timestamp and it is followed by the Volume and OpenInterest components. Rather standard.

And it can be easily transformed into a timeseries-based DataFrame with a one-liner:

df = pd.read_csv('2006-day-001.txt', parse_dates=True, index_col='Date')

This DataFrame contains now the following columns:

['Open', 'High', 'Low', 'Close', 'Volume', 'OpenInterest']

The Date column is not missing, it has simple been transformed into the index. A usual printout of the DataFrame looks like this (skipping some lines for brevity)

               Open     High      Low    Close  Volume  OpenInterest
Date
2006-01-02  1789.36  1802.98  1789.36  1802.16     0.0           0.0
2006-01-03  1802.04  1819.21  1800.92  1807.17     0.0           0.0
...

Most timeseries dataframes will have such a format or a very similar one. Remapping column names (in pandas or directly in the library) should be a easy enough to get the input into the library.

Default Settings

The library provides the sample to show what the default expectations are, which can be summarized as follows:

  • Column names are transformed to its lowercase form before any comparison is made, i.e.: case insensitive comparisons are always made.

    As such, the name of the Close column may also well be close or cLoSe

  • Date timestamp in the index

    This is actually a pure expectation, because the library does not touch the index and does not look into its contens for anything. The index could simply be a sequence of integers

    Note

    The name of the column for the index is irrelevant

  • OHLCVOi ordering, i.e.: Open-High-Low-Close-Volume-OpenInterest

    If the names of the columns do not match the expectation, the corresponding numeric index to the columns will be used, i.e: Open = 0, High = 1, Low = 2, Close = 3, Volume = 4, OpenInterest = 5

Single Input Indicators

This type of indicators have close as the default input to look for, or column index 3 as explained above. Let's see the reference documentation for the archetype of such an indicator, the sma or SimpleMovingAverage.

    Non-weighted average of the last n periods

    Formula:
      - movav = Sum(data, period) / period

    See also:
      - http://en.wikipedia.org/wiki/Moving_average#Simple_moving_average

    Aliases: SMA, SimpleMovingAverage
    Inputs: close
    Outputs: sma
    Params:
      - period (default: 30)
        Period for the moving average calculation

Working with this indicator can be done in the following ways (the loading of the data into the dataframe df is assumed)

Default Column

# Let the aagic of the libray find the `Close` column
sma = btalib.sma(df)

Because the indicator sma defines its single input with the name close, the data input machinery will look into the dataframe for a column matching that name (case insensitive comparison) ... and will find it.

# Be specific about which column to use by passing the column
sma = btalib.sma(df.High)

The indicator needs just a single input. Being specific and passing df.High (which is a Series) will perform the calculation directly on that data field.

# Reuse the sma object and pass it to itself
saa = btalib.sma(df)
sma1 = btalib.sma(sma)

The sma has also a single output. It can therefore directly be used as single input for another indicators ... like itself. No great deal (the delivery period of the first result in sma1 will obviously increase)

# Let the aagic of the libray find the column by index
df.rename(columns={'Close':'NewClose'}, inplace=True)
sma = btalib.sma(df)

Ooops! By renaming Close to NewClose the column can no longer be found by name, and no other column matches the name of the input sought by the sma. As explained above and following the standard OHLCVOi ordering, the Close has an index of 3 and the column present at that index will be taken.

It can be the case that the dataframe has only two (2) columns (plus the index) and therefore, only indices 0 and 1 are available. The machinery will then default to using the first of the columns, i.e.: column 0.

This is seen as a reasonable assumption and choice, because the indicator is expecting a single-input, and a single-input in the form of a dataframe is being provided. When name matchinng and default column index matching both fail, the first of the columns of the single-input dataframe is chosen.

Multiple Input Indicators

The classic stochastic is a good choice to understand how things work. The relevant part of the documentation for it:

    ...
    Aliases: 'stoch', 'Stochastic', 'STOCHASTIC', 'STOCH'
    Inputs: high, low, close
    Outputs: k, d
    Params:
      - period (default: 14)
        Period to consider
      ...

It expects the inputs high, low and close. Remember that the default for single-input indicators is to have close and look for it. In this case, the stochastic has overridden that by still defining close as an input but putting it last. This is so for two reasons:

  • It respects the OHLC ordering

  • It is the ordering of ta-lib and making things like a well-known library is seen as a good thing (library which probably also followed the OHLC convention in the first place)

Giving the indicator the three (3) expected inputs can be done in two generic ways

  • Provide three (3) individual inputs which will be automatically matched to high, low and close internally

  • Provide a single input (1) DataFrame with several columns, that will be internally matched to the inputs (with column name matching, column index matching, ...)

Some examples (the sample data has already been loaded as a DataFrame and is available as df)

Multi-input examples

stochastic = btalib.stochastic(df.High, df.Low, df.Close)

Three (3) inputs are expected and three (3) are provided. In this case the right inputs are used, but nothing prevents a different user choice.

stochastic = btalib.stochastic(df.Close, df.Volume, df.Low)

The stochastic will not complain, because it will internally see the Close, Volume and Low remapped to high, low and close respectively. The calculations in this case will make no sense whatsoever, but the input requirements have nonetheless be fulfilled.

A more advanced case with re-input

sma_low = btalib.sma(df.Low, period=10)
sma_high = btalib.sma(df.High, period=8)

stochastic = btalib.stochastic(sma_high, sma_low, df.Close)

Instead of passing the High and Low from the DataFrame directly into the stochastic, those fields are first transformed using sma indicators of periods 10 and 8. And both sma results are used, together with the standard close.

Single-Input examples

stochastic = btalib.stochastic(df)

In this case and because the DataFrame has the columns High, Low and Close available, the stochastic will use its values for the calculations (reminder: name matching is case insensitive, the column Close could also be named cLoSe)

Should the DataFrame have other naming conventions, the default column indices matching the OHLCVOi ordering will be used, i.e.: Open = 0, High = 1, Low = 2, Close = 3, Volume = 4, OpenInterest = 5

# Let the aagic of the libray find the column by index
df.rename(columns={'Low':'NewLow'}, inplace=True)
stochastic = btalib.stochastic(df)

The column Low has been renamed to NewLow, which means that the second input sought by the stochastic indicator cannot be found by name. When this happens, the indicator will then resort to applying the numeric index and will still use the real Low column.

It is possible for the user to really mess it up, like in this example.

# Let the aagic of the libray find the column by index
df.rename(columns={'Close:'High', 'High': 'Low', 'Low': 'Close'}, inplace=True)
stochastic = btalib.stochastic(df)

All required inputs will be found by name, but the real columns applied as inputs will most likely produce useless results.

Remapping Names/Indices

The remapping of names directly in the DataFrame is shown above and the default numeric indices from 0 to 5 follow the OHLCVOi convention. The library offers the possibility of remapping names and indices, without having to touch the DataFrame or reorder columns.

This is done via the function set_input_indices(**kwargs). A first example which remaps the location of the stochastic inputs. This is useful if the DataFrame has columns with names that have nothing to do with the usual Open-High-Low .... For the sake of the example, the names of the columns in the DataFrame will be mapped to alien names.

df.rename(columns={'High:'Alf', 'Low': 'ET', 'Close': 'Alien'}, inplace=True)

# The names set above make no sense for the library
# Let's give it some indication
btalib.set_input_indices(high=3, low=0, close=1)

Now, because the names make no sense, the numeric indices will be used. Using the set_input_indices function, the library will use the indicated indices to select the columns, overriding the default OHLC ordering.

Column names can also be remapped to new names and not only to indices. Like in this example.

df.rename(columns={'High:'Alf', 'Low': 'ET', 'Close': 'Alien'}, inplace=True)

# The names set above make no sense for the library
# Let's give it some indication
btalib.set_input_indices(high='Alf', low='ET', close='Alien')

Hallelujah! The alien names will now be used by the library to find the columns.

Using set_input_indices to remap numeric indices or names, can be particularly useful coupled with set_use_ohlc_indices_first(onoff=True). This forces the library to use the configuration rather than the standard naming Open-High-Low-Close ....

As in here.

btalib.set_use_ohlc_indices_first(True)
btalib.set_input_indices(high=3, low=0, close=1)

Regardless of the column names, the ordering high=3, low=0 and close=1 will be used. Column name matching has been effectively disabled.

As seen in this snippet, the need to remap the column names to alien names is gone. In a real scenario, the alien names would already be in the source DataFrame and the indices 3, 0 and 1 are the desired inputs to be processed.

Other Indicators as Single-Input

The outcome of an indicator can actually be used as input for another. Putting together the stochastic and the sma

stochastic = btalib.stochastic(df)
sma = btalib.sma(stochastic)

The outputs of the stochastic can be seen in the documentation

    ...
    Aliases: 'stoch', 'Stochastic', 'STOCHASTIC', 'STOCH'
    Inputs: high, low, close
    Outputs: k, d
    Params:
      - period (default: 14)
        Period to consider
      ...

It has actually two (2) outputs, and the sma is expecting just one (1). The rule when using library indicators (as opposed to DataFrames) is to take the natural ordering.

Hence, the sma needs one input and the first one is k, which is the one that will be processed by the sma. It is like if the following had actually been done

stochastic = btalib.stochastic(df)
# stochastic.k is a shorthand for "stochastic.outputs.k" or "stochastic.o.k"
sma = btalib.sma(stochastic.k)

In order to use output d from the stochastic, it is necessary to be specific about it, such as in this case.

stochastic = btalib.stochastic(df)
sma = btalib.sma(stochastic.d)

or even a lot more specific, if one suspects collision between the output name and some instance attribute defined in the indicator itself.

stochastic = btalib.stochastic(df)
sma = btalib.sma(stochastic.outputs.d)

Note

With the o shorthand notation for the outputs

stochastic = btalib.stochastic(df)
sma = btalib.sma(stochastic.o.d)

Single-Input Errors

When a single input is provided to a multi-input indicator like the stochastic the library can complain in the following situations:

  • The DataFrame has less columns than inputs are required

  • An input cannot be found by name and the numeric mapping cannot be matched to an existing column (3 columns and the index is 10)

In this case an btalib.InputsError exception will be raised.