Quick Start Guide

The 1st snippet

A code snippet is always worth a 1000 words.

import btalib
import pandas as pd

# Read a csv file into a pandas dataframe
df = pd.read_csv('2006-day-001.txt', parse_dates=True, index_col='Date')
sma = btalib.sma(df)

Note

2006-day-001.txt is a sample data file available with the sources of bta-lib. See bta-lib - GitHub

Download it if you want to follow the quickstart snippets.

Let's summarize what was just done

  • Load a csv file
  • Compute an sma (Simple Moving Average)

Even if this seems simple, there are some initial questions which can be asked. Let's go for them.

The 1st questions

1. Over what was computed the sma?

For starters a sample of the first two lines in the data file, which has a format very common for a stock market asset.

Date,Open,High,Low,Close,Volume,OpenInterest
2006-01-02,1789.36,1802.98,1789.36,1802.16,0.00,0.00

The first question can now be answered:

  • The sma was calculated using the values in the Close column.

This is so because the standard convention for technical analysis uses the close price as a default. You may of course use any of the other columns.

For example using the High:

sma = btalib.sma(df.High)

Hint

There are additional ways to configure/automate which fields are used for the calculations. See the section "Indicator Input"

2. How and in which format do I get the calculated values?

The default value retuned from sma = btalib.sma(df) is an internal object from the library itself. The reason is that it may be used for additional calculations with other indicators. But getting something market ready is straightforward:

sma = btalib.sma(df)  # default period is 30
print(sma.df)

Et voilá! Getting a DataFrame with the results is very easy. Even a one-liner would have actually sufficed as in

sma = btalib.sma(df).df  # default period is 30

But two-liners are in many occasions a lot more readable and convey the information a lot better. Let's show the result of print(sma_df)

                    sma
Date
2006-01-02          NaN
2006-01-03          NaN
2006-01-04          NaN
2006-01-05          NaN
2006-01-06          NaN
...                 ...
2006-12-21  2029.800333
2006-12-22  2029.961333
2006-12-27  2030.773333
2006-12-28  2031.545667
2006-12-29  2031.730667

[255 rows x 1 columns]

Digging deeper: Parameters

The code contains a comment that says: "default period is 30". And seeing the result with a bunch of initial NaN values ("Not a Number"), the remark made in the comment should be clear:

  • The sma needs 30 data points to start producing values.

Hence the initial NaN, which indicate that no sensible value can be calculated and delivered. For the sake of it, an excerpt of print(sma.df.to_string()) showing when the sma calculation starts delivering values

...
2006-02-08          NaN
2006-02-09          NaN
2006-02-10  1822.221333
2006-02-13  1824.273667
...

Unless a trading calendar for the asset from 2006 is at hand, it is not easy to see if 2006-02-10 is actually the 30th day and hence the first one for which a value can be provided (Yes, it is!). To have a visual confirmation, let's advance in the usage of the library by modifying the calculation period of the sma.

sma = btalib.sma(df, period=4)  # default period is 30, changed to 4
print(sma.df)

By simply passing period=4 as a named argument to sma we change the calcuation window And the result now is:

                  sma
Date
2006-01-02        NaN
2006-01-03        NaN
2006-01-04        NaN
2006-01-05  1815.1700
2006-01-06  1823.0050
...               ...
2006-12-21  2057.6475
...

Blistering barnacles! Values start to be produced on the 4th day ... as expected.

Note

The parameter period for the sma is documented. Each indicator contains complete documentatin on the supported parameters and the default values. See the "Indicator Reference" section.

Specific input selection

For the sake of it and because it was mentioned above, let's repeat the feat using the High price.

sma = btalib.sma(df.High, period=4)  # default period is 30, changed to 4
print(sma.df)

It should not be a surprise that the results change, given the usage of a different field, than the default Close (which is automatically chosen, being the industry de-facto standard)

                  sma
Date
2006-01-02        NaN
2006-01-03        NaN
2006-01-04        NaN
2006-01-05  1819.8100
2006-01-06  1827.4400
...               ...
2006-12-21  2064.8175
2006-12-22  2060.8675
2006-12-27  2062.6000
2006-12-28  2064.0075
2006-12-29  2066.0975

Conclusion

Enough quickstarting! The following has bee shown:

  • Using a DataFrame as input where the Close column was automatically selected by the indicator as the input (following the de-facto standard)

  • Using a specific column of the DataFrame such as High with df.High as input, to run the calculations on it.

  • Fetching the calculation and getting a DataFrame, by using the df attribute of the result with for example df.sma)

  • Changing the default calculation parameters of the indicator by providing a named argument with period=4

The following section will cover additional topics and expand on things like how the automatic selection of the inputs is done (and when being specific about it is important), using indicators with multiple inputs and outputs, reusing results with other indicator, plotting and more.