Getting started with dstream_excel¶
Install¶
Install this package via:
pip install dstream_excel
You must also be running Windows, have Excel installed, have Eikon installed, and have the Eikon plugin for Excel installed.
There is one setting in the Excel plugin that makes or breaks this automation. Go to the Thompson Reuters Datatream tab, click Options. In Calculation Options, make sure “Suppress Auto Calculation on Open” is unchecked.
Also ensure that when you go to the Thomson Reuters and Thomson Reuters Datastream
tabs, that the buttons are not grayed out. If they are, you have to go
to the Thompson Reuters tab and click the “Offline” button. This should
launch Eikon on your system and possibly require you to log in. It may
be necessary to click the “Offline” button again after that. Once you are
logged in, all the buttons should be highlighted on both tabs. Then you can
close Excel and begin using dstream_excel
.
Overview¶
This package downloads the data in four main steps:
Opens a blank Excel workbook
Runs the Datastream Excel plugin functions for sets of companies in each call, by default 20 firms at a time.
After each function, it extracts the data and saves it into a CSV file
After all data is collected, it reads the data from all the generated CSV files and combines into a single CSV file.
Usage¶
There are three main functions in the package.
download_datastream()
is the main function, which
is just a wrapper to call the other two main functions which complete
the four steps described above: download_datastream_save_to_csvs()
and combine_all_datastream_csv()
.
This is a simple example:
from dstream_excel import download_datastream
download_datastream(
'output_folder',
['@AAPL','@MSFT'], # Datastream-style tickers
['UP', 'RI', 'NOSH', 'VO'], # Variable names from Datastream
outpath='data.csv',
freq='D',
begin='-1Y'
)
How Fast is it?¶
This is actually the second approach I have taken in this library to downloading the data. This newer approach is 5-10x as fast as the old one. It allows downloading thousands of firms’ data overnight.
It is possible to squeeze a bit more speed by increasing num_symbols_per_call, which is how many firms are being downloaded in one Excel function call. The default is 20, but you may try increasing it, especially if you are not downloading many variables. For example:
from dstream_excel import download_datastream
download_datastream(
'output_folder',
['@AAPL','@MSFT'],
['UP', 'RI', 'NOSH', 'VO'],
outpath='data.csv',
freq='D',
begin='-1Y',
num_symbols_per_call=40 # can keep going higher until it gets unstable
)
Once you increase this too much, the Excel function calls will start failing.
Troubleshooting¶
Failure during Download¶
Hopefully the download_datastream()
function works end-to-end. But
the steps 2 and 3 where the data is downloaded may fail. If this happens, get your
Excel working manually again (may
require a restart or re-enabling the Eikon plugin), then you can run
download_datastream()
with restart=False to
continue where it left off. restart=False is the default so you don’t actually
need to pass it, but make sure that if you used restart=True before that you
have removed it. All the other options should be the same as your
original call to the function.
For example:
from dstream_excel import download_datastream
download_datastream(
'output_folder',
['@AAPL','@MSFT'],
['UP', 'RI', 'NOSH', 'VO'],
outpath='data.csv',
freq='D',
begin='-1Y',
# optional, but don't set it to True if you want it to pick up where it left off!
restart=False
)
It’s Failing a Lot¶
The stability of the apporoach depends on how much data is being downloaded in each function call. This depends both on num_symbols_per_call and the number of variables being downloaded. I have set the default num_symbols_per_call to be reasonable for up to 10-20 variables. If you are downloading more, you may need to decrease num_symbols_per_call:
from dstream_excel import download_datastream
download_datastream(
'output_folder',
['@AAPL','@MSFT'],
['UP', 'RI', 'NOSH', 'VO'],
outpath='data.csv',
freq='D',
begin='-1Y',
num_symbols_per_call=10 # keep setting lower if you are running into lots of crashing
)
I have planned for a future version to automatically set a good default num_symbols_per_call based on the number of variables being passed, but for now it is 20 by default for any number of variables.
Failure during Combining¶
Sometimes it may happen that the data download finishes fine, but it fails during
the combining CSV step. So you are left with a lot of CSV files, not one single
file to work with. If this happens, you can manually run the
combine_all_datastream_csv()
function which is usually called as part of
download_datastream()
. For example:
from dstream_excel import combine_all_datastream_csv
combine_all_datastream_csv('output_folder', outpath='data.csv')
Note that calling it this way, it will attempt to resume from where it left off. If for some reason this still doesn’t work, you can also try:
combine_all_datastream_csv('output_folder', outpath='data.csv', restart=True)
The restart=True when used in combine_all_datastream_csv()
is merely
talking about the combining process, not the download process. It will not remove
or overwrite anything you have downloaded. It will just start the combining process
over from scratch.