The Python Quants

Python for Finance Course¶

Module 1 — Financial Data

The Python Quants GmbH

Requirements¶

Make sure to have installed locally a comprehensive Python installation like the Anaconda Python distribution (code Python 3.6 tested).

Alternatively, and more easily, register on the Quant Platform where you can execute this Jupyter Notebook file in the browser. After registration, you find all Jupyter Notebook files of this course in the folder pffcourse.

Necessary Imports¶

Working with Python in general requires you to import certain modules/packages from the standard library or from the so-called scientific or PyData stack (i.e. from third parties).

import numpy as np  # array operations
import pandas as pd  # time series management
from pylab import plt
plt.style.use('ggplot')
# put all plots in the notebook itself
%matplotlib inline

Retrieving Stock Price Data¶

We retrieve end-of-day (EOD) price data for certain instruments (data source: Thomson Reuters Eikon API).

data = pd.read_csv('https://hilpisch.com/tr_eikon_eod_data.csv',
                   index_col=0, parse_dates=True)

The AAPL object is of type DataFrame.

type(data)

pandas.core.frame.DataFrame

DataFrame objects provide a wealth of (convenience) methods.

data.info()  # meta information

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2216 entries, 2010-01-01 to 2018-06-29
Data columns (total 12 columns):
AAPL.O    2138 non-null float64
MSFT.O    2138 non-null float64
INTC.O    2138 non-null float64
AMZN.O    2138 non-null float64
GS.N      2138 non-null float64
SPY       2138 non-null float64
.SPX      2138 non-null float64
.VIX      2138 non-null float64
EUR=      2216 non-null float64
XAU=      2211 non-null float64
GDX       2138 non-null float64
GLD       2138 non-null float64
dtypes: float64(12)
memory usage: 225.1 KB

Working with the Data¶

Let us inspect the final five rows of the data set.

data.tail()  # final five rows

You can easily select single or multiple columns of a DataFrame object.

data['AAPL.O'].head()  # first five rows of single column

Date
2010-01-01          NaN
2010-01-04    30.572827
2010-01-05    30.625684
2010-01-06    30.138541
2010-01-07    30.082827
Name: AAPL.O, dtype: float64

data[['AAPL.O', 'MSFT.O']].tail()  # last five rows of 2 columns

Similarly, you can select single or multiple rows.

data.loc['2017-10-06']  # single row via index value

AAPL.O     155.3000
MSFT.O      76.0000
INTC.O      39.6300
AMZN.O     989.5800
GS.N       246.0200
SPY        254.3700
.SPX      2549.3300
.VIX         9.6500
EUR=         1.1733
XAU=      1275.3000
GDX         23.5900
GLD        121.0900
Name: 2017-10-06 00:00:00, dtype: float64

data.iloc[:2]  # two rows via index numbers

Data Visualization¶

There is also a powerful method for plotting.

data['AAPL.O'].plot(figsize=(10, 6));

Let us calculate the Apple stock price log returns.

# fully vectorized operation for log return calculation
rets = np.log(data['AAPL.O'] / data['AAPL.O'].shift(1))

The log returns can then be visualized via a histogram.

rets.hist(figsize=(10, 6), bins=35);

Calculating a Moving Average¶

pandas provides, among others, convenience functions for the calculation of moving averages.

# fully vectorized calculation of 50 days moving average/trend
data['MA50'] = data['AAPL.O'].rolling(50).mean()

data[['AAPL.O', 'MA50']].plot(figsize=(10, 6));

Closing Remarks¶

This closes the first module of the Python for Finance email course. In this module, you have learned:

how to retrieve stock price data with pandas (datareader)
how to look up information about and data stored in a DataFrame object
how to select columns and rows from a DataFrame object
how to visulize data stored in a DataFrame object
how to do vectorized calculations with DataFrame objects (log returns, moving average)

What you have learned today builds the basis for many important use cases for Python in Finance.

Exercises¶

In order to master the material of this module, do the following:

replace Apple by another symbol, eg. MSFT.O or .SPX
read the 10 Minutes to pandas tutorial under http://pandas.pydata.org/pandas-docs/stable/10min.html
discover plotting options under http://pandas.pydata.org/pandas-docs/stable/visualization.html
calculate other statics with e.g. .rolling(window=X).std() or .rolling(window=X).max()
discover more computational tools of pandas under http://pandas.pydata.org/pandas-docs/stable/computation.html

References¶

You find background information for the topics covered in this module in the following books:

Hilpisch, Yves (2018): Python for Finance. 2nd ed, O'Reilly, ch. 8.
McKinney, Wes (2012): Python for Data Analysis. O'Reilly.
VanderPlas, Jake (2016): Python Data Science Handbook. O'Reilly.

The Python Quants

http://tpq.io | @dyjh | team@tpq.io

Quant Platform | http://quant-platform.com

Python for Finance | Python for Finance @ O'Reilly

Derivatives Analytics with Python | Derivatives Analytics @ Wiley Finance

Listed Volatility and Variance Derivatives | Listed VV Derivatives @ Wiley Finance

	AAPL.O	MSFT.O	INTC.O	AMZN.O	GS.N	SPY	.SPX	.VIX	EUR=	XAU=	GDX	GLD
Date
2018-06-25	182.17	98.39	50.71	1663.15	221.54	271.00	2717.07	17.33	1.1702	1265.00	22.01	119.89
2018-06-26	184.43	99.08	49.67	1691.09	221.58	271.60	2723.06	15.92	1.1645	1258.64	21.95	119.26
2018-06-27	184.16	97.54	48.76	1660.51	220.18	269.35	2699.63	17.91	1.1552	1251.62	21.81	118.58
2018-06-28	185.50	98.63	49.25	1701.45	223.42	270.89	2716.31	16.85	1.1567	1247.88	21.93	118.22
2018-06-29	185.11	98.61	49.71	1699.80	220.57	271.28	2718.37	16.09	1.1683	1252.25	22.31	118.65

	AAPL.O	MSFT.O
Date
2018-06-25	182.17	98.39
2018-06-26	184.43	99.08
2018-06-27	184.16	97.54
2018-06-28	185.50	98.63
2018-06-29	185.11	98.61

	AAPL.O	MSFT.O	INTC.O	AMZN.O	GS.N	SPY	.SPX	.VIX	EUR=	XAU=	GDX	GLD
Date
2010-01-01	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1.4323	1096.35	NaN	NaN
2010-01-04	30.572827	30.95	20.88	133.9	173.08	113.33	1132.99	20.04	1.4411	1120.00	47.71	109.8