Team name “The diggers”
Team motto : “In God we trust, all others must bring data”
from IPython.display import Image
Image(filename='logo.png')
Team mentors:
Kostadin Bashev (@bashev), Adelin Lalev(@lalev), Atanaska Reshetkova
The diggers team:
Ivan Dragomirov (@idragomirov), Ivalina Foteva(ifoteva), Beatris Lyubenova (beatrislyubenova)
Team Toolset:
IBM SPSS Modeler
IBM SPSS
Keras
Tensor Flow
R/R Studio
1.Business Understanding¶
The cryptocurrency trading is novell and dynamic industry that has its own pecularities. A single news article can increase or decrease their value almost double. It is not well regulated which means that there is high chance of system abuse. And high volatility is related to the fact, that behind it there is no intrinsic value.
2.Data Understanding¶
The problems we are facing when trying to predict Bitcoin prices are related to the fact that Bitcoin prices are influenced by factors that are difficult to express in numeric form such as, for example, the content of news articles of the day. The difficulty of gaining accurate numeric predictors of the values of different cryptocurrencies is mirrored in the dataset, presented for the challenge. Our preliminary analysis indicates that there is little to no correlation between daily volumes, total amounts in circulation or the market capitalization of each cryptocurrency and its value. In addition, we found that the values of all major cryptocurrencies are strongly correlated. Given this fact we are willing to make educated guess that Bitcoin price actually influences all other values.
from IPython.display import Image
Image(filename='graphic1.png')
These conclusions determine our approach towards the current task. We tried to approach the problem from two perspectives – try to predict volatility which tends to repeat itself in relatively regular intervals and to predict the trend. Our first approach was to try various ARIMA models. In the beginning, just for a thaw, we used the TBATS model of the R package forecast (version 8.1) with only one cryptocurrency (BTC) time series to predict the future price in USD. We tried TBATS only to prove if the series containing some hidden “multiseasonal” effects. Our second approach to the problem consisted of training an LSTM neural network.
3. Data Preparation¶
The preparation of the data reflects the facts and assumptions we described in the previous section.
3.1. Preparation for the ARIMA modeling¶
We reduce our input data to the columns that describe the prices of the 20 major cryptocurrencies we intend to trade. This data is tested for periodic structure in the course of 24hours.
3.2. Preparation for the NN modeling¶
When preparing the NN models, we also limit our input data to the 20 major currencies we try to predict. Moreover, since all of the mentioned cryptocurrencies prices seem to depend on Bitcoin price, so we use only this price as predictor for each of the other 19 currencies. This leaves us with little information to predict the Bitcoin price itself. We can only exploit the fact that Bitcoin current price depends to some point on its previous values.
We train one LSTM neural network for each of the 20 major cryptocurrencies. As input to the LTSM neural network we enter the previous 20 time periods, since we have determined experimentally that further the “window” goes back in time, the lesser useful information we can extract for the current price. This is in line with observations made when instruments like Bollinger bands were conceived for the first time.
As an experiment to validate further this number, we have tried to look back different periods as short as 3 and as long as 40. The end results seem to indicate that shorter lookback improves the directional symmetry of our predictions, but leads to much worse predictions for the scale of the movements. Longer lookback (2030) improves the differences between predicted and real prices, but worsens the directional symmetry. Going further back does not seem to improve the predicted scale of the movements and worsens the directional symmetry even more.
Once the data is shaped in appropriate form, it is fed to LTSM network with architecture

LTSM hidden layer (32 neurons)

OUTPUT (1 neuron)
4.Modeling¶
4.1. LSTM Network¶
The actual implementation of the neural network is done in Python by using Keras and TensorFlow. We have separated our actual code into two fundamental parts. We use separate source files to train 20 LSTM networks – one for each individual cryptocurrency.
The network is trained on the first 13251 data points (one data point consists of the prices for the previous 20 points in time) and is tested on the remaining ones. Then predictions of the network are compared with the real values by using Mean Average Percentage Error (MAPE) and Directional Symmetry criteria.
One trained properly, our model is saved in an HDF5 file named ‘xxxxxxx.h5’ where xxxxx is the name of the cryptocurrency in question.
Since we use autoregressive approach in predicting Bitcoin, but use 20 last prices of Bitcoin together with 20 previous prices of each other cryptocurrency we are trying to predict, we post only 2 relevant source files (others are available as hyperlinks). The first one deals with Bitcoin and the second one deals with Etherium. The second one is representative for the other 18 files that actually look the same with only the code and the name of currency replaced.
bitcoin.py  training our neural network for predicting the prices of Bitcoin¶
import numpy as np
import pandas as pd
import tensorflow as tf
from keras.models import Sequential
from keras.models import load_model
from keras.layers import LSTM
from keras.layers import Dense
import matplotlib.pyplot as plt
# Subroutines
def compute_mape(test, prediction):
return np.mean(np.abs((test  prediction) / test)) * 100
def compute_ds(test, prediction):
oldtest = test[:1]
newtest = test[1:]
oldprediction = prediction[:1]
newprediction = prediction[1:]
tmp1 = newtestoldtest
tmp2 = newpredictionoldprediction
tmp = np.multiply(tmp1, tmp2.T)
percent = ((np.where(tmp>0)[0].shape[0]) * 100)/(oldtest.shape[0])
return percent
# Now this subroutine deserves
def compute_adjustment(testX, predictionX, predictionY):
adj = (np.sum(predictionY)  np.sum(predictionX))/predictionX.shape[0]
return predictionY + adj
# Before we start, a note on cheating.
# The models won't be tested on live data, so if we could get our hands
# on some data that is more recent than the last point in our dataset,
# we could cheat by including it in our training set covertly.
# This will result overfitting of our model to extremes, but since we
# test with the same data we train, testing process will miss it
# and we will appear to have achieved excelent results in
# predicting the "future".
# Then, to cover all up, we could blame the unreproducability of our
# training results on the the random seed generator. We may seed it
# (in the source of the faked solution we will present instead of the
# real, covert one) with current system time for example.
# We refuse to do this and seed our random generators with predetermined
# numbers, which should guarantee total reproducibility of our results,
# including training the model.
# Note: We train on CPU and we are not sure if the rounding up mechanics
# work the same way on the CPU as on GPU, so we don't know if training our
# net on GPU would change the end results.
# Keras is supposed to use numpy pseudorandom generator
# and tensorflow has its own.
# We seed both.
np.random.seed(1)
tf.set_random_seed(2)
data = pd.read_csv('/home/datacrunch/Downloads/matrix_one_file/price_data.csv')
# We start with the BitCoin model.
# The preliminary analysis indicates that:
# 1. The price of Bitcoin is heavily correlated with the prices of all major
# cryptocurrencies. We are willing to bet that the bitcoin price influences the
# prices of the other cryptocurrencies instead of the opposite way around.
# This means that the lesser known crpytocurrencies will just add noise in our
# model.
# 2. The price of Bitcoin is not influenced by the volume of the trade, the
# total number of emited bitcoins and so on.
# Based on these observations, we conclude that the only semireliable
# predictor for the current price comes from the prices for the previous periods.
# We will approach the problem "Bollinger way" choosing to look 20 or so steps
# back and feeding them into LSTM neural network.
# We split the data at row 13251 to get train and test set
splitpoint = 13251
# Bitcoin only.
bitcoin = data['1442']
# We look lookback periods back in time
lookback = 20
# First we fill the missing data
bitcoin = bitcoin.fillna(method='pad')
# Then we scale and center the data
scalefactor = bitcoin.max()
bitcoin = bitcoin / scalefactor
bitcoin = bitcoin  0.5
# Make the timeframes
timeframes = np.array(bitcoin[0:lookback])
for c in range(1, bitcoin.count()lookback):
timeframes = np.vstack((timeframes, np.array(bitcoin[c:c+lookback])))
# Then split the dataset into traindata and testdata
(trainX, testX) = np.split(timeframes, [splitpoint])
(trainY, testY) = np.split(np.array(bitcoin[lookback:]), [splitpoint])
# And shape as input to LSTM layer of Keras
trainX = np. reshape(trainX, (trainX.shape[0], trainX.shape[1], 1))
testX = np. reshape(testX, (testX.shape[0], testX.shape[1], 1))
# Init the model
model = Sequential()
# LSTM layer
model.add(LSTM(32, input_shape=(lookback,1), return_sequences = False))
#model.add(LSTM(32, return_sequences = False))
# Output layer
model.add(Dense(1))
# Compile the model
model.compile(loss='mean_squared_error', optimizer='adam')
# If we need (during the development) to load and train our model
# a little bit more to determine the number of epochs needed.
#model = load_model('bitcoin.h5')
# This is the timeconsuming step  train the model
model.fit(trainX, trainY, epochs=60, verbose=2)
# We save the model at development time to avoid having to train it compute_mape
# repeatedly while hammering the code that follows.
model.save('bitcoin.h5')
# We comment out the previous lines and load the trained model
#model = load_model('bitcoin.h5')
predictY = model.predict(testX, verbose = 1)
# "De"scale, "Decenter"
predictY = (predictY + 0.5) * scalefactor
testY = (testY + 0.5) * scalefactor
# Some data visualisation
# PredictY and TestY on all validation points
# The scale will hide the differences.
plt.plot(testY)
plt.plot(predictY)
plt.show()
# Last 100 datapoints of TestY and PredictY
plt.plot(testY[100:])
plt.plot(predictY[100:])
plt.show()
# Last 10 datapoints of TestY and PredictY
plt.plot(testY[10:])
plt.plot(predictY[10:])
plt.show()
# We export testY and PredictY as CSV so we can share with
# teammates which try ARIMA models, so we can measure
# eachother MAPE and DS
p = pd.DataFrame(data=np.vstack((testY, predictY[:,0])).T)
p.columns = ['testY', 'predictY']
p.to_csv('bitcoinmape_ds_bitcoin.csv')
# Nothing left but to compute MAPE / DS ?
mape = compute_mape(testY, predictY)
print("MAPE is %f percent"%mape)
ds = compute_ds(testY, predictY)
print("DS is %s percent"%ds)
etherium.py  Training our network for predicting etherium prices¶
import numpy as np
import pandas as pd
import tensorflow as tf
from keras.models import Sequential
from keras.models import load_model
from keras.layers import LSTM
from keras.layers import Dense
import matplotlib.pyplot as plt
# Subroutines
def compute_mape(test, prediction):
return np.mean(np.abs((test  prediction) / test)) * 100
def compute_ds(test, prediction):
oldtest = test[:1]
newtest = test[1:]
oldprediction = prediction[:1]
newprediction = prediction[1:]
tmp1 = newtestoldtest
tmp2 = newpredictionoldprediction
tmp = np.multiply(tmp1, tmp2.T)
percent = ((np.where(tmp>0)[0].shape[0]) * 100)/(oldtest.shape[0])
return percent
# Now this subroutine deserves
def compute_adjustment(testX, predictionX, predictionY):
adj = (np.sum(predictionY)  np.sum(predictionX))/predictionX.shape[0]
return predictionY + adj
# Before we start, a note on cheating.
# The models won't be tested on live data, so if we could get our hands
# on some data that is more recent than the last point in our dataset,
# we could cheat by including it in our training set covertly.
# This will result overfitting of our model to extremes, but since we
# test with the same data we train, testing process will miss it
# and we will appear to have achieved excelent results in
# predicting the "future".
# Then, to cover all up, we could blame the unreproducability of our
# training results on the the random seed generator. We may seed it
# (in the source of the faked solution we will present instead of the
# real, covert one) with current system time for example.
# We refuse to do this and seed our random generators with predetermined
# numbers, which should guarantee total reproducibility of our results,
# including training the model.
# Note: We train on CPU and we are not sure if the rounding up mechanics
# work the same way on the CPU as on GPU, so we don't know if training our
# net on GPU would change the end results.
# Keras is supposed to use numpy pseudorandom generator
# and tensorflow has its own.
# We seed both.
np.random.seed(1)
tf.set_random_seed(2)
data = pd.read_csv('/home/datacrunch/Downloads/matrix_one_file/price_data.csv')
# We start with the BitCoin model.
# The preliminary analysis indicates that:
# 1. The price of Bitcoin is heavily correlated with the prices of all major
# cryptocurrencies. We are willing to bet that the bitcoin price influences the
# prices of the other cryptocurrencies instead of the opposite way around.
# This means that the lesser known crpytocurrencies will just add noise in our
# model.
# 2. The price of Bitcoin is not influenced by the volume of the trade, the
# total number of emited bitcoins and so on.
# Based on these observations, we conclude that the only semireliable
# predictor for the current price comes from the prices for the previous periods.
# We will approach the problem "Bollinger way" choosing to look 20 or so steps
# back and feeding them into LSTM neural network.
# We split the data at row 13251 to get train and test set
splitpoint = 13251
# Bitcoin id
# Other id
bitcoinid = '1442'
otherid = '1443'
bitcoin = data[bitcoinid]
other = data[otherid] # Etherium
# We look lookback periods back in time
lookback = 20
# First we fill the missing data
bitcoin = bitcoin.fillna(method='pad')
other = other.fillna(method='pad')
# Then we scale and center the data
scalefactor_bitcoin = bitcoin.max()
bitcoin = bitcoin / scalefactor_bitcoin
bitcoin = bitcoin  0.5
scalefactor_other = other.max()
other = other / scalefactor_other
other = other  0.5
combined = np.array([bitcoin, other])
combined = combined.T
# Make the timeframes
#
timeframes, drop = np.split((combined), [lookback], axis=0)
#print(timeframes)
#print(timeframes.shape)
timeframes = timeframes.reshape(1, lookback, 2)
#print(timeframes.shape)
for c in range(1, bitcoin.count()lookback):
drop1, newframe, drop2 = np.split((combined), [c,c+lookback], axis=0)
newframe = newframe.reshape(1, lookback, 2)
timeframes = np.concatenate((timeframes, newframe), axis=0)
# Then split the dataset into traindata and testdata
(trainX, testX) = np.split(timeframes, [splitpoint])
(trainY, testY) = np.split(np.array(other[lookback:]), [splitpoint])
# And shape as input to LSTM layer of Keras
#trainX = np. reshape(trainX, (trainX.shape[0], trainX.shape[1], 1))
#testX = np. reshape(testX, (testX.shape[0], testX.shape[1], 1))
# Init the model
model = Sequential()
# LSTM layer
model.add(LSTM(32, input_shape=(lookback,2), return_sequences = False))
#model.add(LSTM(32, return_sequences = False))
# Output layer
model.add(Dense(1))
# Compile the model
model.compile(loss='mean_squared_error', optimizer='adam')
# If we need (during the development) to load and train our model
# a little bit more to determine the number of epochs needed.
#model = load_model('etherium.h5')
# This is the timeconsuming step  train the model
#model.fit(trainX, trainY, epochs=60, verbose=2)
# We save the model at development time to avoid having to train it compute_mape
# repeatedly while hammering the code that follows.
#model.save('etherium.h5')
# We comment out the previous lines and load the trained model
model = load_model('etherium.h5')
predictY = model.predict(testX, verbose = 1)
# "De"scale, "Decenter"
predictY = (predictY + 0.5) * scalefactor_other
testY = (testY + 0.5) * scalefactor_other
# Some data visualisation
# PredictY and TestY on all validation points
# The scale will hide the differences.
plt.plot(testY)
plt.plot(predictY)
plt.show()
# Last 100 datapoints of TestY and PredictY
plt.plot(testY[100:])
plt.plot(predictY[100:])
plt.show()
# Last 10 datapoints of TestY and PredictY
plt.plot(testY[10:])
plt.plot(predictY[10:])
plt.show()
# We export testY and PredictY as CSV so we can share with
# teammates which try ARIMA models, so we can measure
# eachother MAPE and DS
p = pd.DataFrame(data=np.vstack((testY, predictY[:,0])).T)
p.columns = ['testY', 'predictY']
p.to_csv('mape_ds_etherium.csv')
# Nothing left but to compute MAPE / DS ?
mape = compute_mape(testY, predictY)
print("MAPE is %f percent"%mape)
ds = compute_ds(testY, predictY)
print("DS is %s percent"%ds)
Link to all of our trained models and source files¶
from IPython.display import Image
Image(filename='graphic2.png')
Blue denotes the real prices of Etherium, while orange denotes the predicted prices
from IPython.display import Image
Image(filename='graphic3.png')
Blue denotes the real prices of Etherium, while orange denotes the predicted prices
from IPython.display import Image
Image(filename='graphic4.png')
Blue denotes the real prices of Etherium, while orange denotes the predicted prices
4.2 ARIMA modeling¶
5 thoughts on “The diggers – “In God we trust, all others must bring data””
Hi team :), Good work!!!
I like your idea to use NN. How you selected the NN structure? If you have described this step – sory, I have missed it.
May about missings there are better ways to handle, espacially when the missing period is long…
We have to admit that we, in fact, guessed. We tried different depths as well as drop layers.
Moreover, the guessing process was actually prepared in advance, because we have experimented previously with other financial data and especially data for Bitcoin movements from BitcoinDesk.
We have tried one, two, three layers of LSTM, intermixed with drop layers. For the neuron we also tried different numbers in each layer. We are pretty convinced that all this is an overkill due the way financial market moves. There is not that many useful information for the current price of any financial asset that many periods back EXCEPT if you try to predict volatility.
This concept was not about volatility but about predicting the direction of the market, which is, in our opinion, not very wise. On the other side, if we try to predict volatility (scale of change, not the direction), we cannot use that information for any real trades, because, as far as we know, nobody sells futures on Bitcoin. Not “legitimate ones” that is.
🙂 Yes, OK, Thanks for the long answer. It is always helpful to have domain expertise.
Good job! You probably needed to explain a bit more about the solution, but this is good
I think the exposition can be improved. The team may elaborate some points and it could do better job at putting everything together, including the data and source files. I can help edit this ….