We transformed the data set in hours instead of each 5 minutes.

For the forecast model, we are using Python. You can find the code below.

We use (‘ARIMA ‘, (3, 1, 1)) which is the most accurate with minimum forecast error.

The forecast we have for next period is 8532.203317 which is close with the information we receive from the crypto market charts.

￼￼

Durbin-Watson Test= [ 1.99735941]

Forecast: 8532.203317

MFE = -6.14636345999

MAE = 0.000720205767632

Test RMSE: 114.202

import numpy

from pandas import Series

from matplotlib import pyplot

import matplotlib.pyplot as plt

from pandas.tools.plotting import autocorrelation_plot

from statsmodels.graphics.tsaplots import plot_pacf

from statsmodels.graphics.tsaplots import plot_acf

from pandas import DataFrame

from statsmodels.tsa.arima_model import ARIMA

import statsmodels.api as sm

series = Series.from_csv(‘price_data_3.csv’, header=0)

split_point = len(series)

dataset, validation = series[0:split_point], series[split_point:]

print(‘Dataset %d, Validation %d’ % (len(dataset), len(validation)))

dataset.to_csv(‘dataset.csv’)

validation.to_csv(‘validation.csv’)

print(series)

X=series.values

# Plot Data

pyplot.plot(X)

pyplot.show()

#

#print(X)

print(‘ACF Original Data’)

autocorrelation_plot(series)

pyplot.show()

#

# plot PACF Partial Autocorrelation for ORIGINAL Data

print(‘PACF ORIGINAL Data’)

pyplot.figure()

plot_pacf(series, lags=60)

pyplot.show()

#

#

#Differenced Data

diff = list()

for i in range(1, len(X)):

value = X[i] – X[i – 1]

diff.append(value)

# Plot Differenced Data

pyplot.plot(diff)

pyplot.show()

print(‘ACF Differenced Data ‘)

plot_acf(diff)

pyplot.show()

print(‘PACF Differenced Data’)

pyplot.figure()

plot_pacf(diff, lags=60)

pyplot.show()

#

series = Series.from_csv(‘price_data_3.csv’, header=0)

print(series)

X=series.values

X=numpy.array(X).astype(‘float’)

order=(3,1,1) # (1,1,1)

#

print ‘START ARIMA’,order,’ for price_data_3 Example ‘

model = ARIMA(X, order)

model_fit = model.fit(disp=0)

print(model_fit.summary())

# plot residual errors

residuals = DataFrame(model_fit.resid)

print(‘Plot Residuals’)

residuals.plot()

pyplot.show()

residuals.plot(kind=’kde’)

pyplot.show()

print(residuals.describe())

#

print ‘Autocorrelation plot for Residuals and ARIMA’,order,’ for price_data_3 Example ‘

plot_acf(residuals)

pyplot.show()

#

print ‘Partial Autocorrelation plot for Residuals and ARIMA’,order,’ for price_data_3 Example’

plot_pacf(residuals)

pyplot.show()

#

#

DWT = sm.stats.durbin_watson(residuals)

print ‘Durbin-Watson Test=’,DWT

#

r,q,p = sm.tsa.acf(residuals, qstat=True)

data = numpy.c_[range(1,41), r[1:], q, p]

table = DataFrame(data, columns=[‘lag’, “AC”, “Q”, “Prob(>Q)”])

print table.set_index(‘lag’)

print

size = int(1350)

train = X[0:size] # множество за …..

test = X[size:len(X)] # множество за тестване

history = [x for x in train]

predictions = list()

for t in range(len(test)):

model = ARIMA(history, order)

model_fit = model.fit(disp=0)

output = model_fit.forecast()

yhat = output[0]

predictions.append(yhat)

obs = test[t]

history.append(obs)

print(‘=%i, predicted=%f, expected(real)=%f’ % (size+t,yhat, obs))

def difference(dataset, interval=1):

diff = list()

for i in range(interval, len(dataset)):

value = dataset[i] – dataset[i – interval]

diff.append(value)

return numpy.array(diff)

def inverse_difference(history, yhat, interval=1):

return yhat + history[-interval]

# load dataset

series = Series.from_csv(‘dataset.csv’, header=None)

# seasonal difference

X = series.values

days_in_year = 365

differenced = difference(X, days_in_year)

# fit model

model = ARIMA(differenced, order=(3,1,1))

model_fit = model.fit(disp=0)

# one-step out-of sample forecast

forecast = model_fit.forecast()[0]

# invert the differenced forecast to something usable

forecast = inverse_difference(X, forecast, days_in_year)

print(‘Forecast: %f’ % forecast)

print

print ‘Durbin-Watson Test=’,DWT

print(‘ARIMA ‘,order)

MFE = (predictions-test).mean()

print “MFE = “,MFE

import numpy

MAE = (numpy.abs((predictions-test).mean()) / predictions).mean()

print “MAE = “, MAE

#

from sklearn.metrics import mean_squared_error

rmse = numpy.sqrt(mean_squared_error(test, predictions))

print(‘Test RMSE: %.3f’ % rmse)

# plot

pyplot.plot(test)

pyplot.plot(predictions, color=’red’)

pyplot.show()

###### ##########

Please find for your reference the attached word files with graphs

## 4 thoughts on “data.nerds: THE A.I. CRYPTO TRADER”

Amazing work, guys! WOW!

SO after DW test how am I supposed to evaluate the rest – execute it on the fly in my brain or what?

I would like to say that following the Box-Jenkins approach is always a good point to start a predictive modeling. Hence, you’ve done a good choice. Also, even though the summary of findings is brief, I appreciate that you have presented major research results.

Yet, as I might see, the issue of missing data is not addressed, it is rather tacked in a way by conversion onto hourly basis. However, some dates and times are absent in the sample set, I mean the date vector is incomplete, and even if you decide to change the sampling rate, the latter problem is still unresolved. Consequently, I might advise (mainly from the point of consideration your future work as analysts) the following: Every sound analysis starts with investment of significant time and efforts in the process of data prep.

I don’t see any data prep and clean. How did you deal with the missing values?

Also your code is unreadable, you don’t have any plots, and this does not really seems like article. You could update it with .ipynb file or html directly to the site.