We transformed the data set in hours instead of each 5 minutes.
For the forecast model, we are using Python. You can find the code below.
We use (‘ARIMA ‘, (3, 1, 1)) which is the most accurate with minimum forecast error.
The forecast we have for next period is 8532.203317 which is close with the information we receive from the crypto market charts.

Durbin-Watson Test= [ 1.99735941]
Forecast: 8532.203317
MFE = -6.14636345999
MAE = 0.000720205767632
Test RMSE: 114.202
import numpy
from pandas import Series
from matplotlib import pyplot
import matplotlib.pyplot as plt
from pandas.tools.plotting import autocorrelation_plot
from statsmodels.graphics.tsaplots import plot_pacf
from statsmodels.graphics.tsaplots import plot_acf
from pandas import DataFrame
from statsmodels.tsa.arima_model import ARIMA
import statsmodels.api as sm
series = Series.from_csv(‘price_data_3.csv’, header=0)
split_point = len(series)
dataset, validation = series[0:split_point], series[split_point:]
print(‘Dataset %d, Validation %d’ % (len(dataset), len(validation)))
dataset.to_csv(‘dataset.csv’)
validation.to_csv(‘validation.csv’)
print(series)
X=series.values
# Plot Data
pyplot.plot(X)
pyplot.show()
#
#print(X)
print(‘ACF Original Data’)
autocorrelation_plot(series)
pyplot.show()
#
# plot PACF Partial Autocorrelation for ORIGINAL Data
print(‘PACF ORIGINAL Data’)
pyplot.figure()
plot_pacf(series, lags=60)
pyplot.show()
#
#
#Differenced Data
diff = list()
for i in range(1, len(X)):
value = X[i] – X[i – 1]
diff.append(value)
# Plot Differenced Data
pyplot.plot(diff)
pyplot.show()
print(‘ACF Differenced Data ‘)
plot_acf(diff)
pyplot.show()
print(‘PACF Differenced Data’)
pyplot.figure()
plot_pacf(diff, lags=60)
pyplot.show()
#
series = Series.from_csv(‘price_data_3.csv’, header=0)
print(series)
X=series.values
X=numpy.array(X).astype(‘float’)
order=(3,1,1) # (1,1,1)
#
print ‘START ARIMA’,order,’ for price_data_3 Example ‘
model = ARIMA(X, order)
model_fit = model.fit(disp=0)
print(model_fit.summary())
# plot residual errors
residuals = DataFrame(model_fit.resid)
print(‘Plot Residuals’)
residuals.plot()
pyplot.show()
residuals.plot(kind=’kde’)
pyplot.show()
print(residuals.describe())
#
print ‘Autocorrelation plot for Residuals and ARIMA’,order,’ for price_data_3 Example ‘
plot_acf(residuals)
pyplot.show()
#
print ‘Partial Autocorrelation plot for Residuals and ARIMA’,order,’ for price_data_3 Example’
plot_pacf(residuals)
pyplot.show()
#
#
DWT = sm.stats.durbin_watson(residuals)
print ‘Durbin-Watson Test=’,DWT
#
r,q,p = sm.tsa.acf(residuals, qstat=True)
data = numpy.c_[range(1,41), r[1:], q, p]
table = DataFrame(data, columns=[‘lag’, “AC”, “Q”, “Prob(>Q)”])
print table.set_index(‘lag’)
print
size = int(1350)
train = X[0:size] # множество за …..
test = X[size:len(X)] # множество за тестване
history = [x for x in train]
predictions = list()
for t in range(len(test)):
model = ARIMA(history, order)
model_fit = model.fit(disp=0)
output = model_fit.forecast()
yhat = output[0]
predictions.append(yhat)
obs = test[t]
history.append(obs)
print(‘=%i, predicted=%f, expected(real)=%f’ % (size+t,yhat, obs))
def difference(dataset, interval=1):
diff = list()
for i in range(interval, len(dataset)):
value = dataset[i] – dataset[i – interval]
diff.append(value)
return numpy.array(diff)
def inverse_difference(history, yhat, interval=1):
return yhat + history[-interval]
# load dataset
series = Series.from_csv(‘dataset.csv’, header=None)
# seasonal difference
X = series.values
days_in_year = 365
differenced = difference(X, days_in_year)
# fit model
model = ARIMA(differenced, order=(3,1,1))
model_fit = model.fit(disp=0)
# one-step out-of sample forecast
forecast = model_fit.forecast()[0]
# invert the differenced forecast to something usable
forecast = inverse_difference(X, forecast, days_in_year)
print(‘Forecast: %f’ % forecast)
print
print ‘Durbin-Watson Test=’,DWT
print(‘ARIMA ‘,order)
MFE = (predictions-test).mean()
print “MFE = “,MFE
import numpy
MAE = (numpy.abs((predictions-test).mean()) / predictions).mean()
print “MAE = “, MAE
#
from sklearn.metrics import mean_squared_error
rmse = numpy.sqrt(mean_squared_error(test, predictions))
print(‘Test RMSE: %.3f’ % rmse)
# plot
pyplot.plot(test)
pyplot.plot(predictions, color=’red’)
pyplot.show()
##########
Please find for your reference the attached word files with graphs
4 thoughts on “data.nerds: THE A.I. CRYPTO TRADER”
Amazing work, guys! WOW!
SO after DW test how am I supposed to evaluate the rest – execute it on the fly in my brain or what?
I would like to say that following the Box-Jenkins approach is always a good point to start a predictive modeling. Hence, you’ve done a good choice. Also, even though the summary of findings is brief, I appreciate that you have presented major research results.
Yet, as I might see, the issue of missing data is not addressed, it is rather tacked in a way by conversion onto hourly basis. However, some dates and times are absent in the sample set, I mean the date vector is incomplete, and even if you decide to change the sampling rate, the latter problem is still unresolved. Consequently, I might advise (mainly from the point of consideration your future work as analysts) the following: Every sound analysis starts with investment of significant time and efforts in the process of data prep.
I don’t see any data prep and clean. How did you deal with the missing values?
Also your code is unreadable, you don’t have any plots, and this does not really seems like article. You could update it with .ipynb file or html directly to the site.