Team members:
- Ana Popova, @anie
- Izabella Taskova, @ izabellataskova
- Kamelia Kosekova, @kameliak
- Kameliya Lokmadzhieva, @kameliyalokmadzhieva
- Nikolay Bojurin, @nikolay
Mentors: @boryana @alex-efremov @pepe
Team name: DAB PANDA
Team logo:
NB!!!! OUR NOTEBOOKS ARE AVAILABLE HERE: DAB PANDA Rmds
Data Understanding and Preparation
You may see our code with results and brief comments if you dab here
Cryptocurrencies…. Are they as cryptic as the name suggests? Perhaps we’ll know at the end of this journey. Let’s start dabbing!
As a start we need to take a look at what we have. And we have a loooot of files.
For level 1 we need to focus on predicting the prices of 20 cryptocurrencies therefore we focus on price series data. We may find that info either in the separate files for the different currencies or within price_data.csv. We opted for the latter. What we discovered was to say the least interesting…
In the data preparation stage we discovered a discrepancy. Originally, we have 15 267 observations. However, we know that for each day we should have 288 observations. The period under consideration covers ‘2018-01-17 11:25:00’ – ‘2018-03-23 14:00:00’, or 66 full days and 2 incomplete ones.
Let’s figure out how many observations in total we should have by breaking down that period:
– for day 1 (2018-01-17): 151 observations
– for the 64 full days: 18 432 observations
– for day 66 (2018-03-23): 168 observations
Woooow! There is a big difference between 15 267 and 18 752. We decide to find out what we are missing by creating a sequence for all date times within the period with a step of 5 minutes (you may see this in code form in our code – dab).
Next, we merge the data on coins with the full list of dates. We find that we get 1 extra observation, which is weird. So, we check for duplicates and discover 1! Then we get rid of the imposter row!
We learn that for each coin we have 3 578 missing values.
To tackle the missing values, we decide to look at the log differenced prices. On that basis we interpolate the missing values by simulating 20 rows of white noise. You may see our pretty plots before and after the interpolation in the link we have provided for this stage (dab).
After that we need to retrieve the original data so we do the reverse of log and diff with a lovely loop that performs some reverse engineering feats!
We then make an empty dataframe which we feed all of the data to – this is our orig set!
We plot the price series – before and after the interpolation! (see our Rpubs to see our pretty graphs – by dabbing here)
We look at the autocorrelation to see how the different coins relate to one another.
We look at ACF and PACF curves for the complete log differenced data – for all 20 coins.
Finally we look at the histograms for the 20 coins!
Modeling
Prelude – dab here
The orig dataframe is the one we use for modelling. It contains the missing observations we imputed form the initial dataset. We transform it from a dataframe to a time series object.
Next, we look for models that would be appropriate for the different coin prices. We look at combinations of (p, d, q), with p and q between 0 and 7. We perform this for all 20 coins and evaluate the models by considering the Ljung-Box p-value, sum of the squared residuals and Akaike criterion.
We discover that some models tend to perform well across multiple coins, such as ARIMA(0,1,6), ARIMA(6,1,0).
We also look at the residuals for all 20 coins for ARIMA(0,1,6) on log data.
Next, we provide a list of models with the highest p-values in our second Rpubs link – to see it dab here!
We attempted to apply ARIMA with rolling window by using a loop. We begin with a historical subset from the first 7 days or 2016 observations.
We managed to obtain results for several coins among which – Dash, Bitcoin Gold, Dogecoin, Ripple and Litecoin.
For Dash: see here
For Litecoin:
For Dogecoin:
Results for Dogecoin: > sqrt(mean((x[2017:length(x)]-ff[2017:length(x)])^2)) [1] 0.006681334 > > mean(ff[2017:length(ff)]) [1] -5.270246 > mean(x[2017:length(x)]) [1] -5.270284 > > sd(ff[2017:length(ff)]) [1] 0.2668967 > sd(x[2017:length(x)]) [1] 0.2669037
For Ripple:
For Ethereum:
sqrt(mean((y[2017:length(y)]-gg[2017:length(y)])^2)) [1] 0.003165124 > > mean(gg[2017:length(gg)]) [1] 6.960584 > mean(y[2017:length(y)]) [1] 6.96065 > > sd(gg[2017:length(gg)]) [1] 0.0230234 > sd(y[2017:length(y)]) [1] 0.02289996
For Bitcoin:
> sqrt(mean((y[2017:length(y)]-gg[2017:length(y)])^2)) [1] 0.002051064 > mean(gg[2017:length(gg)]) [1] 6.748966 > mean(y[2017:length(y)]) [1] 6.748907 > sd(gg[2017:length(gg)]) [1] 0.03380054 > sd(y[2017:length(y)]) [1] 0.03372321
Evaluation
7 thoughts on “DAB PANDA: The A.I. Crypto Trader”
I really like the way you’ve presented results. With RPubs everything is clearly outlined and the reader might follow easily the exhibition of major research steps backed up by the relevant code.
The data prep is conducted correctly. The applied methodology is appealing from theoretical point of view. The sliding window approach is correctly implemented. Considering the issue with computational efficiency, I might say that application the classical Box-Jenkins approach is a good choice.
Obviously, if you had two more hours, you would have accomplished in the same brilliant way (just as all previous sections) the last portion of your research including more comments on the accuracy and robustness of delivered forecasts.
Last, but not least, I would like to emphasize that the text of the article is written in a really nice manner, approaching the reader and dragging with the very first paragraph their attention.
In conclusion I might say that it is a great job, guys!
We specifically require that you upload everything on this website – failing to do so, is going to be fatal for you, no matter that you have done descent job.
We had issues with Jupyter Notebook and decided to use RPubs as an alternative. We have now included a zip file with our R Notebooks – at the beginning of the paper. 🙂
The plots are not really readable. Good job with the missing data, but upload the code as a notebook file, or html, so it can be read here, otherwise it is not possible for anyone to give you feedback and recommendations.
@pepe You can find the zipped R notebooks right below the Panda logo, the link reads DAB Panda Rmds
@pepe: As far as I can see, please correct me if I am wrong, there is some problem for the participants working in R to upload in a nice format their work at this website. If this is the case, there should be an opportunity to incorporate links in the main text that allow participants to present their results in the best way.
@pepe You can find the zipped R notebooks right below the Panda logo, the link reads DAB Panda Rmds