Kryptonians Case Team Mentors:
Dr Subhabaha Pal : @drsubhabahapal
Kryptonians Team:
Ankit Prakash : [email protected] , Anshul Jain: [email protected] , Shivani Sharma : [email protected], Suhail AK: [email protected]
Team Toolset:
R and Excel
Business Understanding
We were given data having information of cryptocurrency starting from 17th January 2018 to 23rd March 2018. We have been asked to create a prediction model for the major cryptocurrencies’ prices and to automate A.I. decision-maker for trading/investing, based on history of cryptocurrencies, we analyze them to forecast the future which should be accurate.
Data Understanding
There is a price time series which do not have enough data points to conduct reasonable analysis or to apply a prediction algorithms. To make any prediction, we have to make a well-grounded selection of the assets in the research database.
The initial trading sessions of some assets are relatively large periods of trading inactivity and so we have eliminated the boundary effects by removing certain number of initial observations.
Also, in price time series there is missing data, and hence data imputation in the time series is performed. There are many approaches to data imputation of financial time series.
Data Preparation
As there were many files, we chose data for 20 Currencies as advised by the Academia Datathon on their websites.
From those currencies we made R to read the price_data.csv file which has all the currencies prices as per the time which starts from 17th Jan 108 11:25 to 23rd Mar 2018.
During data cleaning process we found many discrepancies in the data as many values were missing along with some dates as well.
We had to create a new sequence of time from 18th Jan 2018 00:00 till 24th Jan 2018 11:25 and imported values for respective dates of each currency from the file (price_data.csv), to manage the missing date-time stamp in the data.
Since there were many missing data and for many date-time stamp, we used “zoo” package from the RCran Library to impute these missing data.
Modelling:
We did the modelling with the help of Exponential smoothing and neural network. We took frequency as 288 for daily data, as per the distribution of the data
crypto_price <- ts(crypto_temp$Price, start= c(1,1), frequency = 288)
Evaluation:
Before deploying the result, we evaluated our work by predicting prices for the cryptocurrencies from 25th Jan 2018 00:00 till 29th Jan 2018 11:25 and checking accuracy based on the prices provided for those currencies in the data.
Deployment :
We have predicted for 18 cryptocurrencies, for 5days forecast. And deployed the output in the zip file below.
Output zipfile below
3 thoughts on “By KrYpToNiAnS”
Some comments and feedback from me. In general, the provided code would not give predictions for the required dates:
30.01.2018
06.02.2018
20.02.2018
09.03.2018
18.03.2018
It will also not check how good the prediction was.
Some other comments follow:
1. I cannot run the code as the input data that you have used is missing.
2. It would be beneficial to all reading to better explain what neural network you are using.
3. The specifying of why the given parameters to nnetar function have been chosen would have been nice as well. For example, by default it assumes seasonality in the data – how this seasonality relates to the 5-min steps?
4. In the same vein, explaining what algorithm was used for inferring missing data points is necessary.
5. When you are making a following prediction after the first (for say t2), you seem to be using your prediction for t1, rather than the true data point from t1. Thus your prediction for t2 would be worse than it needs to be.
Why do you predict 18 but not 20 currencies as given by the case?
1. You may want to edit your paper because it does not look good.
2. You may want to give more information on the data science approaches you have leveraged to do your model.
3. It would be good if you have some visualizations in the article
4. You say that you have evaluated the results of your model but I do not see such data.
5. Please include at least 1 metric for model evaluation like Rsq
In summary the team have made it through the first stage of the use case.
6. Would you bet your own money on your predictions? If so how much?