Kryptonians Case Team Mentors:
Dr Subhabaha Pal : @drsubhabahapal
Ankit Prakash : email@example.com , Anshul Jain: firstname.lastname@example.org , Shivani Sharma : email@example.com, Suhail AK: firstname.lastname@example.org
R and Excel
We were given data having information of cryptocurrency starting from 17th January 2018 to 23rd March 2018. We have been asked to create a prediction model for the major cryptocurrencies’ prices and to automate A.I. decision-maker for trading/investing, based on history of cryptocurrencies, we analyze them to forecast the future which should be accurate.
There is a price time series which do not have enough data points to conduct reasonable analysis or to apply a prediction algorithms. To make any prediction, we have to make a well-grounded selection of the assets in the research database.
The initial trading sessions of some assets are relatively large periods of trading inactivity and so we have eliminated the boundary effects by removing certain number of initial observations.
Also, in price time series there is missing data, and hence data imputation in the time series is performed. There are many approaches to data imputation of financial time series.
As there were many files, we chose data for 20 Currencies as advised by the Academia Datathon on their websites.
From those currencies we made R to read the price_data.csv file which has all the currencies prices as per the time which starts from 17th Jan 108 11:25 to 23rd Mar 2018.
During data cleaning process we found many discrepancies in the data as many values were missing along with some dates as well.
We had to create a new sequence of time from 18th Jan 2018 00:00 till 24th Jan 2018 11:25 and imported values for respective dates of each currency from the file (price_data.csv), to manage the missing date-time stamp in the data.
Since there were many missing data and for many date-time stamp, we used “zoo” package from the RCran Library to impute these missing data.
We did the modelling with the help of Exponential smoothing and neural network. We took frequency as 288 for daily data, as per the distribution of the data
crypto_price <- ts(crypto_temp$Price, start= c(1,1), frequency = 288)
Before deploying the result, we evaluated our work by predicting prices for the cryptocurrencies from 25th Jan 2018 00:00 till 29th Jan 2018 11:25 and checking accuracy based on the prices provided for those currencies in the data.
We have predicted for 18 cryptocurrencies, for 5days forecast. And deployed the output in the zip file below.
Output zipfile below