Datathons SolutionsLearnTeam solutions

By KrYpToNiAnS

List of required packages

library(data.table) #for fread function
library(dplyr) #for pipeline function
library(plyr) #for join function
library(tseries) #for ts function
library(forecast) #for forecast function
library(caret) #for neuralnetwork prediction
library(ggplot2) #for plots
library(mice) #for imputating NA/missing values
library(zoo) #for imputing


###Working with Dataset[price_data.csv]


url <- "matrix_one_file/price_data.csv"

crypto <- fread(url, header = TRUE)

crypto_main <- crypto[,c(1:17,20,25,34,37)]
crypto_loop <- crypto_main[,2:21]
name <- names(crypto_loop)

#Automation for Prediction
for( i in name){

crypto_work % select(time,i)
names(crypto_work) <- c("Time", "Price")
crypto_work$Time <- as.POSIXct(crypto_work$Time, format = "%Y-%m-%d %H:%M:%S")
d<- colnames(crypto_work)[2]

# to get the data for time series
crypto_work1 % filter(Time = “2018-01-18 00:00:00”)
Time <- seq(ISOdatetime(2018,1,18,00,0,0), ISOdatetime(2018, 1, 24,11,55,0), by= (60*5))
df <- data.frame(Time)
crypto_temp <- join(df, crypto_work1, by = "Time")
crypto_temp$Price <- na.approx(crypto_temp$Price)

#to get the original value from 25th Jan to 29th Jan
crypto_orignal_value % filter(Time = “2018-01-25 00:00:00”)
Time <- seq(ISOdatetime(2018,1,25,00,0,0), ISOdatetime(2018, 1, 29,11,55,0), by= (60*5))
df1 <- data.frame(Time)
crypto_temp1 <- join(df1, crypto_orignal_value ,by = "Time")
crypto_temp1$Price <- na.approx(crypto_temp1$Price)

#initializing variables
df_new <- data.frame()
new_df <- data.frame()
value <- c()
start <- 1

for(j in 1:5){

for(k in 1:288){

crypto_price <- ts(crypto_temp$Price, start = c(1,1), frequency = 288)
fit1 <- nnetar(crypto_price)
a <- forecast(fit1, h=1)
value <- append(value,a$mean)
df_new<- data.frame(crypto_temp1$Time[start], a$mean)
names(df_new) <- c("Time","Price")
crypto_temp <- rbind(crypto_temp, df_new)
start <- start+1

output_file <- crypto_temp[(1873+(start-k)):nrow(crypto_temp),]
rownames(output_file) <- c()
name <- paste(i,d,"(",j,")",".csv",sep = "")





Kryptonians Case Team Mentors:

Dr Subhabaha Pal : @drsubhabahapal


Kryptonians Team:

Ankit Prakash : [email protected] , Anshul Jain: [email protected] , Shivani Sharma : [email protected], Suhail AK: [email protected]


Team Toolset:

R and Excel

Business Understanding

We were given data having information of cryptocurrency starting from 17th January 2018 to 23rd March 2018. We have been asked to create a prediction model for the major cryptocurrencies’ prices and to automate A.I. decision-maker for trading/investing, based on history of cryptocurrencies, we analyze them to forecast the future which should be accurate.

Data Understanding

There is a price time series which do not have enough data points to conduct reasonable analysis or to apply a prediction algorithms. To make any prediction, we have to make a well-grounded selection of the assets in the research database.

The initial trading sessions of some assets are relatively large periods of trading inactivity and so we have eliminated the boundary effects by removing certain number of initial observations.

Also, in price time series there is missing data, and hence data imputation in the time series is performed. There are many approaches to data imputation of financial time series.

Data Preparation

As there were many files, we chose data for 20 Currencies as advised by the Academia Datathon on their websites.

From those currencies we made R to read the price_data.csv file which has all the currencies prices as per the time which starts from 17th Jan 108 11:25 to 23rd Mar 2018.

During data cleaning process we found many discrepancies in the data as many values were missing along with some dates as well.

We had to create a new sequence of time from 18th Jan 2018 00:00 till 24th Jan 2018 11:25 and imported values for respective dates of each currency from the file (price_data.csv), to manage the missing date-time stamp in the data.

Since there were many missing data and for many date-time stamp, we used “zoo” package from the RCran Library to impute these missing data.


We did the modelling with the help of Exponential smoothing and neural network. We took frequency as 288 for daily data, as per the distribution of the data

crypto_price <- ts(crypto_temp$Price, start= c(1,1), frequency = 288)


Before deploying the result, we evaluated our work by predicting prices for the cryptocurrencies from 25th Jan 2018 00:00 till 29th Jan 2018 11:25 and checking accuracy based on the prices provided for those currencies in the data.


Deployment :

We have predicted for 18 cryptocurrencies, for 5days forecast.  And deployed the output in the zip file below.

Output zipfile below



Share this

3 thoughts on “By KrYpToNiAnS

  1. 0

    Some comments and feedback from me. In general, the provided code would not give predictions for the required dates:

    It will also not check how good the prediction was.

    Some other comments follow:

    1. I cannot run the code as the input data that you have used is missing.
    2. It would be beneficial to all reading to better explain what neural network you are using.
    3. The specifying of why the given parameters to nnetar function have been chosen would have been nice as well. For example, by default it assumes seasonality in the data – how this seasonality relates to the 5-min steps?
    4. In the same vein, explaining what algorithm was used for inferring missing data points is necessary.
    5. When you are making a following prediction after the first (for say t2), you seem to be using your prediction for t1, rather than the true data point from t1. Thus your prediction for t2 would be worse than it needs to be.

  2. 0

    Why do you predict 18 but not 20 currencies as given by the case?
    1. You may want to edit your paper because it does not look good.
    2. You may want to give more information on the data science approaches you have leveraged to do your model.
    3. It would be good if you have some visualizations in the article
    4. You say that you have evaluated the results of your model but I do not see such data.
    5. Please include at least 1 metric for model evaluation like Rsq
    In summary the team have made it through the first stage of the use case.

Leave a Reply