Datathons SolutionsLearnTeam solutions

Prediction Model for Crypto Currency in R

Bitcoins are cryptocurrency systems, that enable its users to exchange payments without passing through a central authority (Eg. Reserve Bank of India, Federal Bank etc). They were developed in 2008, using the Blockchain Technology. In the present article, methods to create prediction models have been implemented. The model considers a sample data of 3 months spaced over 5 minutes for each day. The Training data and Testing data are developed on that dataset for twenty bitcoins; viz: Bitcoin, Bitcoin Cash, Bitcoin Gold, Cardeno, Dash, Dogecoin, Eos, Ethereum, Ethereum Class, Iota, Lisk, Litrcoin, Monero, NEMcoin, Neo, Ripple, Stellar, Tether, Tron, Zcash.
The prediction models used are ARIMA, Exponential Smoothing and Neural Networks on R. The models calculate the values for the next time instant, i.e. next five minutes and the code developed goes on continuing it (predicting next 5-minute price) for all 288 time-points in a day.



Team Name: VPSS.csv

Code Link:


Case Team Mentor:

 Case Team:

Team Toolset:

  • R: dplyr, ggplot2, forecast, lubridate, caret
  • Excel

Business Understanding

  • Given a dataset consisting of Crypto-currency Market, the objective of this project is to design a model that predicts the cryptocurrency prices.
  • The model designed will be useful for the cryptocurrency investors and investment firms to take informed decision on their investments.

Data Understanding

  • The data set  consists of the following variables:
    • time – this includes the date and time corresponding to the bitcoin prices.
    • refID_coin – the reference_ID of each bitcoin.
    • price – the market value of the bitcoin at a given time.
    • marketCap – total value of a particular bitcoin share in the crypto currency market.
    • Circulatingsupply – Number of coins traded and existing at a moment.
    • Volume24h – How much of bitcoins have been traded in a day.
    • Movement1h – The variation of the price in an hour.
    • Movement24h – The variation of the price in 24 hours.

 Data Preparation

  • The following cryptocurrencies were subsetted from the data : Bitcoin, Bitcoin Cash, Bitcoin Gold, Cardeno, Dash, Dogecoin, Eos, Ethereum, Ethereum Class, Iota, Lisk, Litrcoin, Monero, NEMcoin, Neo, Ripple, Stellar, Tether, Tron, Zcash.

datathon_bit <- datathon$refID_coin[datathon$refID_coin %in% c(1442,1445,1456,1446,1453,1477,1452,1443,1457,1451,1460,1448,1454,1447,1449,1444,1450,1474,1455,146)]
price_coin <- datathon %>% filter(refID_coin %in% c(1442,1445,1456,1446,1453,1477,1452,1443,1457,1451,1460,1448,1454,1447,1449,1444,1450,1474,1455,146)) %>% select(time,refID_coin,price)
crypt_currency <- split(price_coin,price_coin$refID_coin)

  •  The data has missing timestamps. To treat that, we use a sample dataframe which has all the dates between a given interval.

a<- “17/1/2018 11:25:00”
b<-“21/1/2018 02:50:00″
a<-as.POSIXct(a,format=”%d/%m/%Y %H:%M:%S”)
b<- as.POSIXct(b,format=”%d/%m/%Y %H:%M:%S”)
z <- seq.POSIXt(a, b, by = “5 min”)

  • The above columns were then matched with the parent data frame and the values were imputed in price column using na.approx and in the price column using na.locf.

missing <- function(coin){
df <- merge(date_framed,coin,all.x =T)
df$refID_coin <- na.locf(df$refID_coin)
df$price <- na.approx(df$price)


  • The data has been converted to time series using ts, which has properties start and frequency. The frequency is 288. The calucaltion for the same is as follows:

In an hour, there are (60/5) = 12 times 5 minutes occuring and in a day, there are 24 hours. Hence, (12*24) = 288, which is the frequency.

crypt_currency_ts <- lapply(crypt_full,function(df){ts(df$price,start = c(1,1),frequency=288)})

  • The data was passed into the functions for ARIMA, Neural Networks and Exponential Smoothing

analysis <- function(price){
pred_ar = c()
for (i in 1:288) {
arim_mod <- auto.arima(price1)
arima_forecast <- forecast(arim_mod,h=1)
pred_ar <- c(pred_ar,arima_forecast$mean)
price1 <- c(price1,arima_forecast$mean)
ets_mod <- stlf(price)
ets_forecast <- forecast(ets_mod,h=1)
pred_et <- c(pred_et,ets_forecast$mean)
neural_mod <- nnetar(price)
neural_forecast <- forecast(arim_mod,h=1)
pred_nn <- c(pred_nn,neural_forecast$mean)

forecasted <- lapply(crypt_currency_ts, analysis)

  • The above code runs for 288 times as the model is designed to forecast the values for one day over 5 minutes interval, which is the frequency.
  • We use auto.arima function to get stationarised time series data, i.e. time and variance are made constant with respect to time.
  • We forecast the auto ARIMA model for the next 5 minutes and repeat for 288 times to get values for the whole day.
  • To calculate the estimated value, we used mean function.
  • The exponential smoothing is done by function stlf and and its forecasted value is calculated by mean of the stlf output and same is repeated for neural network using nnetar and its mean was calculated to estimate the value.
  • The model gives point data for all the models used. The model finds scope for further enhancement for predicting the data more accurately.

Share this

2 thoughts on “Prediction Model for Crypto Currency in R

  1. 0

    1. You say there are missing timestamps but are there missing values?
    2. How did you imputed the missing values?
    3. Did you find a trend/seasonality in the data provided?
    4. How good is your predictive model? What are the metrics you have used to evaluate it?
    5. Would you bet your own money on your predictions? If so how much?

  2. 0

    Your code is self sufficient (i.e. all inputs are present or part of the task supplied) which is a big plus. That said, there seems to be issues with the code, which makes it executable ( for example, variable pred_et is never defined)

    In general, the provided code would not give predictions for the required dates:
    30.01.2018, 06.02.2018, 20.02.2018, 09.03.2018, 18.03.2018

    However, it seems that it can be changed to do that.

    You are missing one of the currencies, because of what is most probably typo – the last selected currency is “146” rather than the required “1465”.

    I like that you have formed three different predictions, but you do not try to explain which one is better – even simple comparisons between the three would increase the usefulness of the output.
    Linked with this, the statistical tests are missing that check goodness of the predictions.

Leave a Reply