Datathons SolutionsLearnTeam solutions

Cryptocurrency Prediction by Kautilya

Given the cryptocurrencies’ data, we aim to forecast the future cryptocurrencies’ prices so as to execute profitable trades. We show that the cryptocurrencies’ prices also exhibit desirable properties such as stationarity and mixing. Some classical time series prediction models that exploit this behavior, such as “Arima” models produce poor predictions and also lack good probabilistic interpretations. We have introduced a theoretical framework in the 1st place and for predicting and trading prices of the cryptocurrencies for future and based on that we have designed our model which is based on “Neural Network” model which can give better prediction values as compared to the other models.


Cryptocurrency Case

KAUTILYA Team Mentors:

  • Subhabaha Pal(@drsubhabahapal)


  • Prakash Kumar (@prakash), MAHE, India
  • Sujit Mohapatra(@sujit), MAHE, India
  • Vivek Arya (@svarya), MAHE, India
  • Amit Kumar Mohapatra (@amit98), MAHE, India
  • Vijay (@vijays), MAHE, India

Team Toolset:

  • R-Studio: Neural Network Time Series Forecasting
  • MS Excel

Business Understanding

  • Who: Data Science Society (Academic Datathon) Organizers
  • What (1): Make a prediction model of the major cryptocurrencies’ prices
  • How: Intra-group brainstorming based on various concepts of time series analysis and machine learning.

Data Understanding

  • Explore dataset’s structure
  • Discover potential variable dependencies across datasets
  • Identify the missing data points across the dataset
  • Identify the subsets of data that modeling would be based on

Data Preparation

The cryptocurrencies data had many NA values and deleted (absent) rows for particular time points, which are common in any financial time series. So we created time points with NAs in rows using Microsoft Excel. The NAs were then imputed using a function from” imputeTS” library of R. The total dataset was then subset to create the desired model.

  • For the target forecast, two approaches opted: the forecast to be made on per 5 minutes data of each day as the cryptocurrencies market works on 24*7. The two approaches on which we have worked on are i. “auto.arima”, ii. “nnetar”.We ensured that there are no variables with a large percent of missing/outlier values.
  • The prices of cryptocurrencies for 25th Jan 2018 with 288-time points were predicted.


The subset of the dataset had price values of currencies for the month of January. The starting time point was taken as 18th Jan 2018 00:00 and end time point was taken to be 24th Jan 2018 23:55 for training the model and which predicts the 25th Jan 00:00. Keeping these things in mind the model predicts the currencies’ prices one by one.  The model has been formed as per the following procedure and so on.

Problem Faced

  • The main problem was with the dataset which is not continuous due to which we have to hit and trial some methods for missing data treatment and imputation.


  • Due to the missing values for which the data points have been imputed, the actual values may vary as in some cases the data points are missing for 2 or more time points.

Select the below link to download the codes


Share this

6 thoughts on “Cryptocurrency Prediction by Kautilya

  1. 2

    In general, the provided code would not give predictions for the required dates:
    30.01.2018, 06.02.2018, 20.02.2018, 09.03.2018, 18.03.2018

    That said, the code is well documented and seems to be fairly easy to be extended to include the required time period.

    It will test for the goodness of the predictions.

    A bit more explanatory text what the nnetar function does and how it was used would be beneficial.

    The input data used has not been provided, so I am unable to run the code as is.

    1. 1

      Yep, It wont give the predictions for the dates because it will give 5 day predictions of all cryptocurrencies starting from 25th Jan 2018. But we can predict for the given dates by varying the value of k. That’s how we predicted 30th Jan.

  2. 1

    1. You may want to include some evaluation metrics for your models both on train & test sets.
    2. On the data prep part – it is not the best solution to just remove rows where you see missing values because it is time-series data and could seriously bias your next steps.
    3. Assumption you have made about the “large number of missing values” is probably poor. Do you have any data/metric you used to prove it?
    4. You may want to include more detailed explanation why the data is not continuous (here is a link on discrete and continuous data
    5. How you would rank your model? What are the metrics you used?

    1. 1

      Sorry but I feel there is some misinterpretation. We have not deleted any rows rather than we have imputed using “na.kalman” of library “imputeTS” using “auto.arima” model. There were lot of missing rows and many time interval were missing.

Leave a Reply