**“IMAGINE, INNOVATE and INSPIRE“**

**Cryptocurrency Case **

**KAUTILYA Team Mentors:**

- Subhabaha Pal(@drsubhabahapal)

**KAUTILYA Team:**

- Prakash Kumar (@prakash), MAHE, India
- Sujit Mohapatra(@sujit), MAHE, India
- Vivek Arya (@svarya), MAHE, India
- Amit Kumar Mohapatra (@amit98), MAHE, India
- Vijay (@vijays), MAHE, India

**Team Toolset:**

- R-Studio: Neural Network Time Series Forecasting
- MS Excel

**Business Understanding**

- Who: Data Science Society (Academic Datathon) Organizers
- What (1): Make a prediction model of the major cryptocurrencies’ prices
- How: Intra-group brainstorming based on various concepts of time series analysis and machine learning.

**Data Understanding**

- Explore dataset’s structure
- Discover potential variable dependencies across datasets
- Identify the missing data points across the dataset
- Identify the subsets of data that modeling would be based on

**Data Preparation**

The cryptocurrencies data had many NA values and deleted (absent) rows for particular time points, which are common in any financial time series. So we created time points with NAs in rows using Microsoft Excel. The NAs were then imputed using a function from” imputeTS” library of R. The total dataset was then subset to create the desired model.

- For the target forecast, two approaches opted: the forecast to be made on per 5 minutes data of each day as the cryptocurrencies market works on 24*7. The two approaches on which we have worked on are i. “auto.arima”, ii. “nnetar”.We ensured that there are no variables with a large percent of missing/outlier values.
- The prices of cryptocurrencies for 25
^{th}Jan 2018 with 288-time points were predicted.

**Modeling**

The subset of the dataset had price values of currencies for the month of January. The starting time point was taken as 18^{th} Jan 2018 00:00 and end time point was taken to be 24^{th} Jan 2018 23:55 for training the model and which predicts the 25^{th} Jan 00:00. Keeping these things in mind the model predicts the currencies’ prices one by one. The model has been formed as per the following procedure and so on.

**Problem Faced**

- The main problem was with the dataset which is not continuous due to which we have to hit and trial some methods for missing data treatment and imputation.

- Due to the missing values for which the data points have been imputed, the actual values may vary as in some cases the data points are missing for 2 or more time points.

Select the below link to download the codes

## 6 thoughts on “Cryptocurrency Prediction by Kautilya”

In general, the provided code would not give predictions for the required dates:

30.01.2018, 06.02.2018, 20.02.2018, 09.03.2018, 18.03.2018

That said, the code is well documented and seems to be fairly easy to be extended to include the required time period.

It will test for the goodness of the predictions.

A bit more explanatory text what the nnetar function does and how it was used would be beneficial.

The input data used has not been provided, so I am unable to run the code as is.

Yep, It wont give the predictions for the dates because it will give 5 day predictions of all cryptocurrencies starting from 25th Jan 2018. But we can predict for the given dates by varying the value of k. That’s how we predicted 30th Jan.

1. You may want to include some evaluation metrics for your models both on train & test sets.

2. On the data prep part – it is not the best solution to just remove rows where you see missing values because it is time-series data and could seriously bias your next steps.

3. Assumption you have made about the “large number of missing values” is probably poor. Do you have any data/metric you used to prove it?

4. You may want to include more detailed explanation why the data is not continuous (here is a link on discrete and continuous data https://www.mathsisfun.com/data/data-discrete-continuous.html)

5. How you would rank your model? What are the metrics you used?

Sorry but I feel there is some misinterpretation. We have not deleted any rows rather than we have imputed using “na.kalman” of library “imputeTS” using “auto.arima” model. There were lot of missing rows and many time interval were missing.

6. Would you bet your own money on your predictions? If so how much?

this is actually sensible approach, but what troubles me is overfitting.