Datathons SolutionsLearnTeam solutions

Prediction Cryptocurrencies using Hybridization Machine Learning

UMalaya Team: Mohamad Nazrin Bin Napiah,Nur Baiti Afini Normadi,Sabrina Binti Kamal,Nur Hidayah Binti Mohd Rosli,Prasanta A/L Sathasivam,Yee Xun Wei

In this paper, this study attempts to predict the twenty cryptocurrency price by taking into consideration various parameters that affect the trading or investment market. For the first level of this study aim to construct the forecasting model to predict the future values of cryptocurrencies and the live model of decision making for trading is deploy. This body of this project were follow the CRISP data mining methodology in supporting to process the models. The data source is from the Academia Datathon 2018. The data set consists of various attributes related to the various coin, price and time, recorded daily in 2018. For the second level, by using the best model from level one which has been compared the accuracy and performance, will be used to construct Artificial Intelligence bot for decision making of trading or investment. By focusing on twenty major cryptocurrencies, each with the large market size and price, this study attempts to predict the forecasting price based on time series method such as min, max and mean price values.

8
votes

 

  1.               Business Understanding

Bitcoin, the first ever decentralized cryptocurrency, was originally conceived to create a global currency and payment system that is able to reliably work without requiring a trusted third party. The main underlying technology that powers Bitcoin, and almost all other cryptocurrencies today, is known as block -chain (Lewis, 2015). The block-chain requires a peer-to-peer network. The ADEPT is one of peer-to-peer network which has been developed by IBM partnership with Samsung (Panikkar, Nair, Brody, & Pureswaran, 2015).

The emerging payment by using crypto currencies accelerating the technology pervades the positive impact to human life (Bakar & Rosbi, 2017). The individual and business has adopted  this new system in term of transact a money quickly and efficiently over the internet without to supply the credit cards or banking information and use a traditional payment system (Ahamad, Nair, & Varghese, 2013). In 2018, the cryptocurrency has a total market cap of around $800 billion USD in Jan 2018 as reported in the Global chart of total market capitalization.

The cryptocurrency has been seen as investment tool is not only focusing on investor but the private investors and brokers also interested to these digital currencies. In this regards, it is necessary to predict the future value (i.e. forecasting or estimating) of cryptocurrency prices to make decision in trading or investment. The objective of Level-I was to construct the forecasting model to predict the future values of cryptocurrencies. The case is financial time-series prediction. The case integrates knowledge from various sources – Crypto Currencies, Quantitative Finance and Machine learning. The objective of Level-II was to implement the model which involves applying “live” model to make a decision making for trading or investment. To predict the forecasting values of 20 major cryptocurrencies, the methodology shown in Figure 1 was employed.

  1.      Data Understanding

For constructing the prediction model for cryptocurrencies, the dataset was provided. The provided dataset contains the historical data of 1500 different cryptocurrencies. The price of each cryptocurrency was recorded for every five minutes. In Level-I, the task was to predict the future price of 20 major cryptocurrencies. Appendix I show the code snippet for checking data quality problems.

  1.      Data Preparation and Transformation

Here, the price of cryptocurrencies was transformed into three descriptive statistical values namely, mean, maximum, and minimum. To transformed the data, the sliding window concept was used. The threshold of 12 window size was used for transforming the price values into mean, minimum, and maximum. In addition, the 12 window size was used to convert 5 minutes’ timestamp to 1-hour timestamp. The resultant transformed file is shown in Table 1. Appendix II shows the code snippet for transforming the cryptocurrencies dataset into sliding window concepts and generating Mean, Maximum, and Minimum values for each sliding window.

 

Table 1: Sample of Resultant Transformed File of ‘1442’ Cryptocurrency

Max Min Mean Timestamp
10807.5 10333.8 10598.46667 17/1/2018 11:25:00 AM -17/1/2018 12:25:00 PM
10807.5 10065.5 10456.425 17/1/2018 1:00:00 PM -17/1/2018 1:55:00 PM
10807.5 10065.5 10416.80556 17/1/2018 2:00:00 PM -17/1/2018 3:00:00 PM
10807.5 10065.5 10394.3 17/1/2018 3:05:00 PM -17/1/2018 4:00:00 PM
10807.5 9616.41 10288.50617 17/1/2018 4:05:00 PM -17/1/2018 5:00:00 PM
10807.5 9402.29 10170.20097 17/1/2018 5:05:00 PM -17/1/2018 6:00:00 PM
10807.5 9402.29 10138.65214 17/1/2018 6:05:00 PM -17/1/2018 7:00:00 PM
10807.5 9402.29 10148.93521 17/1/2018 7:05:00 PM -17/1/2018 8:00:00 PM
10807.5 9402.29 10133.47481 17/1/2018 8:10:00 PM -17/1/2018 9:05:00 PM
10807.5 9402.29 10141.17758 17/1/2018 9:10:00 PM -17/1/2018 10:05:00 PM
11062.4 9402.29 10205.73644 17/1/2018 10:10:00 PM -17/1/2018 11:05:00 PM
11451.8 9402.29 10281.2341 17/1/2018 11:10:00 PM -18/1/2018 12:05:00 AM

 

  1.  Model Construction and Evaluation

Once, the cryptocurrencies files were transformed into useful and informative features then these transformed feature files were fed an input to prediction algorithm to learn the cryptocurrencies patterns from the historic time series data and construct the forecasting model to predict the future values of 20 major cryptocurrencies. Three different prediction algorithms were trained to learn forecasting rules and to compare which one is suitable on our corpus. These three prediction algorithms are; Support Vector Machine for Regression (SVMR), Linear Regression (LR), and Random Forest (RF).

  1. Experimental Setup and Implementation Tools

To test the outcome of Level-I prediction model, 60 analyses (20 Cryptocurrencies files x 3 prediction algorithms) were run. Moreover, to test the performance of constructed model five different performance metrics were used namely, Mean Absolute Percentage Error (MAPE), Directional Symmetry (DS), Coefficient of Variation Mean Absolute Percentage Error (CVMAPE), Computational Efficiency (CE), and Combined Model Score (CS). Matlab was used for data pre-processing step and data transformation step. Java Weka API was used for construction and evaluation of prediction models.

  1. Result

This section presents the results of all 60 analyses (20 Cryptocurrencies files x 3 prediction algorithms). For each analyses, four different performance metrics (namely, MAPE, DS, CVMAPE, CS and CE) are shown. In addition, the combined model score for all 20 predicted models is also shown in Table 2 and Table 3.

Table 2- Experimental results for MAPE and DS  predicted the 20 major cryptocurrencies

Table 2- Experimental Results

CID Linear Regression SVM Regression Random Forest
MAPE DS CE MAPE DS CE MAPE DS CE
1442 0.0204 93.0645 10822.83 0.0151 97.6613 99.83333 0.0401 87.5806 10.16667
1443 0.1637 87.1774 3447.333 0.0214 97.5806 104.2222 0.0261 95.4839 12
1443 0.0167 94.0323 21903.5 0.0361 96.9355 186.8333 0.0792 74.4355 11.66667
1444 0.2287 88.871 20230.33 0.0261 98.7903 149 0.0652 94.7581 10
1445 0.0302 95.3226 9960.5 0.0532 97.9032 136.8333 0.0717 95.4032 10.66667
1446 0.036 97.0968 3047 0.0541 98.7903 131.6111 0.1039 96.9355 10.66667
1447 0.5237 82.6613 5311.127 0.0164 98.0645 188.6667 0.0248 96.4516 11
1448 0.5155 80.5645 20240.33 0.0204 97.0968 169.6667 0.0385 94.7581 11
1449 0.4043 89.7581 20519.17 0.0334 97.6613 187.1667 0.0378 91.6129 11.66667
1450 0.0828 94.1935 18514 0.0365 98.7903 144.1667 0.062 95.7258 10.66667
1451 0.3644 92.3387 9103 0.0348 99.3548 213.1667 0.0353 97.7419 11.66667
1452 0.0446 92.4194 18152.33 0.0184 97.2581 141.8333 0.0483 91.2903 5.5
1453 0.0575 89.6774 18282.33 0.0205 96.5323 124.8333 0.0489 92.8226 7
1454 0.4911 94.5161 5918.667 0.0459 99.1129 168.6667 0.1302 73.3065 7.333333
1455 0.2037 94.9681 3514.333 0.0577 99.1214 189.1667 0.0792 99.1129 9.833333
1456 0.8155 87.6613 22892.22 0.0197 98.7097 130.3333 0.027 95.9677 6.833333
1457 0.1853 93.2258 6648 0.0272 96.7742 176.1667 0.029 92.7419 9.333333
1458 0.0677 92.8226 6724.667 0.0226 97.9032 110.6667 0.0373 94.5968 9.666667
1459 0.0386 78.5484 34454 0.0019 85.8871 54.66667 0.005 46.129 6.833333
1460 0.0193 92.0968 5519.083 0.0311 96.8548   53.89815 0.4796 28.5484 8.5

 

Table 3- Experimental result of average prediction on Combine Score  20 major cryptocurrencies

 

Linear Regression SVM Regression Random Forest
R M D U Z R M D U Z R M D U Z
1.0594308449457146 0.215485 90.55083 13260.24 -13171 0.49118014318682995 0.029625 97.364621 143.07 -46.2262 1.3628232655093842 0.073455 86.77016 9.60 75.73388
  1.  Deployment

 

The modelling approach of the predict the forecast price in trading market size. The model has been illustrated in the previous section. After training and testing as shown in Figure 1 previously (see Section 1). The proposed model will be ready to use. In this section, the proposed approach is described by applied the best model that has been evaluated which had the highest performance and accuracy in previous section. This phase contain four main layer:

 

Layer 1 Crawling Agent (CA): CA is responsible for crawling cryptocurrencies data market from Internet. The suitable language for create crawling agent is python by using Scrapy package. It is an open source and collaborative framework for extracting the data from website. The advantages of Scrapy it can done job in a fast, simple and extensible way.

Layer 2 Transform Agent (TA): TA is agent used for running the transformation of data for cleaning process by using sliding window technique. At the same time, the agent also checking the missing values in the data extraction data in CA.

 

Layer 3 (Analyzer Agent): This layer will analyze data provide by TA agent. This agent using forecasting weka feature that can integrate with others system through weka python plugin to forecast the time series prediction. Based on the previous experiment Random Forest the best machine learning performed with the dataset.

Layer 4 (Decision Maker): This layer is making decision to alert investor to buy and sell of forecast price according time series. The first step is finding the features input based on statistical naive method by shifting the previous price as predict price in advance so that we can be prepared to make trading and do the prediction based on random forest as shown in the pseudocode of forecast model (see Figure 3). Then, the similarity is employed to form a similarity relationship between one rules to another rules from the fuzzy case base. The fuzzy rules generate the rules (see Figure 4),  which indicates the highest and lowest prices status for the trading which can make the decision for investor to trade the current and future currencies. This is applied to make the trading strategies at various level which making the decision, suggestion and recommendation to prevent the predicted currencies failure.

Figure 2- Proposed Approach

 

Pseudocode Forecast model
1.  Start open price T1

2.  Used Naïve forecasting method for predict next T2 by shifting the T1 as predict value.

3.  Loop: Naïve Forecasting method T1 -> T24  Log the data

4.  Transform log data form T1 – T24 (2-hour data collected) to mean, min and max

5. Create sliding window based on T12 threshold value (12 window size was used to convert 5 minutes’ timestamp to 1-hour timestamp)

6.  Set transform data as TransData

7.  Training and Testing TransData in Random forest to forecast predicted value in Mean, Min and Max.

Figure 3- Forecast Model Pseudocode

 

Fuzzy Rules : Trading Strategies
Rules 1: Start trading with small amounts (1% of total assets)

Check the (predict_min, predict_max) compare (previous actual value)

If (predict_min <  previous actual value) {

Skip and wait for the next price

}

If (predict_max) > previous actual value){

Sell more than 1%

}

Rules 2: Buy Low, sell high referring Figure 1.

If (predict_min < previous actual value ){

Invest

}

If  (predict_max > previous actual value){

Trade

}

 

Figure 4- Fuzzy rules for trading strategies

  1. Discussion

 

Table 2 present the result of accuracy of forecasting measure by mean absolute percentage error (MAPE)  and the percentage of occurrences by using directional symmetry (DS) in which the measure the performance of a model in predicting the direction of value changes. The three models which are Linear Regression, SVM Regression and Random Forest were tested. The result obtained was as expected as Random Forest has an edge over other two machine learning algorithms. Among the three algorithms, Random Forest was the fastest to be executed but Linear Regression suffering badly to predict mean value of Bitcoin price. On the other hand, SVM Regression was able to produce accurate predictions while Random Forest came second in that criteria. However, Random Forest was the most stable algorithm when handling different types of datasets. Lastly, again SVM Regression proved that it can predict the direction of the Bitcoin price accurately. Nevertheless, Random Forest emerged as the most optimum and most suitable algorithm to predict prices of the Bitcoin currency (Z= 75.73388).

(Please refer the results of this experiment at Section 6 Results)

 

  1. Conclusion

Based on the prediction scoring formulas, we have calculated Combined Score Model (Z) for all three machine learning algorithms (SVM Regression, Random Forest, Linear Regression). However, in the end, we have found out that Random Forest has the lowest Z value which is 75.73388 and was the fastest as it has the lowest U value. Thus concluding this article by mentioning that Random Forest is the best machine algorithm to predict Bitcoin price value for every 5 minutes and hour.

 

References

 

  1. Ahamad, S., Nair, M., & Varghese, B. (2013). A survey on crypto currencies. Proceedings of the Fourth International Conference on Advances in Computer Science, 42–48. https://doi.org/02.AETACS.2013.4.131.
  2. Bakar, N. A., & Rosbi, S. (2017). Autoregressive Integrated Moving Average (ARIMA) Model for Forecasting Cryptocurrency Exchange Rate in High Volatility Environment: A New Insight of Bitcoin Transaction. International Journal of Advanced Engineering Research and Science, 4(11), 130–137. https://doi.org/10.22161/ijaers.4.11.20.
  3. Lewis, A. (2015). Blockchain Technology Explained. Blockchain Technologies, 1–27. https://doi.org/10.15358/0935-0381-2015-4-5-222.
  4. Panikkar, S., Nair, S., Brody, P., & Pureswaran, V. (2015). ADEPT : An IoT Practitioner Perspective, 1–20. Retrieved from http://ibm.biz/devicedemocracy.
  5. Global Charts; Total Market Capitalization, Retrieved from https://coinmarketcap.com/charts/, 27 April 2018.

 

Appendix

 

Appendix I: Code Snippet: Identify the data quality problems

Appendix II: Code Snippet: Transformation cryptocurrencies dataset into sliding window concepts and generate mean, maximum and minimum value.

Appendix III: Answer to question

 

Answer to Level-I Questions:

1-      What necessary data preparation did you need?

Answer: Please see Section 2.

2-      Which is the most suitable method for forecasting cryptocurrencies?

Answer: Please see Section 2.

3-      Which cryptocurrencies were hardest to predict?

Answer: Please see Section 2.

4-      What anomalies in behavior of the cryptocurrencies did you detect?

Answer: Please see Section 2.

SourceCode

check_missing&null_data UMalayasourcecode

 

Share this

5 thoughts on “Prediction Cryptocurrencies using Hybridization Machine Learning

  1. 0
    votes

    I can judge only from crypto related side so in general I like how the solution is structured and every step is explained. Even I am not data scientist I managed to understand a bit because of the good explanation. The test will show if you have managed to do the job but besides that great work. Keep the good work and good luck!

  2. 0
    votes

    Very nice structure of the paper, with all the results explained in details and all the required test supplied. Well Done. I can’t find the source code for prediction. I found only the matlab code for data manipulation

  3. 0
    votes

    Very good article! Very well-written, good job 🙂

    It would be nice to compare your results with a baseline (predicting the previous point, or average of the previous points or something similar). Also, your idea to use accumulated measures for the past 1 hour is interesting. Did you compare it with just giving the previous points as features to the regression?

    It would be nice to share your code as well.

Leave a Reply