- Business Understanding
Bitcoin, the first ever decentralized cryptocurrency, was originally conceived to create a global currency and payment system that is able to reliably work without requiring a trusted third party. The main underlying technology that powers Bitcoin, and almost all other cryptocurrencies today, is known as block -chain (Lewis, 2015). The block-chain requires a peer-to-peer network. The ADEPT is one of peer-to-peer network which has been developed by IBM partnership with Samsung (Panikkar, Nair, Brody, & Pureswaran, 2015).
The emerging payment by using crypto currencies accelerating the technology pervades the positive impact to human life (Bakar & Rosbi, 2017). The individual and business has adopted this new system in term of transact a money quickly and efficiently over the internet without to supply the credit cards or banking information and use a traditional payment system (Ahamad, Nair, & Varghese, 2013). In 2018, the cryptocurrency has a total market cap of around $800 billion USD in Jan 2018 as reported in the Global chart of total market capitalization.
The cryptocurrency has been seen as investment tool is not only focusing on investor but the private investors and brokers also interested to these digital currencies. In this regards, it is necessary to predict the future value (i.e. forecasting or estimating) of cryptocurrency prices to make decision in trading or investment. The objective of Level-I was to construct the forecasting model to predict the future values of cryptocurrencies. The case is financial time-series prediction. The case integrates knowledge from various sources – Crypto Currencies, Quantitative Finance and Machine learning. The objective of Level-II was to implement the model which involves applying “live” model to make a decision making for trading or investment. To predict the forecasting values of 20 major cryptocurrencies, the methodology shown in Figure 1 was employed.
- Data Understanding
For constructing the prediction model for cryptocurrencies, the dataset was provided. The provided dataset contains the historical data of 1500 different cryptocurrencies. The price of each cryptocurrency was recorded for every five minutes. In Level-I, the task was to predict the future price of 20 major cryptocurrencies. Appendix I show the code snippet for checking data quality problems.
- Data Preparation and Transformation
Here, the price of cryptocurrencies was transformed into three descriptive statistical values namely, mean, maximum, and minimum. To transformed the data, the sliding window concept was used. The threshold of 12 window size was used for transforming the price values into mean, minimum, and maximum. In addition, the 12 window size was used to convert 5 minutes’ timestamp to 1-hour timestamp. The resultant transformed file is shown in Table 1. Appendix II shows the code snippet for transforming the cryptocurrencies dataset into sliding window concepts and generating Mean, Maximum, and Minimum values for each sliding window.
Table 1: Sample of Resultant Transformed File of ‘1442’ Cryptocurrency
Max | Min | Mean | Timestamp |
10807.5 | 10333.8 | 10598.46667 | 17/1/2018 11:25:00 AM -17/1/2018 12:25:00 PM |
10807.5 | 10065.5 | 10456.425 | 17/1/2018 1:00:00 PM -17/1/2018 1:55:00 PM |
10807.5 | 10065.5 | 10416.80556 | 17/1/2018 2:00:00 PM -17/1/2018 3:00:00 PM |
10807.5 | 10065.5 | 10394.3 | 17/1/2018 3:05:00 PM -17/1/2018 4:00:00 PM |
10807.5 | 9616.41 | 10288.50617 | 17/1/2018 4:05:00 PM -17/1/2018 5:00:00 PM |
10807.5 | 9402.29 | 10170.20097 | 17/1/2018 5:05:00 PM -17/1/2018 6:00:00 PM |
10807.5 | 9402.29 | 10138.65214 | 17/1/2018 6:05:00 PM -17/1/2018 7:00:00 PM |
10807.5 | 9402.29 | 10148.93521 | 17/1/2018 7:05:00 PM -17/1/2018 8:00:00 PM |
10807.5 | 9402.29 | 10133.47481 | 17/1/2018 8:10:00 PM -17/1/2018 9:05:00 PM |
10807.5 | 9402.29 | 10141.17758 | 17/1/2018 9:10:00 PM -17/1/2018 10:05:00 PM |
11062.4 | 9402.29 | 10205.73644 | 17/1/2018 10:10:00 PM -17/1/2018 11:05:00 PM |
11451.8 | 9402.29 | 10281.2341 | 17/1/2018 11:10:00 PM -18/1/2018 12:05:00 AM |
- Model Construction and Evaluation
Once, the cryptocurrencies files were transformed into useful and informative features then these transformed feature files were fed an input to prediction algorithm to learn the cryptocurrencies patterns from the historic time series data and construct the forecasting model to predict the future values of 20 major cryptocurrencies. Three different prediction algorithms were trained to learn forecasting rules and to compare which one is suitable on our corpus. These three prediction algorithms are; Support Vector Machine for Regression (SVMR), Linear Regression (LR), and Random Forest (RF).
- Experimental Setup and Implementation Tools
To test the outcome of Level-I prediction model, 60 analyses (20 Cryptocurrencies files x 3 prediction algorithms) were run. Moreover, to test the performance of constructed model five different performance metrics were used namely, Mean Absolute Percentage Error (MAPE), Directional Symmetry (DS), Coefficient of Variation Mean Absolute Percentage Error (CVMAPE), Computational Efficiency (CE), and Combined Model Score (CS). Matlab was used for data pre-processing step and data transformation step. Java Weka API was used for construction and evaluation of prediction models.
- Result
This section presents the results of all 60 analyses (20 Cryptocurrencies files x 3 prediction algorithms). For each analyses, four different performance metrics (namely, MAPE, DS, CVMAPE, CS and CE) are shown. In addition, the combined model score for all 20 predicted models is also shown in Table 2 and Table 3.
Table 2- Experimental results for MAPE and DS predicted the 20 major cryptocurrencies
Table 2- Experimental Results
CID | Linear Regression | SVM Regression | Random Forest | ||||||
MAPE | DS | CE | MAPE | DS | CE | MAPE | DS | CE | |
1442 | 0.0204 | 93.0645 | 10822.83 | 0.0151 | 97.6613 | 99.83333 | 0.0401 | 87.5806 | 10.16667 |
1443 | 0.1637 | 87.1774 | 3447.333 | 0.0214 | 97.5806 | 104.2222 | 0.0261 | 95.4839 | 12 |
1443 | 0.0167 | 94.0323 | 21903.5 | 0.0361 | 96.9355 | 186.8333 | 0.0792 | 74.4355 | 11.66667 |
1444 | 0.2287 | 88.871 | 20230.33 | 0.0261 | 98.7903 | 149 | 0.0652 | 94.7581 | 10 |
1445 | 0.0302 | 95.3226 | 9960.5 | 0.0532 | 97.9032 | 136.8333 | 0.0717 | 95.4032 | 10.66667 |
1446 | 0.036 | 97.0968 | 3047 | 0.0541 | 98.7903 | 131.6111 | 0.1039 | 96.9355 | 10.66667 |
1447 | 0.5237 | 82.6613 | 5311.127 | 0.0164 | 98.0645 | 188.6667 | 0.0248 | 96.4516 | 11 |
1448 | 0.5155 | 80.5645 | 20240.33 | 0.0204 | 97.0968 | 169.6667 | 0.0385 | 94.7581 | 11 |
1449 | 0.4043 | 89.7581 | 20519.17 | 0.0334 | 97.6613 | 187.1667 | 0.0378 | 91.6129 | 11.66667 |
1450 | 0.0828 | 94.1935 | 18514 | 0.0365 | 98.7903 | 144.1667 | 0.062 | 95.7258 | 10.66667 |
1451 | 0.3644 | 92.3387 | 9103 | 0.0348 | 99.3548 | 213.1667 | 0.0353 | 97.7419 | 11.66667 |
1452 | 0.0446 | 92.4194 | 18152.33 | 0.0184 | 97.2581 | 141.8333 | 0.0483 | 91.2903 | 5.5 |
1453 | 0.0575 | 89.6774 | 18282.33 | 0.0205 | 96.5323 | 124.8333 | 0.0489 | 92.8226 | 7 |
1454 | 0.4911 | 94.5161 | 5918.667 | 0.0459 | 99.1129 | 168.6667 | 0.1302 | 73.3065 | 7.333333 |
1455 | 0.2037 | 94.9681 | 3514.333 | 0.0577 | 99.1214 | 189.1667 | 0.0792 | 99.1129 | 9.833333 |
1456 | 0.8155 | 87.6613 | 22892.22 | 0.0197 | 98.7097 | 130.3333 | 0.027 | 95.9677 | 6.833333 |
1457 | 0.1853 | 93.2258 | 6648 | 0.0272 | 96.7742 | 176.1667 | 0.029 | 92.7419 | 9.333333 |
1458 | 0.0677 | 92.8226 | 6724.667 | 0.0226 | 97.9032 | 110.6667 | 0.0373 | 94.5968 | 9.666667 |
1459 | 0.0386 | 78.5484 | 34454 | 0.0019 | 85.8871 | 54.66667 | 0.005 | 46.129 | 6.833333 |
1460 | 0.0193 | 92.0968 | 5519.083 | 0.0311 | 96.8548 | 53.89815 | 0.4796 | 28.5484 | 8.5 |
Table 3- Experimental result of average prediction on Combine Score 20 major cryptocurrencies
Linear Regression | SVM Regression | Random Forest | ||||||||||||
R | M | D | U | Z | R | M | D | U | Z | R | M | D | U | Z |
1.0594308449457146 | 0.215485 | 90.55083 | 13260.24 | -13171 | 0.49118014318682995 | 0.029625 | 97.364621 | 143.07 | -46.2262 | 1.3628232655093842 | 0.073455 | 86.77016 | 9.60 | 75.73388 |
- Deployment
The modelling approach of the predict the forecast price in trading market size. The model has been illustrated in the previous section. After training and testing as shown in Figure 1 previously (see Section 1). The proposed model will be ready to use. In this section, the proposed approach is described by applied the best model that has been evaluated which had the highest performance and accuracy in previous section. This phase contain four main layer:
Layer 1 Crawling Agent (CA): CA is responsible for crawling cryptocurrencies data market from Internet. The suitable language for create crawling agent is python by using Scrapy package. It is an open source and collaborative framework for extracting the data from website. The advantages of Scrapy it can done job in a fast, simple and extensible way.
Layer 2 Transform Agent (TA): TA is agent used for running the transformation of data for cleaning process by using sliding window technique. At the same time, the agent also checking the missing values in the data extraction data in CA.
Layer 3 (Analyzer Agent): This layer will analyze data provide by TA agent. This agent using forecasting weka feature that can integrate with others system through weka python plugin to forecast the time series prediction. Based on the previous experiment Random Forest the best machine learning performed with the dataset.
Layer 4 (Decision Maker): This layer is making decision to alert investor to buy and sell of forecast price according time series. The first step is finding the features input based on statistical naive method by shifting the previous price as predict price in advance so that we can be prepared to make trading and do the prediction based on random forest as shown in the pseudocode of forecast model (see Figure 3). Then, the similarity is employed to form a similarity relationship between one rules to another rules from the fuzzy case base. The fuzzy rules generate the rules (see Figure 4), which indicates the highest and lowest prices status for the trading which can make the decision for investor to trade the current and future currencies. This is applied to make the trading strategies at various level which making the decision, suggestion and recommendation to prevent the predicted currencies failure.
Figure 2- Proposed Approach
Pseudocode Forecast model |
1. Start open price T1
2. Used Naïve forecasting method for predict next T2 by shifting the T1 as predict value. 3. Loop: Naïve Forecasting method T1 -> T24 Log the data 4. Transform log data form T1 – T24 (2-hour data collected) to mean, min and max 5. Create sliding window based on T12 threshold value (12 window size was used to convert 5 minutes’ timestamp to 1-hour timestamp) 6. Set transform data as TransData 7. Training and Testing TransData in Random forest to forecast predicted value in Mean, Min and Max. |
Figure 3- Forecast Model Pseudocode
Fuzzy Rules : Trading Strategies |
Rules 1: Start trading with small amounts (1% of total assets)
Check the (predict_min, predict_max) compare (previous actual value) If (predict_min < previous actual value) { Skip and wait for the next price } If (predict_max) > previous actual value){ Sell more than 1% } Rules 2: Buy Low, sell high referring Figure 1. If (predict_min < previous actual value ){ Invest } If (predict_max > previous actual value){ Trade } |
Figure 4- Fuzzy rules for trading strategies
- Discussion
Table 2 present the result of accuracy of forecasting measure by mean absolute percentage error (MAPE) and the percentage of occurrences by using directional symmetry (DS) in which the measure the performance of a model in predicting the direction of value changes. The three models which are Linear Regression, SVM Regression and Random Forest were tested. The result obtained was as expected as Random Forest has an edge over other two machine learning algorithms. Among the three algorithms, Random Forest was the fastest to be executed but Linear Regression suffering badly to predict mean value of Bitcoin price. On the other hand, SVM Regression was able to produce accurate predictions while Random Forest came second in that criteria. However, Random Forest was the most stable algorithm when handling different types of datasets. Lastly, again SVM Regression proved that it can predict the direction of the Bitcoin price accurately. Nevertheless, Random Forest emerged as the most optimum and most suitable algorithm to predict prices of the Bitcoin currency (Z= 75.73388).
(Please refer the results of this experiment at Section 6 Results)
- Conclusion
Based on the prediction scoring formulas, we have calculated Combined Score Model (Z) for all three machine learning algorithms (SVM Regression, Random Forest, Linear Regression). However, in the end, we have found out that Random Forest has the lowest Z value which is 75.73388 and was the fastest as it has the lowest U value. Thus concluding this article by mentioning that Random Forest is the best machine algorithm to predict Bitcoin price value for every 5 minutes and hour.
References
- Ahamad, S., Nair, M., & Varghese, B. (2013). A survey on crypto currencies. Proceedings of the Fourth International Conference on Advances in Computer Science, 42–48. https://doi.org/02.AETACS.2013.4.131.
- Bakar, N. A., & Rosbi, S. (2017). Autoregressive Integrated Moving Average (ARIMA) Model for Forecasting Cryptocurrency Exchange Rate in High Volatility Environment: A New Insight of Bitcoin Transaction. International Journal of Advanced Engineering Research and Science, 4(11), 130–137. https://doi.org/10.22161/ijaers.4.11.20.
- Lewis, A. (2015). Blockchain Technology Explained. Blockchain Technologies, 1–27. https://doi.org/10.15358/0935-0381-2015-4-5-222.
- Panikkar, S., Nair, S., Brody, P., & Pureswaran, V. (2015). ADEPT : An IoT Practitioner Perspective, 1–20. Retrieved from http://ibm.biz/devicedemocracy.
- Global Charts; Total Market Capitalization, Retrieved from https://coinmarketcap.com/charts/, 27 April 2018.
Appendix
Appendix I: Code Snippet: Identify the data quality problems
Appendix II: Code Snippet: Transformation cryptocurrencies dataset into sliding window concepts and generate mean, maximum and minimum value.
Appendix III: Answer to question
Answer to Level-I Questions:
1- What necessary data preparation did you need?
Answer: Please see Section 2.
2- Which is the most suitable method for forecasting cryptocurrencies?
Answer: Please see Section 2.
3- Which cryptocurrencies were hardest to predict?
Answer: Please see Section 2.
4- What anomalies in behavior of the cryptocurrencies did you detect?
Answer: Please see Section 2.
SourceCode
check_missing&null_data UMalayasourcecode
7 thoughts on “Prediction Cryptocurrencies using Hybridization Machine Learning”
I can judge only from crypto related side so in general I like how the solution is structured and every step is explained. Even I am not data scientist I managed to understand a bit because of the good explanation. The test will show if you have managed to do the job but besides that great work. Keep the good work and good luck!
Very nice structure of the paper, with all the results explained in details and all the required test supplied. Well Done. I can’t find the source code for prediction. I found only the matlab code for data manipulation
Very good article! Very well-written, good job 🙂
It would be nice to compare your results with a baseline (predicting the previous point, or average of the previous points or something similar). Also, your idea to use accumulated measures for the past 1 hour is interesting. Did you compare it with just giving the previous points as features to the regression?
It would be nice to share your code as well.
Very well written article. Good job.
You are one of the small number of teams trying to solve level 2, that is positive.
Please, take my advice: http://jupyter.org/
Are you looking for a profitable investment where you can start with a little amount and earn a reasonable profit within a short period of time?. I never believed in any online investment because I was scared and never wanted to be cheated, until I saw a review about Mr Pablo Martinez. He’s a Forex/Crypto trading account manager who can help you manage your trading account with his trading strategies and winning signals. I started with an investment of $500 and earned a profit of $6,650 within 7 days. I now earn quite a lot on a weekly basis and I owe everything to Mr Pablo Martinez. Thank you Mr Pablo Martinez for turning my financial life around, and I will keep recommending your good works. If you want to invest in Stock, Binary options and Forex/Crypto trading, kindly contact Mr Pablo Martinez and you’ll be glad you did. There are no hidden charges.
Contact Mr Pablo Martinez through
E-mail: [email protected]
WhatsApp: +44 7520 636249
Facebook: https://www.facebook.com/PM-Fast-Trade-104912912155335/
NO UPFRONT FEES
I had this great paying job, and life was good, then almost overnight everything changed when I lost my job, I missed some payments, as a result my credit came up with negative items; Inquiries, late payments, collections, card debts etc. After spending a lot of money and time on credit repair companies to no avail. I saw a review JAKETECH CREDITFIX; I hired their services after days of thorough thinking. After 13days I was told to pull my credit report and when I did I was amazed, my score had been raised from 588 to 781 and all the negatives cleared. I highly recommend them! (Jaketechcreditfix at gmail dot com) …….