
Popular articles by alex-efremov
DAB PANDA: The A.I. Crypto Trader
ACADEMIA DATATHON CASE: THE A.I. CRYPTO TRADER
Datathon Ontotext Mentors’ Guidelines – Text Mining Classification
Tiny smart data modelled with a not-so-tiny smart model – the Case of SAP
CASE Kaufland, TEAM “Data Abusement Squad”
Critical Outliers – VMware Case
Datathon Kaufland Mentors’ Guidelines – On Predictive Maintenance
Datathon Sofia Air Mentors’ Guidelines – On IOT Prediction
Datathon Telenor Mentors’ Guidelines – On TelCo predictions
Datathon NSI Mentors’ Guidelines – Economic Time Series Prediction
Popular comments by alex-efremov
Weather Disruption of Public Transport Analysis Using Python
Hi, taha-junaid3000 🙂
tomislavk is right… Splitting the work with someone would help you to achive better results and to learn much more while collaborating with others 🙂
Keeping in mind your work I would focus more on the analysis and conclusions regarding the data quality, variables for modelling, etc. This would be helpful to make next steps.
A venture in crypto-currency trading
Hi team :), Good work!!!
You are right that the issue with missings should be solved in a better way (but not replacing with last known value). If there is one or a few neighbour missings, we may replace them without distorting the data, but in the case of long missing interval, Instead of replacement, we may use the data sets separately… There are ways to concatenate data from different data sets even when build dynamic models…
Price and promotion optimization for FCMG
Not more to add after Agamemnon 🙂
I also like your validation approach, keeping in mind the small number of data, also introducing different scenarios related to the forecasts.
Weather-proof Mobility
Hi, again 🙂
I like very much your work: the considerations related to the data, the interpretation of the outliers, the conclusions and also the good business understanding. 🙂
I have some comments & questions about the final model: looking at the p-values you put many not significant factors in the model or I misunderstood something. In order to reduce the possibility of overfitting, I would remove some of them in order only significant factors finally to stay. And to check the model for overfitting, we should compare R2, adj.R2, RMSE, etc., both for the train and test samples. In the case of the cross validation you did we also should do this for the average measures of the model quality. Also using linear regression, we impose particular hypothesis about the type of relation between the factors and the dependent. So, it would be good to check other models as well, especially non-parametric ones. Nevertheless, I really like what you have done.
Predicting weather disruption of public transport
Hi, Svilen, 🙂
Once again – you did very good job. The attempt to predict the outliers is not easy. Usually the appropriate data analysis, data prep and also data enrichment is critical for the final solution of cases like this one. You did a lot here, and at the same time there is more to do (e.g. to improve the balance in the data w.r.t. dependent, to reformulate the dependent variable, as you mentioned…). By the way, about the modelling – I noticed that the mean absolute error is much higher for the test sample compared with the train data. This is indicator of overfitting (if I understand correctly what you presented). So, I recommend you in future to take care about this when build models. In this case adding more factors and optimizing the model on the training sample usually reduces the predictive power when play with new data.