|For how many years have you been experimenting with data?||
Popular articles by boryana
Popular comments by boryana
I really like that the whole process of modelling is backed up by sound business logics. I find that your approach of incorporating exogenous features such as data on relevant stocks and Google trends adds a significant extent of originality in the adopted research methodology.
Data prep is conducted in compliance with the core theoretical requirements.
The implementation of rolling sample is correct.
What I would like to advise on the model is to consider statistical significance, especially for the estimates associated with the exogenous explanatory variables. Also, looking at the plot of actual vs predicted 1-step-ahead data, I might state that the model captures really tightly the series volatility for the first 12 000 observations. In order to tune better the model you might consider the reason behind the deteriorated performance aftermath. Once again, my suggestion is to inspect how statistical significance of delivered estimates changes over time.
Congrets on reporting the figure of directional symmetry!
Also, I really like the way your workflow is organized taking advantage of both R and Python utilizing the one that is best suited to the research task at hand.
Great job, guys!
Working with data for 2018 only is a good solution so as to deliver more quickly consistent representation. Also, focusing on one main station is a good choice with respect to the timing of the task. I like the presented maps. I would like to see at least part of your code.
Hi, guys! Also looking forward to read the progress of your paper 🙂
Hi, guys, good job!
Business Understanding: the text is relevant and the research objectives are stated clearly.
Data Understanding: I like very much application of heatmaps so as to visualize the air pollution information contained in the citizen’s data set. You’ve did a good analysis on the issues related to the data in the set with official measurements. Yet, pay attention that citizen’s dataset spans from 2017 to 2018. Therefore, taking years 2013-2016 as training set and 2017-2018 as a test set would work only if you are to predict air pollution at the main stations. However, the objective is to deliver forecasts for citizen stations.
Including a section on future improvements and a list of references is an advantage.
I like that you’ve included a section on utilized Technology and Methods 🙂
The aim in the Business Understanding section is stated clearly.
The Data Understanding section outlines well the key characteristics of the available datasets. Probably merging the official and the citizen datasets at the very beginning of the research consumed too much of your time for data prep. Application of several filtering rules prior to merging datasets might have helped.
Modeling: the presented graph and the bullets of findings below it are a promising beginning. Looking forward to reading your full paper!