Prediction systems

Datathon NSI Solution – The curious case of ‘Household Budget Survey(HBS)’

The National Statistical Institute of Bulgaria (NSI) conducts annually a Household Budget Survey (HBS) with an objective to get reliable and scientifically founded data on the income, expenditure, consumption and other elements of the living standard of the population as well as changes, which have occurred during the years. NSI is considering a change in the periodicity of the Household Budget Survey from yearly to once on every five years,In order to optimize the cost of carrying out the survey. Hence We are creating a model which will predict household expenditure for the next four years using linear regression model and time series. The algorithms that we will be taking help from are linear regression model & Autoregressive integrated moving average(ARIMA). So lets not waste any time and move on with it !


6 thoughts on “Datathon NSI Solution – The curious case of ‘Household Budget Survey(HBS)’

  1. 0

    In the case of the linear regression model that you described: can you make it more clear, which were the observations in the train set, which were the observations in the test set? Which were the features and which were the predicted variables?
    Can you also comment on the main difference between using a classical linear Regression and the ARIMA model? Which is more appropriate?

  2. 0

    Thank you for working on the NSI case!
    You are saying that the TS linear model is better but for example from the “food and non-alcoholic expenditures” graph it can be seen a lost of seasonality which can be crucial when calculating consumer price indicies. It would be better if you provided predicted vs expected values for some kind of error estimation. Otherwise the article is readable, friendly and shows dedication and understanding of a certain level of the subject.

    1. 0

      Thanks Sir for your kind query. We are pleased.
      We have thought of seasonality in the data and found also that.
      However, we tried to incorporate as many variables for prediction in the model to get as good fit as possible. As the time was less, we could not deal with the seasonality in an effective manner. We will further work on this to get a better model.
      We have seen the R-square value which seemed to be good comparing the original and predicted and so for the time-constraints we have omitted presenting that however those were plotted in the graph. We will further update our article.

  3. 0

    EDA: We used linear regression model to identify the most significant factors on which the change in expenditure is dependent.
    Model: We used time series linear regression model to predict the household expenditure based on the above identified significant factors. Made model on both yearly basis and quarterly basis and predicted.
    Used Auto Arima as another approach to make predictions.
    Details of Training data, test data, predicted variables and features have been mentioned in the article now.
    In Linear model we identify the linearity between the variables and in ARIMA model we use historical data to predict the future values.

Leave a Reply