Introduction
Data provided consists of 3 years of weekly volume of sales, price of product in question, prices of main competitors and promotion calendar for a FCMG product. Data is provided by SAP.
The task is to identify the volume uplift drivers, measure the promotional effectiveness and measure the cannibalization effect from main competitors. The goal is to analyze the impact of price reduction and promotions on volume of sales by also taking into account competitor’s prices.
In order to be captured most of the patterns in the behavior of the Volumes of Sales during promotions periods, were derived about 150 new variables. A set of various approaches were applied in the modelling stage. Derivation of the Baseline price is main step in the model, since it has been used in the discount effect capturing. Various Time series analysis were applied to the volume of sales and the price as ARIMA and Spectral Analysis (Fourier) in order to be investigated any seasonal or cyclical (frequency) pattern in the customer behavior. As a featuring selection tools the Lasso and Extreme gradient boosting approaches were used. The final model is based on the OLS method where the significance of all observed patterns from the previous steps is tested.
Data understanding
Since a small number of features is provided, the first task was to review plots of the features provided.
Main observations:
- All but one of the competitors to our product are seasonal competitors, so their prices should have limited impact on our sales volumes.
- It seems from the data that the Competitor 2 and Competitor 7 are Companies that would ensure not only seasonal supply of their product since the information on their prices is provided for the period which follows the seasonal promotion period. But not hard assumption can be made since this behavior is observed at the end of the supplied time series with prices of this competitors.
- Competitors’ prices are very low in the end of their cycle. Most probably the product has short period of expiration, so the competitors are forced to sell the remaining quantities in the end of the season.
- All promotions lead to boost in the sales. For most promotions the price is raising in the end of the promotion. A continuous effect of the promotions can be observed after the promotion has ended.
- The mean price of our product is higher that the mean price of all of the competitors. Most probably this means that our producer is the leader on the market, and the rest of the competitors have follower’s behaviour.
- A couple of cases are observed where the price is obviously discounted, but this week is not market as a promotion in the promotions calendar. The reason for this may be that the retailers are also organizing promotions, but these cases are not included in the promotions calendar.
- The frequency of the promotions is between 1 and 7 weeks.
- There are some cases where in the last week of promotion the price is higher than the price in the first week of promotion (discounted price). It may be due to the way data is aggregated – on weekly level, so if the promotion has started on a Friday the data for the sales during this promotion period might not be absolutely correct.
- The strongest effect is observed for observations A, B, D
- The effect of the promotions of the competitors on the sales is estimated as proxy of the cannibalization effect. It is measured as the impact of the price discount of the competitor on the sales.
- For the most promotional cases the effect of the promotions has highest impact in the first week of the campaign and has decreasing in the following weeks .
Feature engineering
Various data transformations approaches were applied in the derivation of new features. Several groups of variables were created. The main characteristics were connected to the discount formed for the main product during promotion periods. Additional Dummy variable approach was used for creation of separate Promotion type variables and variables indicating the weak of the promotion.
Baseline Price Derivation
As a most important step, baseline price has been evaluated for our product and the products of the main competitors. The baseline price is defined from the periods with no promotion for every next period with promotion. Two different approaches have been used for defining the baseline. The first one was based on the mean price calculated for the period before the promotion. This value was used for the period in promotion.
The second more sophisticated and finally used approach included the following steps:
- For the period of promotions, the price before the promotion and the price after the promotion were used for linear interpolation of the base price during promotion periods
- For the periods with no promotions registered in the promotions calendar the real price was used
- In order to smooth the final base price Hodrick Prescott filtering was applied with lambda of 10
Promotion Discount Derivation
The above baseline price is used in combination with the observed weekly Actual Prices. The Discount is estimated as : (Baseline Price – Actual Price) / Baseline Price. The variable is derived for our company price and the competitor prices.
Additional Variables Derivation
As a next step, additional features have been derived that account for price changes, price elasticity, baseline prices, price discount, competitors’ base price, impact from competitors’ price discount, price log difference between own and competitors’ price, number of competitors, dummies for promotion type and week from start of promotion, time since last promotion started, competitor is in promotion or not etc. Variables based on the maximum realized volume of sales in the last periods before the promotion were derived in order to be captured the relative effect of the current promotion.
For the purposes of the investigation of the cannibalization effect on the market a base level of the Volume of Sales characteristic is created. The Sales are preliminary cleaned form the effect of promotions of the main product in order to be isolated the influence of the discount in the competitors price on the Volume of Sales. For this purpose a model with the dummies for the type of promotion and for the week of the promotions is used. The residuals of this models are smoother with HP filtering (Lambda = 10) and are used as a proxy for the volume of the volume of sales if there were no promotion campaigns.
Time Series Effects
Auto-correlation function have been applied to the price. As a result of the ARIMA model additional features have been derived for the lagged values for the most significant lags. It is well visible from the applied graph that the first and second lag of the equation are more explainable for the volume of sales.
In addition Spectral analysis were applied to the volume of sales, previously cleaned from the effect of the prices in order to be captured pattern in the frequency of the sales which is independent from the price effect. For this purpose the Fourier frequency analysis was used.
Additional variable to represent two spikes within an year (52 weeks period) was derived to represent the short term increase in sales due to national holidays. Since there was non-national holiday calendar available the weeks for each year were determined through analysis of data. The approach was to divide the data into 52 week segments (incomplete for the last year). A sum for each week of the year for the three years was calculated and dived by the number of years for each data was available for the corresponding week (e.g. 2 or 3). Two spikes in volumes are noticeable through visual analysis of the average sales for each week of the year, and were further confirmed by having similar spikes for the corresponding period for the volumes within each separate year.
A full list of the created variables can be found as an attachment.
Modeling
OLS Model
A preferred method for statistical analysis in Marketing is OLS. This is the case, because OLS allows a straightforward interpretation of the sales volume uplift factors. All variables described in section “Feature Engineering” are tested in the model. As the above phase results in around 150 possible explanatory variables a model selection process is required.
The explanatory power of the features and their importance are evaluated by applying a Gradient Boosting Tree algorithm – XGBoost and by building regression models through LASSO analysis. Both analysis rely on initial calibration of the input parameters. This was done by an exhaustive iterative approach, which tries all combinations and selects as optimal the one, which results in highest accuracy. To account for over-fitting, when calibrating the parameters a KFold validation approach was applied, based on 5 sub-samples. The 2 Model Selection approaches described above resulted in a short list of variables with highest importance, which were used as starting point for the OLS model building phase. In the case of LASSO, all variables with non-zero coefficient were added to the short-list, while in the case of XGBoost, the first 20 variables with highest weight in the final prediction were included.
Both forward and backward stepwise selection processes were used when building the final model. The forward selection process was guided by the short-list obtained from the initial phase and relied on expert domain knowledge, statistical evaluation of the significance of the parameters, marginal improvement of the model and analysis on the correlation matrix. The Bakcward selection process started by forcing all variables in the model and evaluating their significance. The second step in the backward-selection process was to remove all insignificant variables one by one, while applying higher thresholds for the variables in the short-list and considering the marginal effect on the accuarcy.
A 5% significance was generally used for evaluating the statistical significance of the coefficients. Only one parameter estimates enters the model with p-value higher then 5% (equal to 7.8%), but as the p-value is still below 10% and the parameter has entered the model with logical sign and level value it was decided to keep the parameter in the model.
Best and worst case scenario assessment
A function has been applied to the model predictions in order to assess not only the predicted sales, but also the best and worst case scenarios.
Residuals have been tested using Shapiro-Wilk test and they are assessed as normally distributed.
From business point of view it makes sense to have non-static best and worst case scenario intervals. So, for example when the volumes are on a peak then higher variation can be expected. On the other hand, when the sales volumes are low, the intervals between the best and worst case are shrinked, because the product is a market leader and has a stable customer base. In order to incorporate these to cases into the prediction we multiplied the intervals by a penalizing parameter having values between 0 and 1.
The function has been derived in the following steps:
Best case scenario
- to our predicted volume of sales a correction term is added
- the correction term is a sigmoid function applied to the percentage change in the sales volume since the previous week multiplied by 3 times the standard deviation of the residuals
Worst case scenario
- from our predicted volume of sales a correction term is substracted
- the correction term is a sigmoid function applied to the percentage change in the sales volume since the previous week multiplied by 3 times the standard deviation of the residuals
From the graph below it can be seen that the best and worst case scenario are affected by sales volumes being in the higher or lower bands.
Evaluation
- The following variables entered the model:
Baseline Price of Actual Price – This is an estimate of the company product price in situation of no promotion in he long-term. As seen from the model it is expected that and increase in the long-term value of the price results in decrease of volume of sales. - Discount of company promotions – This estimate shows the percentage discount of the baseline price when in promotion in the current week. It is expected that higher discount value is related to higher sales in the current week.
- Discount from last promotion – This estimate shows the highest discount in the previous promotion. This value is populated with the mean discount value through the development sample for the first few week of the data set. It is expected that a high discount value during the last promotion will result in lower sales in the current as customers often overbuy the product in case of high discounts. As seen in the final model, the coefficient in-front of the current month discount is higher than the one in-front of the last promotion discount. This is indicative that we can control for the previous promotion discount in case of high expectation from the current promotion session.
- Volume of sales from previous week – This variable enters with negative sign in front of its parameter, which shows us that high sales in the previous month are followed by lower sales in the current
- Week of promotion – It is expected that usuually highest sales are realised in the first week of the promotion.
- Type A promotion – The model shows that the non-pricing factors of the 5 types of promotion are significant only for promotion A. The current expetion is that they result in higher sales.
- Length of last Promotion – This parameter also enters the model with logical coefficient. It is expected that longer previous promotions will result in lower sales in the current period.
- The only competitior, which significantly influence the company sales volumes with their pricing policy is Competitor 3. It is expected that higher percentage difference between the company price and the competitor price results in lower sales for the company.
- Only 1 competitor in the market – The analysis shows that having only 1 competitor in the market influence positvely our company, while the situation of having no competitors is not investigated as there is no such data in the sample
- Holiday/Systematic High Sales – There are 2 periods within the year, which are related with significantly higher volumes of sales. The Statistical analysis shows that this is behavior, which is not explained by any other variables in the model.
The model predictions look quite good. In the graph below you can find the observed vs predicted values:
Conclusion
Generally elasticity of demand is defined as the percentage change in quantity demanded divided by the percentage change in price. Elasticity for the continiuos variables is calculated based on the regression coefficients. It can be used to calculate for eaxh 1% change in the predictors what would be the corresponding change in the volume of sales. The Elasticity is calculated by dividing the mean value for each relevant predictor, multiplied by the regression estimated coefficient and dividing the product the mean value of the dependent variable (Volume of Sales). Or avg(Predictor)*Coefficinet)/avg(Volume of Sales). Additionally the regression formula can be used to simulate different results by inputting different values for the dummy variables and analyzing the changes.
The elasticity for the applicable factors contributing to the volume of sales can be seen in the table below:
Variable Elasticity
datafile[‘BASE2_ACTUAL_PRICE_y’] -1.6640498505
datafile[‘DISCOUNT2_ACTUAL_PRICE_y’] 0.2273465563
datafile[‘LAST_DISCOUNT’] -0.3009602194
datafile[‘VOLUME_LAG1’] -0.2775736955
datafile[‘Price_Relative_Diff__3’] 0.1079935245
In other words for each 1% change in the base price we expect the volume of sales to decrease with 1.7%. If there is 1% increase in the promotional discount we expect the volumes of sales to increase with 0.22%. Similarly the effect can be analysed for the other variables.
The elasticity for the applicable factors contributing to the volume of sales can be seen in the table below:
In other words for each 1% change in the base price we expect the volume of sales to decrease with 1.7%. If there is 1% increase in the promotional discount we expect the volumes of sales to increase with 0.22%. Similarly the effect can be analysed for the other variables.
The analysis showed that the promotional effectiveness is highly price dependent, while the non-pricing factors are significant only in the case of promotion A. The model also shows that the promotion calendar and planning significantly influence the sale volumes as seen by the inclusion of variables like length of last promotion and discount during last promotion
Attachments
References
- MARKETING MIX POLICIES IN FMCG CASE-STUDY: THE ADVERTISING STRATEGY- ANA MARIA BOBEICA
- PREDICTIVE MARKETING MIX MODELLING IN FMCG PACKAGED FOOD CATEGORY INCLUDING PRICE AND ADVERTISING IMPACT – Tomasz Kolanowski
Presentation:
6 thoughts on “Price and promotion optimization for FCMG”
Excellent work team!
The most difficult part of this challenge is to understand the data, create new features and rerun the predictive models till you achieve a good accuracy.
As you may mentioned if you run a predictive model with the initial dataset you will get an extremely low modelling accuracy.
I will vote based on the below criteria:
1. business understanding
2. feature engineering
3. modelling accuracy
4. insights & final results
You achieved a good modeling accuracy and you created a number of new features based on your excellent understanding of the data and the business case.
Moreover you visualized all the variable in a meaningful way and you took the right decisions on creating new features.
You could further increase the accuracy of the model by implementing a better base price algorithm and by calculating the baseline volume in a better way.
Not more to add after Agamemnon 🙂
I also like your validation approach, keeping in mind the small number of data, also introducing different scenarios related to the forecasts.
And just to add: keeping in mind the limited time, I would be very happy to see continuation of your work here.
Questions during your presentation:
1. @agamemnon
What was the most challenging part of this analysis and how you came up with the elasticities at the end ?
2. @mladensavov
What is the explanatory power of your model? Could you quote the R value? How many explanatory variables did you eliminate to build the model?
3. @tonypetrov
Having too many variables in your model may impact performance. Did you take any measures to combat the potential slow down?
Hey all, now Team BOTS present their solution and we are LIVE at https://www.youtube.com/watch?v=INUbkBHqYiE
Don`t forget to ask your questions here or on YouTube and after their presentation we will share the resources with you, of course! 🙂
Take the chance and leave your comments now!
Lot of work went in feature engineering which was too detailed. Final model looks incomplete from client perspective as I worked on price promo models for several retailers.
Please keep include following variables in model
primary – base price,promo price,all promo dummies,competitor price (for some) ,holidays,trend
secondary – other variables which can improve model