SAP business case, TEAM: SAP-iens
In the proposed SAP case we are challenged with maximizing the profit obtained by the sales of a particular product, by setting its price over time, and choosing whether or not to enable one of several types of offers. The data that has been collected includes past sales volume, price, price by our competitors who sell the same product, and kind of offer that was being applied at the time.
The goals of this project require us to model the intricate interdependencies between the price of our products, their sales, and the price of the same product by our competitors. Additionally, the reaction of our competitors to our own change in prices can be gauged, and a strategy developed to be one step ahead of them. Additional factors like seasonal changes, and changes in the local and global economical mood are factors that affect retail businesses everywhere and that should be taken into account.
The available data contains information about the weekly sales of our product for 146 weeks, slightly short of 3 years. We also have information of the price that was set for the whole week, and about the price that 7 of our competitors had for that same period.
The data of our competitors is lacking many values, due to the product not being offered by the competitors during the whole period. For some periods none of the competitors offered the product at all.
We also have information about the types of promotional campaigns that were performed each week, or whether no campaign was performed at all.
First, we tried some basic data visualizations to get used to the data.
At first sight, the data clearly shows that sales spike up right when prices are reduced:
And the total income per week also goes up coinciding with price discounts:
So, at first sight, we can already see that discounts are a good idea. However, there are many other factors that can influence our income. Discounts may be effective precisely because they are not available all of the time. There may be some seasonal factors at play, and the price of our competitors may also be an important thing to consider. We are hoping that with this analysis we can answer the following questions: How often should we offer discounts? For how long should they last? How should be baseline price change over time? What should the discounted price be?
If we take a look at the data of our competitors, we see that it is not very exhaustive:
As we can see, only the blue line (which corresponds to our own price) and the yellow line are complete for the whole period. That means that only one of our competitors (the yellow line) sold the item for the whole period.
We have also tried looking at a smoothed version of the volume of sales and the price, to see if seasonality was a big factor affecting the sales. As the following graph shows, while there are big long-term differences in the amount of sales and in the price, they is no apparent yearly pattern that follows the different seasons, so we have decided to not include a feature that accounts for seasonality.
Smoothed volume of sales:
Smoothed selling price:
The data preparation step is one of the most sensitive and open to interpretation, and requires an understanding of the possible dynamics that may be at play with the data, in order to generate features that contain meaningful information for the system to predict. We also need to transform some of the features to a representation that the model will understand, and choose which features need to be removed.
One feature stands out that needs to be reformatted to be understood by our model, “type of promotion“. This information is categorical, since the type of promotion can be any of five types: “A”, “B”, “C”, “D” or “E”. We need to convert that to a format that our model can understand, so we create 5 different binary features, one for each possible type of promotion, that will be set to 1 when one particular promotion is active, and to 0 otherwise.
We have built additional features by reasoning about how different pieces of data may be related to each other, and testing if those relationships turn out to be useful.
– Price variation
This is a simple difference between the price in the previous week, and the price in the current week. It makes sense that the volume of sales will be heavily related to whether the product has been discounted, independent of the absolute value of the price of the item, so this value should account for that possibility.
– Minimum, maximum and mean price for the week.
We have calculated the average price of the same product for the last week, including our own price and that of our competitors. That way the model can evaluate how our close our price is to being the cheaper or being the most expensive, and see if that affects the volume of sales.
– Minimum, maximum and mean price for the month.
By calculating the same values but for the whole previous month, we can see if there exists some kind of memory effect by our customers, so that our price relative to the market during the last few weeks has an impact to the current volume of sales.
– Time since last discount started, time since last discount ended.
These two values allow us to factor in the tiring factor that seems to exist after a product has been discounted for several weeks. Sales seem to spike right after there is a big discount, but they do not seem to remain nearly as high if the price stays low during the following weeks.
– Campaign lasting effect
This value may seem similar to the previous one, however, having the effect of a campaign and the effect of the price itself separated allows us to distinguish whether differences in sales are due to the price change itself, or to a campaign actually being in course. We have created one different value for each type of campaign, and a cumulative value that includes that of all the campaigns.
Campaign lasting effect:
We have decided to fit the data with a linear regression model. We have taken that decision due to the limited amount of data available. Other more powerful models are very likely to create overfitting in this situation.
As the fitting method, we have tried Lasso, Ridge and Elastic Net. We have found the best results with Lasso method. Elastic Net combines the Lasso and Ridge approaches, so that it can overcome limitations with applying each approach separately. However, due to the high number of features we are dealing with, we have found the aggressive nature of how Lasso gets rid of unnecessary features to be useful.
The Lasso method requires a parameter α (alpha) that needs to be tuned properly. To achieve that we perform a grid search, that found the optimum value of α to be 80.81.
To help the algorithm and detect which features ma
Inner loop test score R^2 is 0.805.
Outer loop test score R^2 is 0.787.
(more thoroughly evaluate the model, and review the steps executed to construct the model, to be certain it properly achieves the business objectives)