Datathon 2020 SolutionsFinance

Minimizing logistical transportation expenses of Retail Supply Chain

0
votes

Business Understanding

As you know,lots of companies,related to logistics system,are having troubles with managing their finances,when it comes to transporting and ordering.There are tons of problems and unnecessary costs,that are driving executives crazy.But the main problem,that is causing the most biggest pain to these companies,IS Transportation Expenses.Just imagine how much billions(if not trillions) dollars would be saved IF all transportation expenses had been minimized.Minimized expenses are main keys for preventing corrupcy among workers and increasing profits of company.

Kaufland has challenged us to reduce the unnecessary expenses on product transportation. The reinforcement learning algorithms are great at adapting in dynamically changing environment. It also can find such solutions to problems that human intellect would not consider. Our environment is the customer demand and the question is ‘How can we satisfy customer’s demand with lowest possible cost?’. The costs come from:

* Transportation
* Storage

The solution to this problem won’t only be beneficial for business but also costumers since products can be shipped according to priority of item at this particular hypermarket.
For such task we’ll use Python and our model is RandomForest.

Data Understanding

So,how to handle this?How to minimize transportation expenses?What should we do?Okey,before rushing,just let us show some interesting data visualizations and information,that can solve this “nightmare” of CEOs of logistics companies.

Data Preparation

Firstly,we have to substitute commas in columns “item_prio” and “storage_cost” with dots.It will give us the ability to manipulate with these numbers as floats,not as strings.

(code in the image above)

Secondly,we are removing “item_id” and “item_name” from data,because they are useless(we can know the popularity of one type of item through vizualisation the item_id and order_qty,but there has almost been given the priority of each product).

 

df.drop([‘item_id’,’item_name’],axis=1,inplace=True)

 

Thirdly,we create dummy variables for column “Unit”.Because storage_cost,min_stock,max_stock,transport_qty and order_qty are dependent on ST and KG units(1 ST=6.35 KG,so instead of multipltiying,it is better to create separate new binary column “ST”).Of course,after this.column “Unit” will be dropped,because we dont need it anymore.

 

dummy_units=pd.get_dummies(df[‘unit’],drop_first=True)
df=pd.concat([df,dummy_units],axis=1)

Now data is ready for manipulations.

df.to_csv(“MasterData.csv”)

Modeling

We decided to use RandomForest,one of the best methods for predicting this type of tasks.

X=df[[‘order_qty’, ‘transport_qty’, ‘min_stock’, ‘max_stock’, ‘item_prio ‘,
       ‘storage_cost’, ‘mhd’, ‘ST ‘]]
Y=df[‘storage_cost’]
from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.2)
from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators = 20, random_state = 0)
regressor.fit(X_train,Y_train)
y_pred = regressor.predict(X_test)
y_pred
RandomForest is perfect for predicting StorageCost,not because of GridSearch’s decision,but because,it has unique structure for predicting storage cost step-by-step,from-root-to-leap,not using direct relationship between ALL variables.

Evaluation

The difference between actual and predicted values is really small,and we can put it in production.

As you can see here,we can predict storage cost without needing help from accountant or going into details.

Share this

2 thoughts on “Minimizing logistical transportation expenses of Retail Supply Chain

  1. 0
    votes

    The articles begins with – “As you know,lots of companies…”. No I(we) the reader/s do not necessary know much about the supply chain industry and its problems. Cite your sources.
    “Just imagine how much” – no need to imagine cite where you get this numbers.
    “The reinforcement learning algorithms are great at adapting in dynamically changing environment” – this is not true in general. While there are algorithms that can deal with non homogeneous environment this statement is not true. (1), (2)

    The article mentions “Reinforcement learning” and assumes that the reader is familiar with the general framework. It could help the exposure if there is more explanation what is meant by “environment”. While the term environment is present no mentions of agent, actions, rewards is present in the text. This exposition would greatly benefit of either explanation of the general RL framework or citation of relevant work like this one (3).

    The solutions continues with nice section “Data understanding”. The team starts with the proper mindset and shows how the variable that is to be predicted is distributed given the item. Unfortunately no other graphs of related variables are made.
    The proposed solution is using Random Forest – which is not RL algorithm. But is suitable for the task at hand. While RF is overall robust algorithm there so no support of the claim that this one is the best for the task at hand. Comparison with other algorithms should have been done or at least mentioned.

    Modeling section is clean and easy to follow. Still “RandomForest is perfect for predicting StorageCost,not because of GridSearch’s decision,but because,it has unique structure for predicting storage cost step-by-step,from-root-to-leap,not using direct relationship between ALL variables.” – this sentence is good on presentations it has no value in this text, as other (black box)algorithms are quite good at finding complex relationship between variables(emphasis on ALL).

    Evaluation section is non existent. Evaluating errors (residuals) from algorithm by eye might be possible in this case but in general there should be a metric that the algorithm is evaluated against eg. MSE , R-squared. (4) Errors seem to be in the same direction (sing) so they are either absolute values, or the algorithm is consistently biased.
    Small errors in absolute values could be artifact of the task and the variable that is being predicted is “small” in nature.

    1 – https://ieeexplore.ieee.org/document/7849368
    2 – https://www.sciencedirect.com/science/article/pii/S1877050917311134
    3 – https://arxiv.org/pdf/2005.01643.pdf
    4 – https://scikit-learn.org/stable/modules/model_evaluation.html

Leave a Reply