Great approach regarding the data understanding and good explanation of variables and derived variables. Sound understanding of business process and caveats. This paper would benefit from general explanation of Reinforcement learning framework such as (1).
Although the team explains how they think the environment and the agent is to act I would like to see the complete solution and how it really does in this task.
Congratulations for the hard work!

Minimizing logistical transportation expenses of Retail Supply Chain

The articles begins with – “As you know,lots of companies…”. No I(we) the reader/s do not necessary know much about the supply chain industry and its problems. Cite your sources.
“Just imagine how much” – no need to imagine cite where you get this numbers.
“The reinforcement learning algorithms are great at adapting in dynamically changing environment” – this is not true in general. While there are algorithms that can deal with non homogeneous environment this statement is not true. (1), (2)

The article mentions “Reinforcement learning” and assumes that the reader is familiar with the general framework. It could help the exposure if there is more explanation what is meant by “environment”. While the term environment is present no mentions of agent, actions, rewards is present in the text. This exposition would greatly benefit of either explanation of the general RL framework or citation of relevant work like this one (3).

The solutions continues with nice section “Data understanding”. The team starts with the proper mindset and shows how the variable that is to be predicted is distributed given the item. Unfortunately no other graphs of related variables are made.
The proposed solution is using Random Forest – which is not RL algorithm. But is suitable for the task at hand. While RF is overall robust algorithm there so no support of the claim that this one is the best for the task at hand. Comparison with other algorithms should have been done or at least mentioned.

Modeling section is clean and easy to follow. Still “RandomForest is perfect for predicting StorageCost,not because of GridSearch’s decision,but because,it has unique structure for predicting storage cost step-by-step,from-root-to-leap,not using direct relationship between ALL variables.” – this sentence is good on presentations it has no value in this text, as other (black box)algorithms are quite good at finding complex relationship between variables(emphasis on ALL).

Evaluation section is non existent. Evaluating errors (residuals) from algorithm by eye might be possible in this case but in general there should be a metric that the algorithm is evaluated against eg. MSE , R-squared. (4) Errors seem to be in the same direction (sing) so they are either absolute values, or the algorithm is consistently biased.
Small errors in absolute values could be artifact of the task and the variable that is being predicted is “small” in nature.

