Datathon 2020 Solutions

Reinforce Learning in Optimizing Supply Chain For Kaufland

0
votes

This article is a work in progress. We are a team of two and we just started exploring the dataset.

Below is a link to Github –

https://github.com/shamafarabi/Datathon

 

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
In [2]:
r=pd.read_csv('sales.csv')
masterdata=pd.read_csv('item_lookup.csv')
In [3]:
r.head()
Out[3]:
item_id the_date sold_qty
0 40001260 1/2/2019 7
1 40001260 1/3/2019 7
2 40001260 1/4/2019 16
3 40001260 1/5/2019 16
4 40001260 1/6/2019 8
In [4]:
r.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 37026 entries, 0 to 37025
Data columns (total 3 columns):
item_id     37026 non-null int64
the_date    37026 non-null object
sold_qty    37026 non-null int64
dtypes: int64(2), object(1)
memory usage: 867.9+ KB
In [13]:
masterdata.head(50)
Out[13]:
item_id item_name unit order_qty transport_qty min_stock max_stock item_prio storage_cost mhd
0 101900 Dried Basil ST 20 2880 10 40 0 563 548
1 181130 Softcookies ST 28 1680 30 150 2 568 98
2 188640 Carrots juice ST 12 1296 6 60 2 473 243
3 192100 Ice Cream stick ST 24 5304 40 100 2 358 480
4 211110 Cake Cream ST 48 5760 12 48 2 557 243
5 236490 Chiken spice ST 30 6300 10 40 2 584 405
6 250520 Corn flour ST 12 1296 12 60 3 552 547
7 263350 Deo ST 12 3072 18 120 4 944 821
8 280560 Soda ST 6 264 6 60 5 561 135
9 293220 Bio Bar ST 360 19440 10 50 5 758 248
10 302920 Mayo ST 8 1600 16 90 5 883 135
11 334060 Bio gum ST 500 24500 100 600 5 644 821
12 346200 Snacks ST 24 1680 10 100 6 926 203
13 373470 Ayran ST 12 1140 12 84 6 412 15
14 381840 Jelly Candy ST 18 1944 18 72 8 741 135
15 387440 Milk ST 20 960 20 120 9 554 10
16 400830 Tea ST 12 2400 6 30 9 12 821
17 410980 Baby shampoo ST 21 1260 10 40 9 1219 821
18 436620 Handkerchief ST 20 2160 20 140 9 1221 0
19 470570 Pickled Mushrooms Whole ST 12 1728 6 30 1 1141 435
20 475500 Body cream ST 8 1440 6 48 1 1427 0
21 478170 Humus ST 8 1280 6 24 1 106 29
22 489130 Dried Herbs ST 6 180 6 24 11 1282 0
23 500970 Cold tea ST 12 1512 6 48 11 606 338
24 514610 Rice ST 12 720 6 48 11 1374 405
25 520170 Pickled Peppers ST 12 768 6 30 12 1414 289
26 529330 Ice Cream ST 40 5760 20 100 12 458 405
27 559090 Pickles ST 6 864 6 60 12 1051 360
28 559450 Candy ST 64 5120 32 160 12 929 141
29 560010 Muesli ST 20 1200 10 60 12 1566 135
30 567280 Creme sensitive ST 12 2016 6 24 12 1477 540
31 576340 Tooth brush ST 12 3360 6 42 12 1781 0
32 598220 Freezed veggetables ST 25 1575 10 40 13 1141 274
33 609130 Protein Bar ST 12 3600 6 48 14 1361 225
34 611060 Frozen Baggette ST 7 1176 14 56 14 995 141
35 619320 Shampoo ST 12 1152 6 48 15 1279 0
36 624160 Salami ST 30 2520 6 30 15 1267 68
37 636440 Matches ST 288 8640 360 3600 15 2091 999
38 649970 Mashed potatoes ST 14 504 6 42 16 1306 274
39 653750 Black Tea ST 6 1800 12 60 16 1905 822
40 657690 Cleaning liquid ST 12 480 6 36 17 406 675
41 658850 Pickled Mushrooms ST 12 1440 6 30 17 173 540
42 662410 Sweetener ST 80 62400 80 160 17 91 798
43 666050 Washing liquid ST 20 800 5 50 17 166 1096
44 680220 Washing liquid 750ml ST 10 480 10 60 17 137 1095
45 684750 Black Tea w ST 6 2184 12 72 17 1905 821
46 688650 Razor blade ST 40 2560 10 50 17 1236 0
47 690940 Deospray ST 6 2304 6 36 17 1886 0
48 723160 Dog Food ST 6 924 6 24 18 1332 487
49 735390 Cooking Knife ST 96 3072 12 48 19 2562 0
In [6]:
r[r.duplicated()]
Out[6]:
item_id the_date sold_qty
In [7]:
masterdata[masterdata.duplicated()]
Out[7]:
item_id item_name unit order_qty transport_qty min_stock max_stock item_prio storage_cost mhd
In [8]:
#Removing the space char
masterdata.columns = ['item_id', 'item_name', 'unit', 'order_qty', 'transport_qty',
       'min_stock', 'max_stock', 'item_prio', 'storage_cost', 'mhd']
print(masterdata.columns)
Index(['item_id', 'item_name', 'unit', 'order_qty', 'transport_qty',
       'min_stock', 'max_stock', 'item_prio', 'storage_cost', 'mhd'],
      dtype='object')
In [14]:
"""
Now lets turn all the int64 into float so that all is good for later plotings
"""
masterdata.item_id = masterdata.item_id.astype(float)
masterdata.order_qty = masterdata.order_qty.astype(float)
masterdata.transport_qty = masterdata.transport_qty.astype(float)
masterdata.min_stock = masterdata.min_stock.astype(float)
masterdata.max_stock = masterdata.max_stock.astype(float)
masterdata.mhd = masterdata.mhd.astype(float)

masterdata.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 102 entries, 0 to 101
Data columns (total 10 columns):
item_id          102 non-null float64
item_name        102 non-null object
unit             102 non-null object
order_qty        102 non-null float64
transport_qty    102 non-null float64
min_stock        102 non-null float64
max_stock        102 non-null float64
item_prio        102 non-null int64
storage_cost     102 non-null int64
mhd              102 non-null float64
dtypes: float64(6), int64(2), object(2)
memory usage: 8.0+ KB
In [15]:
"""
Obviously it is not very informative to plot all the features in one scatterplot,
but let's just see how it look like and then we will make separate plots
"""
sns.scatterplot(x = 'item_id', y = 'order_qty', data=masterdata)
sns.scatterplot(x = 'item_id', y = 'transport_qty', data=masterdata)
sns.scatterplot(x = 'item_id', y = 'min_stock', data=masterdata)
sns.scatterplot(x = 'item_id', y = 'max_stock', data=masterdata)
sns.scatterplot(x = 'item_id', y = 'item_prio', data=masterdata)
sns.scatterplot(x = 'item_id', y = 'storage_cost', data=masterdata)
sns.scatterplot(x = 'item_id', y = 'mhd', data=masterdata)
plt.show()
In [ ]:

Share this

Leave a Reply