For how many years have you been experimenting with data? | 5 |
---|

## Popular articles by zenpanik

### ACADEMIA DATATHON CASE: THE A.I. CRYPTO TRADER

### Datathon Ontotext Mentors’ Guidelines – Text Mining Classification

### The SAP Case using KNIME and Multiple Linear Regression Method

### THE A.I. CRYPTO TRADER: cryptomonkeys

### Tiny smart data modelled with a not-so-tiny smart model – the Case of SAP

### Antelope SAP

### Case NetInfo/Vesti.bg article recommendation — Team Army of Ones and Zeroes — Datathon 2020

### Critical Outliers – VMware Case

### Datathon Kaufland Mentors’ Guidelines – On Predictive Maintenance

### Datathon Sofia Air Mentors’ Guidelines – On IOT Prediction

### Datathon Telenor Mentors’ Guidelines – On TelCo predictions

## Popular comments by zenpanik

### Datathon-HackNews-Solutions-Data Titans

Great – lots of content and interesting charts and numbers.

### ShopUp Datathon2020 – Article recommender case

Nice idea to use BERT – I would be happy to see what are results from Russian model on Bulgarian text

Please share results from RNN model

### Datathon 2020 – NetInfo Article Recommender – Newbies

1. Data Prep

1.1. What is the result from describe method in Pandas?

1.2. Since Pandas is based on dictonaries and numpy arrays (really fast and efficient). Not sure why the team does not use Pandas industry’s standard methods

and spent time on writing code to work with dictonaries.

Would you mind to elaborate more on this please?

For example:“`

a = train[‘pagePath’]

a = a.to_list()

a = set(a)

a = list(a)“`

This is just list(train[‘pagePath’].unique())

1.2. Formation of dictonary could be done with applying a function to the URL – You used column “PagePath” but said that the columns are:

[“User ID”, “Time Stamp”, “URL”, “Page Title” and “Page Views”]

I do not see how you derived this column list

1.3. I do not see sorting by timestamp in your code.

1.4. I do not see soring by visitor in the code – so all loops you are doing in step MODEL are not correct since:

while(visitor[j] == visitor[j+1])

depends on visitor (probably User ID)

1.5. KNN usually stands for K-Nearest Neighbours … What is it in your case?

1.6. Please show a snippet of your data … this code is puzzeling:

What are knn_data[1] and knn_data[0]?

2. MODEL

What are you predicting exacltly? Why do you use regression model?

3. Please include charts and samples from your data.

### Cryptocurrency Prediction by Kautilya

6. Would you bet your own money on your predictions? If so how much?

### Cryptocurrency Prediction by Kautilya

1. You may want to include some evaluation metrics for your models both on train & test sets.

2. On the data prep part – it is not the best solution to just remove rows where you see missing values because it is time-series data and could seriously bias your next steps.

3. Assumption you have made about the “large number of missing values” is probably poor. Do you have any data/metric you used to prove it?

4. You may want to include more detailed explanation why the data is not continuous (here is a link on discrete and continuous data https://www.mathsisfun.com/data/data-discrete-continuous.html)

5. How you would rank your model? What are the metrics you used?