
For how many years have you been experimenting with data? | 5 |
---|
Popular articles by zenpanik
ACADEMIA DATATHON CASE: THE A.I. CRYPTO TRADER
Datathon Ontotext Mentors’ Guidelines – Text Mining Classification
The SAP Case using KNIME and Multiple Linear Regression Method
THE A.I. CRYPTO TRADER: cryptomonkeys
Case NetInfo/Vesti.bg article recommendation — Team Army of Ones and Zeroes — Datathon 2020
Tiny smart data modelled with a not-so-tiny smart model – the Case of SAP
Antelope SAP
Datathon Kaufland Mentors’ Guidelines – On Predictive Maintenance
Datathon Sofia Air Mentors’ Guidelines – On IOT Prediction
Datathon Telenor Mentors’ Guidelines – On TelCo predictions
Datathon NSI Mentors’ Guidelines – Economic Time Series Prediction
Popular comments by zenpanik
Datathon-HackNews-Solutions-Data Titans
Great – lots of content and interesting charts and numbers.
ShopUp Datathon2020 – Article recommender case
Nice idea to use BERT – I would be happy to see what are results from Russian model on Bulgarian text
Please share results from RNN model
Datathon 2020 – NetInfo Article Recommender – Newbies
1. Data Prep
1.1. What is the result from describe method in Pandas?
1.2. Since Pandas is based on dictonaries and numpy arrays (really fast and efficient). Not sure why the team does not use Pandas industry’s standard methods
and spent time on writing code to work with dictonaries.
Would you mind to elaborate more on this please?
For example:“`
a = train[‘pagePath’]
a = a.to_list()
a = set(a)
a = list(a)“`
This is just list(train[‘pagePath’].unique())
1.2. Formation of dictonary could be done with applying a function to the URL – You used column “PagePath” but said that the columns are:
[“User ID”, “Time Stamp”, “URL”, “Page Title” and “Page Views”]
I do not see how you derived this column list
1.3. I do not see sorting by timestamp in your code.
1.4. I do not see soring by visitor in the code – so all loops you are doing in step MODEL are not correct since:
while(visitor[j] == visitor[j+1])
depends on visitor (probably User ID)
1.5. KNN usually stands for K-Nearest Neighbours … What is it in your case?
1.6. Please show a snippet of your data … this code is puzzeling:
What are knn_data[1] and knn_data[0]?
2. MODEL
What are you predicting exacltly? Why do you use regression model?
3. Please include charts and samples from your data.
Cryptocurrency Prediction by Kautilya
6. Would you bet your own money on your predictions? If so how much?
Cryptocurrency Prediction by Kautilya
1. You may want to include some evaluation metrics for your models both on train & test sets.
2. On the data prep part – it is not the best solution to just remove rows where you see missing values because it is time-series data and could seriously bias your next steps.
3. Assumption you have made about the “large number of missing values” is probably poor. Do you have any data/metric you used to prove it?
4. You may want to include more detailed explanation why the data is not continuous (here is a link on discrete and continuous data https://www.mathsisfun.com/data/data-discrete-continuous.html)
5. How you would rank your model? What are the metrics you used?