NFT Datathon 2022
Mentor(Alexandar Efremov)
Team(Daniel Pavlov, Martin Nenov, Aleksandar Svinarov)
Technology we use:
-PyCharm
-CoLAb
What was our approach:
Our approach with the NFT sales dataset and the NFT traits type dataset:
NFT traits dataset:
we made a function to generate a new trait dataset containing only the rarity score for each trait on each NFT. We can later use this data to train our model
NFT sales dataset:
We wanted to add more features to our sales dataset so we added a ‘time_diff’ feature which gives information about how much time has elapsed before the next transaction takes place.
we added a ‘price_diff’ feature that gives us information abut the difference between each transaction.
Filtering of sales dataset:
we made a function to filter and remove 3 consecutive transactions with a total sum of ‘time_diff’ more or equal to 24 hours which we think would be bad for the training model.
We removed transactions with zero loss/profit.
the python file that includes our functions for filtering and generating new datasets:
Data Prep
In our data preparation we will consider some data from the filtered and improved NFT sales dataset and the newly generated NFT rarity score dataset.
what data we decided to exclude from the NFT sales dataset: hash, from and to address currency for ETH and amount because we will use amountUSD.
we will make use of the from and to addresses by using the newly added feature ‘time_diff’.
features will will use: timestamp, token_id, gas_price,block_number, amountUSD,time_diff,price_diff, and( rarity score for each trait)
We merge the data from the rarity score dataset and the new sales dataset into a new dataset that will feed into our models.
What was our approach with training and testing models:
the dataset will be using:
First we use a MinMaxScaler to scale our data.
we checked the correlation between our features and it does not look good.
we have a function so we can test and evaluate the performance of each model and decide what is the best approach:
What we can improve:
4 thoughts on “PMI”
A few questions, I think is good to address in the article and presentation:
Rarity: Can you include your approach to the rarity score?
Features: Where do the values for ‘Ear’, ‘Hair’ etc. come from?
Results: can you do some sort of visual comparison between test prediction and ground truth? Also, how would you approach the confidence range of your predictions? How do your prediction’s confidences correlate with how good the prediction is?
Huge thumbs up! I really liked the systematic approach in choosing the “right” model.
Further suggestions: you might want to explore how different test/train splits affect your model (very similar idea to CV, but in a sense to test the actual test set which ideally should be curated.
It is good that you tried different algorithms, it is however not clear how was the data split in train / test, based on the results it looks like there is a data leakage between train – test sets.
– business case and data understanding – 5
– data exploration – 5
– methods – 4
– rarity score – 4.5
– image handling – 0
The data cleaning is important and it is good that you focused in that. Also the generation of the additional derived variables based on the time stamp and
And also I like that you trained different models.
Finally, the very high R2 shows that there is something wrong. 🙂
Also as I understood you predict the price of all nft-s. What about price forecast of single nft?