Datathons Solutions

NFT Datathon 2022 Goal Diggers Team


I. Business Understanding

Irreplaceable tokens, better known as NFT, are the latest phenomenon in the world of cryptocurrencies, which is gaining wide popularity. Simply put, NFTs are transforming digital works of art and other collectibles into one-of-a-kind, verifiable assets that can be traded in the blockchain system. Irreplaceable tokens, or NFTs, are digital content particles associated with the blockchain, the digital database behind cryptocurrencies such as Bitcoin and Ethereum. Unlike NFTs, these assets are interchangeable, which means that they can be traded or exchanged for similar assets of the same dollar value. NFTs, on the other hand, are unique and non-interchangeable, which means that no two NFTs are the same. Anyone can create an NFT. All you need is a digital wallet, some Ethereum and a connection to the NFT market, where you can upload your content and turn it into a NFT or crypto art.

The market is characterized by the peculiarity of uniqueness – unlike our familiar currencies – USD, BGN, EUR, Bitcoin, etc. each NFT is one of a kind and has its own unique characteristics, which makes determining its price more difficult. Furthermore, it is precisely these circumstances that lead to a high price volatility – for example, under the influence of a trend in social networks, demand can rise dramatically and this can lead to speculation in the value of the asset.

II. Data Understanding and Data Exploration

We are provided with an extremely exciting and complex selection of data for the case. We have 3 data sets available:

  • Sales – containing information about NFT transactions for the period April 2021 – March 2022

Get-to-know the data: It contains 257,434 unique transactions recorded by block number & time stamp, divided into 8 collections. It is interesting to note that we have a unique number of senders (variable “from”) 73,795 and 94,964 recipients (variable “to”), which may indicate an increase in the number of sellers and the formation of larger merchants. We have 2 currencies available – ETH & USD, of which in the next stage we will take into account only the observations in ETH to ensure comparability of similar prices. In terms of prices, there are 25,495 unique observations in the amount column, and gasPrice 201,565, from which it can be concluded that some of the transactions take place on similar or similar amounts in ETH. The same contract address corresponds to one collection, a total of 8 unique contract addresses.

Collections: there is a different spelling of the names of the collections in the different data sets, which should be standardized. The date format in the timestamp column needs to be adjusted.

  • Traits – descriptive characteristics of NFT by collections

Get-to-know the data: There are 202,265 unique records distributed in 8 collections. The interesting thing here is that there are 24,212 unique Tokens, to which there are 55 types of different traits / characteristics, according to the different collection.

Looking at the names of Traits (trait_type), similar spellings are observed, which need to be grouped under common categories in order to reduce and facilitate the analysis. The maximum number of tokens (tokenID) in traits is less for the CloneX (18895 on 19127) & Cool Cats (9932 on 9940) collections compared to the Sales data set. This may be due to differences in sampling periods.

  • Tweets-comments from the social network with reactions

Get-to-know the data: The latest data set contains 3,081 tweets from the creators of the 8 collections.

The columns userscreenname & username contain the names of the collections, 8 unique values, but in col. user has 7 unique records, which suggests inconsistencies in the data and is not suitable for use in the analysis from now on. Here again, there are differences in the spelling of the collections that will need to be removed to merge the data sets.

III. Feature Engineering

    1. Unique transactions, used as parent transaction. We create a key that connects seller (from), buyer (to), collection (tokenName) to a specific time of the transaction. Based on that key, we obtain additional variables that represent a lead functions of a new key that targets recipient and sender: from-to & to-from. This is done in order to identify related transactions and self-transactions that create an false volume, even with a price (amount) greater than 0. In this way, 3 types of transactions are distinguished: self-transactions, related transactions and actual sales. Further, only actual sales transactions will be included in the analysis, with which we will define the price. Here is an example of artificially created price increase with self-transactions:
    2. Rarity Index: we create 2 types of weights to assess the rarity of NFT. We use the serial number of NFT in the collection (tokenID) and refer it to the total number of NFT in the collection that are known at the moment. The first way to evaluate weight is the following: 1 divided by the total number of NFTs in the collection. The second way is using the serial number of NTF per collection, divided by the total number of NFTs in that collection. Our goal is to compare whether the serial number and weight of the NFT affect the price and whether the size of the collection affects the formation of the price. Here is an example of exceptional outlier:
    3. Determining the tradable volume:
    4. Total sales volume variable: shows traded volume regardless type of transaction and whether traded price is 0.
    5. -Real sales volume: zero, self-transactions and related transactions are excluded as counted transactions. Here we use real sale transactions only.
    6. Additional analysis of traits data set to combine similar characteristics. We noticed that among different trait types there are similarities between them and decided to combine them in one category. Here are initial features:

After reallocating them in new categories:

  1. Following steps: Evaluating crypto market volatility using external data sources.

IV. Modelling and Evaluation

  1. Sentiment analysis of tweet data

We use several dictionaries to gain more knowledge of our tweet data:

-Afinn  – determines the weight of the words involved in the posts. The sum of all words is negative -1460, which suggests that the negative words in the posts are predominant (obscene words). We could indicate as the most positive words: breathtaking, hurrah, superb, thrilled, outstanding. The total amount of reactions: -1460

-Bing : it divides the words in the posts into positive and negative. There are 4,781 negative words in the posts and only 2005 positive ones.

-NRC : extracts different emotions from the posts. Here again, the negative words from the posts predominate, but other emotions such as anger, disgust, sadness can now be distinguished.


2. Linear regression and ARIMA models

Our attention is drained to two main models: linear regression and ARIMA models. Linear regression aims to help us to understand  the basic  concepts and dependancies of NFTs. LR is build for token_id 8319 of Meebits collection due to limitness of computational power.  The NFT choice is based of the number of transactions (more that 700 of which 278 transactions are real). Used explanatory  variables for the model are weight of the whole collection, weight of single unit, volume all, volume real, and price as a response variable. Our proposal is to add trait information per NFT to the model, because we expect high influence of NFT`s characteristics. Then, we could use score per influencer, obtained from tweet data set. Results for the used method are: Multiple R-squared: 0.6976, Adjusted R-squared: 0.692 with Residual standard error: 27.43 on 272 degrees of freedom.

V. Further investigation

Our suggestions for further investigations are:

  1. Using tweets information: extract score result per user, which score could be used for the main model.
  2. Based on traits information we could make analysis of given characteristics. Feature engineering is a must due to differences of NFT`s descriptions. Making new variables using tweet data will improve results for the model.
  3. Development of the main model

We recommend finalizing feature engineering from the previous steps. Then, building an linear regression model with presented variables for better understanding of our data. Then, using ARIMA model could predict future price taking into account trends, seasonality and could clean the variance of data.


Share this

9 thoughts on “NFT Datathon 2022 Goal Diggers Team

  1. 0

    In many cases the data prep is a key step and in this case the data engineering and data cleaning and the related analysis that you mane are very important steps.  
    In addition to that I appreciate your attempt to use sentiment analysis.
    And finally I like your well structured work :).

  2. 0

    Are you looking for a profitable investment where you can start with a little amount and earn a reasonable profit within a short period of time?. I never believed in any online investment because I was scared and never wanted to be cheated, until I saw a review about Mr Pablo Martinez. He’s a Forex/Crypto trading account manager who can help you manage your trading account with his trading strategies and winning signals. I started with an investment of $500 and earned a profit of $6,650 within 7 days. I now earn quite a lot on a weekly basis and I owe everything to Mr Pablo Martinez. Thank you Mr Pablo Martinez for turning my financial life around, and I will keep recommending your good works. If you want to invest in Stock, Binary options and Forex/Crypto trading, kindly contact Mr Pablo Martinez and you’ll be glad you did. There are no hidden charges.

    Contact Mr Pablo Martinez through

    E-mail: [email protected]

    WhatsApp: +44 7520 636249


  3. 0


    ( MorrisGray830 At gmail Dot Com, is the man for the job )
    This man is dedicated to his work and you can trust him more than yourself. I contacted him a year and a half Ago and he didn’t succeed. when i got ripped of $491,000 worth of bitcoins by scammers, I tried several recovery programs with no success too. I kept on. And now after so much time Mr Morris Gray contacted me with a success, and the reward he took was small because obviously he is doing this because he wants to help idiots like me who fell for crypto scam, and love his job. Of course he could have taken all the coins and not tell me , I was not syncing this wallet for a year, but he didn’t.
    He is the MAN guys , He is! If you have been a victim of crypto scam before you can trust Morris Gray 10000000%. I thought there were no such good genuine guys anymore on earth, but Mr Morris Gray brought my trust to humanity again. GOD bless you sir…you can reach him via ( MORRIS GRAY 830 at Gmaill dot com ) or Whatsapp +1 (607)698-0239..

  4. 0

    It has been really hard losing a lot of bitcoin to these binary options and cryptocurrency trading companies. I lost $40k and was really looking for a way to recover my lost funds. I met they first recovery expert, he tried his best and told me to forget about they bitcoin because they scammers are using a strong GPR to guard their bitcoin ID. Fortunately, I found a Recovery Expert that made sure I got back everything I lost. If you are in a similar situation and have lost a lot of money, Contact [CryptoSwiftRecovery@gmail. com] His honest, fast and reliable.

  5. 0

    Do you know anyone that was a victim of crypto trading scam? or have you ever been a victim? Then worry no more because i was ones a victim, i read about a certain recovery agent and contacted them via CryptoSwiftRecovery@gmail. com, surprisingly, i got back 90% of my lost bitcoing and Dogecoin

  6. 0

    Hello everyone, I’m Maria Bauer from Brisbane, Australia.
    I wish to share this information with the public about Wardrivers Cyber. A few months back, I was seeking an investment opportunity but I ended up being scammed of $137,000 in Bitcoins.
    I was depressed and disappointed in myself until an article popped up about Wardrivers cyber, a cryptocurrency recovery company and to my surprise, they were able to recover almost all the funds I had lost. I’m sharing this out of excitement as I know there are many people out there who must have had an encounter with these swindlers.
    You can reach them on (Wardriverscyberservice@techie .com)

  7. 0

    Thank you Hackwest At Writeme Dot Com for helping me recover my lost $79k worth of BTC. I was deceived into putting my hard earned money into what seems to be real until I was asked to pay more money to be able to pull out my investment as well as profit. So I will always recommend their services to anyone who is in a similar situation for help. Contact West Today and thank me later. You can whatsapp +1 424 307 2638.

  8. 0

    After giving up on my hacked crypto account over 1 month ago, I came across Jimfundsrecovery AT consultant c0m recommendation about crypto wallet recovery on a blog, I slide in and file a complaint to them about how my account was hacked by fake investment platform, I sent my hacked wallet address to them as requested. Well, let me cut this short. Within a few days of sending the required info, I gained access to my account with my coins intact..

Leave a Reply