Intro
NFT is a term that becomes to be popular last couple of years, therefore we wrote an article to help you dive deeper, also there is a video made on the same topic which is in Bulgarian.
Data Use Case Introduction
Nexo, a leading crypto financing institution, was the first institution to offer NFT-backed lending services. This market segment is different from Nexo’s standard service, where the Company accepts highly liquid collaterals (cryptocurrencies) that are traded around the clock in high volumes across multiple venues. And by so, the Company should be much more careful assessing the value of each NFT asset, understand its liquidity in order to evaluate the maximum downside risk the Company is willing to accept and be capable of making liquidations if needed.
How to assess the value of a NFT? A case considering NFTs specific characteristics such as rarity and illiquidity
The market price of an unique NFT is not easily observable. Once the public sale of the project is over, the new owner of the NFT can decide whether and at what price the asset should be traded. Factors affecting the price include:
- Rarity;
- Social media exposure of the collection/the particular asset that is driven by 1) the team behind collection; 2) large collectors; 3) social media influencers;
- General strength of the crypto/capital markets
a) Digging into rarity
Each NFT within a collection is unique, and as we explained above, rarity can play an important role in assessing the value of each NFT piece. The first step is to understand the rarity of a specific set and compare it to the other items within the collection
Expected Outcome:
The expected output of the first subtask is to assess the uniqueness of a specific asset relatively to the whole collection and return a rarity index (or percentage) and on top of that evaluate how the rarity affect the liquidity of that specific group of assets within a rarity rank.
b) Analyzing liquidity
The second component the Company should assess is the liquidity conditions of a particular NFT and the collection it belongs to. We mentioned earlier that in this new collectibles market, matching buyers with sellers may take time. In assessing the liquidity of an NFT or a particular subsection of the collection we may consider the following:
- Recent transactional volume within the collection. Assess the general liquidity in NFTs marketplaces such as OpenSea by understanding how easy and fast it is to trade a specific collection. For the purpose of this Datathon, we are providing a full data set of 4 collections.
- Correlating the rarity index (calculated in the previous section) with the transactional data.
- Asset concentration & ownership – How many people / addresses are holding the NFTs within the collection? Are there too many NFTs from the collection held by a limited number of economic actors that may have a strong influence on the price? (Each address can be considered as a individual identity)
- Historical volatility of the collection and assets within it.
- Status & Strength of the crypto market: If the general crypto market (bitcoin and ethereum movements mostly) is experiencing positive returns and gaining value or not
- Wash Trading – is the volume artificially being created by some collectors buying and selling to themselves using different addresses to create fake volume and hype?
Expected Outcome:
The expected output in this section is to perform an analysis of transactional history of the asset, the collection it belongs to, considering also general market dynamics and returns as output a liquidity percentage / liquidity rank of the particular asset & collection. A nice-to-have would be a confidence interval of the asset’s price within the next 30 days.
c) How to value a NFT
The information and results from section a) and b) have provided a better understanding about some of the most important value drivers for NFT collections. In this section we will combine this information with general crypto market conditions and sentiment as well as other variables to be able to develop a pricing algorithm
Expected Outcome:
The expected output in this sections is to prepare a relative valuation model that, consider different inputs, such as i) NFT ID & Collection, ii) Rarity (from section a) , iii) liquidity characteristics (from section b), iv) general crypto market conditions, and other variables.
d) Modeling the given transactional set with the Twitter data
In the past months, the NFT sector has also caught the attention of famous people (singers, actors, soccer players, etc) that find it cool collecting NFTs and showing them as profile pictures in this so-called “crypto twitter”. To say some names, Gary Vee (VeeFriends), Eminem (BAYC), Gwyneth Paltrow (BAYC & FlowerGirls), Neymar (BAYC), Thalia (Robotos) & Jordan Belfort (CryptoPunks) has joined the space buying different collections
Considering that the price variation of the NFT can be impacted with internal factors (team behind it executing their roadmap and building its community) and external factors (marketing / twitter raids). We want to address the possibility of price variation caused by these so called influencers / recognized people in the crypto space across platforms.
Using Twitter data to evaluate sentiment towards a project.
Expected Outcome:
The expected output in this section is to analyze the relationship between the influencers, the tweets they write and the performance of the specific NFT collections mentioned. It will help to understand the impact each one of these tweets have. Considering this, it’s expected that – if some relationship exists- a premium / multiplier directly related to a certain influencer (for say something the “Gary Vee Multiplier”).
e) How can we identify and assess the owner and provenance of a specific NFT based on the profile picture of a Twitter Account?
Similarly to traditional art pieces, the price of an NFT could be largely determined by an asset’s provenance (the history of ownership of a specific asset) and its visibility (for social recognition purposes).
Twitter already has established a feature that allows NFT owners to authenticate NFTs displayed in their profile photos within its Twitter Blue paid subscription, however in order to simplify and make more accessible the discovery process of NFT’s heritage, we are tasked to build a search engine for NFTs (think of this as the NFT searching feature of Google Image Searching function)
With this engine, we need to input a Twitter account handle (say @ish). And using the search engine, compare FroyoFren’s profile picture with an existing NFT database and find the specific collection and token ID.
Next, it should map the token ID with the owner’s Ethereum address, the time lapsed since the token has been transferred into the address in an arm-length transaction (e.g. one involving token transfers in return for the profile picture).Finally, it should look up the marketplace databases for ongoing auctions of similar assets (based on rarity/common traits) and return purchasing options.
Expected Outcome:
The expected outcome here is a searching algorithm that recognizes a specific profile picture (of a Twitter account) and maps it with an NFT database and returns as output the token’s holder’s address. (A predefined set of images urls will be provided)
On top, you may want to develop analytics tracking ownership, trading history and tools that could help us to identify actionable investment ideas within the dataset considered.
Data Set and Descriptions
Collection Traits (Asset Characteristics)
Each collection consists of a fixed number of NFTs with unique identifiers. Those IDs can be found in the token_id column. In addition, the collection can have numerous trait_type, possibly up to 10-20. These trait types are not standard between collections, so we will do our best to standardize them. Not all NFTs of a collection have all of the traits available in the collection. This contributes to the RARITY of the NFT. For example, TokenID 1 has attributes: eyes, mouth, fur, hat background – In addition to those traits TokenID 7 has an earring trait.
Collection transfers
- Each of the NFTs transfers is associated with an ethereum transaction. However, not all of them are sales of the NFT. Here you may have a simple transfer, a mint transaction or a sale.
- We have a blockNumber, and a hash, both of which could be easily used to query transactions on the blockchain explorer Etherscan.io.
- from and to represent the sender and receiver addresses, respectively, or the seller and buyer when the transaction is a sale.
- tokenID is the unique identifier of the NFT sold, within the collection, it should also be the key to joining the transfer and the traits data.
- gasUsed and cumulativeGasUsed are two transaction specifications which represent the cost of the transaction. In ethereum the cost of the transaction is not static but rather is determined by the busyness of the network through an auction. A more congested network means more expensive transactions and vice versa.
- value – a field which determines whether a transaction is a sale.
- Marketplace– if the transaction is a sale, in which platform the trade occurs
- currency – in which the user paid for the NFT
BTC/ETH Price data
- Data for the asset prices and volume. It can be used to model general crypto markets volatility and conditions.
- The extract is from daily candles data, but will be changed hourly in the future.
Twitter data
- In the NFT industry, the success of a given project is highly influenced by the sentiment of the community. This along with the hype surrounding the project are among the biggest drivers for the price increase of the collections. This makes being able to analyze the sentiment one of the most important tools when trying to predict the future moves in the market.
- A dataset containing all original tweets from the Twitter profiles of the collections themselves for the past year (NOTE: BAYC and MAYC have one profile as they are created by the same company). The dataset contains different engagement metrics such as likes, comments and retweets, which can be used to analyze the sentiment over time.
- The data will be used for sentiment analysis and general interest of the NFT community towards the projects
- username refers to the Twitter handle of the account
- comments, likes and retweets are the main engagement statistics for a tweet and each of them refers to the corresponding metric
- embedded_text is the main body of the tweet
Variables:
open– refers to the initial price at which the assets starts trading that particular day
high– refers to the highest price at which the assets starts trading that particular day
low– refers to the lowest price at which the assets starts trading that particular day
close– refers to the last price at which the assets starts trading that particular day
volume– refers to total amount of money traded that day
num_trades– refers to the total number of settlements that occur that day
exchange– refers to the platform / marketplace at which the trade occur, in this case Binance
symbol- refers to the specific crypto currency or assets at which information belongs to
Data descrption
“…_complete_dataset”
contract – contract for returned NFT
token_id – The number of the Token in the collection, unique to each Token
nft-url – uri representing the location of the NFT’s original metadata blob. This is a backup for you to parse when the metadata field is not automatically populated.
hint: to download the image you need to add to the url. Ex. if you are workind with azuki collection add “/0.png”
trait_type – traits/attributes/characteristics for each NFT asset.
value – traits/attributes/characteristics for each NFT asset.
“…_sales”
block_number – number of the block in which the transaction is recorded
timeStamp – the date and time in which the transaction is mined,UNIX timestamp
hash – the string representing the transaction hash to check the execution status
nonce – The nonce is the number of transactions sent from a given address.
blockHash – the string representing the hash of the transaction
from – theETH address of the sender of the Token
contractAddress – the contract address of the ERC-20 token
to – the ETH address which sent the token
tokenID – The number of the Token in the collection, unique to each Token
tokenName – the long name by which the token contract should be known
tokenSymbol – the symbol by which the token contract should be known
tokenDecimal – refers to how divisible a token can be, from 0 (not at all divisible) to 18 (pretty much continuous) and even higher if required. Technically speaking, the decimals value is the number of digits that come after the decimal place when displaying token values on-screen.
gas – Maximum gas allocated for the transaction/amount of gas supplied for this transaction
gasPrice – cost per unit of gas specified for this transaction in ether or gwei, depending on the decimal value
gasUsed – the amount of gas used for this specific transaction
currency – The cryptocurrency used to pay for the sale, either ETH or one of the stablecoins
value – The scalar value equal to the number of units (in Ethereum Wei) to be transferred to the message call’s recipient or, in the case of contract creation, as an endowment to the newly created contract.
valueUSD – value in USD
“hackathon_tweets”
userscreenname – the Twitter name of the profile
username – the Twitter handle of the profile
timestamp – date and time of the tweet
text – contains information about the Twitter name, handle and date of the tweet
embedded_text – the full text body of the tweet
emojis – the emojis used in the text body of the tweet
comments – the amount of comments on the tweet (if NaN, there were no comments)
likes – the amount of likes on the tweet
retweets – the amount of retweets on the tweet (similar to shares on Facebook)
tweet_url – the link to the original tweet
user – the name of the profile of the collection or the creator of the collection
THE DATASETS
MENTORS’ RECOMMENDATIONS
When looking for a solution I would answer the questions:
What is the business goal and what is the data science task?
What is the meaning of the data?
What model is appreciate?
What data preparation helps me?
How to assess the model accuracy and how to avoid overfitting?
Which model is the best keeping the time frame?
Ok, I found a solution, what’s next?…
And I have to keep in mind to document my steps – this is what the jury will see in the end
1: First I’d come up with a way to relate NFT market valuation to the ETH/BTC pair.
2: I’d think about how daily ETH/BTC volume relates to price action of NFT collections
3: From there I’d come up with a valuation for individual collections in the space
4: I’d think of the way rarity in a collection contributes to the price of each individual NFTs inside a collection, I’d want to know if individual traits or groups of traits contribute to valuation of a single NFT
Consider rarity as a category split within the collection, 1/4 most rare, 1/4 most common, mid tier, I.e.or other categorical splits, and how this compares to the liquidity of the individual asset.
5: Come up with a Risk Free rate which to compare asset prices . I’d think about inflation and cost of borrowing capital i.e treasure notes.
6: At this point we should have a volatility variable and rarity variable we can now see which if any single or collection of NFTs outperform the market, if at all, and come up with a single metric custom metric.
7: Find the confidence interval of the metric, and evaluate performance including slippage due to market conditions.
- Create easily quearable dataset;
-
- unify trait_types and values (‘clone_x’ might be difficult because of all the abbreviations)
- start simple: NFT with collection, dictionary with trait_type/value, some basic traiding info (initial price, final price, some simple metric of volatility, and mean delta time between transactions)
- Run some basic algorithms (like tf-idf on the traits), or just simple correlation between traits and price, if there is similarity between trait values, maybe substitute the trait with an embedding
- Look for ‘unique’ traits and see their relation ti price and trade amount
- Try to decouple the collection from the analysis: run the same algorithm/statistic/visualization on each collection
One thought on “NFT Datathon case”
A dataset with all of the original tweets from the collections’ Twitter profiles over the past year (NOTE: BAYC and MAYC have one quordle profile as they are created by the same company). Different engagement metrics, like likes, comments, and retweets, are in the dataset. These can be used to track how people feel over time.