SNATeam solutions

404 Telenor Social Media Not Found

Social networks are characterized by the links between the nodes. We employed six different link analysis algorithms to rank the nodes in the network by their importance. For the task of leader detection, the best link analysis algorithm proved to be the vanilla PageRank. Out of the top 2000 nodes, 1725 are leaders, achieving a precision of 0.8625 and a recall ot 0.8792.
We also explored an alternative solution based on embeddings. We trained a Skipgram model where we set the context of a given node to its neighbourhood. To avoid the assumption for word order made by the model we repeated and reshuffled longer neighbourhoods. These embeddings were then used to train a softmax classifier, achieving a macro-f1 score of 0.6667.
We futher provide a visualization tool which can be used to explore the graph manually.

4
votes

TELENOR CASE

Business understanding

There is great potential in unlocking hidden insights in linked data through investigation of social structures as businesses are interested to gain insight into the relationships customers have with other people in their network. Identifying communities and influencers within these networks complements and enhances the currently existing BI solutions in risk management, fraud investigation, churn mitigation, new customers acquisition, and viral marketing. The current article is focused on solving two main problems for the Telenor case:

  1. Identifying leaders in the community
  2. Calculating degree of association with a given community

Data understanding

The original dataset contains 1,452,747 observations. It contains the following variables

  • Subscriber A (caller party ID)
  • Subscriber B (called party ID)
  • Label (Strength of connection – Low, Medium, or High)
  • Real Event Flag (0 and 1).

Data preparation

Exclusions

There are 176, 337 exclusions applied to the original dataset, for which the Real Event Flag is 0, as they are considered “fake” calls.

Derivations

For modelling purposes the Label variable was assigned numerical values as follows:

  • High = 1.0
  • Medium = 0.5
  • Low = 0.25

Untitled1

Modelling

To solve this task we employed the following methods:

  1. We calculated two graph ranking algorithms:
    1. PageRank
    2. Hubs & Authorities
  2. We calculated node embeddings for the graph by training a SkipGram model
  3. We trained a neural network using the embeddings from 2. to predict association of a given node with a given group.

1. Link Analysis/Graph Ranking

We employed the PageRank and Hubs and Authorities algorithms to find the important nodes in the graph. They both are iterative algorithms which start by assigning equal scores to all nodes in the graph. These scores are then iteratively redistributed based on the link structure of the graph. While PageRank provides a single metric for every node, the Hubs & Authorities algorithm provides two scores – the hub score and the authority score. The hub score is a measure of the quality of the outgoing links  – do they point at good authorities – while the authority score is a measure of the quality of the incoming links. We developed both weighted and unweighted variants of the algorithms.

 

Among the top 2000 users from the PageRank algorithm, 1725 are leaders according to the golden dataset. For more information, please refer to the Implementation part before.

2. Graph Node Embeddings

We investigated an alternative approach inspired by the recent advances in NLP related to word embeddings. We use the SkipGram Model to train node embeddings  We define a neighbourhood as the set of a given node and all of its neighbours. We supply these neighbourhoods as context to the SkipGram model. To make up for the word order assumption in the skipgram model we repeat and reshuffle longer neighbourhoods.

The intuition behind the idea is as follows: When applied to words, the skipgram model tries to predict co-occurring words. It is only natural that when applied to a graph it should try to predict nodes which co-occur in a lot of neighbourhoods. The following are the projections of the embeddings in 2-d space for each group:

These embeddings were then fed into a simple softmax classifier. We argue that they have some predictive power as they achieve a macro-f1 score of 0.6667 and the following confusion matrix:

 [[111902 2 7 7 56 3 36]
 [ 7 12 8 0 0 0 1]
 [ 33 2 69 5 6 3 72]
 [ 17 0 3 77 2 0 11]
 [ 326 0 0 0 461 1 14]
 [ 36 0 0 0 2 85 17]
 [ 250 1 35 19 26 10 378]]

The scores output from this model can be interpreted as *association* to the given group and used for various research tasks.

Interactive Visualization

We developed a simple, yet cool UI to explore the graph manually. The UI consists of a single input field where the user supplies the START_NODE of the graph. The system then displays the START_NODE and all of its‘ outgoing links. The user can then proceed to click on the neighbouring nodes to expand the graph in that direction. Nodes from G5 are colored red. Check out the video below:

Implementation

Hubs and Authorities

Hubs+and+Authorities

Page Rank

PageRank

Share this

One thought on “404 Telenor Social Media Not Found

Leave a Reply