# TELENOR CASE

# Business understanding

There is great potential in unlocking hidden insights in linked data through investigation of social structures as businesses are interested to gain insight into the relationships customers have with other people in their network. Identifying communities and influencers within these networks complements and enhances the currently existing BI solutions in risk management, fraud investigation, churn mitigation, new customers acquisition, and viral marketing. The current article is focused on solving two main problems for the Telenor case:

- Identifying leaders in the community
- Calculating degree of association with a given community

# Data understanding

The original dataset contains 1,452,747 observations. It contains the following variables

- Subscriber A (caller party ID)
- Subscriber B (called party ID)
- Label (Strength of connection – Low, Medium, or High)
- Real Event Flag (0 and 1).

# Data preparation

**Exclusions**

There are 176, 337 exclusions applied to the original dataset, for which the Real Event Flag is 0, as they are considered “fake” calls.

**Derivations**

For modelling purposes the Label variable was assigned numerical values as follows:

- High = 1.0
- Medium = 0.5
- Low = 0.25

```
def read_data():
"""
Reads the data, drops the Real_event flag 0 events and adds a new numerical column for the strength
where High = 1, Medium = 0.5, Low = 0.25
Then it groups by (from,to) and sums.
:return:
"""
df = pd.read_csv('/mnt/8C24EDC524EDB1FE/data/sna/Datathon_2018_Dataset_Hashbyte_New.csv',sep=';')
df = df[df['Real_Event_Flag'] > 0]
del df['Real_Event_Flag']
strength_remap = {'High': 1, 'Medium': 0.5, 'Low': 0.25}
df['strength'] = df['Label'].map(lambda x: strength_remap[x])
combined = df.groupby(by=['Subscriber_A',"Subsciber_B"]).sum()
return combined
```

# Modelling

To solve this task we employed the following methods:

- We calculated two graph ranking algorithms:
- PageRank
- Hubs & Authorities

- We calculated node embeddings for the graph by training a SkipGram model
- We trained a neural network using the embeddings from 2. to predict association of a given node with a given group.

## 1. Link Analysis/Graph Ranking

We employed the *PageRank* and *Hubs and Authorities *algorithms to find the important nodes in the graph. They both are iterative algorithms which start by assigning equal scores to all nodes in the graph. These scores are then iteratively redistributed based on the link structure of the graph. While *PageRank* provides a single metric for every node, the* Hubs & Authorities* algorithm provides two scores – the hub score and the authority score. The hub score is a measure of the quality of the outgoing links – do they point at good authorities – while the authority score is a measure of the quality of the incoming links. We developed both weighted and unweighted variants of the algorithms.

Among the top 2000 users from the PageRank algorithm, 1725 are leaders according to the golden dataset. For more information, please refer to the Implementation part before.

## 2. Graph Node Embeddings

We investigated an alternative approach inspired by the recent advances in NLP related to word embeddings. We use the SkipGram Model to train node embeddings We define a neighbourhood as the set of a given node and all of its neighbours. We supply these neighbourhoods as context to the SkipGram model. To make up for the word order assumption in the skipgram model we repeat and reshuffle longer neighbourhoods.

The intuition behind the idea is as follows: When applied to words, the skipgram model tries to predict co-occurring words. It is only natural that when applied to a graph it should try to predict nodes which co-occur in a lot of neighbourhoods. The following are the projections of the embeddings in 2-d space for each group:

These embeddings were then fed into a simple softmax classifier. We argue that they have some predictive power as they achieve a macro-f1 score of 0.6667 and the following confusion matrix:

[[111902 2 7 7 56 3 36] [ 7 12 8 0 0 0 1] [ 33 2 69 5 6 3 72] [ 17 0 3 77 2 0 11] [ 326 0 0 0 461 1 14] [ 36 0 0 0 2 85 17] [ 250 1 35 19 26 10 378]]

The scores output from this model can be interpreted as *association* to the given group and used for various research tasks.

## Interactive Visualization

We developed a simple, yet cool UI to explore the graph manually. The UI consists of a single input field where the user supplies the START_NODE of the graph. The system then displays the START_NODE and all of its‘ outgoing links. The user can then proceed to click on the neighbouring nodes to expand the graph in that direction. Nodes from G5 are colored red. Check out the video below:

## Implementation

**Hubs and Authorities**

```
import collections
import operator
from time import time
from link import Link
from data import read_data
def normalize(values:dict):
normalization_value = sum(x**2 for x in values.values()) ** 0.5
return {key: value/ normalization_value for (key,value) in values.items()}
def main(data, fp, use_weights =False):
start = time()
seen = set()
outnodes = collections.defaultdict(lambda: set())
innodes = collections.defaultdict(lambda: set()) # only use for the stats at the end. Do not use it in the algorithm
for i, r in data.iterrows():
user_from = i[0]
user_to = i[1]
l = Link(user_from, user_to, r['strength'])
outnodes[user_from].add(l)
innodes[user_to].add(user_from)
seen.add(user_from)
seen.add(user_to)
print("Graph created")
oldhubs = collections.defaultdict(lambda: (len(seen) ** -0.5))
oldauthorities = collections.defaultdict(lambda: (len(seen) ** -0.5))
for i in range(100):
newhubs = collections.defaultdict(lambda: 0)
newauthorities = collections.defaultdict(lambda: 0)
for vertex in seen:
for link in outnodes[vertex]:
if use_weights:
newhubs[vertex] += oldauthorities[link.target] * min(link.strength, 1)
newauthorities[link.target] += oldhubs[vertex] * min(link.strength, 1)
else:
newhubs[vertex] += oldauthorities[link.target]
newauthorities[link.target] += oldhubs[vertex]
newhubs = normalize(newhubs)
newauthorities = normalize(newauthorities)
delta = sum([abs(newhubs[t] - oldhubs[t]) for t in newhubs])
print("Delta", delta)
oldhubs = newhubs
oldauthorities = newauthorities
auth = sorted(oldauthorities.items(), key=operator.itemgetter(1), reverse=True)
hubs = sorted(oldhubs.items(), key=operator.itemgetter(1), reverse=True)
with open('%s.hubs.txt' % fp, 'w') as out:
for u_id, score in hubs:
out.write("\t".join([str(round(score, 8)),
str(oldauthorities.get(u_id, 0)),
str(u_id),
'Out degree %i ' % len(outnodes[u_id]),
'In degree %i users' % len(innodes[u_id])]) + "\n")
with open('%s.auth.txt' % fp, 'w') as out:
for u_id, score in auth:
out.write("\t".join([str(round(score, 8)),
str(oldhubs.get(u_id,0)),
str(u_id),
'Responded to %i users' % len(outnodes[u_id]),
'Got responses from %i users' % len(innodes[u_id])]) + "\n")
print(fp + " Complete")
return time() - start
graphData = read_data()
main(graphData, '404-weighted', True)
```

### Page Rank

## PageRank¶

```
import collections
import operator
from data import read_data
from pandas import DataFrame
from link import Link
def getWeights(outnodes):
weights = {}
for k, links in outnodes.items():
normalization = sum(l.strength for l in links)
weights[k] = [l.strength/normalization for l in links]
return weights
def main(graphData: DataFrame, alpha=0.8, use_weights=False):
seen = set()
outnodes = collections.defaultdict(lambda: [])
for i, r in graphData.iterrows():
user_from = i[0]
user_to = i[1]
link = Link(user_from, user_to, r['strength'])
outnodes[user_from].append(link)
seen.add(user_from)
seen.add(user_to)
print("Graph created")
currentpageranks = collections.defaultdict(lambda: 1.0 / len(seen))
newpageranks = collections.defaultdict()
weights = getWeights(outnodes)
for iteration in range(100):
C = 0
print('seen', len(seen))
print('senders', len(outnodes.keys()))
print('sinks', len(seen - set(outnodes.keys())))
for a in (seen - set(outnodes.keys())):
C += currentpageranks[a]
print(C)
for r in seen:
newpageranks[r] = (1 - alpha + alpha * C) / len(seen)
for r, outgoing in outnodes.items():
current_weights = weights[r]
for i, link in enumerate(outgoing):
weight = 1/ len(outgoing)
if use_weights:
weight = current_weights[i]
newpageranks[link.target] += alpha * currentpageranks[r] * weight
delta = sum(abs(currentpageranks[x]- newpageranks[x]) for x in newpageranks)
print("Delta", delta)
currentpageranks = newpageranks
newpageranks = collections.defaultdict()
return currentpageranks
graphData = read_data()
results = main(graphData, use_weights=True)
pageranks = sorted(results.items(), key=operator.itemgetter(1), reverse=True)
with open('pr-weighted.txt', 'w') as out:
for pagerank in pageranks:
out.write(str(round(pagerank[1], 8)) + " " + str(pagerank[0]))
out.write('\n')
```

## One thought on “404 Telenor Social Media Not Found”

Супер сте, мачкайте!