Some very basic initial code in R – understanding and exploring the data
# Import data—-
setwd(“C:\\Users\\Veni\\Documents\\FEBA\\Monthly Challenge May 2019 Ontotext\\dt18-ontotext-simple\\”)
rm(list=ls())
library(sentimentr)
dd=read.csv(“dt18-ontotext-simple.csv”, stringsAsFactors = F)
# Looking at the data
names(dd) # “org” “names” “types” “descriptions” “locations” “categories” “industries”
rapply(dd,class) # “character”
# Checking for nas
ddclass=data.frame(varnames= names(dd),
varclass=sapply(dd, class))
ddclass$nas=colSums(is.na(dd)) # no nas
unique(ddclass$varclass)
# Checking for uniqueness
unique(dd$industries) # 1311 entries
levels.default(dd$industries)
mm = grep(“http://dbpedia.org/resource/”, dd$org) # All items come from DBpedia and column can be ignored
unique(dd$org) # 277419 entries
unique(dd$names) # 276545 entries
unique(dd$types) # 68 entries
unique(dd$descriptions) # 264869 entries
unique(dd$locations) # 32953 entries
unique(dd$categories) # 242936 entries
# Correct for punctuation, capital letters, etc.—-
tx=dd$industries
tx=tolower(tx) # convert all upper cases to lower cases
tx=gsub(“_”,” “,tx) # cleans _
dd$tx=tx
#
5 thoughts on “Monthly Challenge – Ontotext case – Solution – ABCs”
Hi Venpi, great you started working on the dataset! 🙂
Did you know that you can import your code as a Jupiter notebook, it will be much more readable?
P.S.
Here is how you can do this: https://www.youtube.com/watch?v=jd4M04O-uL8
The first week of the challenge is explorational. However, we encourage you to give feedback to those who published their first findings.
________
Your assignments to peer review (and give feedback below the corresponding articles) for week 1 of the Monthly challenge are the following teams:
https://www.datasciencesociety.net/monthly-challenge-ontotext-case-solution-door/
https://www.datasciencesociety.net/monthly-challenge-ontotext-case-solution-your-unique-team-name/
Hi Venpi, please update your article
Your assignments to peer review (and give feedback below the corresponding articles) for week 2 of the Monthly challenge are the following teams:
https://www.datasciencesociety.net/monthly-challenge-ontotext-case-solution-outliers/ https://www.datasciencesociety.net/monthly-challenge-ontotext-case-solution-team-epistemi/
Hi Venpi, please update your article
Your assignments to peer review (and give feedback below the corresponding articles) for week 3 of the Monthly challenge are the following teams:
https://www.datasciencesociety.net/monthly-challenge-ontotext-case-solution-kung-fu-panda/
https://www.datasciencesociety.net/monthly-challenge-ontotext-case-solution-door/
Hi Venpi,
It’s the final week of the challenge – you can continue investigating the case even after the challenge but it will be cool if you update your article and review others’.
If you are stuck in the challenge, you can check the instructions:
https://www.datasciencesociety.net/text-mining-data-science-monthly-challenge/