Classification systems

Monthly Challenge – Ontotext case – Solution – ABCs

Understanding and Exploring the Data

0
votes

Some very basic initial code in R – understanding and exploring the data

# Import data—-

setwd(“C:\\Users\\Veni\\Documents\\FEBA\\Monthly Challenge May 2019 Ontotext\\dt18-ontotext-simple\\”)
rm(list=ls())

library(sentimentr)

dd=read.csv(“dt18-ontotext-simple.csv”, stringsAsFactors = F)

# Looking at the data
names(dd) # “org” “names” “types” “descriptions” “locations” “categories” “industries”
rapply(dd,class) # “character”

# Checking for nas
ddclass=data.frame(varnames= names(dd),
varclass=sapply(dd, class))

ddclass$nas=colSums(is.na(dd))  # no nas
unique(ddclass$varclass)

# Checking for uniqueness
unique(dd$industries) # 1311 entries
levels.default(dd$industries)

mm = grep(“http://dbpedia.org/resource/”, dd$org) # All items come from DBpedia and column can be ignored

unique(dd$org) # 277419 entries
unique(dd$names) # 276545 entries
unique(dd$types) # 68 entries
unique(dd$descriptions) # 264869 entries
unique(dd$locations) # 32953 entries
unique(dd$categories) # 242936 entries

# Correct for punctuation, capital letters, etc.—-
tx=dd$industries
tx=tolower(tx) # convert all upper cases to lower cases
tx=gsub(“_”,” “,tx) # cleans _
dd$tx=tx

#

Share this

5 thoughts on “Monthly Challenge – Ontotext case – Solution – ABCs

  1. 0
    votes

    The first week of the challenge is explorational. However, we encourage you to give feedback to those who published their first findings.
    ________
    Your assignments to peer review (and give feedback below the corresponding articles) for week 1 of the Monthly challenge are the following teams:
    https://www.datasciencesociety.net/monthly-challenge-ontotext-case-solution-door/
    https://www.datasciencesociety.net/monthly-challenge-ontotext-case-solution-your-unique-team-name/

Leave a Reply