Datathon Sofia Air Solution – Air station measurement bias correction using Pearson correlation coefficient

Posted 5 CommentsPosted in Datathons Solutions

This article aims to improve the estimation of the measured PM10 pollutants. In Sofia, there are several air pollution measurement stations. They measure PM10 particles, which are particles found in the air with a diameter between 2.5 and 10 micrometers.

The measurement stations fall into two categories, official stations and citizen stations. The official stations provide reliable measurements, they are better monitored and documented. The down-side is that they are only 5 and they are all concentrated in a single region. The citizen stations represent devices mounted on people homes or properties which measure PM10 particles. There is a whole network of such devices. They are many in number and provide a good coverage of the city. The problem with those measurements is that they are biased because of many local factors. Therefore the measurements form the citizen stations are not as reliable as those from the official stations, but on the up-side they are many in numbers.

In this article we define a method to reduce the bias of the measurements from the citizen stations.

Courses for introduction to data science

Posted Leave a commentPosted in Learn

  Data Science Tools Note: This is an updated version of the old course. If you were enrolled in the older version (before September 18   SQL and Relational Databases 101 About This Course Data is one of the most critical assets of any business. Data needs a database to store and process data quickly. […]

Hack News Datathon | Contribute to the Cause against News Propaganda

Posted Leave a commentPosted in News, Uncategorised

The Community Behind the Cause – Data Science Society Data Science Society, a data-driven global community, has a 4-year track of putting data science into good use. Apart from data hackathons with real cases tackling business problems, we are solving social issues with data such as air pollution prediction, identifying manipulation in the news and […]

What is Propaganda?

Posted Leave a commentPosted in News

The Institute for Propaganda Analysis in 1938 defined propaganda as: “The expression of an opinion or an action by individuals or groups deliberately designed to influence the opinions or the actions of other individuals or groups with reference to predetermined ends”. – Institute for Propaganda Analysis The point of view, highlights, and storytelling expressed in […]

Air Sofia Pollution Case

Posted 7 CommentsPosted in Datathons Solutions

Script in R below:   library(stringr) #Step 1 ———————- rm(list=ls()) dd <- read.csv(“C:\\Users\\estoyanova\\OneDrive – VMware, Inc\\ES\\UNI\\master BA\\Boriana-Monthly challenge\\Air Tube\\data_bg_2017.csv”, header = TRUE, sep = “,”, na.strings = c(“”,” “, “NA”, “#NA”), stringsAsFactors = FALSE) topo <- read.csv(“C:\\Users\\estoyanova\\OneDrive – VMware, Inc\\ES\\UNI\\master BA\\Boriana-Monthly challenge\\TOPO-DATA\\sofia_topo.csv”, header = TRUE, sep = “,”, na.strings = c(“”,” “, “NA”, “#NA”), stringsAsFactors […]

Monthly Challenge – Sofia Air – Solution – [iseveryonehigh]

Posted 8 CommentsPosted in Prediction systems

I have just begun my machine learning course from Andrew Ng at Coursera so I thought that this challenge would be a good test of my learnings. I apologise for the delay for article writing as I was not sure if I should have taken this challenge or not since the dataset seemed difficult to […]

Monthly Challenge – Sofia Air – Solution – Kung Fu Panda

Posted 12 CommentsPosted in Prediction systems

1. Business Understanding The air quality in Sofia, Bulgaria, has been a problem for some time already. The population of the city is constantly increasing and this brings more traffic on the streets. The car ownership in Sofia is among the highest in Europe with around 600 cars per 1000 citizens. Another huge issue in […]