Prediction systems

Monthly Challenge – Sofia Air – Solution – [iseveryonehigh]


I have just begun my machine learning course from Andrew Ng at Coursera so I thought that this challenge would be a good test of my learnings. I apologise for the delay for article writing as I was not sure if I should have taken this challenge or not since the dataset seemed difficult to understand. After seeing a few articles around, I think I got an idea of what to do for my first step.

Week 1:

Here are things that I did for week 1 :

I imported the dataset into jupyter notebook and first thing I did was to join the the two datasets.

Then I grouped them by their geohashes and separated them into various dictionaries. Each with key as the geohash and value as the various columns of that geohash.

Then I removed the geohash column from the dataframe inside those values of the dictionary

After that, I used ‘ffill’ which is known as the forward fill for replacing 0 values from temperature, pressure and humidity. I don’t think so I should have replaced the 0 values from the temperature column but most temperature value seemed more than 0 degrees at a glance. I will change that in the future (I’m traveling now so I don’t have access to jupyter notebook)

I did this after grouping geohashes because I don’t want different geohashes’ values to get mixed up.

Now comes the hardest part which took me a lot of time and hair pulling. The visualisation.

I was using spyder all this time because it has a fantastic feature called ‘variable explorer’ which I’m a huge fan of. I wanted to visualise these data as heat map. So I started with generating a KML file. It failed miserably, then I moved on to generating a GeoJSON. That too failed horribly. Last  night I was randomly searching stuff I stumbled upon a library called ‘Folium’. Lo-behold, all my problem solved !

But it required Jupyter Notebook, so I needed some time to make the switch :/

So then, I take the average of all the values in P1, P2 etc. And map them to each geohash.

That’s all for week 1. Next I’ll look into Linear regression !


Share this

8 thoughts on “Monthly Challenge – Sofia Air – Solution – [iseveryonehigh]

  1. 0

    I think this looks like a good start!

    I’m not sure (and admittedly my article doesn’t have much of this either), but I think that you may want to include more background about the project goals. The instructions ( also had some more suggestions about data cleaning that you may want to implement – things like checking for missing values and removing stations which weren’t measured in 2018.

    Also, while taking care of the 0 values seems like a good idea, I’m curious what you replaced them with? I’m not familiar with ffill. Did you assign mean values for each of the “missing” data points?

    I really like your heatmaps, and think that they are useful for visualizing the data before you really dive into using it.

    Overall a good start!

Leave a Reply