Learn

Data Visualization with Python.

1
votes

It is said that a picture is equal to 1000 words. This article will focus on data visualization with Python and will introduce the most popular data visualization libraries, textbooks, and courses available.


Data Visualization is a very important and often overlooked part of the process of asking the right question, getting the required data, exploring, model and finally communication the answer by setting it for production or showing insights to other people. It is widely used in the Exploratory Data Analysis to getting to know the data, its distribution, and main descriptive statistics.

Black Hole

Recently, a black hole was imaged for the first time in history by the Event Horizon telescope made of telescopes all over the world. Half of ton of hard drives had been used to store the data. It was so big, it had to be flown physically to one place even in our modern time of internet and fast computer networks.  Python and several of the libraries have been used to make this incredible feat possible. For more technical details you could see in the academic paper.

Having a better understanding of the data, no matter the source will lead to creating more accurate models.  Finally, the output is made much easier to present to people especially if they aren’t familiar with the work. Here is a list of some libraries you can use start with.

1. Matplotlib (https://matplotlib.org/)

Matplotlib is a low-level library for creating two-dimensional diagrams and graphs. It is the oldest Python visualization library and the most developed with the most commits and contributors as of 2018.

With its help, you can build diverse charts, from histograms and scatterplots to non-Cartesian coordinates graphs. Moreover, many popular plotting libraries are designed to work in conjunction with matplotlib.

There have been style changes in colors, sizes, fonts, legends, etc. As an example of appearance improvements are an automatic alignment of axes legends and among significant colors improvements is a new colorblind-friendly color cycle.

Source: ActiveWizards

It is also possible to make simple animations,  3D plotting, changing the axis, geographical data, saving files to different formats and others.

Most visualization courses teach it for example:

2. Seaborn (https://seaborn.pydata.org/)

Seaborn is a library for making statistical graphics in Python. It is built on top of matplotlib and closely integrated with pandas data structures. It contains more suitable default settings for processing charts. Also, there is a rich gallery of visualizations including some complex types like time series, jointplots, and violin diagrams.

The seaborn updates mostly cover bug fixes. However, there were improvements in compatibility between FacetGrid or PairGrid and enhanced interactive matplotlib backends, adding parameters and options to visualizations.

The quality of the charts is typically higher then matplotlib and are easier to build without much customization.

It is sometimes covered in some courses, but there are fewer courses on it compared to matplotlib. For example:

3. Bokeh (https://bokeh.pydata.org/en/latest/)

While Matplotlib and Seaborn are static, the Bokeh library creates interactive and scalable visualizations in a browser using JavaScript widgets. No need to learn JavaScript – You can read more in here.

The library provides a versatile collection of graphs, styling possibilities, interaction abilities in the form of linking plots, adding widgets, and defining callbacks, and many more useful features.
Bokeh can boast with improved interactive abilities, like a rotation of categorical tick labels, as well as small zoom tool and customized tooltip fields enhancements.

Source: bokeh.pydata.org

There are fewer courses on the topic, but let’s list some:

4. Plotly (https://plot.ly/)

Plotly is a newer interactive library that allows you to build sophisticated graphics easily. The package is adapted to work in interactive web applications. Among its remarkable visualizations are contour graphics, ternary plots, and 3D charts. For example, https://towardsdatascience.com/the-next-level-of-data-visualization-in-python-dd6e99039d5e

The continuous enhancements of the library with new graphics and features brought the support for “multiple linked views” as well as animation, and crosstalk integration.

Some courses on the topic are:

5. Dash (https://plot.ly/products/dash/)

Dash is a productive Python framework for building web applications. This is partially paid.

Written on top of Flask, Plotly.js, and React.js, Dash is ideal for building data visualization apps with highly custom user interfaces in pure Python. It’s particularly suited for anyone who works with data in Python. It is growing fast at the moment https://blog.sicara.com/bokeh-dash-best-dashboard-framework-python-shiny-alternative-c5b576375f7f?gi=4adaec21bf8e

A course about this library is :

6. HoloViews (http://holoviews.org/)

HoloViews is an open-source Python library designed to make data analysis and visualization seamless and simple. With HoloViews, you can usually express what you want to do in very few lines of code, letting you focus on what you are trying to explore and convey, not on the process of plotting. It is built on top of bokeh and is more statistically inclined. It is still under development and there are bugs.

Source: Holoviews

Some materials on Holoviews:

7. GeoViews (http://geo.holoviews.org/)

GeoViews is a Python library that makes it easy to explore and visualize geographical, meteorological, and oceanographic datasets, such as those used in weather, climate, and remote sensing research. It is still under development.

GeoViews is built on the HoloViews library for building flexible visualizations of multidimensional data. GeoViews adds a family of geographic plot types based on the Cartopy library, plotted using either the Matplotlib or Bokeh packages. Each of the new GeoElement plot types is a new HoloViews Element that has an associated geographic projection based on cartopy.crs .

The GeoElements currently, includeFeature ,  WMTS ,  Tiles ,  Points ,  ContoursImage ,  QuadMesh, TriMesh ,  RGB ,  HSV , Labels,  GraphHexTiles ,  VectorField and Text objects, each of which can easily be overlaid in the same plots. E.g. an object with temperature data can be overlaid with coastline data using an expression like gv.Image(temperature) * gv.Feature(cartopy.feature.COASTLINE) . Each GeoElement can also be freely combined in layouts with any other HoloViews Element , making it simple to make even complex multi-figure layouts of overlaid objects.

Some video on the library:

8.Others

There are other projects that we could call libraries but are not well developed at the moment. The list can be viewed here.

Lastly, let’s give some literature on the topic:

  • Matplotlib 3.0 Cookbook: Over 150 recipes to create highly detailed interactive visualizations using Python. This book covers Matplotlib and Seaborn, there are a lot of examples. Unfortunately, some are very crowded and could be done separately. 

  • Matplotlib for Python Developers: Effective techniques for data visualization with Python 

  • Hands-On Data Visualization with Bokeh: Interactive web plotting for Python using Bokeh. This book covers only bokeh. It is short, but compact with many examples.

  • Data Visualization with Python: Create an impact with meaningful data insights using interactive and engaging visuals.  A new book that covers Matplotlib, Seaborn, and Bokeh.

  • Others. Many books have characters dedicated to Data Visualization, although their main intention is different.

Most of the books can be downloaded for free from some of those links present in this article:

35 open access websites that provide useful resources for everybody!

Share this

Leave a Reply