Scroll through the Python Package Index and you'll find libraries for practically every data visualization need—from GazeParser for eye movement research to pastalog for realtime visualizations of neural network training. And while many of these libraries are intensely focused on accomplishing a specific task, some can be used no matter what your field.
Today, we're giving an overview of 10 interdisciplinary Python data visualization libraries, from the well-known to the obscure. We've noted the ones you can take for a spin without the hassle of running Python locally, using Mode Python Notebooks.
Two histograms (matplotlib)
matplotlib is the O.G. of Python data visualization libraries. Despite being over a decade old, it's still the most widely used library for plotting in the Python community. It was designed to closely resemble MATLAB, a proprietary programming language developed in the 1980s.
Because matplotlib was the first Python data visualization library, many other libraries are built on top of it or designed to work in tandem with it during analysis. Some libraries like pandas and Seaborn are “wrappers” over matplotlib. They allow you to access a number of matplotlib’s methods with less code.
While matplotlib is good for getting a sense of the data, it's not very useful for creating publication-quality charts quickly and easily. As Chris Moffitt points out in his overview of Python visualization tools, matplotlib “is extremely powerful but with that power comes complexity.”
matplotlib has long been criticized for its default styles, which have a distinct 1990s feel. The upcoming release of matplotlib 2.0 promises many new style changes to address this problem.
Violinplot (Michael Waskom)
Seaborn harnesses the power of matplotlib to create beautiful charts in a few lines of code. The key difference is Seaborn's default styles and color palettes, which are designed to be more aesthetically pleasing and modern. Since Seaborn is built on top of matplotlib, you'll need to know matplotlib to tweak Seaborn's defaults.
Created by: Michael Waskom, available in Mode
Where to learn more: http://web.stanford.edu/~mwaskom/software/seaborn/index.html
Small multiples (ŷhat)
ggplot is based on ggplot2, an R plotting system, and concepts from The Grammar of Graphics. ggplot operates differently than matplotlib: it lets you layer components to create a complete plot. For instance, you can start with axes, then add points, then a line, a trendline, etc. Although The Grammar of Graphics has been praised as an “intuitive” method for plotting, seasoned matplotlib users might need time to adjust to this new mindset.
According to the creator, ggplot isn't designed for creating highly customized graphics. It sacrifices complexity for a simpler method of plotting.
ggplot is tightly integrated with pandas, so it's best to store your data in a DataFrame when using ggplot.
Interactive weather statistics for three cities (Continuum Analytics)
Like ggplot, Bokeh is based on The Grammar of Graphics, but unlike ggplot, it's native to Python, not ported over from R. Its strength lies in the ability to create interactive, web-ready plots, which can be easily output as JSON objects, HTML documents, or interactive web applications. Bokeh also supports streaming and real-time data.
Bokeh provides three interfaces with varying levels of control to accommodate different user types. The highest level is for creating charts quickly. It includes methods for creating common charts such as bar plots, box plots, and histograms. The middle level has the same specificity as matplotlib and allows you to control the basic building blocks of each chart (the dots in a scatter plot, for example). The lowest level is geared toward developers and software engineers. It has no pre-set defaults and requires you to define every element of the chart.
Box plot (Florian Mounier)
Like Bokeh and Plotly, pygal offers interactive plots that can be embedded in the web browser. Its prime differentiator is the ability to output charts as SVGs. As long as you're working with smaller datasets, SVGs will do you just fine. But if you're making charts with hundreds of thousands of data points, they'll have trouble rendering and become sluggish.
Since each chart type is packaged into a method and the built-in styles are pretty, it's easy to create a nice-looking chart in a few lines of code.
Line plot (Plotly)
You might know Plotly as an online platform for data visualization, but did you also know you can access its capabilities from a Python notebook? Like Bokeh, Plotly's forte is making interactive plots, but it offers some charts you won't find in most libraries, like contour plots, dendograms, and 3D charts.
Choropleth (Andrea Cuttone)
geoplotlib is a toolbox for creating maps and plotting geographical data. You can use it to create a variety of map-types, like choropleths, heatmaps, and dot density maps. You must have Pyglet (an object-oriented programming interface) installed to use geoplotlib. Nonetheless, since most Python data visualization libraries don't offer maps, it's nice to have a library dedicated solely to them.
Scatter plot with trend line (David Robinson)
Nullity matrix (Aleksey Bilogur)
Dealing with missing data is a pain. missingno allows you to quickly gauge the completeness of a dataset with a visual summary, instead of trudging through a table. You can filter and sort data based on completion or spot correlations with a heatmap or a dendrogram.
Chart grid with consistent scales (Christopher Groskopf)
Leather's creator, Christopher Groskopf, puts it best: “Leather is the Python charting library for those who need charts now and don’t care if they’re perfect.” It's designed to work with all data types and produces charts as SVGs, so you can scale them without losing image quality. Since this library is relatively new, some of the documentation is still in progress. The charts you can make are pretty basic—but that's the intention.
Created by: Christopher Groskopf
Where to learn more: http://leather.readthedocs.io/en/latest/index.html
Other great reads on Python data visualization
There are a ton of great evaluations and overviews of Python data visualization libraries out there. Check out some of our favorites:
- One Chart, Twelve Charting Libraries (Lisa Charlotte Rost)
- Overview of Python Visualization Tools (Practical Business Python)
- Python data visualization: Comparing 7 tools (Dataquest.io)
Did we miss your favorite data viz library? Let us know in the comments below.