Faceted plotting

Pandas has some nice tools to make faceted plots – small multiples of plots where data is grouped by category (http://pandas.pydata.org/pandas-docs/stable/rplot.html). However, I think there would be value in having this functionality built into matplotlib. Mainly:

  1. Not every dataset lives in a dataframe

  2. The pandas library mimics the ggplot interface, and some people would prefer an interface closer to matplotlib

  3. Properly implemented, I think a matplotlib facet system would enable a wider variety of faceted plots than the pandas tools.

I’ve taken a stab at this, and came up with an interface that I think has potential. This currently exists as a separate repository at https://github.com/ChrisBeaumont/mplfacet, and an example notebook at http://bit.ly/17u1JzP

There two basic ways to use a facet object:

Facet(key, data).method()

will group one or more data arrays by key, and build a subplot for each group by calling method (which is any axes plot method). Alternatively,

for item in Facet(key, data):

x, y = item.data

item.axes.scatter(x, y)

sets up the subplots and groups the data for you, but gives you more freedom to populate each subplot however you like.

Is there interest in building this into matplotlib? If so, I would like to polish it up and submit a PR.

Cheers,
Chris

Chris,

This is lovely work. Thanks for taking the time to put this together, I
think it has a lot of potential.

I'd like to get a discussion going regarding the current implementation of
the API you've rolled together for faceted plots. Overall, I like the
flexibility you have provided. However, I have some reservations and I'd
like to outline those now.

The current workflow is: Organise your data and create a Facet object.
Then call one of the Facet's plotting methods.

You have designed the Facet object to respond to calls to matplotlib's
plotting methods. That's pretty cool. My reservation here is that I'm not
sure it aligns with the design of matplotlib. At present, the Axes object
implements the plotting methods, and each method will have its own way of
plotting the various types of matplotlib objects. These are Collections,
PolyCollections, LineCollections, Line2D, etc... What would feel more
natural is if I could do the following:

f = Facet(...)
ax.facet(f, 'scatter')

Granted, this isn't as flexible, but it aligns with the current design
philosophy. That is, the user plots objects to the axes, not the other way
around.

In short, this is the matplotlib workflow: Organise your data, and pass it
to one of the axes object's plotting methods.

The way you have implemented Facet reminds of a discussion Mike, Phil, Ben
and myself were having over beers at the SciPy conference in Austin. We
were talking about how matplotlib's plotting API should move forward.
Admittedly, functions like plot() are a total disaster, they take a
plethora of different argument orders and types and try to conform to many
calling signatures at once. Specifically, the way the data is passed to
the plotting method varies wildly.

ax.plot(x1, y1, x2, y3, ...)
ax.plot((x1, x2, x3), (y1, y2, y3), ...)

This goes for the ax.tri* methods too. In addition to this, I tried to
extend this in a pull request<
https://github.com/matplotlib/matplotlib/pull/1143> by allowing the user to
pass in a callable object, to ax.fplot(), and have matplotlib decide how to
plot it. Fernando then asked the killer question, "So are you going to
write an fcontour, fcontourf, ftriplot, ftricontour, etc?" Obviously, no.
You have to draw the line somewhere. This led to the following line of
thinking: What if matplotlib's plotting methods just acted on an object of
type Plottable? That is, it doesn't matter whether your data is an array,
a function, or in your case, a Facet object. The Plottable class will
carve out an interface that each of matplotlib's plotting methods can
utilise that interface to do the drawing.

This is the new workflow: The user organises their data into a Plottable
object. Pass that Plottable object to any one of matplotlib's plotting
methods.

I think your Faceted plotting API supports exactly what I'm hoping to see
matplotlib will move towards:

class Facet(matplotlib.Plottable)
    def __init__(self, ...)
        ...

f = Facet(...)
ax.scatter(f)

Thoughts?

Thanks for the hard work.
Best wishes,
Damon

···

On Sat, Aug 31, 2013 at 10:21 AM, Chris Beaumont <beaumont@...229...>wrote:

Pandas has some nice tools to make faceted plots -- small multiples of
plots where data is grouped by category (
http://pandas.pydata.org/pandas-docs/stable/rplot.html). However, I think
there would be value in having this functionality built into matplotlib.
Mainly:

1. Not every dataset lives in a dataframe
2. The pandas library mimics the ggplot interface, and some people would
prefer an interface closer to matplotlib
3. Properly implemented, I think a matplotlib facet system would enable a
wider variety of faceted plots than the pandas tools.

I've taken a stab at this, and came up with an interface that I think has
potential. This currently exists as a separate repository at
https://github.com/ChrisBeaumont/mplfacet, and an example notebook at
http://bit.ly/17u1JzP

There two basic ways to use a facet object:

Facet(key, data).method()

will group one or more data arrays by key, and build a subplot for each
group by calling method (which is any axes plot method). Alternatively,

for item in Facet(key, data):
    x, y = item.data
    item.axes.scatter(x, y)

sets up the subplots and groups the data for you, but gives you more
freedom to populate each subplot however you like.

Is there interest in building this into matplotlib? If so, I would like to
polish it up and submit a PR.

Cheers,
Chris

--
Damon McDougall
http://www.damon-is-a-geek.com
Institute for Computational Engineering Sciences
201 E. 24th St.
Stop C0200
The University of Texas at Austin
Austin, TX 78712-1229

Hi Damon,

Thanks for your thoughts on how this should fit in with MPLs API. My $0.02:

···

Three things about this style bother me:

  1. It seems too verbose (“facet” gets typed a lot – 4 times if you call the variable facet instead of f).

  2. I don’t love having to invoke methods like ‘scatter’ by naming them as a string. It feels kludgy for some reason.

  3. I think the axes plotting methods belong to a different category than a facet. The former are “artist factories” that add artists like lines/patches/etc to axes. A facet, on the other hand, is a higher-level “axes factory” that creates multiple subplot axes objects. Making facet an axes method seems out of place, since I think it’s more natural to have a separate axes for each subplot. What do you think?

Good point. My implementation relies on a pretty general (but not universal or formally documented) property of most plot functions: the first arguments for each method are usually data arrays. This means that, in most situations, Facet can extract the appropriate subset of the original data, pass them as the first arguments to an axes method, and this will “do the right thing”. This works most of the time, but might be considered a hack. The iterator interface is meant to address the cases where this doesn’t work (for example, calling Facet.imshow or Facet.streamplot doesn’t work).

This interface addresses my first two concerns above, but not the third – I don’t think that all facets should live in a single axes. I’m not sure what you envision the Plottable interface looks like, but I imagine it provides methods to extract data, so that you can plot things besides arrays. In this case, I think a facet could use Plottables when building subplots, but I’m not sure a facet is a plottable.

Tangential to the notion of Plottable objects: if there were a standard protocol for passing data and style arguments to all plotting methods, it would be easier to build robust, higher level axes factories. Facets are one such factory, but there are others. For example (and not the prettiest, I admit), see the map at http://www.tableausoftware.com/public/gallery/new-jersey-test-score-analysis-visualization. It’s basically a faceted group of pie charts, that are positioned and sized according to more data. The generalized description is something like:

atomic_plot + faceted_by(variable) + positioned_by(x, y) + sized_by(z)

Where atomic_plot is an axes plot method (e.g., ax.pie, but why not ax.bar or any other single-variable plot?). You could imagine building a layered API like this, and it would be easier if the interface for all atomic_plot objects were compatible. Matplotlib was first built to win converts over from matplotlib – with a layered API, you can start converting the ggplot/d3/bokeh/vega community :slight_smile:

Cheers,

Chris

What would feel more natural is if I could do the following:

f = Facet(…)

ax.facet(f, ‘scatter’)

Admittedly, functions like plot() are a total disaster, they take a plethora of different argument orders and types and try to conform to many calling signatures at once. Specifically, the way the data is passed to the plotting method varies wildly.

I think your Faceted plotting API supports exactly what I’m hoping to see matplotlib will move towards:

class Facet(matplotlib.Plottable)

def init(self, …)

f = Facet(…)

ax.scatter(f)