handling labeled data

Hey all,

Everyone should be aware of https://github.com/matplotlib/matplotlib/pull/4787 which is both a very simple, but very important change to the mpl API by providing a minimal API to pass labeled data (that is anything that foo[key] return an array-like object) into mpl plotting functions.

This is due to Fernando and Brian’s persuasive case to the importance of starting to address labeled data in mpl and it is now or in 6-9 months

The general approach follows R / seaborn / panadas and allows users to pass in a data kwarg which if present, any data fields which are strings are replaced by a call to data[key]. In code

ax.plot(labeled_data[‘a’], labeled_data[‘b’])

and

ax.plot(‘a’, ‘b’, data=labeled_data)

are equivalent.

This is the minimal change to get quality of life for users who work with labeled data at the repl and to put a flag in the sand for the API that down stream projects should be targeting.

Major changes to what the plotting functions do (inferring labels, inferring what computation to do etc) are out of scope for this PR which I want to see included in 1.5. What a higher-level API which can make use of the additional meta-data available looks like is a much larger discussion which will must have input from all of the stake holders (ex IPython, pandas, bokeh, seaborn, xray).

Tom

A couple immediate thoughts: what if the data is spread across a mix of objects? Also, I think “labeled” might be a better kwarg name. Less likely to conflict with apis. I’ll give this a careful look-see tomorrow.

Ben Root

···

On Jul 25, 2015 7:03 PM, “Thomas Caswell” <tcaswell@…149…> wrote:

Hey all,

Everyone should be aware of https://github.com/matplotlib/matplotlib/pull/4787 which is both a very simple, but very important change to the mpl API by providing a minimal API to pass labeled data (that is anything that foo[key] return an array-like object) into mpl plotting functions.

This is due to Fernando and Brian’s persuasive case to the importance of starting to address labeled data in mpl and it is now or in 6-9 months

The general approach follows R / seaborn / panadas and allows users to pass in a data kwarg which if present, any data fields which are strings are replaced by a call to data[key]. In code

ax.plot(labeled_data[‘a’], labeled_data[‘b’])

and

ax.plot(‘a’, ‘b’, data=labeled_data)

are equivalent.

This is the minimal change to get quality of life for users who work with labeled data at the repl and to put a flag in the sand for the API that down stream projects should be targeting.

Major changes to what the plotting functions do (inferring labels, inferring what computation to do etc) are out of scope for this PR which I want to see included in 1.5. What a higher-level API which can make use of the additional meta-data available looks like is a much larger discussion which will must have input from all of the stake holders (ex IPython, pandas, bokeh, seaborn, xray).

Tom



Matplotlib-devel mailing list

Matplotlib-devel@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-devel