Units discussion...

nathan12343 · February 8, 2018, 8:13pm

Does numpy subclassing really matter? If the docs say the unit converter
must convert from the external type to the internal type, then as long as
the converter does that, it doesn't matter what the external type is or
what it inherits from right? The point is that the converter class is the
only class manipulating the external data objects - MPL shouldn't care what
they are or what they inherit from.

To make my statement more concrete, here's a matplotlib pull request that
fixed a bug that only triggered for astropy and yt but not for pint:

github.com/matplotlib/matplotlib

colors: ensure masked array data is an ndarray

matplotlib:master ← data-exp-lab:strip-units-imshow

opened 02:29PM - 21 Jun 16 UTC

ngoldbaum

+3 -1

This fixes compatibility for imshow plots with array data that is a unit-aware n…darray subclass, for example data from yt.units or astropy.units. Take for example the following script: ``` import numpy as np import matplotlib.pyplot as plt from yt.units import km arr = np.random.random((400, 400))*km plt.imshow(a) plt.show() ``` This produces the following traceback right now: https://gist.github.com/ngoldbaum/27effb41173859132fed08a0ad5485ee This is not isolated to yt's units - if one tries the same thing with astropy's units the same issue arises. Pint does not have this issue because Pint's `Quantity` class is not an ndarray subclass, so when it is converted to a masked array in `matplotlib.image`, the units are stripped. This function seems to expect `result.data` to be an instance of the base ndarray class. This patch just enforces that expectation explicitly.

In this case it was an issue because of difference in how NumPy's masked
array deals with ndarray subclasses versus array wrapper classes.

I think one issue is that data types are malleable in the API right now.
Lists, tuples, numpy, ints, floats, etc are all possible inputs in
many/most cases. IMO, the unit API should not be malleable at all. The
unit converter API should say that the return type of external->internal
conversion is always a specific value type (e.g. list of float, numpy float
64 array).

Jody: IMO, your example should plot the data in inches in the first plot
call, then convert the second input to inches and plot that. The plot
calls supports the xunits keyword argument which tells the converter what
floating point unit conversion to apply. If that keyword is not specified,
then it defaults to the type of the input. The example that needs to be
more clear is if I do this:

ax.plot( x1, y1, xunits="km" )
ax.plot( x2, y2, xunits="miles" )

IMO, either the floats are km or miles, not both. So either the first
call sticks the converter to using km and the second xunits is ignored. Or
the second input overrides the first and requires that the first artists go
back through a conversion to miles. Either is a reasonable choice for
behavior (but the first is much easier to implement).

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/matplotlib-devel/attachments/20180208/07de0b18/attachment-0001.html>

···

On Thu, Feb 8, 2018 at 1:08 PM, Drain, Theodore R (392P) < theodore.r.drain at jpl.nasa.gov> wrote:

rmay31 · February 8, 2018, 8:15pm

Hi,

Let me start by saying that this will probably come across as crabby, and I
don't really mean for it to do so. I'm happy people are looking at
improving unit support. HOWEVER, I'm concerned that those trying to push
right now are completely ignorant of what actually exists in matplotlib and
how the rest of the ecosystem of unit packages works, don't have personal
use cases and are completely unclear of what others use cases are, and seem
to be throwing things at the wall as rapidly as possible. For instance,
Anthony:

One major point (already mentioned by others) that led, I think, to some

devs (including myself) being relatively dismissive about unit support is
the lack of well-defined use case, other than "it'd be nice if we supported
units"

(i.e., especially from the point of view of devs who *don't* use units

themselves, it ends up being an ever moving target). In particular, tests
on unit support ("unit unit tests"? :-)) currently only rely on the old JPL
unit code

that ended up integrated into Matplotlib's test suite, but does not test

integration with the two major unit packages I am aware of (pint and
astropy.units).

False. Until David Stansby's contribution, I wrote every line of code in:
https://github.com/matplotlib/matplotlib/commits/master/lib/matplotlib/tests/test_units.py.
Either way, that test has literally *nothing* to do with JPL's
implementation. (And 30s of github could have revealed this.) I added that
code *literally* to check whether we're properly interfacing with a library
just like pint.

Is there a smaller library that subclasses ndarray for units support? I

imagine we could vendorize a subset of whatever astropy or yt do. Or maybe
they aren?t so huge that they would be unreasonable to make as test
dependencies. yt is only 68 Mb.

No. Just no. Again, I have stubbed out just fine the functionality within
test_units.py to function just like pint--in about 25 lines. I'm happy to
do so for an ndarray subclass-based one as well.

Now, about the functionality:

What we need an example of is how the following should work.
x = np.arange(10)
y  = x*2 * myunitclass.in
ax.plot(x, y)
z = x*2 * myunitclass.cm
ax.plot(x, z)

That currently works today, and works just fine. Same test file:

github.com

matplotlib/matplotlib/blob/main/lib/matplotlib/tests/test_units.py#L72-L81


      
              qc.convert = MagicMock(side_effect=convert)
              qc.axisinfo = MagicMock(side_effect=lambda u, a:
                                      munits.AxisInfo(label=u, default_limits=(0, 100)))
              qc.default_units = MagicMock(side_effect=default_units)
              return qc
          
          
          # Tests that the conversion machinery works properly for classes that
          # work as a facade over numpy arrays (like pint)
          @image_comparison(['plot_pint.png'], style='mpl20',

You're plotting things with the same dimensionality, the converter
interface can convert to the units that exist already on the axes. Done.
I'm quite happy with it.

Honestly, I'm not trying to be mean about this. But I come into an email
thread where things are moving so fast, with factually incorrect
information flying around, that I'm simply overwhelmed. (14 messages in 3
hours???) I don't think email is a good place to discuss this.

To be clear, I am *ecstatic* that people are looking at unit challenges,
and I agree that the way we're implementing it in matplotlib is
hacky--handling it uniquely for each plotting method rather than
systematically. And I'm happy to have new voices come in and try to improve
the situation with new ideas. But I see people railing against the current
converter interface as if it's unused, crusty, or otherwise completely
inadequate. The converter WORKS fine. The problem is in that we have to
hook up unit machinery individually to each plotting method, because each
plotting method is its own special snowflake--unique and unlike any other.
What we need is to rationalize the implementation of plots, specifically
the data handling (missing data, units, shape, etc.), and then implementing
units will be a sane task.

Or maybe I'm wrong, and there is some structural deficiency in the current
converter--but I'd at least like to see those arguments coming from a place
knowledge, not conjecture about how this thing may or may not be working
currently, and wild speculation about how it's supposed to work. Contrary
to the "lack of well-defined use case" idea, there are plenty--they might
not be written down, but that doesn't mean they haven't been discussed
before.

Let's find a better venue for this discussion that lends itself for
everyone to join in *together*, synchronously, and in a form where we're
not guessing at tone.

Ryan

···

On Thu, Feb 8, 2018 at 11:48 AM, Jody Klymak <jklymak at uvic.ca> wrote:

On 8 Feb 2018, at 09:54, Drain, Theodore R (392P) < > theodore.r.drain at jpl.nasa.gov> wrote:

I think we can help with building a better toy unit system. Or we can
standardize on datetime and some existing unit package. Whatever makes it
easier for people to write test cases.

For me, the problem w/ datetime is that it is not fully featured units
handling in that it doesn?t support multiple units. Its really just a
class of data that we have known conversion to float for.

What we need an example of is how the following should work.
x = np.arange(10)
y = x*2 * myunitclass.in
ax.plot(x, y)
z = x*2 * myunitclass.cm
ax.plot(x, z)
So when a new feature is added, we can ask that its units support is made
clear. I guess I don?t mind if those are astropy units or yt units, or
pint, or?? though there will be some pushback about including another test
dependency.

Would pint units work? Its a very small dependency, but maybe not as full
featured or structured wildly differently from the others?

A test suite to my mind would
- test basic functionality
- test mixing allowed dimensions (i.e. inches and centimeters)
- test changing the axis units (so all the plotted data changes its
values, *or* the tick locators/formatters change their values).
- test that disallowed mixed dimensions fail.
- ??

Cheers, Jody

________________________________________
From: Jody Klymak <jklymak at uvic.ca>
Sent: Thursday, February 8, 2018 9:39 AM
To: Drain, Theodore R (392P)
Cc: matplotlib development list
Subject: Re: [Matplotlib-devel] Units discussion...

I realize that units are "a pain", but they're hugely useful. Just
plotting datetimes is going to be a pain without units (and was a huge pain
before the unit system). The proposal that only Axes supports units is
going to cause us a massive problem as that's rarely everything that we do
with a plot. I could do a survey to find all the interactions we use (and
that doesn't even touch the 1000's of lines of code our users have written)
if that would help but anything that's part of the public api (axes,
artists, patches, etc) is probably being used - i.e. pretty much anything
that's in the current user's guide is something that we use/want/need to
work with unitized data.

OK, *for discussion*: A scope of work for JPL and Matplotlib might be:

1) develop better toy unit module that has most of the desired features
(maybe the existing one is fine, but please see
Units handling different with plot than other functions... · Issue #9713 · matplotlib/matplotlib · GitHub for why I?m a little
dismayed with the state of things).

2) write a developer?s guide explaining how units should be/are implemented
a) in matplotlib modules
b) by downstream developers (this is probably adequate already).

It sounds like what you are saying is that units should be carried to the
draw stage (or cache stage) for all artists. Thats maybe fine, but as a
new developer, I found the units support woefully under-documented. The
fact that others have hacked in units support in various inconsistent ways
means that we need to police all this better.

OTOH, maybe Antony and I are poor people to lead this charge, given that
we don?t need unit support. But I don?t think we are being hypercritical
in pointing out it needs work.

Thanks a lot, Jody

This is kind of what I meant in my previous email about use cases. Saying
"just Axes has units" is basically saying the only valid unit use case is
create a plot one time and look at it. You can't manipulate it, edit it,
or build any kind of plotting GUI application (which we have many of) once
the plot has been created. The Artist classes are one of the primary API's
for applications. Artists are created, edited, and manipulated if you want
to allow the user to modify things in a plot after it's created. Even
the most basic cases like calling Line2D.set_data() wouldn't be allowed
with units if only Axes has unit support.

I'm not sure I understand the statement that units are a moving target.
The reason it keeps popping up is that code gets added without something
considering units which then triggers a bug reports which require fixing.
If there was a clearer policy and new code was required to have test cases
that cover non-unit and unit inputs, I think things would go much
smoother. We'd be happy to help with submitting new test cases to cover
unit cases in existing code once a policy is decided on. Maybe what's
needed is better documentation for developers who don't use units so they
can easily write a test case with units when adding/modifying functionality.

Ted

________________________________________
From: anntzer.lee at gmail.com<mailto:anntzer.lee at gmail.com
<anntzer.lee at gmail.com>> <anntzer.lee at gmail.com<mailto:
anntzer.lee at gmail.com <anntzer.lee at gmail.com>>> on behalf of Antony Lee <
antony.lee at berkeley.edu<mailto:antony.lee at berkeley.edu
<antony.lee at berkeley.edu>>>
Sent: Thursday, February 8, 2018 8:09 AM
To: Drain, Theodore R (392P)
Cc: matplotlib development list
Subject: Re: [Matplotlib-devel] Units discussion...

I'm momentarily a bit away from Matplotlib development due to real life
piling up, so I'll just keep this short.

One major point (already mentioned by others) that led, I think, to some
devs (including myself) being relatively dismissive about unit support is
the lack of well-defined use case, other than "it'd be nice if we supported
units" (i.e., especially from the point of view of devs who *don't* use
units themselves, it ends up being an ever moving target). In particular,
tests on unit support ("unit unit tests"? :-)) currently only rely on the
old JPL unit code that ended up integrated into Matplotlib's test suite,
but does not test integration with the two major unit packages I am aware
of (pint and astropy.units).

From the email of Ted it appears that these are not sufficient to
represent all kinds of relevant units. In particular, I was at some point
hoping to completely work in deunitized data internally, *including the
plotting*, and rely on the fact that if the deunitized and the unitized
data are usually linked by an affine transform, so the plotting part
doesn't need to convert back to unitized data and we only need to place and
label the ticks accordingly; however Ted mentioned relativistic units,
which imply the use of a non-affine transform. So I think it would also be
really helpful if JPL could release some reasonably documented unit library
with their actual use cases (and how it differs from pint & astropy.units),
so that we know better what is actually needed (I believe carrying the JPL
unit code in our own code base is a mistake).

As for the public vs private, or rather unitized vs deunitized API
discussion, I believe a relatively simple and consistent line would be to
make Axes methods unitized and everything else deunitized (but with clear
ways to convert to and from unitized data when not using Axes methods).

Antony

2018-02-07 16:33 GMT+01:00 Drain, Theodore R (392P) <
theodore.r.drain at jpl.nasa.gov<mailto:theodore.r.drain at jpl.nasa.gov
<theodore.r.drain at jpl.nasa.gov>><mailto:theodore.r.drain at jpl.nasa.gov
<theodore.r.drain at jpl.nasa.gov>>>:
That sounds fine to me. Our original unit prototype API actually had
conversions for both directions but I think the float->unit version was
removed (or really moved) when the ticker/formatter portion of the unit API
was settled on.

Using floats/numpy arrays internally is going to easier and faster so I
think that's a plus. The biggest issue we're going to run in to is what's
defined as "internal" vs part of the unit API. Some things are easy like
the Axes/Axis API. But we also use low level API's like the patches. Are
those unitized? This is the pro and con of using something like Python
where basically everything is public. It makes it possible to do lots of
things, but it's much harder to define a clear library with a specific
public API.

Somewhere in the process we should write a proposal that outlines which
classes/methods are part of the unit api and which are going to be
considered internal. I'm sure we can help with that effort.

That also might help clarify/influence code structure - if internal
implementation classes are placed in a sub-package inside MPL 3.0, it
becomes clearer to people later on what the "official' public API vs what
can be optimized to just use floats. Obviously the dev's would need to
decide if that kind of restructuring is worth it or not.

Ted

________________________________________
From: David Stansby <dstansby at gmail.com<mailto:dstansby at gmail.com
<dstansby at gmail.com>><mailto:dstansby at gmail.com <dstansby at gmail.com>>>
Sent: Wednesday, February 7, 2018 3:42 AM
To: Jody Klymak
Cc: Drain, Theodore R (392P); matplotlib development list
Subject: Re: [Matplotlib-devel] Units discussion...

Practically, I think what we are proposing is that for unit support the
user must supply two functions for each axis:

* A mapping from your unit objects to floating point numbers
* A mapping from those floats back to your unit objects

As far as I know function 2 is new, and doesn't need to be supplied at the
moment. Doing this would mean we can convert units as soon as they enter
Matplotlib, only ever have to deal with floating point numbers internally,
and then use the second function as late as possible when the user requests
stuff like e.g. the axis limits.

Also worth noting that any major change like this will go in to Matplotlib
3.0 at the earliest, so will be python 3 only.

David

On 7 February 2018 at 06:06, Jody Klymak <jklymak at uvic.ca<mailto: > jklymak at uvic.ca <jklymak at uvic.ca>><mailto:jklymak at uvic.ca > <jklymak at uvic.ca>><mailto:jklymak at uvic.ca <jklymak at uvic.ca><mailto: > jklymak at uvic.ca <jklymak at uvic.ca>>>> wrote:
Dear Ted,

Thanks so much for engaging on this.

Don?t worry, nothing at all is changing w/o substantial back and forth,
and OK from downstream users. I actually don?t think it?ll be a huge
change, probably just some clean up and better documentation.

FWIW, I?ve not personally done much programming w/ units, just been a bit
perplexed by their inconsistent and (to my simple mind) convoluted
application in the codebase. Having experience from people who try to use
them everyday will be absolutely key.

Cheers, Jody

On Feb 6, 2018, at 14:17 PM, Drain, Theodore R (392P) <
theodore.r.drain at jpl.nasa.gov<mailto:theodore.r.drain at jpl.nasa.gov
<theodore.r.drain at jpl.nasa.gov>><mailto:theodore.r.drain at jpl.nasa.gov
<theodore.r.drain at jpl.nasa.gov>><mailto:theodore.r.drain at jpl.nasa.gov
<theodore.r.drain at jpl.nasa.gov><mailto:theodore.r.drain at jpl.nasa.gov
<theodore.r.drain at jpl.nasa.gov>>>> wrote:

We use units for everything in our system (in fact, we funded John Hunter
originally to add in a unit system so we could use MPL) so it's a crucial
system for us. In our system, we have our own time classes (which handle
relativistic time frames as well as much higher precision representations)
and a custom unit system for floating point values.

I think it's important to talk about these changes in concrete terms. I
understand the words you're using, but I'm not really clear on what the
real proposed changes are. For example, the current unit API returns a
units.AxisInfo object so the converter can set the formatter and locators
to use. Is that what you mean in the 2nd paragraph about ticks and
labels? Or is that changing?

The current unit api is pretty simple and in units.ConversionInterface.
Are any of these changes going to change the conversion API? (note - I'm
not against changing it - I'm just not sure if there are any changes or
not).

Another thing to consider: many of the examples people use are scripts
which make a plot and stop. But there are other use cases which are more
complicated and stress the system in different ways. We write several GUI
applications (in PyQt) that use MPL for plotting. In these cases, the user
is interacting with the plot to add and remove artists, change styles,
modify data, etc etc. So having a good object oriented API for modifying
things after construction is important for this to work. So when units are
involved, it can't be a "convert once at construction" and never touch
units again. We are constantly adjusting limits, moving artists, etc in
unitized space after the plot is created.

So in addition to the ConversionInterface API, I think there are other
items that would be useful to explicitly spelled out. Things like which
API's in MPL should accept units and which won't and which methods return
unitized data and which don't. It would be nice if there was a clear
policy on this. Maybe one exists and I'm not aware of it - it would be
helpful to repeat it in a discussion on changing the unit system.
Obviously I would love to have every method accept and return unitized data
:-).

I bring this up because I was just working on a hover/annotation class
that needed to move a single annotation artist with the mouse. To move the
annotation box the way I needed to, I had to set to one private member
variable, call two set methods, use attribute assignment for one value, and
set one semi-public member variable - some of which work with units and
some of which didn't. I think having a clear "this kind of method
accepts/returns units" policy would help when people are adding new
accessors/methods/variables to make it more clear what kind of data is
acceptable in each.

Ted
ps: I may be able to help with some resources to work on any unit
upgrades, but to make that happen I need to get a clear statement of what
problem is being solved and the scope of the work so I can explain to our
management why it's important.

________________________________________
From: Matplotlib-devel <matplotlib-devel-bounces+ted.
drain=jpl.nasa.gov at python.org<mailto:matplotlib-devel-
bounces+ted.drain=jpl.nasa.gov at python.org
<matplotlib-devel-bounces+ted.drain=jpl.nasa.gov at python.org>><mailto:jpl.
nasa.gov at python.org <jpl.nasa.gov at python.org>><mailto:
jpl.nasa.gov at python.org <jpl.nasa.gov at python.org><mailto
:jpl.nasa.gov at python.org <jpl.nasa.gov at python.org>>>> on behalf of Jody
Klymak <jklymak at uvic.ca<mailto:jklymak at uvic.ca <jklymak at uvic.ca>><mailto:
jklymak at uvic.ca <jklymak at uvic.ca>><mailto:jklymak at uvic.ca
<jklymak at uvic.ca><mailto:jklymak at uvic.ca <jklymak at uvic.ca>>>>
Sent: Saturday, February 3, 2018 9:25 PM
To: matplotlib development list
Subject: [Matplotlib-devel] Units discussion...

Hi all,

To carry on the gitter discussion about unit handling, hopefully to lead
to a more stringent documentation and implimentation?.

In response to @anntzer I thought about the units support a bit - it seems
that rather than a transform, a more straightforward approach is to have
the converter map to float arrays in a unique way. This float mapping
would be completely analogous to `date2num` in `dates`, in that it doesn?t
change and is perfectly invertible without matplotlib ever knowing about
the unit information, though the axis could store it for the the tick
locators and formatters. It would also have an inverse that would supply
data back to the user in unit-aware data (though not necessarily in the
unit that the user supplied. e.g. if they supply 8*in, the and the
converter converts everything to meter floats, then the returned unitized
inverse would be 0.203*m, or whatever convention the converter wants to
supply.).

User ?unit? control, i.e. making the plot in inches instead of m, would be
accomplished with ticks locators and formatters. Matplotlib would never
directly convert between cm and inches (any more than it converts from days
to hours for dates), the downstream-supplied tick formatter and labeller
would do it.

Each axis would only get one converter, set by the first call to the axis.
Subsequent calls to the axis would pass all data (including bare floats) to
the converter. If the converter wants to pass bare floats then it can do
so. If it wants to accept other data types then it can do so. It should
be possible for the user to clear or set the converter, but then they
should know what they are doing and why.

Whats missing? I don?t think this is wildly different than what we have,
but maybe a bit more clear.

Cheers, Jody

_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel at python.org<mailto:Matplotlib-devel at python.org
<Matplotlib-devel at python.org>><mailto:Matplotlib-devel at python.org
<Matplotlib-devel at python.org>><mailto:Matplotlib-devel at python.org
<Matplotlib-devel at python.org><mailto:Matplotlib-devel at python.org
<Matplotlib-devel at python.org>>>
Matplotlib-devel Info Page
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel at python.org<mailto:Matplotlib-devel at python.org
<Matplotlib-devel at python.org>><mailto:Matplotlib-devel at python.org
<Matplotlib-devel at python.org>><mailto:Matplotlib-devel at python.org
<Matplotlib-devel at python.org><mailto:Matplotlib-devel at python.org
<Matplotlib-devel at python.org>>>
Matplotlib-devel Info Page

_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel at python.org<mailto:Matplotlib-devel at python.org
<Matplotlib-devel at python.org>><mailto:Matplotlib-devel at python.org
<Matplotlib-devel at python.org>><mailto:Matplotlib-devel at python.org
<Matplotlib-devel at python.org><mailto:Matplotlib-devel at python.org
<Matplotlib-devel at python.org>>>
Matplotlib-devel Info Page

_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel at python.org<mailto:Matplotlib-devel at python.org
<Matplotlib-devel at python.org>><mailto:Matplotlib-devel at python.org
<Matplotlib-devel at python.org>>
Matplotlib-devel Info Page

_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel at python.org<mailto:Matplotlib-devel at python.org
<Matplotlib-devel at python.org>>
Matplotlib-devel Info Page

--
Jody Klymak
Jody M. Klymak - UVic Ocean Physics

_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel at python.org
Matplotlib-devel Info Page

--
Ryan May
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/matplotlib-devel/attachments/20180208/62eb36e5/attachment-0001.html>

antony.lee · February 8, 2018, 8:49pm

I apologize for the erroneous statements I have made regarding tests. I
should, in fact, be well aware of test_units, having had to fight with it
when fixing PR#9774 (see the part modifying axes/_base.py). However, my
intent (and again, I readily admit I wrote something else and that was
incorrect) was that the test is against our own mocking of a minimal unit
system, rather than an external, actually used one. In other words, I
would much prefer actually bringing in pint as a test dependency (at least
for CI -- we can always locally skipif it), and whatever else we need to
cover all cases. Why? Because, for someone who is not a unit specialist,
how do I know whether your mock unit class is actually relevant and has
anything to do with "real-life" units?

There may not be a fundamental structural deficiency in the current
converter setup in itself, but I maintain that the need to add an ad hoc
implementation to "each uniquely special snowflake" (plotting method),
rather than at well defined entry points, is less than ideal. As you
mentioned, this may not actually be due to units, but just to Matplotlib's
general architecture, but unit support make this more visible (... IMO).

Finally, please understand the "lack of well defined use case" from the
point of view of a developer who does not use unitized data. He sees a
bunch of rather complex code to convert units around (e.g. Line2D.recache
and everything that calls into it), and meanwhile, what is the *only*
documentation he sees on the unit system? It is the docstring of the units
module, which is frankly less than optimal. At that point, he just sees
the unit support code as a burden that has to be carried around.
Obviously, I totally understand that people use Matplotlib with different
use cases, and there may be things I use in Matplotlib that you couldn't
care less about. However, as Jody mentioned some time ago, the unit system
is literally supposed to touch *any* data that comes into Matplotlib, and
can therefore hardly be ignored by any dev. I believe this is consistent
with the call for a MEP clarifying the use cases of units.

Antony

2018-02-08 21:15 GMT+01:00 Ryan May <rmay31 at gmail.com>:

Hi,

Let me start by saying that this will probably come across as crabby, and
I don't really mean for it to do so. I'm happy people are looking at
improving unit support. HOWEVER, I'm concerned that those trying to push
right now are completely ignorant of what actually exists in matplotlib and
how the rest of the ecosystem of unit packages works, don't have personal
use cases and are completely unclear of what others use cases are, and seem
to be throwing things at the wall as rapidly as possible. For instance,
Anthony:

> One major point (already mentioned by others) that led, I think, to
some devs (including myself) being relatively dismissive about unit support
is the lack of well-defined use case, other than "it'd be nice if we
supported units"
> (i.e., especially from the point of view of devs who *don't* use units
themselves, it ends up being an ever moving target). In particular, tests
on unit support ("unit unit tests"? :-)) currently only rely on the old JPL
unit code
> that ended up integrated into Matplotlib's test suite, but does not test
integration with the two major unit packages I am aware of (pint and
astropy.units).

False. Until David Stansby's contribution, I wrote every line of code in:
Commits · matplotlib/matplotlib · GitHub
master/lib/matplotlib/tests/test_units.py. Either way, that test has
literally *nothing* to do with JPL's implementation. (And 30s of github
could have revealed this.) I added that code *literally* to check whether
we're properly interfacing with a library just like pint.

> Is there a smaller library that subclasses ndarray for units support?
I imagine we could vendorize a subset of whatever astropy or yt do. Or
maybe they aren?t so huge that they would be unreasonable to make as test
dependencies. yt is only 68 Mb.

No. Just no. Again, I have stubbed out just fine the functionality within
test_units.py to function just like pint--in about 25 lines. I'm happy to
do so for an ndarray subclass-based one as well.

Now, about the functionality:

> What we need an example of is how the following should work.
> ```python
> x = np.arange(10)
> y = x*2 * myunitclass.in
> ax.plot(x, y)
> z = x*2 * myunitclass.cm
> ax.plot(x, z)
> ```

That currently works today, and works just fine. Same test file:
https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/tests/
test_units.py#L72-L81 You're plotting things with the same
dimensionality, the converter interface can convert to the units that exist
already on the axes. Done. I'm quite happy with it.

Honestly, I'm not trying to be mean about this. But I come into an email
thread where things are moving so fast, with factually incorrect
information flying around, that I'm simply overwhelmed. (14 messages in 3
hours???) I don't think email is a good place to discuss this.

To be clear, I am *ecstatic* that people are looking at unit challenges,
and I agree that the way we're implementing it in matplotlib is
hacky--handling it uniquely for each plotting method rather than
systematically. And I'm happy to have new voices come in and try to improve
the situation with new ideas. But I see people railing against the current
converter interface as if it's unused, crusty, or otherwise completely
inadequate. The converter WORKS fine. The problem is in that we have to
hook up unit machinery individually to each plotting method, because each
plotting method is its own special snowflake--unique and unlike any other.
What we need is to rationalize the implementation of plots, specifically
the data handling (missing data, units, shape, etc.), and then implementing
units will be a sane task.

Or maybe I'm wrong, and there is some structural deficiency in the current
converter--but I'd at least like to see those arguments coming from a place
knowledge, not conjecture about how this thing may or may not be working
currently, and wild speculation about how it's supposed to work. Contrary
to the "lack of well-defined use case" idea, there are plenty--they might
not be written down, but that doesn't mean they haven't been discussed
before.

Let's find a better venue for this discussion that lends itself for
everyone to join in *together*, synchronously, and in a form where we're
not guessing at tone.

Ryan
I think we can help with building a better toy unit system. Or we can
standardize on datetime and some existing unit package. Whatever makes it
easier for people to write test cases.

For me, the problem w/ datetime is that it is not fully featured units
handling in that it doesn?t support multiple units. Its really just a
class of data that we have known conversion to float for.

What we need an example of is how the following should work.
x = np.arange(10)
y = x*2 * myunitclass.in
ax.plot(x, y)
z = x*2 * myunitclass.cm
ax.plot(x, z)
So when a new feature is added, we can ask that its units support is made
clear. I guess I don?t mind if those are astropy units or yt units, or
pint, or?? though there will be some pushback about including another test
dependency.

Would pint units work? Its a very small dependency, but maybe not as
full featured or structured wildly differently from the others?

A test suite to my mind would
- test basic functionality
- test mixing allowed dimensions (i.e. inches and centimeters)
- test changing the axis units (so all the plotted data changes its
values, *or* the tick locators/formatters change their values).
- test that disallowed mixed dimensions fail.
- ??

Cheers, Jody

________________________________________
From: Jody Klymak <jklymak at uvic.ca>
Sent: Thursday, February 8, 2018 9:39 AM
To: Drain, Theodore R (392P)
Cc: matplotlib development list
Subject: Re: [Matplotlib-devel] Units discussion...

I realize that units are "a pain", but they're hugely useful. Just
plotting datetimes is going to be a pain without units (and was a huge pain
before the unit system). The proposal that only Axes supports units is
going to cause us a massive problem as that's rarely everything that we do
with a plot. I could do a survey to find all the interactions we use (and
that doesn't even touch the 1000's of lines of code our users have written)
if that would help but anything that's part of the public api (axes,
artists, patches, etc) is probably being used - i.e. pretty much anything
that's in the current user's guide is something that we use/want/need to
work with unitized data.

OK, *for discussion*: A scope of work for JPL and Matplotlib might be:

1) develop better toy unit module that has most of the desired features
(maybe the existing one is fine, but please see
Units handling different with plot than other functions... · Issue #9713 · matplotlib/matplotlib · GitHub for why I?m a
little dismayed with the state of things).

2) write a developer?s guide explaining how units should be/are
implemented
a) in matplotlib modules
b) by downstream developers (this is probably adequate already).

It sounds like what you are saying is that units should be carried to the
draw stage (or cache stage) for all artists. Thats maybe fine, but as a
new developer, I found the units support woefully under-documented. The
fact that others have hacked in units support in various inconsistent ways
means that we need to police all this better.

OTOH, maybe Antony and I are poor people to lead this charge, given that
we don?t need unit support. But I don?t think we are being hypercritical
in pointing out it needs work.

Thanks a lot, Jody

This is kind of what I meant in my previous email about use cases. Saying
"just Axes has units" is basically saying the only valid unit use case is
create a plot one time and look at it. You can't manipulate it, edit it,
or build any kind of plotting GUI application (which we have many of) once
the plot has been created. The Artist classes are one of the primary API's
for applications. Artists are created, edited, and manipulated if you want
to allow the user to modify things in a plot after it's created. Even
the most basic cases like calling Line2D.set_data() wouldn't be allowed
with units if only Axes has unit support.

I'm not sure I understand the statement that units are a moving target.
The reason it keeps popping up is that code gets added without something
considering units which then triggers a bug reports which require fixing.
If there was a clearer policy and new code was required to have test cases
that cover non-unit and unit inputs, I think things would go much
smoother. We'd be happy to help with submitting new test cases to cover
unit cases in existing code once a policy is decided on. Maybe what's
needed is better documentation for developers who don't use units so they
can easily write a test case with units when adding/modifying functionality.

Ted

________________________________________
From: anntzer.lee at gmail.com<mailto:anntzer.lee at gmail.com
<anntzer.lee at gmail.com>> <anntzer.lee at gmail.com<mailto:
anntzer.lee at gmail.com <anntzer.lee at gmail.com>>> on behalf of Antony Lee <
antony.lee at berkeley.edu<mailto:antony.lee at berkeley.edu
<antony.lee at berkeley.edu>>>
Sent: Thursday, February 8, 2018 8:09 AM
To: Drain, Theodore R (392P)
Cc: matplotlib development list
Subject: Re: [Matplotlib-devel] Units discussion...

I'm momentarily a bit away from Matplotlib development due to real life
piling up, so I'll just keep this short.

One major point (already mentioned by others) that led, I think, to some
devs (including myself) being relatively dismissive about unit support is
the lack of well-defined use case, other than "it'd be nice if we supported
units" (i.e., especially from the point of view of devs who *don't* use
units themselves, it ends up being an ever moving target). In particular,
tests on unit support ("unit unit tests"? :-)) currently only rely on the
old JPL unit code that ended up integrated into Matplotlib's test suite,
but does not test integration with the two major unit packages I am aware
of (pint and astropy.units).

From the email of Ted it appears that these are not sufficient to
represent all kinds of relevant units. In particular, I was at some point
hoping to completely work in deunitized data internally, *including the
plotting*, and rely on the fact that if the deunitized and the unitized
data are usually linked by an affine transform, so the plotting part
doesn't need to convert back to unitized data and we only need to place and
label the ticks accordingly; however Ted mentioned relativistic units,
which imply the use of a non-affine transform. So I think it would also be
really helpful if JPL could release some reasonably documented unit library
with their actual use cases (and how it differs from pint & astropy.units),
so that we know better what is actually needed (I believe carrying the JPL
unit code in our own code base is a mistake).

As for the public vs private, or rather unitized vs deunitized API
discussion, I believe a relatively simple and consistent line would be to
make Axes methods unitized and everything else deunitized (but with clear
ways to convert to and from unitized data when not using Axes methods).

Antony

2018-02-07 16:33 GMT+01:00 Drain, Theodore R (392P) <
theodore.r.drain at jpl.nasa.gov<mailto:theodore.r.drain at jpl.nasa.gov
<theodore.r.drain at jpl.nasa.gov>><mailto:theodore.r.drain at jpl.nasa.gov
<theodore.r.drain at jpl.nasa.gov>>>:
That sounds fine to me. Our original unit prototype API actually had
conversions for both directions but I think the float->unit version was
removed (or really moved) when the ticker/formatter portion of the unit API
was settled on.

Using floats/numpy arrays internally is going to easier and faster so I
think that's a plus. The biggest issue we're going to run in to is what's
defined as "internal" vs part of the unit API. Some things are easy like
the Axes/Axis API. But we also use low level API's like the patches. Are
those unitized? This is the pro and con of using something like Python
where basically everything is public. It makes it possible to do lots of
things, but it's much harder to define a clear library with a specific
public API.

Somewhere in the process we should write a proposal that outlines which
classes/methods are part of the unit api and which are going to be
considered internal. I'm sure we can help with that effort.

That also might help clarify/influence code structure - if internal
implementation classes are placed in a sub-package inside MPL 3.0, it
becomes clearer to people later on what the "official' public API vs what
can be optimized to just use floats. Obviously the dev's would need to
decide if that kind of restructuring is worth it or not.

Ted

________________________________________
From: David Stansby <dstansby at gmail.com<mailto:dstansby at gmail.com
<dstansby at gmail.com>><mailto:dstansby at gmail.com <dstansby at gmail.com>>>
Sent: Wednesday, February 7, 2018 3:42 AM
To: Jody Klymak
Cc: Drain, Theodore R (392P); matplotlib development list
Subject: Re: [Matplotlib-devel] Units discussion...

Practically, I think what we are proposing is that for unit support the
user must supply two functions for each axis:

* A mapping from your unit objects to floating point numbers
* A mapping from those floats back to your unit objects

As far as I know function 2 is new, and doesn't need to be supplied at
the moment. Doing this would mean we can convert units as soon as they
enter Matplotlib, only ever have to deal with floating point numbers
internally, and then use the second function as late as possible when the
user requests stuff like e.g. the axis limits.

Also worth noting that any major change like this will go in to
Matplotlib 3.0 at the earliest, so will be python 3 only.

David

Dear Ted,

Thanks so much for engaging on this.

Don?t worry, nothing at all is changing w/o substantial back and forth,
and OK from downstream users. I actually don?t think it?ll be a huge
change, probably just some clean up and better documentation.

FWIW, I?ve not personally done much programming w/ units, just been a bit
perplexed by their inconsistent and (to my simple mind) convoluted
application in the codebase. Having experience from people who try to use
them everyday will be absolutely key.

Cheers, Jody

On Feb 6, 2018, at 14:17 PM, Drain, Theodore R (392P) <
theodore.r.drain at jpl.nasa.gov<mailto:theodore.r.drain at jpl.nasa.gov
<theodore.r.drain at jpl.nasa.gov>><mailto:theodore.r.drain at jpl.nasa.gov
<theodore.r.drain at jpl.nasa.gov>><mailto:theodore.r.drain at jpl.nasa.gov
<theodore.r.drain at jpl.nasa.gov><mailto:theodore.r.drain at jpl.nasa.gov
<theodore.r.drain at jpl.nasa.gov>>>> wrote:

We use units for everything in our system (in fact, we funded John Hunter
originally to add in a unit system so we could use MPL) so it's a crucial
system for us. In our system, we have our own time classes (which handle
relativistic time frames as well as much higher precision representations)
and a custom unit system for floating point values.

I think it's important to talk about these changes in concrete terms. I
understand the words you're using, but I'm not really clear on what the
real proposed changes are. For example, the current unit API returns a
units.AxisInfo object so the converter can set the formatter and locators
to use. Is that what you mean in the 2nd paragraph about ticks and
labels? Or is that changing?

The current unit api is pretty simple and in units.ConversionInterface.
Are any of these changes going to change the conversion API? (note - I'm
not against changing it - I'm just not sure if there are any changes or
not).

Another thing to consider: many of the examples people use are scripts
which make a plot and stop. But there are other use cases which are more
complicated and stress the system in different ways. We write several GUI
applications (in PyQt) that use MPL for plotting. In these cases, the user
is interacting with the plot to add and remove artists, change styles,
modify data, etc etc. So having a good object oriented API for modifying
things after construction is important for this to work. So when units are
involved, it can't be a "convert once at construction" and never touch
units again. We are constantly adjusting limits, moving artists, etc in
unitized space after the plot is created.

So in addition to the ConversionInterface API, I think there are other
items that would be useful to explicitly spelled out. Things like which
API's in MPL should accept units and which won't and which methods return
unitized data and which don't. It would be nice if there was a clear
policy on this. Maybe one exists and I'm not aware of it - it would be
helpful to repeat it in a discussion on changing the unit system.
Obviously I would love to have every method accept and return unitized data
:-).

I bring this up because I was just working on a hover/annotation class
that needed to move a single annotation artist with the mouse. To move the
annotation box the way I needed to, I had to set to one private member
variable, call two set methods, use attribute assignment for one value, and
set one semi-public member variable - some of which work with units and
some of which didn't. I think having a clear "this kind of method
accepts/returns units" policy would help when people are adding new
accessors/methods/variables to make it more clear what kind of data is
acceptable in each.

Ted
ps: I may be able to help with some resources to work on any unit
upgrades, but to make that happen I need to get a clear statement of what
problem is being solved and the scope of the work so I can explain to our
management why it's important.

________________________________________
From: Matplotlib-devel <matplotlib-devel-bounces+ted.
drain=jpl.nasa.gov at python.org<mailto:matplotlib-devel-bounce
s+ted.drain=jpl.nasa.gov at python.org
<matplotlib-devel-bounces+ted.drain=jpl.nasa.gov at python.org>><
mailto:jpl.nasa.gov at python.org <jpl.nasa.gov at python.org>><mailto:jpl.
nasa.gov at python.org <jpl.nasa.gov at python.org><mailto:
jpl.nasa.gov at python.org <jpl.nasa.gov at python.org>>>> on behalf of Jody
Klymak <jklymak at uvic.ca<mailto:jklymak at uvic.ca <jklymak at uvic.ca>><
mailto:jklymak at uvic.ca <jklymak at uvic.ca>><mailto:jklymak at uvic.ca
<jklymak at uvic.ca><mailto:jklymak at uvic.ca <jklymak at uvic.ca>>>>
Sent: Saturday, February 3, 2018 9:25 PM
To: matplotlib development list
Subject: [Matplotlib-devel] Units discussion...

Hi all,

To carry on the gitter discussion about unit handling, hopefully to lead
to a more stringent documentation and implimentation?.

In response to @anntzer I thought about the units support a bit - it
seems that rather than a transform, a more straightforward approach is to
have the converter map to float arrays in a unique way. This float mapping
would be completely analogous to `date2num` in `dates`, in that it doesn?t
change and is perfectly invertible without matplotlib ever knowing about
the unit information, though the axis could store it for the the tick
locators and formatters. It would also have an inverse that would supply
data back to the user in unit-aware data (though not necessarily in the
unit that the user supplied. e.g. if they supply 8*in, the and the
converter converts everything to meter floats, then the returned unitized
inverse would be 0.203*m, or whatever convention the converter wants to
supply.).

User ?unit? control, i.e. making the plot in inches instead of m, would
be accomplished with ticks locators and formatters. Matplotlib would never
directly convert between cm and inches (any more than it converts from days
to hours for dates), the downstream-supplied tick formatter and labeller
would do it.

Each axis would only get one converter, set by the first call to the
axis. Subsequent calls to the axis would pass all data (including bare
floats) to the converter. If the converter wants to pass bare floats then
it can do so. If it wants to accept other data types then it can do so.
It should be possible for the user to clear or set the converter, but then
they should know what they are doing and why.

Whats missing? I don?t think this is wildly different than what we have,
but maybe a bit more clear.

Cheers, Jody

_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel at python.org<mailto:Matplotlib-devel at python.org
<Matplotlib-devel at python.org>><mailto:Matplotlib-devel at python.org
<Matplotlib-devel at python.org>><mailto:Matplotlib-devel at python.org
<Matplotlib-devel at python.org><mailto:Matplotlib-devel at python.org
<Matplotlib-devel at python.org>>>
Matplotlib-devel Info Page
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel at python.org<mailto:Matplotlib-devel at python.org
<Matplotlib-devel at python.org>><mailto:Matplotlib-devel at python.org
<Matplotlib-devel at python.org>><mailto:Matplotlib-devel at python.org
<Matplotlib-devel at python.org><mailto:Matplotlib-devel at python.org
<Matplotlib-devel at python.org>

...
[Message tronqu?]
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel at python.org
Matplotlib-devel Info Page

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/matplotlib-devel/attachments/20180208/57631f66/attachment-0001.html>

···

On Thu, Feb 8, 2018 at 11:48 AM, Jody Klymak <jklymak at uvic.ca> wrote:

On 8 Feb 2018, at 09:54, Drain, Theodore R (392P) < >> theodore.r.drain at jpl.nasa.gov> wrote:
On 7 February 2018 at 06:06, Jody Klymak <jklymak at uvic.ca<mailto:jklyma >> k at uvic.ca <jklymak at uvic.ca>><mailto:jklymak at uvic.ca <jklymak at uvic.ca>>< >> mailto:jklymak at uvic.ca <jklymak at uvic.ca><mailto:jklymak at uvic.ca >> <jklymak at uvic.ca>>>> wrote:

_Drain_Theodore_R_39 · February 8, 2018, 9:47pm

FYI for anyone interested - we already submitted (around the time of the first unit submit code in 2009) a mock of up our unit and time classes, with converters and tickers which is located in matplotlib/testing/jpl_units/. It doesn't appear to be used in any tests anymore but it's there if anyone wants to look at it and was used in the original unit API testing.

I think everyone isn't in as much disagreement as it appears. The way MPL works right now, it's easy for dev's who aren't familiar with units to write code that works and appears correct, but fails for some cases like units. And they won't know that until a user runs into that case. So we should work to improve this situation. The solution will most likely be some combination of code changes, clearer dev docs, and more and better test cases.

I think a big problem is that the plots have no defined internal data representation. Since Python is untyped, it's easy to write code that works for one test case but fails others you might not think of. It also means that inside a plot method, a developer really doesn't know what functions they're allowed to use. Is the data variable a list? Is it unitized? Is it integers? floats? a numpy array?

That's why I'd propose that for any numeric data type, the unit converter must return a numpy array of floats. Then the plot code (and dev docs) can be very explicit about what functionality can be used and you can be sure that after the external->internal converter is run, you know what the data type is. If done properly, I think this actually makes the existing code simpler. We can have a sequence of converters that try to run on the input which would include "standard" types like lists and integers. So if a user puts in a Python list of integers, floats, numpy, or their own type, etc, the developer knows that once the converter at the top of the method runs, they have a numpy array of floats to work with and there is no guess as to what functions will work or not work.

If this works, then it can be "the one way" to write a plot function for numeric data and every method can have the conversion as the first step.

Ted
ps: I think this dev list is the best forum for this discussion unless you can arrange a conference where we can all meet up. I find gitter is too hard to follow unless you're watching it in real time. A forum thread would be better IMO, but we don't have that.

···

________________________________________
From: Nathan Goldbaum <nathan12343@gmail.com>
Sent: Thursday, February 8, 2018 12:13 PM
To: Drain, Theodore R (392P)
Cc: matplotlib development list
Subject: Re: [Matplotlib-devel] Units discussion...

On Thu, Feb 8, 2018 at 1:08 PM, Drain, Theodore R (392P) <theodore.r.drain at jpl.nasa.gov<mailto:theodore.r.drain at jpl.nasa.gov>> wrote:
Does numpy subclassing really matter? If the docs say the unit converter must convert from the external type to the internal type, then as long as the converter does that, it doesn't matter what the external type is or what it inherits from right? The point is that the converter class is the only class manipulating the external data objects - MPL shouldn't care what they are or what they inherit from.

To make my statement more concrete, here's a matplotlib pull request that fixed a bug that only triggered for astropy and yt but not for pint:

https://github.com/matplotlib/matplotlib/pull/6622

In this case it was an issue because of difference in how NumPy's masked array deals with ndarray subclasses versus array wrapper classes.

I think one issue is that data types are malleable in the API right now. Lists, tuples, numpy, ints, floats, etc are all possible inputs in many/most cases. IMO, the unit API should not be malleable at all. The unit converter API should say that the return type of external->internal conversion is always a specific value type (e.g. list of float, numpy float 64 array).

Jody: IMO, your example should plot the data in inches in the first plot call, then convert the second input to inches and plot that. The plot calls supports the xunits keyword argument which tells the converter what floating point unit conversion to apply. If that keyword is not specified, then it defaults to the type of the input. The example that needs to be more clear is if I do this:

ax.plot( x1, y1, xunits="km" )
ax.plot( x2, y2, xunits="miles" )

IMO, either the floats are km or miles, not both. So either the first call sticks the converter to using km and the second xunits is ignored. Or the second input overrides the first and requires that the first artists go back through a conversion to miles. Either is a reasonable choice for behavior (but the first is much easier to implement).

dstansby · February 8, 2018, 10:11pm

I agree with everything you've said there.

I propose to have a go at implementing what I proposed in the next few
weeks - on the surface it seems to me like it will simplify things a lot,
but I guess I'll see as I go how hard it actually is! If it works it will
be a bit of an upheaval for 3rd parties who use units at the moment, but
should be worth it in the long run.

David

···

On 8 February 2018 at 21:47, Drain, Theodore R (392P) < theodore.r.drain at jpl.nasa.gov> wrote:

FYI for anyone interested - we already submitted (around the time of the
first unit submit code in 2009) a mock of up our unit and time classes,
with converters and tickers which is located in
matplotlib/testing/jpl_units/. It doesn't appear to be used in any tests
anymore but it's there if anyone wants to look at it and was used in the
original unit API testing.

I think everyone isn't in as much disagreement as it appears. The way MPL
works right now, it's easy for dev's who aren't familiar with units to
write code that works and appears correct, but fails for some cases like
units. And they won't know that until a user runs into that case. So we
should work to improve this situation. The solution will most likely be
some combination of code changes, clearer dev docs, and more and better
test cases.

I think a big problem is that the plots have no defined internal data
representation. Since Python is untyped, it's easy to write code that
works for one test case but fails others you might not think of. It also
means that inside a plot method, a developer really doesn't know what
functions they're allowed to use. Is the data variable a list? Is it
unitized? Is it integers? floats? a numpy array?

That's why I'd propose that for any numeric data type, the unit converter
must return a numpy array of floats. Then the plot code (and dev docs) can
be very explicit about what functionality can be used and you can be sure
that after the external->internal converter is run, you know what the data
type is. If done properly, I think this actually makes the existing code
simpler. We can have a sequence of converters that try to run on the input
which would include "standard" types like lists and integers. So if a user
puts in a Python list of integers, floats, numpy, or their own type, etc,
the developer knows that once the converter at the top of the method runs,
they have a numpy array of floats to work with and there is no guess as to
what functions will work or not work.

If this works, then it can be "the one way" to write a plot function for
numeric data and every method can have the conversion as the first step.

Ted
ps: I think this dev list is the best forum for this discussion unless you
can arrange a conference where we can all meet up. I find gitter is too
hard to follow unless you're watching it in real time. A forum thread
would be better IMO, but we don't have that.

________________________________________
From: Nathan Goldbaum <nathan12343 at gmail.com>
Sent: Thursday, February 8, 2018 12:13 PM
To: Drain, Theodore R (392P)
Cc: matplotlib development list
Subject: Re: [Matplotlib-devel] Units discussion...

On Thu, Feb 8, 2018 at 1:08 PM, Drain, Theodore R (392P) < > theodore.r.drain at jpl.nasa.gov<mailto:theodore.r.drain at jpl.nasa.gov>> > wrote:
Does numpy subclassing really matter? If the docs say the unit converter
must convert from the external type to the internal type, then as long as
the converter does that, it doesn't matter what the external type is or
what it inherits from right? The point is that the converter class is the
only class manipulating the external data objects - MPL shouldn't care what
they are or what they inherit from.

To make my statement more concrete, here's a matplotlib pull request that
fixed a bug that only triggered for astropy and yt but not for pint:

colors: ensure masked array data is an ndarray by ngoldbaum · Pull Request #6622 · matplotlib/matplotlib · GitHub

In this case it was an issue because of difference in how NumPy's masked
array deals with ndarray subclasses versus array wrapper classes.

I think one issue is that data types are malleable in the API right now.
Lists, tuples, numpy, ints, floats, etc are all possible inputs in
many/most cases. IMO, the unit API should not be malleable at all. The
unit converter API should say that the return type of external->internal
conversion is always a specific value type (e.g. list of float, numpy float
64 array).

Jody: IMO, your example should plot the data in inches in the first plot
call, then convert the second input to inches and plot that. The plot
calls supports the xunits keyword argument which tells the converter what
floating point unit conversion to apply. If that keyword is not specified,
then it defaults to the type of the input. The example that needs to be
more clear is if I do this:

ax.plot( x1, y1, xunits="km" )
ax.plot( x2, y2, xunits="miles" )

IMO, either the floats are km or miles, not both. So either the first
call sticks the converter to using km and the second xunits is ignored. Or
the second input overrides the first and requires that the first artists go
back through a conversion to miles. Either is a reasonable choice for
behavior (but the first is much easier to implement).
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel at python.org
Matplotlib-devel Info Page

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/matplotlib-devel/attachments/20180208/ed036a1b/attachment-0001.html>

_Drain_Theodore_R_39 · February 8, 2018, 11:18pm

David,
What exactly is the proposal? I'm not sure I fully understand that. The existing unit system basically already does conversion to float, it's just not applied very evenly or in quite the same way in all the methods.

But - I wonder if this is the wrong first step to take. At this point, do we know which methods are working fine as is and which ones are not? Maybe it would be better to start writing a comprehensive set of unit test cases that pass different unitized data to each Axes method. That could serve to define what we expect to happen and would help identify methods that don't currently work. Failing unit tests are a nice way to identify code that needs to change and would help others know exactly what the "right" behavior is supposed to be. Then code changes could be made to start correctly those issues.

Then a similar set of unit tests with unitized data could be written for artists and the same process could be repeated. Once artists handle unitized data, it may also simplify the plot methods as well - at least for the ones whose primary role is to just build an artists.

Ted

···

________________________________________
From: David Stansby <dstansby@gmail.com>
Sent: Thursday, February 8, 2018 2:11 PM
To: Drain, Theodore R (392P)
Cc: matplotlib development list
Subject: Re: [Matplotlib-devel] Units discussion...

I agree with everything you've said there.

I propose to have a go at implementing what I proposed in the next few weeks - on the surface it seems to me like it will simplify things a lot, but I guess I'll see as I go how hard it actually is! If it works it will be a bit of an upheaval for 3rd parties who use units at the moment, but should be worth it in the long run.

David

On 8 February 2018 at 21:47, Drain, Theodore R (392P) <theodore.r.drain at jpl.nasa.gov<mailto:theodore.r.drain at jpl.nasa.gov>> wrote:
FYI for anyone interested - we already submitted (around the time of the first unit submit code in 2009) a mock of up our unit and time classes, with converters and tickers which is located in matplotlib/testing/jpl_units/. It doesn't appear to be used in any tests anymore but it's there if anyone wants to look at it and was used in the original unit API testing.

I think everyone isn't in as much disagreement as it appears. The way MPL works right now, it's easy for dev's who aren't familiar with units to write code that works and appears correct, but fails for some cases like units. And they won't know that until a user runs into that case. So we should work to improve this situation. The solution will most likely be some combination of code changes, clearer dev docs, and more and better test cases.

I think a big problem is that the plots have no defined internal data representation. Since Python is untyped, it's easy to write code that works for one test case but fails others you might not think of. It also means that inside a plot method, a developer really doesn't know what functions they're allowed to use. Is the data variable a list? Is it unitized? Is it integers? floats? a numpy array?

That's why I'd propose that for any numeric data type, the unit converter must return a numpy array of floats. Then the plot code (and dev docs) can be very explicit about what functionality can be used and you can be sure that after the external->internal converter is run, you know what the data type is. If done properly, I think this actually makes the existing code simpler. We can have a sequence of converters that try to run on the input which would include "standard" types like lists and integers. So if a user puts in a Python list of integers, floats, numpy, or their own type, etc, the developer knows that once the converter at the top of the method runs, they have a numpy array of floats to work with and there is no guess as to what functions will work or not work.

If this works, then it can be "the one way" to write a plot function for numeric data and every method can have the conversion as the first step.

Ted
ps: I think this dev list is the best forum for this discussion unless you can arrange a conference where we can all meet up. I find gitter is too hard to follow unless you're watching it in real time. A forum thread would be better IMO, but we don't have that.

________________________________________
From: Nathan Goldbaum <nathan12343 at gmail.com<mailto:nathan12343@gmail.com>>
Sent: Thursday, February 8, 2018 12:13 PM
To: Drain, Theodore R (392P)
Cc: matplotlib development list
Subject: Re: [Matplotlib-devel] Units discussion...

On Thu, Feb 8, 2018 at 1:08 PM, Drain, Theodore R (392P) <theodore.r.drain at jpl.nasa.gov<mailto:theodore.r.drain at jpl.nasa.gov><mailto:theodore.r.drain at jpl.nasa.gov<mailto:theodore.r.drain at jpl.nasa.gov>>> wrote:
Does numpy subclassing really matter? If the docs say the unit converter must convert from the external type to the internal type, then as long as the converter does that, it doesn't matter what the external type is or what it inherits from right? The point is that the converter class is the only class manipulating the external data objects - MPL shouldn't care what they are or what they inherit from.

To make my statement more concrete, here's a matplotlib pull request that fixed a bug that only triggered for astropy and yt but not for pint:

https://github.com/matplotlib/matplotlib/pull/6622

In this case it was an issue because of difference in how NumPy's masked array deals with ndarray subclasses versus array wrapper classes.

I think one issue is that data types are malleable in the API right now. Lists, tuples, numpy, ints, floats, etc are all possible inputs in many/most cases. IMO, the unit API should not be malleable at all. The unit converter API should say that the return type of external->internal conversion is always a specific value type (e.g. list of float, numpy float 64 array).

Jody: IMO, your example should plot the data in inches in the first plot call, then convert the second input to inches and plot that. The plot calls supports the xunits keyword argument which tells the converter what floating point unit conversion to apply. If that keyword is not specified, then it defaults to the type of the input. The example that needs to be more clear is if I do this:

ax.plot( x1, y1, xunits="km" )
ax.plot( x2, y2, xunits="miles" )

IMO, either the floats are km or miles, not both. So either the first call sticks the converter to using km and the second xunits is ignored. Or the second input overrides the first and requires that the first artists go back through a conversion to miles. Either is a reasonable choice for behavior (but the first is much easier to implement).
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel at python.org<mailto:Matplotlib-devel at python.org>
https://mail.python.org/mailman/listinfo/matplotlib-devel

jklymak1 · February 9, 2018, 12:33am

Aha! `jpl_units` *is* used for a few tests in `test_axes.py` and `test_dates.py`, and `test_patches.py`

And those tests behave in what I?d say is a reasonable manner. So maybe step 1 of coming up with a decent toy units system is accomplished.

Steps 2 and 3 might be to add some more tests and some documentation/tutorial using the toy unit system. I?m happy to give that a shot in the next few weeks. Then we can move on to fixing the methods that don?t play well w/ units.

Cheers, Jody

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/matplotlib-devel/attachments/20180208/7c812aed/attachment.html>

···

On 8 Feb 2018, at 15:18, Drain, Theodore R (392P) <theodore.r.drain at jpl.nasa.gov> wrote:

On 8 February 2018 at 21:47, Drain, Theodore R (392P) <theodore.r.drain at jpl.nasa.gov <mailto:theodore.r.drain at jpl.nasa.gov><mailto:theodore.r.drain at jpl.nasa.gov <mailto:theodore.r.drain at jpl.nasa.gov>>> wrote:
FYI for anyone interested - we already submitted (around the time of the first unit submit code in 2009) a mock of up our unit and time classes, with converters and tickers which is located in matplotlib/testing/jpl_units/. It doesn't appear to be used in any tests anymore but it's there if anyone wants to look at it and was used in the original unit API testing.

dstansby · February 9, 2018, 11:55am

After having thought about this a bit more, apologies for jumping the gun.

I'm trying to learn more about how the units system works at the moment and
why it works that way. I'll probably try and improve the docs a bit as I go.

One question I have at the moment is why do some plotting methods pass
units through them quite a long way, instead of just doing the conversion
right at the beginning of the method? It seems like the obvious thing to do
to me is do the conversion immediately, but I'm sure there must be a good
reason to pass units through in some places.

David

···

On 9 February 2018 at 00:33, Jody Klymak <jklymak at uvic.ca> wrote:

On 8 Feb 2018, at 15:18, Drain, Theodore R (392P) < > theodore.r.drain at jpl.nasa.gov> wrote:

On 8 February 2018 at 21:47, Drain, Theodore R (392P) < > theodore.r.drain at jpl.nasa.gov<mailto:theodore.r.drain at jpl.nasa.gov > <theodore.r.drain at jpl.nasa.gov>>> wrote:
FYI for anyone interested - we already submitted (around the time of the
first unit submit code in 2009) a mock of up our unit and time classes,
with converters and tickers which is located in
matplotlib/testing/jpl_units/. It doesn't appear to be used in any tests
anymore but it's there if anyone wants to look at it and was used in the
original unit API testing.

Aha! `jpl_units` *is* used for a few tests in `test_axes.py` and
`test_dates.py`, and `test_patches.py`

And those tests behave in what I?d say is a reasonable manner. So maybe
step 1 of coming up with a decent toy units system is accomplished.

Steps 2 and 3 might be to add some more tests and some
documentation/tutorial using the toy unit system. I?m happy to give that a
shot in the next few weeks. Then we can move on to fixing the methods that
don?t play well w/ units.

Cheers, Jody

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/matplotlib-devel/attachments/20180209/707e3bec/attachment.html>

jklymak1 · February 9, 2018, 5:22pm

After having thought about this a bit more, apologies for jumping the gun.

I'm trying to learn more about how the units system works at the moment and why it works that way. I'll probably try and improve the docs a bit as I go.

One question I have at the moment is why do some plotting methods pass units through them quite a long way, instead of just doing the conversion right at the beginning of the method? It seems like the obvious thing to do to me is do the conversion immediately, but I'm sure there must be a good reason to pass units through in some places.

So that something like

ax.plot(a, b, units=?inches?)
ax.plot(a, c, units=?centimeters?)

will convert to centimeters at draw time. This currently works with `plot` because `Line2D` objects carry the units through until the path gets cached. It doesn?t work for `scatter`, though it seems possible that it could, but maybe only with violence to how `scatter` is architectured.

I was thinking of building a units tutorial on this, with what works and what doesn?t work documented. But if you are eager to do it, then that?d be great.

I found this to be a reasonable set of examples:

import matplotlib.pyplot as plt
import numpy as np
import matplotlib.testing.jpl_units as units
units.register()

xdata = [x*units.sec for x in range(10)]
xdata = np.arange(10) * units.sec
ydata1 = (1.5*np.arange(10) - 0.5) * units.km
ydata2 = (1.75*np.arange(10) - 1.0) * units.km

fig, ax = plt.subplots()
ax.plot(xdata, ydata1, color='blue', xunits="sec")
ax.set_xlabel("x-label 001")

fig, ax = plt.subplots()
ax.plot(xdata, ydata1, color='blue', xunits="hour")
ax.set_xlabel("hours")

fig, ax = plt.subplots()
ax.plot(xdata, ydata1, color='blue', xunits="sec", yunits='m')
ax.plot(xdata, ydata2, color='green', xunits="hour")
ax.set_xlabel("hours")
plt.show()

···

On 9 Feb 2018, at 03:55, David Stansby <dstansby at gmail.com> wrote:

David

On 9 February 2018 at 00:33, Jody Klymak <jklymak at uvic.ca <mailto:jklymak at uvic.ca>> wrote:

On 8 Feb 2018, at 15:18, Drain, Theodore R (392P) <theodore.r.drain at jpl.nasa.gov <mailto:theodore.r.drain at jpl.nasa.gov>> wrote:

On 8 February 2018 at 21:47, Drain, Theodore R (392P) <theodore.r.drain at jpl.nasa.gov <mailto:theodore.r.drain at jpl.nasa.gov><mailto:theodore.r.drain at jpl.nasa.gov <mailto:theodore.r.drain at jpl.nasa.gov>>> wrote:
FYI for anyone interested - we already submitted (around the time of the first unit submit code in 2009) a mock of up our unit and time classes, with converters and tickers which is located in matplotlib/testing/jpl_units/. It doesn't appear to be used in any tests anymore but it's there if anyone wants to look at it and was used in the original unit API testing.

Aha! `jpl_units` *is* used for a few tests in `test_axes.py` and `test_dates.py`, and `test_patches.py`

And those tests behave in what I?d say is a reasonable manner. So maybe step 1 of coming up with a decent toy units system is accomplished.

Steps 2 and 3 might be to add some more tests and some documentation/tutorial using the toy unit system. I?m happy to give that a shot in the next few weeks. Then we can move on to fixing the methods that don?t play well w/ units.

Cheers, Jody

--
Jody Klymak

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/matplotlib-devel/attachments/20180209/697fec22/attachment.html>