bugfix for dataLim problem (Chris Barker)

If this strategy sounds reasonable to you, I can go ahead

    > and implement it.

This looks fine; FYI I'll include a post I started in response to your
earlier email but failed to push send; this provides a little context

···

To: Eric Firing <efiring@...229...>
Cc: Christopher Barker <Chris.Barker@...236...>,
    matplotlib-users@lists.sourceforge.net
Subject: Re: [Matplotlib-users] Apparent bug in Data limits with LineCollections
From: John Hunter <jdhunter@...5...>

    > I would like to make a genuine bugfix, but I do not yet
    > understand all this well enough to do so right now. Maybe
    > John will chime in with a good solution.

Just a comment for now. If you look at ax.add_collection, it does not
update the datalim. This is by design but it should be documented.
The reason I didn't add it was collecitons were meant to be fast
(they've failed a little bit on that front but they aren't
mind-numbingly slow) and so I left it to the user to set the datalim
manually since this is potentially expensive and the user often knows
the lim for one reason or another. See the finance.py module for
several instances on how to set the data lim with collections. Eg,

    minx, maxx = (0, len(rangeSegments))
    miny = min([low for low in lows if low !=-1])
    maxy = max([high for high in highs if high != -1])

    corners = (minx, miny), (maxx, maxy)
    ax.update_datalim(corners)
    ax.autoscale_view()

As for how the datalim handling works, the syntax is

  self.dataLim.update(xys, ignore)

Note this is different than the ax.update_datalim method, which calls
it. datalim is a bbox which has an ignore state variable (boolean).

The ignore argument to update datalim can take on three values

  0: do not ignore the current limits and update them with the xys
  1: ignore the current datalim limits and override with xys
-1: use the datalim ignore state to determine the ignore settings

This seems a bit complex but arose from experience. Basically a lot
of different objects want to add their data to the datalim. In most
use cases, you want the first object to add data to ignore the current
limits (which are just default values) and subsequent objects to add
to the datalim taking into account the previous limits. The default
behavior of datalim is to set ignore to 1, and after the first call
with -1 set ignore to 0. Thus everyone can call with -1 and have the
desired default behavior . I hope you are all confused now.

One can manually set the ignore state var with

  datalim.ignore(1)

Cheers,
JDH

John,

Thanks very much. I had missed the fact that the ignore argument can take three values, not two, so I will take that into account. As usual, I might not finish the changes until the weekend.

Eric

John Hunter wrote:

···

"Eric" == Eric Firing <efiring@...229...> writes:

    > If this strategy sounds reasonable to you, I can go ahead
    > and implement it.

This looks fine; FYI I'll include a post I started in response to your
earlier email but failed to push send; this provides a little context

To: Eric Firing <efiring@...229...>
Cc: Christopher Barker <Chris.Barker@...236...>,
    matplotlib-users@lists.sourceforge.net
Subject: Re: [Matplotlib-users] Apparent bug in Data limits with LineCollections
From: John Hunter <jdhunter@...5...>

    > I would like to make a genuine bugfix, but I do not yet
    > understand all this well enough to do so right now. Maybe
    > John will chime in with a good solution.

Just a comment for now. If you look at ax.add_collection, it does not
update the datalim. This is by design but it should be documented.
The reason I didn't add it was collecitons were meant to be fast
(they've failed a little bit on that front but they aren't
mind-numbingly slow) and so I left it to the user to set the datalim
manually since this is potentially expensive and the user often knows
the lim for one reason or another. See the finance.py module for
several instances on how to set the data lim with collections. Eg,

    minx, maxx = (0, len(rangeSegments))
    miny = min([low for low in lows if low !=-1])
    maxy = max([high for high in highs if high != -1])

    corners = (minx, miny), (maxx, maxy)
    ax.update_datalim(corners)
    ax.autoscale_view()

As for how the datalim handling works, the syntax is

  self.dataLim.update(xys, ignore)

Note this is different than the ax.update_datalim method, which calls
it. datalim is a bbox which has an ignore state variable (boolean).

The ignore argument to update datalim can take on three values

  0: do not ignore the current limits and update them with the xys
  1: ignore the current datalim limits and override with xys
-1: use the datalim ignore state to determine the ignore settings

This seems a bit complex but arose from experience. Basically a lot
of different objects want to add their data to the datalim. In most
use cases, you want the first object to add data to ignore the current
limits (which are just default values) and subsequent objects to add
to the datalim taking into account the previous limits. The default
behavior of datalim is to set ignore to 1, and after the first call
with -1 set ignore to 0. Thus everyone can call with -1 and have the
desired default behavior . I hope you are all confused now.

One can manually set the ignore state var with

  datalim.ignore(1)

Cheers,
JDH

-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

John and Chris,

In reference to my suggestion:

1) Use a flag instead of the have_data() method to keep track of
whether data limit updating needs to start from scratch. Then
axes.cla() can set the flag, and the update_datalim* functions can
clear it.

2) Add an optional flag to add_collection, telling it to call the
collection's get_verts method and use the result to update the data
limits. This would make it easier to use collections in user-level
code, without imposing any performance penalty for functions like
contour that handle the data limit updating in a more efficient way.

I fixed the bug in which ax.cla() was not actually causing the existing dataLim to be ignored. It did not require an extra flag; it only required taking advantage of the ignore=-1 option of dataLim.update. The fixed version is in CVS.

I was not able to implement the second part of the strategy, however, which was to put an optional flag into axes.add_collection telling it to update the dataLim. The reason is quite fundamental, and I think it reveals some additional bugs.

Because collections have verts and offsets, which may have separate transformations, it is not possible, in general, to convert the plotted points to data units (which is what we would need) until the viewLim is set; but that defeats the purpose, which is to update the dataLim so that it can be used to calculate viewLim.

The related bug is that the collection get_verts() methods all simply add the offset to the verts and return the sums, which are quite meaningless unless transform and transOffset are identical; and even if they are identical, the units of get_verts() will depend on the transform.

Options for adding some automatic dataLim updating option to add_collection include:

1) Don't even try. Simply require it to be done manually. Make notes in docstrings and/or elsewhere.

2) Do it only if the transforms and offsets are such that it does not depend on the viewLim; otherwise raise an exception.

3) Do a partial job of it also in the case where transData is the transform for either of verts of offsets, and simply ignore the effect of whichever of these is not using transData.

I am leaning towards (1); it is not clear to me that the effort involved in the other options would be justified by the convenience gained.

Regarding the bug in get_verts(), the options include:

1) Add bug warnings to the docstrings.

2) Raise an exception if it is not possible to unambiguously determine the plotted points in data coordinates.

3) Remove the functions entirely on the grounds that they are dangerously misleading.

Most likely I have missed something important in all this.

Eric

Eric Firing wrote:

I fixed the bug in which ax.cla() was not actually causing the existing dataLim to be ignored. It did not require an extra flag; it only required taking advantage of the ignore=-1 option of dataLim.update. The fixed version is in CVS.

Wonderful. Thanks.

Because collections have verts and offsets, which may have separate transformations, it is not possible, in general, to convert the plotted points to data units (which is what we would need) until the viewLim is set; but that defeats the purpose, which is to update the dataLim so that it can be used to calculate viewLim.

1) Don't even try. Simply require it to be done manually. Make notes in docstrings and/or elsewhere.

Which is what we have now, yes?

2) Do it only if the transforms and offsets are such that it does not depend on the viewLim; otherwise raise an exception.

3) Do a partial job of it also in the case where transData is the transform for either of verts of offsets, and simply ignore the effect of whichever of these is not using transData.

I'd be happy if dataLim was calculated from just the offsets. Is that option 3? adding option 2 would be fine too, but wouldn't help me any.

That's because in my case, I'm using a LineCollection to create something kind of like a marker, so the offsets really do define the dataLim. Perhaps that's not universal enough to include, however. It would be easy.

-Chris

···

--
Christopher Barker, Ph.D.
Oceanographer
                                         
NOAA/OR&R/HAZMAT (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@...236...

Chris,

Christopher Barker wrote:

Eric Firing wrote:

I fixed the bug in which ax.cla() was not actually causing the existing dataLim to be ignored. It did not require an extra flag; it only required taking advantage of the ignore=-1 option of dataLim.update. The fixed version is in CVS.

Wonderful. Thanks.

Because collections have verts and offsets, which may have separate transformations, it is not possible, in general, to convert the plotted points to data units (which is what we would need) until the viewLim is set; but that defeats the purpose, which is to update the dataLim so that it can be used to calculate viewLim.

1) Don't even try. Simply require it to be done manually. Make notes in docstrings and/or elsewhere.

Which is what we have now, yes?

Yes, except for the "notes in docstrings and/or elsewhere".

2) Do it only if the transforms and offsets are such that it does not depend on the viewLim; otherwise raise an exception.

3) Do a partial job of it also in the case where transData is the transform for either of verts of offsets, and simply ignore the effect of whichever of these is not using transData.

I'd be happy if dataLim was calculated from just the offsets. Is that option 3? adding option 2 would be fine too, but wouldn't help me any.

Option 3 includes and extends option 2, and does cover your use case. What it does *not* do is ensure that the result of autoscaling is that you can see everything you are trying to plot. To some extent this is a problem even now; a line marker that happens to land in a corner (such as the origin) doesn't show up very well, because only 1/4 of it is in the plotted domain.

To get around this problem, one could put in yet another rc param that would ensure a margin is added to the viewLim as calculated from the dataLim. So, for example, plot([0,1], [0,1]) would have axis limits from -0.2 to 1.2 (or something like that) instead of 0 to 1, and a big red marker at each point would still be fully within the plotted domain.

A strategy question, of course, is how much of this sort of sophistication should be built in to the automatic plotting, versus simply available, as is the case now, with manual commands. This concern with excessive complexity is the reason I am not enthusiastic about option 3 (or 2); getting the right result manually is pretty easy once one knows how to do it. And, chances are, for a final product one would end up using the manual methods anyway so as to have full control.

That's because in my case, I'm using a LineCollection to create something kind of like a marker, so the offsets really do define the dataLim. Perhaps that's not universal enough to include, however. It would be easy.

-Chris

Eric

Eric Firing wrote:

Which is what we have now, yes?

Yes, except for the "notes in docstrings and/or elsewhere".

Yes. Notes are always good.

2) Do it only if the transforms and offsets are such that it does not depend on the viewLim; otherwise raise an exception.

3) Do a partial job of it also in the case where transData is the transform for either of verts of offsets, and simply ignore the effect of whichever of these is not using transData.

I'd be happy if dataLim was calculated from just the offsets. Is that option 3? adding option 2 would be fine too, but wouldn't help me any.

Option 3 includes and extends option 2, and does cover your use case. What it does *not* do is ensure that the result of autoscaling is that you can see everything you are trying to plot. To some extent this is a problem even now; a line marker that happens to land in a corner (such as the origin) doesn't show up very well, because only 1/4 of it is in the plotted domain.

Exactly, and I think that's fine.

To get around this problem, one could put in yet another rc param that would ensure a margin is added to the viewLim as calculated from the dataLim. So, for example, plot([0,1], [0,1]) would have axis limits from -0.2 to 1.2 (or something like that) instead of 0 to 1, and a big red marker at each point would still be fully within the plotted domain.

I think an rc param is a bad idea. there really is no way to do this in the general case, as you have NO idea how big people's markers, etc are going to be in data units. I also think it's no big deal to have part of a marker chopped off.

A strategy question, of course, is how much of this sort of sophistication should be built in to the automatic plotting, versus simply available, as is the case now, with manual commands. This concern with excessive complexity is the reason I am not enthusiastic about option 3 (or 2);

I agree. I still vote for my option: just calculate the dataLim from the offsets. That's it. It's easy, and it covers the basics.

getting the right result manually is pretty easy once one knows how to do it. And, chances are, for a final product one would end up using the manual methods anyway so as to have full control.

True, but it would be nice if the defaults gave you dataLims that were at least in the ballpark for your data, rather than arbitrary. It'd be easier for people to see the plot and think: "I need to tweak the axis limits"

Thanks for you work on this.

-Chris

···

--
Christopher Barker, Ph.D.
Oceanographer
                                         
NOAA/OR&R/HAZMAT (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@...236...