> I have come across a little inconsistency that
was unexpected
in the
> matplotlib API. The following is perfectly valid:
>
> import matplotlib.pyplot as plt
> plt.plot(, )
> plt.show()
>
> However, this is not valid:
>
> import matplotlib.pyplot as plt
> plt.scatter(, )
> plt.show()
>
> The immediate issue that comes up in scatter is
that it
attempts to find
> min/max of the input data for the purpose of
autoscaling
(this can
> probably be done better by just using
set_xmargin(0.05) and
> set_ymargin(0.05)). This can easily be bypassed
with an if
statement.
> However, then we discover that polygon collection
do not like
having
> empty offsets, which leads to a failure in the affine
transformation.
>
> So, the question is, is this a bug or a feature? I
personally believe
> that empty data is a perfectly valid scenario and
given that
other
> matplotlib functions handle it gracefully, we
should make the
> collections object more friendly to empty data.
I agree; a call with empty data should
simply not plot anything.
Eric
Digging further, it appears that the problem is
in _path.cpp for
_path_module::affine_transform() which
explicitly checks for an
empty vertices array and throws an exception if
it is empty.
So, do we want to make _path.cpp empty-friendly
or should we just
make empty collections objects just avoid doing
anything that
requires doing an affine transform?
Ben Root
Ok, some more digging deepens the mystery. While an
empty-friendly
_path.cpp would be nice, it appears that the
collections and axes
objects are already doing all it can to avoid doing
transforms for empty
collections.
However, it appears that the supposedly empty
collection object from
scatter(, ) is not really empty. Its
self._paths member contains a
list of unit_circle() from Path. This is also the
case for
EllipseCollection. Meanwhile, LineCollection and
PatchCollection
initializes their self._paths in accordance to their
given data.
One way to solve the problem would be to start each
draw() method with a short-circuit return in case there
is nothing to draw. It would be needed only for classes
for which empty self._paths is not a valid test. So for
CircleCollection it would be:
@allow_rasterization
def draw(self, renderer):
# sizes is the area of the circle circumscribing
the polygon
# in points^2
if len(self._sizes) == 0:
return
self._transforms = [
transforms.Affine2D().scale(
(np.sqrt(x) * self.figure.dpi / 72.0) /
np.sqrt(np.pi))
for x in self._sizes]
return Collection.draw(self, renderer)
(Collection.draw returns nothing, so there is no
inconsistency in the return value.)
Alternatively, it looks like maybe an empty
self._transforms could be used in a short-circuit test
at the start of Collection.draw.
Eric
Ben Root
That wouldn't completely solve the problem. Affine
transforms are being done for get_datalim() as well. The
other issue (and I see that I mixed this up myself) is the
assumption elsewhere that a non-empty self._path attribute
means that there is something to plot. This is an
assumption that is made in axes.py on line 1413 and it is an
invalid assumption.
As for your proposed solution in draw(), I prefer
short-circuiting in Collections.draw(). This makes for less
work for new Collection subclasses.
Ben Root
Working from an assumption that we can't possibly find all
bad/broken assumptions in mpl, I decided to go the route of Eric
and make the Collections object as robust as possible. I have
attached a patch that should enable empty scatter plots in
matplotlib and enable empty Collections as well.
This has not been fully tested yet. Please be brutal!
Ben Root
Heh, looks like that patch breaks the plotting of errorbars (the
line connecting the center point to the error caps is missing).
I will see about fixing that.
Ben Root
Nailed it!
After much examination and analysis, I realized that the transforms and
the backend renderers are actually quite robust to empty arrays of
points, so long as the arrays are shaped Nx2. It appears that in
collections.py, an attempt is made to force that shape, but this is done
inconsistently and wasn't technically correct.
Consider the following snippet:
offsets = np.asarray(offsets)
if len(offsets.shape) == 1:
offsets = offsets[np.newaxis,:] # Make it Nx2.
If offsets was [[1, 4], [2, 5], [3, 6]], everything is fine. If offsets
was [1, 4], everything works fine as well. However, if offsets was [1,
2, 3], this would go through, producing a 1x3 array (which isn't what we
want and can cause errors later). If offsets was , then a (1, 0)
array is produced, and this causes problems later in the code when it
checks for empty arrays incorrectly.
We can make this code better and clearer by shaping the array
appropriately like so:
offsets = np.asarray(offsets)
offsets.shape = (-1, 2)
There are multiple benefits to this. First, it explicitly makes it
clear our intent and requirement (that offsets needs to be Nx2).
Second, it guarantees no copying of the array. Third, any array that
can't fit this shape will throw an error. Fourth, we eliminate an
if-statement!
So, with this cleaner enforcement, I applied it to other places in
collections.py where it wasn't being done at all. In addition, checks
for empty arrays were being done with a check for len(offsets) > 0,
which isn't quite right (a 1x0 array would pass this check, which is
incorrect), so I changed it to the clearer and more correct offsets.size
> 0.
The attached patch will not only fix the issues in collections.py, but
it also slightly changes the implementation of scatter() in axes.py to
avoid doing min/max checks (which would fail for empty arrays) and
instead will set data margins if they haven't been set already (and if
there is data).
I have tested the patch with the test suite and it passes. I also
double-checked that all the mplot3d examples (which are not in the test
suite) and they all rendered correctly.
My only question is: Is it too late to commit to svn?
I don't think so. Try it. It can't hurt.