Empty scatter plot

I have come across a little inconsistency that was unexpected in the matplotlib API. The following is perfectly valid:

import matplotlib.pyplot as plt
plt.plot([], [])
plt.show()

However, this is not valid:

import matplotlib.pyplot as plt
plt.scatter([], [])
plt.show()

The immediate issue that comes up in scatter is that it attempts to find min/max of the input data for the purpose of autoscaling (this can probably be done better by just using set_xmargin(0.05) and set_ymargin(0.05)). This can easily be bypassed with an if statement. However, then we discover that polygon collection do not like having empty offsets, which leads to a failure in the affine transformation.

So, the question is, is this a bug or a feature? I personally believe that empty data is a perfectly valid scenario and given that other matplotlib functions handle it gracefully, we should make the collections object more friendly to empty data.

Ben Root

I agree that this is certainly a bug. For instance, I have a short
script which reads a stream of data from stdin and plots in
realtime. Unfortunately this bug made it prohibitively difficult to add
support for scatter plots. I'd definitely appreciate it if this were
fixed.

Cheers,

- Ben

···

On Tue, 15 Feb 2011 11:40:01 -0600, Benjamin Root <ben.root@...553...> wrote:

So, the question is, is this a bug or a feature? I personally believe that
empty data is a perfectly valid scenario and given that other matplotlib
functions handle it gracefully, we should make the collections object more
friendly to empty data.

I have come across a little inconsistency that was unexpected in the
matplotlib API. The following is perfectly valid:

import matplotlib.pyplot as plt
plt.plot([], [])
plt.show()

However, this is not valid:

import matplotlib.pyplot as plt
plt.scatter([], [])
plt.show()

The immediate issue that comes up in scatter is that it attempts to find
min/max of the input data for the purpose of autoscaling (this can
probably be done better by just using set_xmargin(0.05) and
set_ymargin(0.05)). This can easily be bypassed with an if statement.
However, then we discover that polygon collection do not like having
empty offsets, which leads to a failure in the affine transformation.

So, the question is, is this a bug or a feature? I personally believe
that empty data is a perfectly valid scenario and given that other
matplotlib functions handle it gracefully, we should make the
collections object more friendly to empty data.

I agree; a call with empty data should simply not plot anything.

Eric

···

On 02/15/2011 07:40 AM, Benjamin Root wrote:

Ben Root

Digging further, it appears that the problem is in _path.cpp for _path_module::affine_transform() which explicitly checks for an empty vertices array and throws an exception if it is empty.

So, do we want to make _path.cpp empty-friendly or should we just make empty collections objects just avoid doing anything that requires doing an affine transform?

Ben Root

···

On Tue, Feb 15, 2011 at 11:54 AM, Eric Firing <efiring@…229…> wrote:

On 02/15/2011 07:40 AM, Benjamin Root wrote:

I have come across a little inconsistency that was unexpected in the

matplotlib API. The following is perfectly valid:

import matplotlib.pyplot as plt

plt.plot([], [])

plt.show()

However, this is not valid:

import matplotlib.pyplot as plt

plt.scatter([], [])

plt.show()

The immediate issue that comes up in scatter is that it attempts to find

min/max of the input data for the purpose of autoscaling (this can

probably be done better by just using set_xmargin(0.05) and

set_ymargin(0.05)). This can easily be bypassed with an if statement.

However, then we discover that polygon collection do not like having

empty offsets, which leads to a failure in the affine transformation.

So, the question is, is this a bug or a feature? I personally believe

that empty data is a perfectly valid scenario and given that other

matplotlib functions handle it gracefully, we should make the

collections object more friendly to empty data.

I agree; a call with empty data should simply not plot anything.

Eric

Ok, some more digging deepens the mystery. While an empty-friendly _path.cpp would be nice, it appears that the collections and axes objects are already doing all it can to avoid doing transforms for empty collections.

However, it appears that the supposedly empty collection object from scatter([], []) is not really empty. Its self._paths member contains a list of unit_circle() from Path. This is also the case for EllipseCollection. Meanwhile, LineCollection and PatchCollection initializes their self._paths in accordance to their given data.

Ben Root

···

On Tue, Feb 15, 2011 at 12:19 PM, Benjamin Root <ben.root@…867…> wrote:

On Tue, Feb 15, 2011 at 11:54 AM, Eric Firing <efiring@…229…> wrote:

On 02/15/2011 07:40 AM, Benjamin Root wrote:

I have come across a little inconsistency that was unexpected in the

matplotlib API. The following is perfectly valid:

import matplotlib.pyplot as plt

plt.plot([], [])

plt.show()

However, this is not valid:

import matplotlib.pyplot as plt

plt.scatter([], [])

plt.show()

The immediate issue that comes up in scatter is that it attempts to find

min/max of the input data for the purpose of autoscaling (this can

probably be done better by just using set_xmargin(0.05) and

set_ymargin(0.05)). This can easily be bypassed with an if statement.

However, then we discover that polygon collection do not like having

empty offsets, which leads to a failure in the affine transformation.

So, the question is, is this a bug or a feature? I personally believe

that empty data is a perfectly valid scenario and given that other

matplotlib functions handle it gracefully, we should make the

collections object more friendly to empty data.

I agree; a call with empty data should simply not plot anything.

Eric

Digging further, it appears that the problem is in _path.cpp for _path_module::affine_transform() which explicitly checks for an empty vertices array and throws an exception if it is empty.

So, do we want to make _path.cpp empty-friendly or should we just make empty collections objects just avoid doing anything that requires doing an affine transform?

Ben Root

         > I have come across a little inconsistency that was unexpected
        in the
         > matplotlib API. The following is perfectly valid:
         >
         > import matplotlib.pyplot as plt
         > plt.plot([], [])
         > plt.show()
         >
         > However, this is not valid:
         >
         > import matplotlib.pyplot as plt
         > plt.scatter([], [])
         > plt.show()
         >
         > The immediate issue that comes up in scatter is that it
        attempts to find
         > min/max of the input data for the purpose of autoscaling
        (this can
         > probably be done better by just using set_xmargin(0.05) and
         > set_ymargin(0.05)). This can easily be bypassed with an if
        statement.
         > However, then we discover that polygon collection do not like
        having
         > empty offsets, which leads to a failure in the affine
        transformation.
         >
         > So, the question is, is this a bug or a feature? I
        personally believe
         > that empty data is a perfectly valid scenario and given that
        other
         > matplotlib functions handle it gracefully, we should make the
         > collections object more friendly to empty data.

        I agree; a call with empty data should simply not plot anything.

        Eric

    Digging further, it appears that the problem is in _path.cpp for
    _path_module::affine_transform() which explicitly checks for an
    empty vertices array and throws an exception if it is empty.

    So, do we want to make _path.cpp empty-friendly or should we just
    make empty collections objects just avoid doing anything that
    requires doing an affine transform?

    Ben Root

Ok, some more digging deepens the mystery. While an empty-friendly
_path.cpp would be nice, it appears that the collections and axes
objects are already doing all it can to avoid doing transforms for empty
collections.

However, it appears that the supposedly empty collection object from
scatter([], []) is not really empty. Its self._paths member contains a
list of unit_circle() from Path. This is also the case for
EllipseCollection. Meanwhile, LineCollection and PatchCollection
initializes their self._paths in accordance to their given data.

One way to solve the problem would be to start each draw() method with a short-circuit return in case there is nothing to draw. It would be needed only for classes for which empty self._paths is not a valid test. So for CircleCollection it would be:

     @allow_rasterization
     def draw(self, renderer):
         # sizes is the area of the circle circumscribing the polygon
         # in points^2
         if len(self._sizes) == 0:
             return
         self._transforms = [
             transforms.Affine2D().scale(
                 (np.sqrt(x) * self.figure.dpi / 72.0) / np.sqrt(np.pi))
             for x in self._sizes]
         return Collection.draw(self, renderer)

(Collection.draw returns nothing, so there is no inconsistency in the return value.)

Alternatively, it looks like maybe an empty self._transforms could be used in a short-circuit test at the start of Collection.draw.

Eric

···

On 02/15/2011 08:50 AM, Benjamin Root wrote:

On Tue, Feb 15, 2011 at 12:19 PM, Benjamin Root <ben.root@…553… > <mailto:ben.root@…553…>> wrote:
    On Tue, Feb 15, 2011 at 11:54 AM, Eric Firing <efiring@…229… > <mailto:efiring@…229…>> wrote:
        On 02/15/2011 07:40 AM, Benjamin Root wrote:

Ben Root

That wouldn’t completely solve the problem. Affine transforms are being done for get_datalim() as well. The other issue (and I see that I mixed this up myself) is the assumption elsewhere that a non-empty self._path attribute means that there is something to plot. This is an assumption that is made in axes.py on line 1413 and it is an invalid assumption.

As for your proposed solution in draw(), I prefer short-circuiting in Collections.draw(). This makes for less work for new Collection subclasses.

Ben Root

···

On Tue, Feb 15, 2011 at 1:17 PM, Eric Firing <efiring@…552…229…> wrote:

On 02/15/2011 08:50 AM, Benjamin Root wrote:

On Tue, Feb 15, 2011 at 12:19 PM, Benjamin Root <ben.root@…553… > > mailto:ben.root@...553...> wrote:

On Tue, Feb 15, 2011 at 11:54 AM, Eric Firing <efiring@...229... > > <mailto:efiring@...229...>> wrote:



    On 02/15/2011 07:40 AM, Benjamin Root wrote:

     > I have come across a little inconsistency that was unexpected

    in the

     > matplotlib API.  The following is perfectly valid:

     >

     > import matplotlib.pyplot as plt

     > plt.plot([], [])

     > plt.show()

     >

     >

     > However, this is not valid:

     >

     > import matplotlib.pyplot as plt

     > plt.scatter([], [])

     > plt.show()

     >

     >

     > The immediate issue that comes up in scatter is that it

    attempts to find

     > min/max of the input data for the purpose of autoscaling

    (this can

     > probably be done better by just using set_xmargin(0.05) and

     > set_ymargin(0.05)).  This can easily be bypassed with an if

    statement.

     > However, then we discover that polygon collection do not like

    having

     > empty offsets, which leads to a failure in the affine

    transformation.

     >

     > So, the question is, is this a bug or a feature?  I

    personally believe

     > that empty data is a perfectly valid scenario and given that

    other

     > matplotlib functions handle it gracefully, we should make the

     > collections object more friendly to empty data.



    I agree; a call with empty data should simply not plot anything.



    Eric





Digging further, it appears that the problem is in _path.cpp for

_path_module::affine_transform() which explicitly checks for an

empty vertices array and throws an exception if it is empty.



So, do we want to make _path.cpp empty-friendly or should we just

make empty collections objects just avoid doing anything that

requires doing an affine transform?



Ben Root

Ok, some more digging deepens the mystery. While an empty-friendly

_path.cpp would be nice, it appears that the collections and axes

objects are already doing all it can to avoid doing transforms for empty

collections.

However, it appears that the supposedly empty collection object from

scatter([], []) is not really empty. Its self._paths member contains a

list of unit_circle() from Path. This is also the case for

EllipseCollection. Meanwhile, LineCollection and PatchCollection

initializes their self._paths in accordance to their given data.

One way to solve the problem would be to start each draw() method with a short-circuit return in case there is nothing to draw. It would be needed only for classes for which empty self._paths is not a valid test. So for CircleCollection it would be:

@...725...

def draw(self, renderer):

    # sizes is the area of the circle circumscribing the polygon

    # in points^2

    if len(self._sizes) == 0:

        return

    self._transforms = [

        transforms.Affine2D().scale(

            (np.sqrt(x) * self.figure.dpi / 72.0) / np.sqrt(np.pi))

        for x in self._sizes]

    return Collection.draw(self, renderer)

(Collection.draw returns nothing, so there is no inconsistency in the return value.)

Alternatively, it looks like maybe an empty self._transforms could be used in a short-circuit test at the start of Collection.draw.

Eric

Ben Root

Working from an assumption that we can’t possibly find all bad/broken assumptions in mpl, I decided to go the route of Eric and make the Collections object as robust as possible. I have attached a patch that should enable empty scatter plots in matplotlib and enable empty Collections as well.

This has not been fully tested yet. Please be brutal!

Ben Root

enable_emptyscatter.patch (1.82 KB)

···

On Tue, Feb 15, 2011 at 1:41 PM, Benjamin Root <ben.root@…854…> wrote:

On Tue, Feb 15, 2011 at 1:17 PM, Eric Firing <efiring@…229…> wrote:

On 02/15/2011 08:50 AM, Benjamin Root wrote:

On Tue, Feb 15, 2011 at 12:19 PM, Benjamin Root <ben.root@…553… > > > mailto:ben.root@...553...> wrote:

On Tue, Feb 15, 2011 at 11:54 AM, Eric Firing <efiring@...229... > > > <mailto:efiring@...229...>> wrote:



    On 02/15/2011 07:40 AM, Benjamin Root wrote:

     > I have come across a little inconsistency that was unexpected

    in the

     > matplotlib API.  The following is perfectly valid:

     >

     > import matplotlib.pyplot as plt

     > plt.plot([], [])

     > plt.show()

     >

     >

     > However, this is not valid:

     >

     > import matplotlib.pyplot as plt

     > plt.scatter([], [])

     > plt.show()

     >

     >

     > The immediate issue that comes up in scatter is that it

    attempts to find

     > min/max of the input data for the purpose of autoscaling

    (this can

     > probably be done better by just using set_xmargin(0.05) and

     > set_ymargin(0.05)).  This can easily be bypassed with an if

    statement.

     > However, then we discover that polygon collection do not like

    having

     > empty offsets, which leads to a failure in the affine

    transformation.

     >

     > So, the question is, is this a bug or a feature?  I

    personally believe

     > that empty data is a perfectly valid scenario and given that

    other

     > matplotlib functions handle it gracefully, we should make the

     > collections object more friendly to empty data.



    I agree; a call with empty data should simply not plot anything.



    Eric





Digging further, it appears that the problem is in _path.cpp for

_path_module::affine_transform() which explicitly checks for an

empty vertices array and throws an exception if it is empty.



So, do we want to make _path.cpp empty-friendly or should we just

make empty collections objects just avoid doing anything that

requires doing an affine transform?



Ben Root

Ok, some more digging deepens the mystery. While an empty-friendly

_path.cpp would be nice, it appears that the collections and axes

objects are already doing all it can to avoid doing transforms for empty

collections.

However, it appears that the supposedly empty collection object from

scatter([], []) is not really empty. Its self._paths member contains a

list of unit_circle() from Path. This is also the case for

EllipseCollection. Meanwhile, LineCollection and PatchCollection

initializes their self._paths in accordance to their given data.

One way to solve the problem would be to start each draw() method with a short-circuit return in case there is nothing to draw. It would be needed only for classes for which empty self._paths is not a valid test. So for CircleCollection it would be:

@...725...

def draw(self, renderer):

    # sizes is the area of the circle circumscribing the polygon

    # in points^2

    if len(self._sizes) == 0:

        return

    self._transforms = [

        transforms.Affine2D().scale(

            (np.sqrt(x) * self.figure.dpi / 72.0) / np.sqrt(np.pi))

        for x in self._sizes]

    return Collection.draw(self, renderer)

(Collection.draw returns nothing, so there is no inconsistency in the return value.)

Alternatively, it looks like maybe an empty self._transforms could be used in a short-circuit test at the start of Collection.draw.

Eric

Ben Root

That wouldn’t completely solve the problem. Affine transforms are being done for get_datalim() as well. The other issue (and I see that I mixed this up myself) is the assumption elsewhere that a non-empty self._path attribute means that there is something to plot. This is an assumption that is made in axes.py on line 1413 and it is an invalid assumption.

As for your proposed solution in draw(), I prefer short-circuiting in Collections.draw(). This makes for less work for new Collection subclasses.

Ben Root

Heh, looks like that patch breaks the plotting of errorbars (the line connecting the center point to the error caps is missing).

I will see about fixing that.

Ben Root

···

On Tue, Feb 15, 2011 at 2:05 PM, Benjamin Root <ben.root@…854…> wrote:

On Tue, Feb 15, 2011 at 1:41 PM, Benjamin Root <ben.root@…553…> wrote:

On Tue, Feb 15, 2011 at 1:17 PM, Eric Firing <efiring@…229…> wrote:

On 02/15/2011 08:50 AM, Benjamin Root wrote:

On Tue, Feb 15, 2011 at 12:19 PM, Benjamin Root <ben.root@…553… > > > > mailto:ben.root@...553...> wrote:

On Tue, Feb 15, 2011 at 11:54 AM, Eric Firing <efiring@...229... > > > > <mailto:efiring@...229...>> wrote:



    On 02/15/2011 07:40 AM, Benjamin Root wrote:

     > I have come across a little inconsistency that was unexpected

    in the

     > matplotlib API.  The following is perfectly valid:

     >

     > import matplotlib.pyplot as plt

     > plt.plot([], [])

     > plt.show()

     >

     >

     > However, this is not valid:

     >

     > import matplotlib.pyplot as plt

     > plt.scatter([], [])

     > plt.show()

     >

     >

     > The immediate issue that comes up in scatter is that it

    attempts to find

     > min/max of the input data for the purpose of autoscaling

    (this can

     > probably be done better by just using set_xmargin(0.05) and

     > set_ymargin(0.05)).  This can easily be bypassed with an if

    statement.

     > However, then we discover that polygon collection do not like

    having

     > empty offsets, which leads to a failure in the affine

    transformation.

     >

     > So, the question is, is this a bug or a feature?  I

    personally believe

     > that empty data is a perfectly valid scenario and given that

    other

     > matplotlib functions handle it gracefully, we should make the

     > collections object more friendly to empty data.



    I agree; a call with empty data should simply not plot anything.



    Eric





Digging further, it appears that the problem is in _path.cpp for

_path_module::affine_transform() which explicitly checks for an

empty vertices array and throws an exception if it is empty.



So, do we want to make _path.cpp empty-friendly or should we just

make empty collections objects just avoid doing anything that

requires doing an affine transform?



Ben Root

Ok, some more digging deepens the mystery. While an empty-friendly

_path.cpp would be nice, it appears that the collections and axes

objects are already doing all it can to avoid doing transforms for empty

collections.

However, it appears that the supposedly empty collection object from

scatter([], []) is not really empty. Its self._paths member contains a

list of unit_circle() from Path. This is also the case for

EllipseCollection. Meanwhile, LineCollection and PatchCollection

initializes their self._paths in accordance to their given data.

One way to solve the problem would be to start each draw() method with a short-circuit return in case there is nothing to draw. It would be needed only for classes for which empty self._paths is not a valid test. So for CircleCollection it would be:

@...725...

def draw(self, renderer):

    # sizes is the area of the circle circumscribing the polygon

    # in points^2

    if len(self._sizes) == 0:

        return

    self._transforms = [

        transforms.Affine2D().scale(

            (np.sqrt(x) * self.figure.dpi / 72.0) / np.sqrt(np.pi))

        for x in self._sizes]

    return Collection.draw(self, renderer)

(Collection.draw returns nothing, so there is no inconsistency in the return value.)

Alternatively, it looks like maybe an empty self._transforms could be used in a short-circuit test at the start of Collection.draw.

Eric

Ben Root

That wouldn’t completely solve the problem. Affine transforms are being done for get_datalim() as well. The other issue (and I see that I mixed this up myself) is the assumption elsewhere that a non-empty self._path attribute means that there is something to plot. This is an assumption that is made in axes.py on line 1413 and it is an invalid assumption.

As for your proposed solution in draw(), I prefer short-circuiting in Collections.draw(). This makes for less work for new Collection subclasses.

Ben Root

Working from an assumption that we can’t possibly find all bad/broken assumptions in mpl, I decided to go the route of Eric and make the Collections object as robust as possible. I have attached a patch that should enable empty scatter plots in matplotlib and enable empty Collections as well.

This has not been fully tested yet. Please be brutal!

Ben Root

Nailed it!

After much examination and analysis, I realized that the transforms and the backend renderers are actually quite robust to empty arrays of points, so long as the arrays are shaped Nx2. It appears that in collections.py, an attempt is made to force that shape, but this is done inconsistently and wasn’t technically correct.

Consider the following snippet:

        offsets = np.asarray(offsets)
        if len(offsets.shape) == 1:
            offsets = offsets[np.newaxis,:]  # Make it Nx2.

If offsets was [[1, 4], [2, 5], [3, 6]], everything is fine. If offsets was [1, 4], everything works fine as well. However, if offsets was [1, 2, 3], this would go through, producing a 1x3 array (which isn’t what we want and can cause errors later). If offsets was [], then a (1, 0) array is produced, and this causes problems later in the code when it checks for empty arrays incorrectly.

We can make this code better and clearer by shaping the array appropriately like so:

         offsets = np.asarray(offsets)
         offsets.shape = (-1, 2)

There are multiple benefits to this. First, it explicitly makes it clear our intent and requirement (that offsets needs to be Nx2). Second, it guarantees no copying of the array. Third, any array that can’t fit this shape will throw an error. Fourth, we eliminate an if-statement!

So, with this cleaner enforcement, I applied it to other places in collections.py where it wasn’t being done at all. In addition, checks for empty arrays were being done with a check for len(offsets) > 0, which isn’t quite right (a 1x0 array would pass this check, which is incorrect), so I changed it to the clearer and more correct offsets.size > 0.

The attached patch will not only fix the issues in collections.py, but it also slightly changes the implementation of scatter() in axes.py to avoid doing min/max checks (which would fail for empty arrays) and instead will set data margins if they haven’t been set already (and if there is data).

I have tested the patch with the test suite and it passes. I also double-checked that all the mplot3d examples (which are not in the test suite) and they all rendered correctly.

My only question is: Is it too late to commit to svn?

I hope this enlightens others.
Ben Root

emptycollections.patch (4.69 KB)

···

On Thu, Feb 17, 2011 at 1:36 PM, Benjamin Root <ben.root@…553…> wrote:

On Tue, Feb 15, 2011 at 2:05 PM, Benjamin Root <ben.root@…553…> wrote:

On Tue, Feb 15, 2011 at 1:41 PM, Benjamin Root <ben.root@…553…> wrote:

On Tue, Feb 15, 2011 at 1:17 PM, Eric Firing <efiring@…229…> wrote:

On 02/15/2011 08:50 AM, Benjamin Root wrote:

On Tue, Feb 15, 2011 at 12:19 PM, Benjamin Root <ben.root@…553… > > > > > mailto:ben.root@...553...> wrote:

On Tue, Feb 15, 2011 at 11:54 AM, Eric Firing <efiring@...229... > > > > > <mailto:efiring@...229...>> wrote:



    On 02/15/2011 07:40 AM, Benjamin Root wrote:

     > I have come across a little inconsistency that was unexpected

    in the

     > matplotlib API.  The following is perfectly valid:

     >

     > import matplotlib.pyplot as plt

     > plt.plot([], [])

     > plt.show()

     >

     >

     > However, this is not valid:

     >

     > import matplotlib.pyplot as plt

     > plt.scatter([], [])

     > plt.show()

     >

     >

     > The immediate issue that comes up in scatter is that it

    attempts to find

     > min/max of the input data for the purpose of autoscaling

    (this can

     > probably be done better by just using set_xmargin(0.05) and

     > set_ymargin(0.05)).  This can easily be bypassed with an if

    statement.

     > However, then we discover that polygon collection do not like

    having

     > empty offsets, which leads to a failure in the affine

    transformation.

     >

     > So, the question is, is this a bug or a feature?  I

    personally believe

     > that empty data is a perfectly valid scenario and given that

    other

     > matplotlib functions handle it gracefully, we should make the

     > collections object more friendly to empty data.



    I agree; a call with empty data should simply not plot anything.



    Eric





Digging further, it appears that the problem is in _path.cpp for

_path_module::affine_transform() which explicitly checks for an

empty vertices array and throws an exception if it is empty.



So, do we want to make _path.cpp empty-friendly or should we just

make empty collections objects just avoid doing anything that

requires doing an affine transform?



Ben Root

Ok, some more digging deepens the mystery. While an empty-friendly

_path.cpp would be nice, it appears that the collections and axes

objects are already doing all it can to avoid doing transforms for empty

collections.

However, it appears that the supposedly empty collection object from

scatter([], []) is not really empty. Its self._paths member contains a

list of unit_circle() from Path. This is also the case for

EllipseCollection. Meanwhile, LineCollection and PatchCollection

initializes their self._paths in accordance to their given data.

One way to solve the problem would be to start each draw() method with a short-circuit return in case there is nothing to draw. It would be needed only for classes for which empty self._paths is not a valid test. So for CircleCollection it would be:

@...725...

def draw(self, renderer):

    # sizes is the area of the circle circumscribing the polygon

    # in points^2

    if len(self._sizes) == 0:

        return

    self._transforms = [

        transforms.Affine2D().scale(

            (np.sqrt(x) * self.figure.dpi / 72.0) / np.sqrt(np.pi))

        for x in self._sizes]

    return Collection.draw(self, renderer)

(Collection.draw returns nothing, so there is no inconsistency in the return value.)

Alternatively, it looks like maybe an empty self._transforms could be used in a short-circuit test at the start of Collection.draw.

Eric

Ben Root

That wouldn’t completely solve the problem. Affine transforms are being done for get_datalim() as well. The other issue (and I see that I mixed this up myself) is the assumption elsewhere that a non-empty self._path attribute means that there is something to plot. This is an assumption that is made in axes.py on line 1413 and it is an invalid assumption.

As for your proposed solution in draw(), I prefer short-circuiting in Collections.draw(). This makes for less work for new Collection subclasses.

Ben Root

Working from an assumption that we can’t possibly find all bad/broken assumptions in mpl, I decided to go the route of Eric and make the Collections object as robust as possible. I have attached a patch that should enable empty scatter plots in matplotlib and enable empty Collections as well.

This has not been fully tested yet. Please be brutal!

Ben Root

Heh, looks like that patch breaks the plotting of errorbars (the line connecting the center point to the error caps is missing).

I will see about fixing that.

Ben Root

                     > I have come across a little inconsistency that
                    was unexpected
                            in the
                     > matplotlib API. The following is perfectly valid:
                     >
                     > import matplotlib.pyplot as plt
                     > plt.plot([], [])
                     > plt.show()
                     >
                     > However, this is not valid:
                     >
                     > import matplotlib.pyplot as plt
                     > plt.scatter([], [])
                     > plt.show()
                     >
                     > The immediate issue that comes up in scatter is
                    that it
                            attempts to find
                     > min/max of the input data for the purpose of
                    autoscaling
                            (this can
                     > probably be done better by just using
                    set_xmargin(0.05) and
                     > set_ymargin(0.05)). This can easily be bypassed
                    with an if
                            statement.
                     > However, then we discover that polygon collection
                    do not like
                            having
                     > empty offsets, which leads to a failure in the affine
                            transformation.
                     >
                     > So, the question is, is this a bug or a feature? I
                            personally believe
                     > that empty data is a perfectly valid scenario and
                    given that
                            other
                     > matplotlib functions handle it gracefully, we
                    should make the
                     > collections object more friendly to empty data.

                            I agree; a call with empty data should
                    simply not plot anything.

                            Eric

                        Digging further, it appears that the problem is
                    in _path.cpp for
                        _path_module::affine_transform() which
                    explicitly checks for an
                        empty vertices array and throws an exception if
                    it is empty.

                        So, do we want to make _path.cpp empty-friendly
                    or should we just
                        make empty collections objects just avoid doing
                    anything that
                        requires doing an affine transform?

                        Ben Root

                    Ok, some more digging deepens the mystery. While an
                    empty-friendly
                    _path.cpp would be nice, it appears that the
                    collections and axes
                    objects are already doing all it can to avoid doing
                    transforms for empty
                    collections.

                    However, it appears that the supposedly empty
                    collection object from
                    scatter([], []) is not really empty. Its
                    self._paths member contains a
                    list of unit_circle() from Path. This is also the
                    case for
                    EllipseCollection. Meanwhile, LineCollection and
                    PatchCollection
                    initializes their self._paths in accordance to their
                    given data.

                One way to solve the problem would be to start each
                draw() method with a short-circuit return in case there
                is nothing to draw. It would be needed only for classes
                for which empty self._paths is not a valid test. So for
                CircleCollection it would be:

                    @allow_rasterization
                    def draw(self, renderer):
                        # sizes is the area of the circle circumscribing
                the polygon
                        # in points^2
                        if len(self._sizes) == 0:
                            return
                        self._transforms = [
                            transforms.Affine2D().scale(
                                (np.sqrt(x) * self.figure.dpi / 72.0) /
                np.sqrt(np.pi))
                            for x in self._sizes]
                        return Collection.draw(self, renderer)

                (Collection.draw returns nothing, so there is no
                inconsistency in the return value.)

                Alternatively, it looks like maybe an empty
                self._transforms could be used in a short-circuit test
                at the start of Collection.draw.

                Eric

                    Ben Root

            That wouldn't completely solve the problem. Affine
            transforms are being done for get_datalim() as well. The
            other issue (and I see that I mixed this up myself) is the
            assumption elsewhere that a non-empty self._path attribute
            means that there is something to plot. This is an
            assumption that is made in axes.py on line 1413 and it is an
            invalid assumption.

            As for your proposed solution in draw(), I prefer
            short-circuiting in Collections.draw(). This makes for less
            work for new Collection subclasses.

            Ben Root

        Working from an assumption that we can't possibly find all
        bad/broken assumptions in mpl, I decided to go the route of Eric
        and make the Collections object as robust as possible. I have
        attached a patch that should enable empty scatter plots in
        matplotlib and enable empty Collections as well.

        This has not been fully tested yet. Please be brutal!

        Ben Root

    Heh, looks like that patch breaks the plotting of errorbars (the
    line connecting the center point to the error caps is missing).

    I will see about fixing that.

    Ben Root

Nailed it!

After much examination and analysis, I realized that the transforms and
the backend renderers are actually quite robust to empty arrays of
points, so long as the arrays are shaped Nx2. It appears that in
collections.py, an attempt is made to force that shape, but this is done
inconsistently and wasn't technically correct.

Consider the following snippet:

             offsets = np.asarray(offsets)
             if len(offsets.shape) == 1:
                 offsets = offsets[np.newaxis,:] # Make it Nx2.

If offsets was [[1, 4], [2, 5], [3, 6]], everything is fine. If offsets
was [1, 4], everything works fine as well. However, if offsets was [1,
2, 3], this would go through, producing a 1x3 array (which isn't what we
want and can cause errors later). If offsets was [], then a (1, 0)
array is produced, and this causes problems later in the code when it
checks for empty arrays incorrectly.

We can make this code better and clearer by shaping the array
appropriately like so:

              offsets = np.asarray(offsets)
              offsets.shape = (-1, 2)

There are multiple benefits to this. First, it explicitly makes it
clear our intent and requirement (that offsets needs to be Nx2).
Second, it guarantees no copying of the array. Third, any array that
can't fit this shape will throw an error. Fourth, we eliminate an
if-statement!

So, with this cleaner enforcement, I applied it to other places in
collections.py where it wasn't being done at all. In addition, checks
for empty arrays were being done with a check for len(offsets) > 0,
which isn't quite right (a 1x0 array would pass this check, which is
incorrect), so I changed it to the clearer and more correct offsets.size
> 0.

The attached patch will not only fix the issues in collections.py, but
it also slightly changes the implementation of scatter() in axes.py to
avoid doing min/max checks (which would fail for empty arrays) and
instead will set data margins if they haven't been set already (and if
there is data).

I have tested the patch with the test suite and it passes. I also
double-checked that all the mplot3d examples (which are not in the test
suite) and they all rendered correctly.

My only question is: Is it too late to commit to svn?

I don't think so. Try it. It can't hurt.

I hope this enlightens others.

Yes, thank you!

Eric

···

On 02/18/2011 08:26 AM, Benjamin Root wrote:

On Thu, Feb 17, 2011 at 1:36 PM, Benjamin Root <ben.root@…553… > <mailto:ben.root@…553…>> wrote:
    On Tue, Feb 15, 2011 at 2:05 PM, Benjamin Root <ben.root@…553… > <mailto:ben.root@…553…>> wrote:
        On Tue, Feb 15, 2011 at 1:41 PM, Benjamin Root <ben.root@…553… > <mailto:ben.root@…553…>> wrote:
            On Tue, Feb 15, 2011 at 1:17 PM, Eric Firing > <efiring@…229… <mailto:efiring@…229…>> wrote:
                On 02/15/2011 08:50 AM, Benjamin Root wrote:
                    On Tue, Feb 15, 2011 at 12:19 PM, Benjamin Root > <ben.root@…553… <mailto:ben.root@…553…> > <mailto:ben.root@…553…>> > wrote:
                        On Tue, Feb 15, 2011 at 11:54 AM, Eric Firing > <efiring@…229… <mailto:efiring@…229…> > <mailto:efiring@…229… > <mailto:efiring@…229…>>> wrote:
                            On 02/15/2011 07:40 AM, Benjamin Root wrote:

Ben Root

Ah, and so it is still open. Committed in r8988.

Ben Root

···

On Fri, Feb 18, 2011 at 1:17 PM, Eric Firing <efiring@…552…229…> wrote:

On 02/18/2011 08:26 AM, Benjamin Root wrote:

On Thu, Feb 17, 2011 at 1:36 PM, Benjamin Root <ben.root@…553… > > mailto:ben.root@...553...> wrote:

On Tue, Feb 15, 2011 at 2:05 PM, Benjamin Root <ben.root@…553… > > mailto:ben.root@...553...> wrote:

On Tue, Feb 15, 2011 at 1:41 PM, Benjamin Root <ben.root@…553… > > mailto:ben.root@...553...> wrote:

On Tue, Feb 15, 2011 at 1:17 PM, Eric Firing > > <efiring@…229… mailto:efiring@...229...> wrote:

            On 02/15/2011 08:50 AM, Benjamin Root wrote:



                On Tue, Feb 15, 2011 at 12:19 PM, Benjamin Root

                <ben.root@...553... <mailto:ben.root@...553...>

<mailto:ben.root@…553… mailto:ben.root@...553...>>

                wrote:



                    On Tue, Feb 15, 2011 at 11:54 AM, Eric Firing

                <efiring@...229... <mailto:efiring@...229...>

<mailto:efiring@…229…

                <mailto:efiring@...229...>>> wrote:



                        On 02/15/2011 07:40 AM, Benjamin Root wrote:

                 > I have come across a little inconsistency that

                was unexpected

                        in the

                 > matplotlib API.  The following is perfectly valid:

                 >

                 > import matplotlib.pyplot as plt

                 > plt.plot([], [])

                 > plt.show()

                 >

                 >

                 > However, this is not valid:

                 >

                 > import matplotlib.pyplot as plt

                 > plt.scatter([], [])

                 > plt.show()

                 >

                 >

                 > The immediate issue that comes up in scatter is

                that it

                        attempts to find

                 > min/max of the input data for the purpose of

                autoscaling

                        (this can

                 > probably be done better by just using

                set_xmargin(0.05) and

                 > set_ymargin(0.05)).  This can easily be bypassed

                with an if

                        statement.

                 > However, then we discover that polygon collection

                do not like

                        having

                 > empty offsets, which leads to a failure in the affine

                        transformation.

                 >

                 > So, the question is, is this a bug or a feature?  I

                        personally believe

                 > that empty data is a perfectly valid scenario and

                given that

                        other

                 > matplotlib functions handle it gracefully, we

                should make the

                 > collections object more friendly to empty data.



                        I agree; a call with empty data should

                simply not plot anything.



                        Eric





                    Digging further, it appears that the problem is

                in _path.cpp for

                    _path_module::affine_transform() which

                explicitly checks for an

                    empty vertices array and throws an exception if

                it is empty.



                    So, do we want to make _path.cpp empty-friendly

                or should we just

                    make empty collections objects just avoid doing

                anything that

                    requires doing an affine transform?



                    Ben Root







                Ok, some more digging deepens the mystery.  While an

                empty-friendly

                _path.cpp would be nice, it appears that the

                collections and axes

                objects are already doing all it can to avoid doing

                transforms for empty

                collections.



                However, it appears that the supposedly empty

                collection object from

                scatter([], []) is not really empty.  Its

                self._paths member contains a

                list of unit_circle() from Path.  This is also the

                case for

                EllipseCollection.  Meanwhile, LineCollection and

                PatchCollection

                initializes their self._paths in accordance to their

                given data.





            One way to solve the problem would be to start each

            draw() method with a short-circuit return in case there

            is nothing to draw.  It would be needed only for classes

            for which empty self._paths is not a valid test.  So for

            CircleCollection it would be:



                @...725...

                def draw(self, renderer):

                    # sizes is the area of the circle circumscribing

            the polygon

                    # in points^2

                    if len(self._sizes) == 0:

                        return

                    self._transforms = [

                        transforms.Affine2D().scale(

                            (np.sqrt(x) * self.figure.dpi / 72.0) /

            np.sqrt(np.pi))

                        for x in self._sizes]

                    return Collection.draw(self, renderer)



            (Collection.draw returns nothing, so there is no

            inconsistency in the return value.)



            Alternatively, it looks like maybe an empty

            self._transforms could be used in a short-circuit test

            at the start of Collection.draw.



            Eric







                Ben Root







        That wouldn't completely solve the problem.  Affine

        transforms are being done for get_datalim() as well.  The

        other issue (and I see that I mixed this up myself) is the

        assumption elsewhere that a non-empty self._path attribute

        means that there is something to plot.  This is an

        assumption that is made in axes.py on line 1413 and it is an

        invalid assumption.



        As for your proposed solution in draw(), I prefer

        short-circuiting in Collections.draw().  This makes for less

        work for new Collection subclasses.



        Ben Root







    Working from an assumption that we can't possibly find all

    bad/broken assumptions in mpl, I decided to go the route of Eric

    and make the Collections object as robust as possible.  I have

    attached a patch that should enable empty scatter plots in

    matplotlib and enable empty Collections as well.



    This has not been fully tested yet.  Please be brutal!



    Ben Root







Heh, looks like that patch breaks the plotting of errorbars (the

line connecting the center point to the error caps is missing).



I will see about fixing that.



Ben Root

Nailed it!

After much examination and analysis, I realized that the transforms and

the backend renderers are actually quite robust to empty arrays of

points, so long as the arrays are shaped Nx2. It appears that in

collections.py, an attempt is made to force that shape, but this is done

inconsistently and wasn’t technically correct.

Consider the following snippet:

         offsets = np.asarray(offsets)

         if len(offsets.shape) == 1:

             offsets = offsets[np.newaxis,:]  # Make it Nx2.

If offsets was [[1, 4], [2, 5], [3, 6]], everything is fine. If offsets

was [1, 4], everything works fine as well. However, if offsets was [1,

2, 3], this would go through, producing a 1x3 array (which isn’t what we

want and can cause errors later). If offsets was [], then a (1, 0)

array is produced, and this causes problems later in the code when it

checks for empty arrays incorrectly.

We can make this code better and clearer by shaping the array

appropriately like so:

          offsets = np.asarray(offsets)

          offsets.shape = (-1, 2)

There are multiple benefits to this. First, it explicitly makes it

clear our intent and requirement (that offsets needs to be Nx2).

Second, it guarantees no copying of the array. Third, any array that

can’t fit this shape will throw an error. Fourth, we eliminate an

if-statement!

So, with this cleaner enforcement, I applied it to other places in

collections.py where it wasn’t being done at all. In addition, checks

for empty arrays were being done with a check for len(offsets) > 0,

which isn’t quite right (a 1x0 array would pass this check, which is

incorrect), so I changed it to the clearer and more correct offsets.size

The attached patch will not only fix the issues in collections.py, but

it also slightly changes the implementation of scatter() in axes.py to

avoid doing min/max checks (which would fail for empty arrays) and

instead will set data margins if they haven’t been set already (and if

there is data).

I have tested the patch with the test suite and it passes. I also

double-checked that all the mplot3d examples (which are not in the test

suite) and they all rendered correctly.

My only question is: Is it too late to commit to svn?

I don’t think so. Try it. It can’t hurt.

I hope this enlightens others.

Yes, thank you!

Eric

Ben Root