[Matplotlib-users] Consist Symbols/Sizes between Plot/Scatter

Manuel_Metz1 · June 13, 2008, 12:40pm

John Hunter wrote:

I am making a scatter plot and want the legend to display the symbols.
This functionality doesn't seem to exist, so I have followed the
workaround outlined here:

http://www.nabble.com/Legend-for-a-scatter-plot-based-on-symbols-td17554839.html#a17554839

Are there any plans to make the symbols which are available in plot()
the same as those available for scatter()? If not, can we at least
get the diamond symbol the same? I want to pass the same symbol to
plot() and scatter() and get the same symbol---as it is, I must use
'd' in scatter and 'D' in plot.

I think this is worth fixing -- could you file a bug on the
sourceforge site and note any symbols you are aware of that don't
agree between plot and scatter.

Hi John,

this is something that came up some time ago already.
http://www.nabble.com/forum/ViewPost.jtp?post=16054211&framed=y

Paul Novak had planed to work on scatter legend and I am also interested in this, so we came up with a code fragment, but it doesn't do the job well. I think a legend for scatter is something that is really needed for matplotlib. The main problem is, that I got lost in all those transform things - finally I felt like crazy

Also, how are the markersizes scaled? For example, in scatter(), I am
using s=30...but if I do plot(...,markersize=30), then the markers are
not the same size as the markers from the scatter plot. I can go back
and forth until the scale is right, but is there a better way?

Note sure if this is a bug or a feature, but it is a consequence of
emulating the matlab approach. In scatter, the size often is used to
convey information, and the size parameter give thes area of the
circle that circumscribes the marker. In plot, he marker size is not
generally intended to convey information, only that something happened
here, and the size choice is usually one of aesthetics. In any case,
in plot the size is radius of the circle, the side of the square or
diamond. For most markers, you should be able to square the
markersize to get approximate correspondence between plot and scatter.

I think that's something that is also a bit "unfortunate" in matplotlib. Basically, there are two different routines to draw markers: one is plot(), and on is scatter(). Both do nearly the same thing, but use different code bases.
It might be worth to think about reorganizing this ??? Having one function that creates markers, and incorporating them from both, scatter() and plot(), sounds like a good idea to me !?

Manuel

···

On Thu, Jun 12, 2008 at 8:56 PM, T J <tjhnson@...149...> wrote:

_John_Hunter · June 13, 2008, 1:57pm

This is how I see it -- plot draws homogeneous markers. Knowing they
are homogeneous allows us to do optimizations in the backend, like
cacheing the rasterized marker and blitting it many places. This can
radically speed up plots where there are many markers. scatter, on
the hand, exists to draw heterogeneous markers, which vary either in
size or in color or both. This is why I have always been lukewarm in
adding auto-legend support -- which of the heterogeneous scatter
symbols gets the legend entry? I

n cases where you only vary the color or only vary the size, one could
use the first polygon as a proxy for auto-legending. This wouldn't
work all the time, but it might be good enough for folks who want
auto-legends and others would still have the option of doing the
legend polygon proxy trick. One way to make this easier would be for
the polygon collections to provide a polygon_proxy method which
returns a unit area patches.Polygon instance with properties set to
correspond to the first polygon in the collection -- then the legend
implementer could use it w/o worrying about all the different types of
polygon collections and the odd sizing that may occur by simply taking
the first element.

But yes, to the extent that we can centralize the marker symbol ->
path creation and reuse that between the line2d and poly collection
code, that would be a good thing. One could then scale and translate
the canonical polygons as needed for their respective uses.

On the size as area usage in scatter -- I have found this
counter-intuitive from day 1, and I wrote the damned thing, but as I
noted we inherited this from matlab and a lot of code is built around
it, so we are probably stuck with it. If you think it is sufficiently
irksome, you can add a kwarg to scatter along the lines of

def scatter(..., sizearea=True)

where if sizearea is False we treat the size as a linear dimension.
This could also be an rc param so people could set the defaults to
behave in a more intuitive way w/o breaking the old code.

JDH

···

On Fri, Jun 13, 2008 at 7:40 AM, Manuel Metz <mmetz@...459...> wrote:

Paul Novak had planed to work on scatter legend and I am also interested in
this, so we came up with a code fragment, but it doesn't do the job well. I
think a legend for scatter is something that is really needed for
matplotlib. The main problem is, that I got lost in all those transform
things - finally I felt like crazy

I think that's something that is also a bit "unfortunate" in matplotlib.
Basically, there are two different routines to draw markers: one is plot(),
and on is scatter(). Both do nearly the same thing, but use different code
bases.
It might be worth to think about reorganizing this ??? Having one function
that creates markers, and incorporating them from both, scatter() and
plot(), sounds like a good idea to me !?

Manuel_Metz1 · June 13, 2008, 7:56pm

John Hunter wrote:

Paul Novak had planed to work on scatter legend and I am also interested in
this, so we came up with a code fragment, but it doesn't do the job well. I
think a legend for scatter is something that is really needed for
matplotlib. The main problem is, that I got lost in all those transform
things - finally I felt like crazy

I think that's something that is also a bit "unfortunate" in matplotlib.
Basically, there are two different routines to draw markers: one is plot(),
and on is scatter(). Both do nearly the same thing, but use different code
bases.
It might be worth to think about reorganizing this ??? Having one function
that creates markers, and incorporating them from both, scatter() and
plot(), sounds like a good idea to me !?

This is how I see it -- plot draws homogeneous markers. Knowing they
are homogeneous allows us to do optimizations in the backend, like
cacheing the rasterized marker and blitting it many places. This can
radically speed up plots where there are many markers.

!!!

> scatter, on

the hand, exists to draw heterogeneous markers, which vary either in
size or in color or both.

Well, I personally (sometimes) prefer scatter over plot, because scatter
has a much greater flexibility to produce different markers (star
symbols; custom symbols) ... That could be overcome if both, plot and
scatter, use the same code basis to create the markers.
That would also make clear that plot should in general be used for
homogeneous data and scatter for heterogeneous.

This is why I have always been lukewarm in
adding auto-legend support -- which of the heterogeneous scatter
symbols gets the legend entry? I

n cases where you only vary the color or only vary the size, one could
use the first polygon as a proxy for auto-legending. This wouldn't
work all the time, but it might be good enough for folks who want
auto-legends and others would still have the option of doing the
legend polygon proxy trick. One way to make this easier would be for
the polygon collections to provide a polygon_proxy method which
returns a unit area patches.Polygon instance with properties set to
correspond to the first polygon in the collection -- then the legend
implementer could use it w/o worrying about all the different types of
polygon collections and the odd sizing that may occur by simply taking
the first element.

Something like this might be _very_ useful. I think that would be the
most common case (vary only colours / only sizes), especially if one
produces figures for print media. As soon as you start to vary
colors/sizes/markers over-the-top, you probably don't won't to have a
legend any more - it starts to get too crowded.

If you have, however, say a data-set with x ~ 3 kinds of different
data and only vary few properties (like the sizes only) -- and there is
this legend option as you outlined above -- than there is a pretty easy
way to produce a meaningful legend. Split the data into x arrays, and
call scatter x-times.
I personally wouldn't worry too much about the size of a marker in
the legend - it should be approximately the size of the text. If I have
marked one data-set with squares and one with circles, most likely
everyone will understand the legend.

But yes, to the extent that we can centralize the marker symbol ->
path creation and reuse that between the line2d and poly collection
code, that would be a good thing. One could then scale and translate
the canonical polygons as needed for their respective uses.

On the size as area usage in scatter -- I have found this
counter-intuitive from day 1, and I wrote the damned thing, but as I
noted we inherited this from matlab and a lot of code is built around
it, so we are probably stuck with it. If you think it is sufficiently
irksome, you can add a kwarg to scatter along the lines of

def scatter(..., sizearea=True)

where if sizearea is False we treat the size as a linear dimension.
This could also be an rc param so people could set the defaults to
behave in a more intuitive way w/o breaking the old code.

JDH

Manuel

···

On Fri, Jun 13, 2008 at 7:40 AM, Manuel Metz <mmetz@...459...> wrote: