Better defaults all around?

Hi all,

Since we're considering the possibility of making a matplotlib 2.0
release with a better default colormap, it occurred to me that it
might make sense to take this opportunity to improve other visual
defaults.

Defaults are important. Obviously for publication graphs you'll want
to end up tweaking every detail, but (a) not everyone does but we
still have to read their graphs, and (b) probably only 1% of the plots
I make are for publication; the rest are quick one-offs that I make
on-the-fly to help me understand my own data. For such plots it's
usually not worth spending much/any time tweaking layout details, I
just want something usable, quickly. And I think there's a fair amount
of low-hanging improvements possible.

Batching multiple visual changes like this together seems much better
than spreading them out over multiple releases. It keeps the messaging
super easy to understand: "matplotlib 2.0 is just like 1.x, your code
will still work, the only difference is that your plots will look
better by default". And grouping these changes together makes it
easier to provide for users who need to revert back to the old
defaults -- it's easy to provide simple binary choice between "before
2.0" versus "after 2.0", harder to keep track of a bunch of different
changes spread over multiple releases.

Some particular annoyances I often run into and that might be
candidates for changing:

- The default method of choosing axis limits is IME really, really
annoying, because of the way it tries to find "round number"
boundaries. It's a clever idea, but in practice I've almost never seen
this pick axis limits that are particularly meaningful for my data,
and frequently it picks particularly bad ones. For example, suppose
you want to plot the spectrum of a signal; because of FFT's preference
for power-of-two sizes works it's natural to end up with samples
ranging from 0 to 255. If you plot this, matplotlib will give you an
xlim of (0, 300), which looks pretty ridiculous. But even worse is the
way this method of choosing xlims can actually obscure data -- if the
extreme values in your data set happen to fall exactly on a "round
number", then this will be used as the axis limits, and you'll end up
with data plotted directly underneath the axis spine. I frequently
encounter this when making scatter plots of data in the 0-1 range --
the points located at exactly 0 and 1 are very important to see, but
are nearly invisible by default. A similar case I ran into recently
was when plotting autocorrelation functions for different signals. For
reference I wanted to include the theoretically ideal ACF for white
noise, which looks like this:
    plt.plot(np.arange(1000), [1] + [0] * 999)
Good luck reading that plot!

R's default rule for deciding axis limits is very simple: extend the
data range by 4% on each side; those are your limits. IME this rule --
while obviously not perfect -- always produces something readable and
unobjectionable.

- Axis tickmarks should point outwards rather than inwards: There's
really no advantage to making them point inwards, and pointing inwards
means they can obscure data. My favorite example of this is plotting a
histogram with 100 bins -- that's an obvious thing to do, right? Check
it out:
  plt.hist(np.random.RandomState(0).uniform(size=100000), bins=100)
This makes me do a double-take every few months until I remember
what's going on: "WTF why is the bar on the left showing a *stacked*
barplot...ohhhhh right those are just the ticks, which happen to be
exactly the same width as the bar." Very confusing.

Seaborn's built-in themes give you the options of (1) no axis ticks at
all, just a background grid (by default the white-on-light-grey grid
as popularized by ggplot2), (2) outwards pointing tickmarks. Either
option seems like a better default to me!

- Default line colors: The rgbcmyk color cycle for line plots doesn't
appear to be based on any real theory about visualization -- it's just
the corners of the RGB color cube, which is a highly perceptually
non-uniform space. The resulting lines aren't terribly high contrast
against the default white background, and the different colors have
varying luminance that makes some lines "pop out" more than others.

Seaborn's default is to use a nice isoluminant variant on matplotlib's default:
   http://web.stanford.edu/~mwaskom/software/seaborn/tutorial/aesthetics.html
ggplot2 uses isoluminant colors with maximally-separated hues, which
also works well. E.g.:
   http://www.cookbook-r.com/Graphs/Colors_(ggplot2)/ggplot2_scale_hue_colors_l45.png

- Line thickness: basically every time I make a line plot I wish the
lines were thicker. This is another thing that seaborn simply changes
unconditionally.

In general I guess we could do a lot worse than to simply adopt
seaborn's defaults as the matplotlib defaults :slight_smile: Their full list of
overrides can be seen here:
   https://github.com/mwaskom/seaborn/blob/master/seaborn/rcmod.py#L135
   https://github.com/mwaskom/seaborn/blob/master/seaborn/rcmod.py#L301

- Dash styles: a common recommendation for line plots is to
simultaneously vary both the color and the dash style of your lines,
because redundant cues are good and dash styles are more robust than
color in the face of greyscale printing etc. But every time I try to
follow this advice I find myself having to define new dashes from
scratch, because matplotlib's default dash styles ("-", "--", "-.",
":") have wildly varying weights; in particular I often find it hard
to even see the dots in the ":" and "-." styles. Here's someone with a
similar complaint:
     http://philbull.wordpress.com/2012/03/14/custom-dashdot-line-styles-in-matplotlib/

Just as very rough numbers, something along the lines of "--" = [7,
4], "-." = [7, 4, 3, 4], ":" = [2, 1.5] looks much better to me.

It might also make sense to consider baking the advice I mentioned
above into matplotlib directly, and having a non-trivial dash cycle
enabled by default. (So the first line plotted uses "-", second uses
"--" or similar, etc.) This would also have the advantage that if we
make the length of the color cycle and the dash cycle relatively
prime, then we'll dramatically increase the number of lines that can
be plotted on the same graph with distinct appearances. (I often run
into the annoying situation where I throw up a quick-and-dirty plot,
maybe with something like pandas's dataframe.plot(), and then discover
that I have multiple indistinguishable lines.)

Obviously one could quibble with my specific proposals here, but does
in general seem like a useful thing to do?

-n

···

--
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

Some of your wishes are in progress already: https://github.com/matplotlib/matplotlib/pull/3818
There is also an issue open about scaling the dashes with the line width, and you are right, the spacing for the dashes are terrible.

I can definitely see the argument to making a bunch of these visual changes together. Preferably, I would like to do these changes via style sheets so that we can provide a “classic” stylesheet for backwards compatibility.

I do actually like the autoscaling system as it exists now. The problem is that the data margins feature is applied haphazardly. The power spectra example is a good example of where we could “smarten” the system. As for the ticks… I think that is a very obscure edge-case. I personally prefer inward.

It is good to get these grievances enumerated. I am interested in seeing where this discussion goes.

Cheers!
Ben Root

···

On Fri, Nov 21, 2014 at 6:22 PM, Nathaniel Smith <njs@…503…> wrote:

Hi all,

Since we’re considering the possibility of making a matplotlib 2.0

release with a better default colormap, it occurred to me that it

might make sense to take this opportunity to improve other visual

defaults.

Defaults are important. Obviously for publication graphs you’ll want

to end up tweaking every detail, but (a) not everyone does but we

still have to read their graphs, and (b) probably only 1% of the plots

I make are for publication; the rest are quick one-offs that I make

on-the-fly to help me understand my own data. For such plots it’s

usually not worth spending much/any time tweaking layout details, I

just want something usable, quickly. And I think there’s a fair amount

of low-hanging improvements possible.

Batching multiple visual changes like this together seems much better

than spreading them out over multiple releases. It keeps the messaging

super easy to understand: "matplotlib 2.0 is just like 1.x, your code

will still work, the only difference is that your plots will look

better by default". And grouping these changes together makes it

easier to provide for users who need to revert back to the old

defaults – it’s easy to provide simple binary choice between "before

2.0" versus “after 2.0”, harder to keep track of a bunch of different

changes spread over multiple releases.

Some particular annoyances I often run into and that might be

candidates for changing:

  • The default method of choosing axis limits is IME really, really

annoying, because of the way it tries to find “round number”

boundaries. It’s a clever idea, but in practice I’ve almost never seen

this pick axis limits that are particularly meaningful for my data,

and frequently it picks particularly bad ones. For example, suppose

you want to plot the spectrum of a signal; because of FFT’s preference

for power-of-two sizes works it’s natural to end up with samples

ranging from 0 to 255. If you plot this, matplotlib will give you an

xlim of (0, 300), which looks pretty ridiculous. But even worse is the

way this method of choosing xlims can actually obscure data – if the

extreme values in your data set happen to fall exactly on a "round

number", then this will be used as the axis limits, and you’ll end up

with data plotted directly underneath the axis spine. I frequently

encounter this when making scatter plots of data in the 0-1 range –

the points located at exactly 0 and 1 are very important to see, but

are nearly invisible by default. A similar case I ran into recently

was when plotting autocorrelation functions for different signals. For

reference I wanted to include the theoretically ideal ACF for white

noise, which looks like this:

plt.plot(np.arange(1000), [1] + [0] * 999)

Good luck reading that plot!

R’s default rule for deciding axis limits is very simple: extend the

data range by 4% on each side; those are your limits. IME this rule –

while obviously not perfect – always produces something readable and

unobjectionable.

  • Axis tickmarks should point outwards rather than inwards: There’s

really no advantage to making them point inwards, and pointing inwards

means they can obscure data. My favorite example of this is plotting a

histogram with 100 bins – that’s an obvious thing to do, right? Check

it out:

plt.hist(np.random.RandomState(0).uniform(size=100000), bins=100)

This makes me do a double-take every few months until I remember

what’s going on: "WTF why is the bar on the left showing a stacked

barplot…ohhhhh right those are just the ticks, which happen to be

exactly the same width as the bar." Very confusing.

Seaborn’s built-in themes give you the options of (1) no axis ticks at

all, just a background grid (by default the white-on-light-grey grid

as popularized by ggplot2), (2) outwards pointing tickmarks. Either

option seems like a better default to me!

  • Default line colors: The rgbcmyk color cycle for line plots doesn’t

appear to be based on any real theory about visualization – it’s just

the corners of the RGB color cube, which is a highly perceptually

non-uniform space. The resulting lines aren’t terribly high contrast

against the default white background, and the different colors have

varying luminance that makes some lines “pop out” more than others.

Seaborn’s default is to use a nice isoluminant variant on matplotlib’s default:

http://web.stanford.edu/~mwaskom/software/seaborn/tutorial/aesthetics.html

ggplot2 uses isoluminant colors with maximally-separated hues, which

also works well. E.g.:

http://www.cookbook-r.com/Graphs/Colors_%28ggplot2%29/ggplot2_scale_hue_colors_l45.png

  • Line thickness: basically every time I make a line plot I wish the

lines were thicker. This is another thing that seaborn simply changes

unconditionally.

In general I guess we could do a lot worse than to simply adopt

seaborn’s defaults as the matplotlib defaults :slight_smile: Their full list of

overrides can be seen here:

https://github.com/mwaskom/seaborn/blob/master/seaborn/rcmod.py#L135

https://github.com/mwaskom/seaborn/blob/master/seaborn/rcmod.py#L301

  • Dash styles: a common recommendation for line plots is to

simultaneously vary both the color and the dash style of your lines,

because redundant cues are good and dash styles are more robust than

color in the face of greyscale printing etc. But every time I try to

follow this advice I find myself having to define new dashes from

scratch, because matplotlib’s default dash styles (“-”, “–”, “-.”,

“:”) have wildly varying weights; in particular I often find it hard

to even see the dots in the “:” and “-.” styles. Here’s someone with a

similar complaint:

 [http://philbull.wordpress.com/2012/03/14/custom-dashdot-line-styles-in-matplotlib/](http://philbull.wordpress.com/2012/03/14/custom-dashdot-line-styles-in-matplotlib/)

Just as very rough numbers, something along the lines of “–” = [7,

4], “-.” = [7, 4, 3, 4], “:” = [2, 1.5] looks much better to me.

It might also make sense to consider baking the advice I mentioned

above into matplotlib directly, and having a non-trivial dash cycle

enabled by default. (So the first line plotted uses “-”, second uses

“–” or similar, etc.) This would also have the advantage that if we

make the length of the color cycle and the dash cycle relatively

prime, then we’ll dramatically increase the number of lines that can

be plotted on the same graph with distinct appearances. (I often run

into the annoying situation where I throw up a quick-and-dirty plot,

maybe with something like pandas’s dataframe.plot(), and then discover

that I have multiple indistinguishable lines.)

Obviously one could quibble with my specific proposals here, but does

in general seem like a useful thing to do?

-n

Nathaniel J. Smith

Postdoctoral researcher - Informatics - University of Edinburgh

http://vorpus.org


Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server

from Actuate! Instantly Supercharge Your Business Reports and Dashboards

with Interactivity, Sharing, Native Excel Exports, App Integration & more

Get technology previously reserved for billion-dollar corporations, FREE

http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk


Matplotlib-devel mailing list

Matplotlib-devel@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

I like the idea of aligning a set of changes for 2.0 even if still far away.

Regarding to backwards compatibility I think that indeed it is important but when changing mayor version (1.x to 2.0) becomes less important and we must take care of prioritizing evolution.
Take for example the OO interface (not defined yet) this is very probable to break the current pyplot interface but still this is a change that needs to be done.

In terms of defaults. I would like to see the new Navigation as default (if it gets merged) and tabbed figures (to come after navigation), having separate figures feel kind of …“old”

···

On Fri, Nov 21, 2014 at 6:22 PM, Nathaniel Smith <njs@…503…> wrote:

Hi all,

Since we’re considering the possibility of making a matplotlib 2.0

release with a better default colormap, it occurred to me that it

might make sense to take this opportunity to improve other visual

defaults.

Defaults are important. Obviously for publication graphs you’ll want

to end up tweaking every detail, but (a) not everyone does but we

still have to read their graphs, and (b) probably only 1% of the plots

I make are for publication; the rest are quick one-offs that I make

on-the-fly to help me understand my own data. For such plots it’s

usually not worth spending much/any time tweaking layout details, I

just want something usable, quickly. And I think there’s a fair amount

of low-hanging improvements possible.

Batching multiple visual changes like this together seems much better

than spreading them out over multiple releases. It keeps the messaging

super easy to understand: "matplotlib 2.0 is just like 1.x, your code

will still work, the only difference is that your plots will look

better by default". And grouping these changes together makes it

easier to provide for users who need to revert back to the old

defaults – it’s easy to provide simple binary choice between "before

2.0" versus “after 2.0”, harder to keep track of a bunch of different

changes spread over multiple releases.

Some particular annoyances I often run into and that might be

candidates for changing:

  • The default method of choosing axis limits is IME really, really

annoying, because of the way it tries to find “round number”

boundaries. It’s a clever idea, but in practice I’ve almost never seen

this pick axis limits that are particularly meaningful for my data,

and frequently it picks particularly bad ones. For example, suppose

you want to plot the spectrum of a signal; because of FFT’s preference

for power-of-two sizes works it’s natural to end up with samples

ranging from 0 to 255. If you plot this, matplotlib will give you an

xlim of (0, 300), which looks pretty ridiculous. But even worse is the

way this method of choosing xlims can actually obscure data – if the

extreme values in your data set happen to fall exactly on a "round

number", then this will be used as the axis limits, and you’ll end up

with data plotted directly underneath the axis spine. I frequently

encounter this when making scatter plots of data in the 0-1 range –

the points located at exactly 0 and 1 are very important to see, but

are nearly invisible by default. A similar case I ran into recently

was when plotting autocorrelation functions for different signals. For

reference I wanted to include the theoretically ideal ACF for white

noise, which looks like this:

plt.plot(np.arange(1000), [1] + [0] * 999)

Good luck reading that plot!

R’s default rule for deciding axis limits is very simple: extend the

data range by 4% on each side; those are your limits. IME this rule –

while obviously not perfect – always produces something readable and

unobjectionable.

  • Axis tickmarks should point outwards rather than inwards: There’s

really no advantage to making them point inwards, and pointing inwards

means they can obscure data. My favorite example of this is plotting a

histogram with 100 bins – that’s an obvious thing to do, right? Check

it out:

plt.hist(np.random.RandomState(0).uniform(size=100000), bins=100)

This makes me do a double-take every few months until I remember

what’s going on: "WTF why is the bar on the left showing a stacked

barplot…ohhhhh right those are just the ticks, which happen to be

exactly the same width as the bar." Very confusing.

Seaborn’s built-in themes give you the options of (1) no axis ticks at

all, just a background grid (by default the white-on-light-grey grid

as popularized by ggplot2), (2) outwards pointing tickmarks. Either

option seems like a better default to me!

  • Default line colors: The rgbcmyk color cycle for line plots doesn’t

appear to be based on any real theory about visualization – it’s just

the corners of the RGB color cube, which is a highly perceptually

non-uniform space. The resulting lines aren’t terribly high contrast

against the default white background, and the different colors have

varying luminance that makes some lines “pop out” more than others.

Seaborn’s default is to use a nice isoluminant variant on matplotlib’s default:

http://web.stanford.edu/~mwaskom/software/seaborn/tutorial/aesthetics.html

ggplot2 uses isoluminant colors with maximally-separated hues, which

also works well. E.g.:

http://www.cookbook-r.com/Graphs/Colors_%28ggplot2%29/ggplot2_scale_hue_colors_l45.png

  • Line thickness: basically every time I make a line plot I wish the

lines were thicker. This is another thing that seaborn simply changes

unconditionally.

In general I guess we could do a lot worse than to simply adopt

seaborn’s defaults as the matplotlib defaults :slight_smile: Their full list of

overrides can be seen here:

https://github.com/mwaskom/seaborn/blob/master/seaborn/rcmod.py#L135

https://github.com/mwaskom/seaborn/blob/master/seaborn/rcmod.py#L301

  • Dash styles: a common recommendation for line plots is to

simultaneously vary both the color and the dash style of your lines,

because redundant cues are good and dash styles are more robust than

color in the face of greyscale printing etc. But every time I try to

follow this advice I find myself having to define new dashes from

scratch, because matplotlib’s default dash styles (“-”, “–”, “-.”,

“:”) have wildly varying weights; in particular I often find it hard

to even see the dots in the “:” and “-.” styles. Here’s someone with a

similar complaint:

 [http://philbull.wordpress.com/2012/03/14/custom-dashdot-line-styles-in-matplotlib/](http://philbull.wordpress.com/2012/03/14/custom-dashdot-line-styles-in-matplotlib/)

Just as very rough numbers, something along the lines of “–” = [7,

4], “-.” = [7, 4, 3, 4], “:” = [2, 1.5] looks much better to me.

It might also make sense to consider baking the advice I mentioned

above into matplotlib directly, and having a non-trivial dash cycle

enabled by default. (So the first line plotted uses “-”, second uses

“–” or similar, etc.) This would also have the advantage that if we

make the length of the color cycle and the dash cycle relatively

prime, then we’ll dramatically increase the number of lines that can

be plotted on the same graph with distinct appearances. (I often run

into the annoying situation where I throw up a quick-and-dirty plot,

maybe with something like pandas’s dataframe.plot(), and then discover

that I have multiple indistinguishable lines.)

Obviously one could quibble with my specific proposals here, but does

in general seem like a useful thing to do?

-n

Nathaniel J. Smith

Postdoctoral researcher - Informatics - University of Edinburgh

http://vorpus.org


Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server

from Actuate! Instantly Supercharge Your Business Reports and Dashboards

with Interactivity, Sharing, Native Excel Exports, App Integration & more

Get technology previously reserved for billion-dollar corporations, FREE

http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk


Matplotlib-devel mailing list

Matplotlib-devel@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

With regards to defaults for 2.0, I am actually all for breaking them for the better. What I find important is giving users an easy mechanism to use an older style, if it is important to them. The current behavior isn’t “buggy” (for the most part) and failing to give users a way to get behavior that they found desirable would be alienating. I think this is why projects like prettyplotlib and seaborn have been so important to matplotlib. It enables those who are in the right position to judge styles to explore the possibilities easily without commiting matplotlib to any early decision and allowing it to have a level of stability that many users find attractive.

At the moment, the plans for the OO interface changes should not result in any (major) API breaks, so I am not concerned about that at the moment. Let’s keep focused on style related issues in this thread.

Tabbed figures? Intriguing… And I really do need to review that MEP of yours…

Cheers!
Ben Root

···

On Fri, Nov 21, 2014 at 9:36 PM, Federico Ariza <ariza.federico@…149…> wrote:

I like the idea of aligning a set of changes for 2.0 even if still far away.

Regarding to backwards compatibility I think that indeed it is important but when changing mayor version (1.x to 2.0) becomes less important and we must take care of prioritizing evolution.
Take for example the OO interface (not defined yet) this is very probable to break the current pyplot interface but still this is a change that needs to be done.

In terms of defaults. I would like to see the new Navigation as default (if it gets merged) and tabbed figures (to come after navigation), having separate figures feel kind of …“old”

On 21 Nov 2014 21:23, “Benjamin Root” <ben.root@…867…> wrote:

Some of your wishes are in progress already: https://github.com/matplotlib/matplotlib/pull/3818
There is also an issue open about scaling the dashes with the line width, and you are right, the spacing for the dashes are terrible.

I can definitely see the argument to making a bunch of these visual changes together. Preferably, I would like to do these changes via style sheets so that we can provide a “classic” stylesheet for backwards compatibility.

I do actually like the autoscaling system as it exists now. The problem is that the data margins feature is applied haphazardly. The power spectra example is a good example of where we could “smarten” the system. As for the ticks… I think that is a very obscure edge-case. I personally prefer inward.

It is good to get these grievances enumerated. I am interested in seeing where this discussion goes.

Cheers!
Ben Root


Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server

from Actuate! Instantly Supercharge Your Business Reports and Dashboards

with Interactivity, Sharing, Native Excel Exports, App Integration & more

Get technology previously reserved for billion-dollar corporations, FREE

http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk


Matplotlib-devel mailing list

Matplotlib-devel@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

On Fri, Nov 21, 2014 at 6:22 PM, Nathaniel Smith <njs@…503…> wrote:

Hi all,

Since we’re considering the possibility of making a matplotlib 2.0

release with a better default colormap, it occurred to me that it

might make sense to take this opportunity to improve other visual

defaults.

Defaults are important. Obviously for publication graphs you’ll want

to end up tweaking every detail, but (a) not everyone does but we

still have to read their graphs, and (b) probably only 1% of the plots

I make are for publication; the rest are quick one-offs that I make

on-the-fly to help me understand my own data. For such plots it’s

usually not worth spending much/any time tweaking layout details, I

just want something usable, quickly. And I think there’s a fair amount

of low-hanging improvements possible.

Batching multiple visual changes like this together seems much better

than spreading them out over multiple releases. It keeps the messaging

super easy to understand: "matplotlib 2.0 is just like 1.x, your code

will still work, the only difference is that your plots will look

better by default". And grouping these changes together makes it

easier to provide for users who need to revert back to the old

defaults – it’s easy to provide simple binary choice between "before

2.0" versus “after 2.0”, harder to keep track of a bunch of different

changes spread over multiple releases.

Some particular annoyances I often run into and that might be

candidates for changing:

  • The default method of choosing axis limits is IME really, really

annoying, because of the way it tries to find “round number”

boundaries. It’s a clever idea, but in practice I’ve almost never seen

this pick axis limits that are particularly meaningful for my data,

and frequently it picks particularly bad ones. For example, suppose

you want to plot the spectrum of a signal; because of FFT’s preference

for power-of-two sizes works it’s natural to end up with samples

ranging from 0 to 255. If you plot this, matplotlib will give you an

xlim of (0, 300), which looks pretty ridiculous. But even worse is the

way this method of choosing xlims can actually obscure data – if the

extreme values in your data set happen to fall exactly on a "round

number", then this will be used as the axis limits, and you’ll end up

with data plotted directly underneath the axis spine. I frequently

encounter this when making scatter plots of data in the 0-1 range –

the points located at exactly 0 and 1 are very important to see, but

are nearly invisible by default. A similar case I ran into recently

was when plotting autocorrelation functions for different signals. For

reference I wanted to include the theoretically ideal ACF for white

noise, which looks like this:

plt.plot(np.arange(1000), [1] + [0] * 999)

Good luck reading that plot!

R’s default rule for deciding axis limits is very simple: extend the

data range by 4% on each side; those are your limits. IME this rule –

while obviously not perfect – always produces something readable and

unobjectionable.

  • Axis tickmarks should point outwards rather than inwards: There’s

really no advantage to making them point inwards, and pointing inwards

means they can obscure data. My favorite example of this is plotting a

histogram with 100 bins – that’s an obvious thing to do, right? Check

it out:

plt.hist(np.random.RandomState(0).uniform(size=100000), bins=100)

This makes me do a double-take every few months until I remember

what’s going on: "WTF why is the bar on the left showing a stacked

barplot…ohhhhh right those are just the ticks, which happen to be

exactly the same width as the bar." Very confusing.

Seaborn’s built-in themes give you the options of (1) no axis ticks at

all, just a background grid (by default the white-on-light-grey grid

as popularized by ggplot2), (2) outwards pointing tickmarks. Either

option seems like a better default to me!

  • Default line colors: The rgbcmyk color cycle for line plots doesn’t

appear to be based on any real theory about visualization – it’s just

the corners of the RGB color cube, which is a highly perceptually

non-uniform space. The resulting lines aren’t terribly high contrast

against the default white background, and the different colors have

varying luminance that makes some lines “pop out” more than others.

Seaborn’s default is to use a nice isoluminant variant on matplotlib’s default:

http://web.stanford.edu/~mwaskom/software/seaborn/tutorial/aesthetics.html

ggplot2 uses isoluminant colors with maximally-separated hues, which

also works well. E.g.:

http://www.cookbook-r.com/Graphs/Colors_%28ggplot2%29/ggplot2_scale_hue_colors_l45.png

  • Line thickness: basically every time I make a line plot I wish the

lines were thicker. This is another thing that seaborn simply changes

unconditionally.

In general I guess we could do a lot worse than to simply adopt

seaborn’s defaults as the matplotlib defaults :slight_smile: Their full list of

overrides can be seen here:

https://github.com/mwaskom/seaborn/blob/master/seaborn/rcmod.py#L135

https://github.com/mwaskom/seaborn/blob/master/seaborn/rcmod.py#L301

  • Dash styles: a common recommendation for line plots is to

simultaneously vary both the color and the dash style of your lines,

because redundant cues are good and dash styles are more robust than

color in the face of greyscale printing etc. But every time I try to

follow this advice I find myself having to define new dashes from

scratch, because matplotlib’s default dash styles (“-”, “–”, “-.”,

“:”) have wildly varying weights; in particular I often find it hard

to even see the dots in the “:” and “-.” styles. Here’s someone with a

similar complaint:

 [http://philbull.wordpress.com/2012/03/14/custom-dashdot-line-styles-in-matplotlib/](http://philbull.wordpress.com/2012/03/14/custom-dashdot-line-styles-in-matplotlib/)

Just as very rough numbers, something along the lines of “–” = [7,

4], “-.” = [7, 4, 3, 4], “:” = [2, 1.5] looks much better to me.

It might also make sense to consider baking the advice I mentioned

above into matplotlib directly, and having a non-trivial dash cycle

enabled by default. (So the first line plotted uses “-”, second uses

“–” or similar, etc.) This would also have the advantage that if we

make the length of the color cycle and the dash cycle relatively

prime, then we’ll dramatically increase the number of lines that can

be plotted on the same graph with distinct appearances. (I often run

into the annoying situation where I throw up a quick-and-dirty plot,

maybe with something like pandas’s dataframe.plot(), and then discover

that I have multiple indistinguishable lines.)

Obviously one could quibble with my specific proposals here, but does

in general seem like a useful thing to do?

-n

Nathaniel J. Smith

Postdoctoral researcher - Informatics - University of Edinburgh

http://vorpus.org


Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server

from Actuate! Instantly Supercharge Your Business Reports and Dashboards

with Interactivity, Sharing, Native Excel Exports, App Integration & more

Get technology previously reserved for billion-dollar corporations, FREE

http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk


Matplotlib-devel mailing list

Matplotlib-devel@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

I would be also quite interested in having better defaults. My list of "complains" are:

* Easy way to get only two lines for axis (left and down, instead of four)
* Better default font (Source Sans Pro / Source Code Pro for example (open source))
* Better default colormap
* Better axis limit (when you draw with thick lines, they get cut)
* Better icons for the toolbar (there are a lot of free icons around)
* Better colors (more pastel)
* Less "cluttered" figures
* Lighter grids

+ All Nathaniel's suggestions

Ideally, we could have a set of standard figures for each main type (plot, scatter, quiver) and tweak parameters to search for the best output.

Nicolas

···

On 22 Nov 2014, at 04:18, Benjamin Root <ben.root@...553...> wrote:

With regards to defaults for 2.0, I am actually all for breaking them for the better. What I find important is giving users an easy mechanism to use an older style, if it is important to them. The current behavior isn't "buggy" (for the most part) and failing to give users a way to get behavior that they found desirable would be alienating. I think this is why projects like prettyplotlib and seaborn have been so important to matplotlib. It enables those who are in the right position to judge styles to explore the possibilities easily without commiting matplotlib to any early decision and allowing it to have a level of stability that many users find attractive.

At the moment, the plans for the OO interface changes should not result in any (major) API breaks, so I am not concerned about that at the moment. Let's keep focused on style related issues in this thread.

Tabbed figures? Intriguing... And I really do need to review that MEP of yours...

Cheers!
Ben Root

On Fri, Nov 21, 2014 at 9:36 PM, Federico Ariza <ariza.federico@...149...> wrote:
I like the idea of aligning a set of changes for 2.0 even if still far away.

Regarding to backwards compatibility I think that indeed it is important but when changing mayor version (1.x to 2.0) becomes less important and we must take care of prioritizing evolution.
Take for example the OO interface (not defined yet) this is very probable to break the current pyplot interface but still this is a change that needs to be done.

In terms of defaults. I would like to see the new Navigation as default (if it gets merged) and tabbed figures (to come after navigation), having separate figures feel kind of ..."old"

On 21 Nov 2014 21:23, "Benjamin Root" <ben.root@...553...> wrote:
Some of your wishes are in progress already: [ENH] Initial support for linestyle cycling on plot() by WeatherGod · Pull Request #3818 · matplotlib/matplotlib · GitHub
There is also an issue open about scaling the dashes with the line width, and you are right, the spacing for the dashes are terrible.

I can definitely see the argument to making a bunch of these visual changes together. Preferably, I would like to do these changes via style sheets so that we can provide a "classic" stylesheet for backwards compatibility.

I do actually like the autoscaling system as it exists now. The problem is that the data margins feature is applied haphazardly. The power spectra example is a good example of where we could "smarten" the system. As for the ticks... I think that is a very obscure edge-case. I personally prefer inward.

It is good to get these grievances enumerated. I am interested in seeing where this discussion goes.

Cheers!
Ben Root

On Fri, Nov 21, 2014 at 6:22 PM, Nathaniel Smith <njs@...503...> wrote:
Hi all,

Since we're considering the possibility of making a matplotlib 2.0
release with a better default colormap, it occurred to me that it
might make sense to take this opportunity to improve other visual
defaults.

Defaults are important. Obviously for publication graphs you'll want
to end up tweaking every detail, but (a) not everyone does but we
still have to read their graphs, and (b) probably only 1% of the plots
I make are for publication; the rest are quick one-offs that I make
on-the-fly to help me understand my own data. For such plots it's
usually not worth spending much/any time tweaking layout details, I
just want something usable, quickly. And I think there's a fair amount
of low-hanging improvements possible.

Batching multiple visual changes like this together seems much better
than spreading them out over multiple releases. It keeps the messaging
super easy to understand: "matplotlib 2.0 is just like 1.x, your code
will still work, the only difference is that your plots will look
better by default". And grouping these changes together makes it
easier to provide for users who need to revert back to the old
defaults -- it's easy to provide simple binary choice between "before
2.0" versus "after 2.0", harder to keep track of a bunch of different
changes spread over multiple releases.

Some particular annoyances I often run into and that might be
candidates for changing:

- The default method of choosing axis limits is IME really, really
annoying, because of the way it tries to find "round number"
boundaries. It's a clever idea, but in practice I've almost never seen
this pick axis limits that are particularly meaningful for my data,
and frequently it picks particularly bad ones. For example, suppose
you want to plot the spectrum of a signal; because of FFT's preference
for power-of-two sizes works it's natural to end up with samples
ranging from 0 to 255. If you plot this, matplotlib will give you an
xlim of (0, 300), which looks pretty ridiculous. But even worse is the
way this method of choosing xlims can actually obscure data -- if the
extreme values in your data set happen to fall exactly on a "round
number", then this will be used as the axis limits, and you'll end up
with data plotted directly underneath the axis spine. I frequently
encounter this when making scatter plots of data in the 0-1 range --
the points located at exactly 0 and 1 are very important to see, but
are nearly invisible by default. A similar case I ran into recently
was when plotting autocorrelation functions for different signals. For
reference I wanted to include the theoretically ideal ACF for white
noise, which looks like this:
    plt.plot(np.arange(1000), [1] + [0] * 999)
Good luck reading that plot!

R's default rule for deciding axis limits is very simple: extend the
data range by 4% on each side; those are your limits. IME this rule --
while obviously not perfect -- always produces something readable and
unobjectionable.

- Axis tickmarks should point outwards rather than inwards: There's
really no advantage to making them point inwards, and pointing inwards
means they can obscure data. My favorite example of this is plotting a
histogram with 100 bins -- that's an obvious thing to do, right? Check
it out:
  plt.hist(np.random.RandomState(0).uniform(size=100000), bins=100)
This makes me do a double-take every few months until I remember
what's going on: "WTF why is the bar on the left showing a *stacked*
barplot...ohhhhh right those are just the ticks, which happen to be
exactly the same width as the bar." Very confusing.

Seaborn's built-in themes give you the options of (1) no axis ticks at
all, just a background grid (by default the white-on-light-grey grid
as popularized by ggplot2), (2) outwards pointing tickmarks. Either
option seems like a better default to me!

- Default line colors: The rgbcmyk color cycle for line plots doesn't
appear to be based on any real theory about visualization -- it's just
the corners of the RGB color cube, which is a highly perceptually
non-uniform space. The resulting lines aren't terribly high contrast
against the default white background, and the different colors have
varying luminance that makes some lines "pop out" more than others.

Seaborn's default is to use a nice isoluminant variant on matplotlib's default:
   Controlling figure aesthetics — seaborn 0.13.0 documentation
ggplot2 uses isoluminant colors with maximally-separated hues, which
also works well. E.g.:
   http://www.cookbook-r.com/Graphs/Colors_(ggplot2)/ggplot2_scale_hue_colors_l45.png

- Line thickness: basically every time I make a line plot I wish the
lines were thicker. This is another thing that seaborn simply changes
unconditionally.

In general I guess we could do a lot worse than to simply adopt
seaborn's defaults as the matplotlib defaults :slight_smile: Their full list of
overrides can be seen here:
   https://github.com/mwaskom/seaborn/blob/master/seaborn/rcmod.py#L135
   https://github.com/mwaskom/seaborn/blob/master/seaborn/rcmod.py#L301

- Dash styles: a common recommendation for line plots is to
simultaneously vary both the color and the dash style of your lines,
because redundant cues are good and dash styles are more robust than
color in the face of greyscale printing etc. But every time I try to
follow this advice I find myself having to define new dashes from
scratch, because matplotlib's default dash styles ("-", "--", "-.",
":") have wildly varying weights; in particular I often find it hard
to even see the dots in the ":" and "-." styles. Here's someone with a
similar complaint:
     Custom dash/dot line styles in matplotlib | Lumps 'n' Bumps

Just as very rough numbers, something along the lines of "--" = [7,
4], "-." = [7, 4, 3, 4], ":" = [2, 1.5] looks much better to me.

It might also make sense to consider baking the advice I mentioned
above into matplotlib directly, and having a non-trivial dash cycle
enabled by default. (So the first line plotted uses "-", second uses
"--" or similar, etc.) This would also have the advantage that if we
make the length of the color cycle and the dash cycle relatively
prime, then we'll dramatically increase the number of lines that can
be plotted on the same graph with distinct appearances. (I often run
into the annoying situation where I throw up a quick-and-dirty plot,
maybe with something like pandas's dataframe.plot(), and then discover
that I have multiple indistinguishable lines.)

Obviously one could quibble with my specific proposals here, but does
in general seem like a useful thing to do?

-n

--
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
matplotlib-devel List Signup and Options

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
matplotlib-devel List Signup and Options

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
matplotlib-devel List Signup and Options

I think using native icons would be the best scenario, at least whet the backend and platform support it.

···

On Nov 22, 2014 9:08 AM, “Nicolas P. Rougier” <Nicolas.Rougier@…272…922…> wrote:

I would be also quite interested in having better defaults. My list of “complains” are:

  • Easy way to get only two lines for axis (left and down, instead of four)

  • Better default font (Source Sans Pro / Source Code Pro for example (open source))

  • Better default colormap

  • Better axis limit (when you draw with thick lines, they get cut)

  • Better icons for the toolbar (there are a lot of free icons around)

  • Better colors (more pastel)

  • Less “cluttered” figures

  • Lighter grids

  • All Nathaniel’s suggestions

Ideally, we could have a set of standard figures for each main type (plot, scatter, quiver) and tweak parameters to search for the best output.

Nicolas

On 22 Nov 2014, at 04:18, Benjamin Root <ben.root@…553…> wrote:

With regards to defaults for 2.0, I am actually all for breaking them for the better. What I find important is giving users an easy mechanism to use an older style, if it is important to them. The current behavior isn’t “buggy” (for the most part) and failing to give users a way to get behavior that they found desirable would be alienating. I think this is why projects like prettyplotlib and seaborn have been so important to matplotlib. It enables those who are in the right position to judge styles to explore the possibilities easily without commiting matplotlib to any early decision and allowing it to have a level of stability that many users find attractive.

At the moment, the plans for the OO interface changes should not result in any (major) API breaks, so I am not concerned about that at the moment. Let’s keep focused on style related issues in this thread.

Tabbed figures? Intriguing… And I really do need to review that MEP of yours…

Cheers!

Ben Root

On Fri, Nov 21, 2014 at 9:36 PM, Federico Ariza <ariza.federico@…149…> wrote:

I like the idea of aligning a set of changes for 2.0 even if still far away.

Regarding to backwards compatibility I think that indeed it is important but when changing mayor version (1.x to 2.0) becomes less important and we must take care of prioritizing evolution.

Take for example the OO interface (not defined yet) this is very probable to break the current pyplot interface but still this is a change that needs to be done.

In terms of defaults. I would like to see the new Navigation as default (if it gets merged) and tabbed figures (to come after navigation), having separate figures feel kind of …“old”

On 21 Nov 2014 21:23, “Benjamin Root” <ben.root@…553…> wrote:

Some of your wishes are in progress already: https://github.com/matplotlib/matplotlib/pull/3818

There is also an issue open about scaling the dashes with the line width, and you are right, the spacing for the dashes are terrible.

I can definitely see the argument to making a bunch of these visual changes together. Preferably, I would like to do these changes via style sheets so that we can provide a “classic” stylesheet for backwards compatibility.

I do actually like the autoscaling system as it exists now. The problem is that the data margins feature is applied haphazardly. The power spectra example is a good example of where we could “smarten” the system. As for the ticks… I think that is a very obscure edge-case. I personally prefer inward.

It is good to get these grievances enumerated. I am interested in seeing where this discussion goes.

Cheers!

Ben Root

On Fri, Nov 21, 2014 at 6:22 PM, Nathaniel Smith <njs@…503…> wrote:

Hi all,

Since we’re considering the possibility of making a matplotlib 2.0

release with a better default colormap, it occurred to me that it

might make sense to take this opportunity to improve other visual

defaults.

Defaults are important. Obviously for publication graphs you’ll want

to end up tweaking every detail, but (a) not everyone does but we

still have to read their graphs, and (b) probably only 1% of the plots

I make are for publication; the rest are quick one-offs that I make

on-the-fly to help me understand my own data. For such plots it’s

usually not worth spending much/any time tweaking layout details, I

just want something usable, quickly. And I think there’s a fair amount

of low-hanging improvements possible.

Batching multiple visual changes like this together seems much better

than spreading them out over multiple releases. It keeps the messaging

super easy to understand: "matplotlib 2.0 is just like 1.x, your code

will still work, the only difference is that your plots will look

better by default". And grouping these changes together makes it

easier to provide for users who need to revert back to the old

defaults – it’s easy to provide simple binary choice between "before

2.0" versus “after 2.0”, harder to keep track of a bunch of different

changes spread over multiple releases.

Some particular annoyances I often run into and that might be

candidates for changing:

  • The default method of choosing axis limits is IME really, really

annoying, because of the way it tries to find “round number”

boundaries. It’s a clever idea, but in practice I’ve almost never seen

this pick axis limits that are particularly meaningful for my data,

and frequently it picks particularly bad ones. For example, suppose

you want to plot the spectrum of a signal; because of FFT’s preference

for power-of-two sizes works it’s natural to end up with samples

ranging from 0 to 255. If you plot this, matplotlib will give you an

xlim of (0, 300), which looks pretty ridiculous. But even worse is the

way this method of choosing xlims can actually obscure data – if the

extreme values in your data set happen to fall exactly on a "round

number", then this will be used as the axis limits, and you’ll end up

with data plotted directly underneath the axis spine. I frequently

encounter this when making scatter plots of data in the 0-1 range –

the points located at exactly 0 and 1 are very important to see, but

are nearly invisible by default. A similar case I ran into recently

was when plotting autocorrelation functions for different signals. For

reference I wanted to include the theoretically ideal ACF for white

noise, which looks like this:

plt.plot(np.arange(1000), [1] + [0] * 999)

Good luck reading that plot!

R’s default rule for deciding axis limits is very simple: extend the

data range by 4% on each side; those are your limits. IME this rule –

while obviously not perfect – always produces something readable and

unobjectionable.

  • Axis tickmarks should point outwards rather than inwards: There’s

really no advantage to making them point inwards, and pointing inwards

means they can obscure data. My favorite example of this is plotting a

histogram with 100 bins – that’s an obvious thing to do, right? Check

it out:

plt.hist(np.random.RandomState(0).uniform(size=100000), bins=100)

This makes me do a double-take every few months until I remember

what’s going on: "WTF why is the bar on the left showing a stacked

barplot…ohhhhh right those are just the ticks, which happen to be

exactly the same width as the bar." Very confusing.

Seaborn’s built-in themes give you the options of (1) no axis ticks at

all, just a background grid (by default the white-on-light-grey grid

as popularized by ggplot2), (2) outwards pointing tickmarks. Either

option seems like a better default to me!

  • Default line colors: The rgbcmyk color cycle for line plots doesn’t

appear to be based on any real theory about visualization – it’s just

the corners of the RGB color cube, which is a highly perceptually

non-uniform space. The resulting lines aren’t terribly high contrast

against the default white background, and the different colors have

varying luminance that makes some lines “pop out” more than others.

Seaborn’s default is to use a nice isoluminant variant on matplotlib’s default:

http://web.stanford.edu/~mwaskom/software/seaborn/tutorial/aesthetics.html

ggplot2 uses isoluminant colors with maximally-separated hues, which

also works well. E.g.:

http://www.cookbook-r.com/Graphs/Colors_%28ggplot2%29/ggplot2_scale_hue_colors_l45.png

  • Line thickness: basically every time I make a line plot I wish the

lines were thicker. This is another thing that seaborn simply changes

unconditionally.

In general I guess we could do a lot worse than to simply adopt

seaborn’s defaults as the matplotlib defaults :slight_smile: Their full list of

overrides can be seen here:

https://github.com/mwaskom/seaborn/blob/master/seaborn/rcmod.py#L135

https://github.com/mwaskom/seaborn/blob/master/seaborn/rcmod.py#L301

  • Dash styles: a common recommendation for line plots is to

simultaneously vary both the color and the dash style of your lines,

because redundant cues are good and dash styles are more robust than

color in the face of greyscale printing etc. But every time I try to

follow this advice I find myself having to define new dashes from

scratch, because matplotlib’s default dash styles (“-”, “–”, “-.”,

“:”) have wildly varying weights; in particular I often find it hard

to even see the dots in the “:” and “-.” styles. Here’s someone with a

similar complaint:

 [http://philbull.wordpress.com/2012/03/14/custom-dashdot-line-styles-in-matplotlib/](http://philbull.wordpress.com/2012/03/14/custom-dashdot-line-styles-in-matplotlib/)

Just as very rough numbers, something along the lines of “–” = [7,

4], “-.” = [7, 4, 3, 4], “:” = [2, 1.5] looks much better to me.

It might also make sense to consider baking the advice I mentioned

above into matplotlib directly, and having a non-trivial dash cycle

enabled by default. (So the first line plotted uses “-”, second uses

“–” or similar, etc.) This would also have the advantage that if we

make the length of the color cycle and the dash cycle relatively

prime, then we’ll dramatically increase the number of lines that can

be plotted on the same graph with distinct appearances. (I often run

into the annoying situation where I throw up a quick-and-dirty plot,

maybe with something like pandas’s dataframe.plot(), and then discover

that I have multiple indistinguishable lines.)

Obviously one could quibble with my specific proposals here, but does

in general seem like a useful thing to do?

-n

Nathaniel J. Smith

Postdoctoral researcher - Informatics - University of Edinburgh

http://vorpus.org


Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server

from Actuate! Instantly Supercharge Your Business Reports and Dashboards

with Interactivity, Sharing, Native Excel Exports, App Integration & more

Get technology previously reserved for billion-dollar corporations, FREE

http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk


Matplotlib-devel mailing list

Matplotlib-devel@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-devel


Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server

from Actuate! Instantly Supercharge Your Business Reports and Dashboards

with Interactivity, Sharing, Native Excel Exports, App Integration & more

Get technology previously reserved for billion-dollar corporations, FREE

http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk


Matplotlib-devel mailing list

Matplotlib-devel@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-devel


Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server

from Actuate! Instantly Supercharge Your Business Reports and Dashboards

with Interactivity, Sharing, Native Excel Exports, App Integration & more

Get technology previously reserved for billion-dollar corporations, FREE

http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk_______________________________________________

Matplotlib-devel mailing list

Matplotlib-devel@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-devel


Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server

from Actuate! Instantly Supercharge Your Business Reports and Dashboards

with Interactivity, Sharing, Native Excel Exports, App Integration & more

Get technology previously reserved for billion-dollar corporations, FREE

http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk


Matplotlib-devel mailing list

Matplotlib-devel@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

A few thoughts to add to the excellent ones to date, to do with colorbar behaviour.
My general comment would be that if the axis tick formatter defaults are changed not to forget about the colorbar as I typically find it needs more tweaking than the main axes.

I’ll make a couple of suggestions, but these are low on the list compared to the suggestions that others have made.

  1. consider rasterizing colorbar contents by default

  2. make colorbar axis sizing for matshow behave like imshow

  3. consider rasterizing colorbar contents by default
    Eric describes this here http://matplotlib.1069221.n5.nabble.com/rasterized-colorbar-td39582.html

and suggests that rasterizing the colorbar may not be desirable, although I’m not totally sure why. Perhaps it is because I have noticed that mixing rasterized content with vector lines/axes in matplotlib is generally imperfect. If saving the figure as a pdf or svg with dpi left at default, you can usually see offsets and scaling problems. For example after rasterizing a colorbar I usually see white pixels along the top and side within the vector colorbar frame. This also shows up when using imshow or matshow to show images. I don’t know if this is an agg limitation, a backend limitation or a bug. If it’s a known limitation, maybe avoid this suggestion, but if it’s a bug, maybe it can be fixed and then rasterizing the colorbar might become a better default option.

For colorbars I usually do lots of tweaking along the lines of:

cb = plt.colorbar(format=ScalarFormatter(useMathText=True))
cb.formatter.set_useOffset(False)
cb.formatter.set_scientific(True)
cb.formatter.set_powerlimits((0,2))
cb.update_ticks()
cb.solids.set_rasterized(True)

although I’m not sure about advocating useMathText and set_scientific for defaults. I wonder what other think about this?

Things like default powerlimits for the colorbar might be rethought. I think colorbars typically have too many ticks and associated labels and they should perhaps favour integer labels over floating point representation if possible.
In the extreme case, for continuous colormaps, often a tick at just the top and bottom of the range would be adequate.

  1. I’m not sure how much pyplot.matshow is generally used but I still use it.

Could the colorbar height for matshow pick up the axis height of the main figure, or maybe imshow could default to interpolation=‘nearest’ so I wouldn’t be tempted to use matshow any more?

For example,
plt.matshow(rand(20,20))
plt.colorbar()

doesn’t behave nicely like

plt.imshow(rand(20,20), interpolation=‘nearest’)
plt.colorbar()

Gary

···

On 22 November 2014 at 19:06, Nicolas P. Rougier <Nicolas.Rougier@…922…> wrote:

I would be also quite interested in having better defaults. My list of “complains” are:

  • Easy way to get only two lines for axis (left and down, instead of four)

  • Better default font (Source Sans Pro / Source Code Pro for example (open source))

  • Better default colormap

  • Better axis limit (when you draw with thick lines, they get cut)

  • Better icons for the toolbar (there are a lot of free icons around)

  • Better colors (more pastel)

  • Less “cluttered” figures

  • Lighter grids

  • All Nathaniel’s suggestions

Ideally, we could have a set of standard figures for each main type (plot, scatter, quiver) and tweak parameters to search for the best output.

Nicolas

On 22 Nov 2014, at 04:18, Benjamin Root <ben.root@…553…> wrote:

With regards to defaults for 2.0, I am actually all for breaking them for the better. What I find important is giving users an easy mechanism to use an older style, if it is important to them. The current behavior isn’t “buggy” (for the most part) and failing to give users a way to get behavior that they found desirable would be alienating. I think this is why projects like prettyplotlib and seaborn have been so important to matplotlib. It enables those who are in the right position to judge styles to explore the possibilities easily without commiting matplotlib to any early decision and allowing it to have a level of stability that many users find attractive.

At the moment, the plans for the OO interface changes should not result in any (major) API breaks, so I am not concerned about that at the moment. Let’s keep focused on style related issues in this thread.

Tabbed figures? Intriguing… And I really do need to review that MEP of yours…

Cheers!

Ben Root

On Fri, Nov 21, 2014 at 9:36 PM, Federico Ariza <ariza.federico@…149…> wrote:

I like the idea of aligning a set of changes for 2.0 even if still far away.

Regarding to backwards compatibility I think that indeed it is important but when changing mayor version (1.x to 2.0) becomes less important and we must take care of prioritizing evolution.

Take for example the OO interface (not defined yet) this is very probable to break the current pyplot interface but still this is a change that needs to be done.

In terms of defaults. I would like to see the new Navigation as default (if it gets merged) and tabbed figures (to come after navigation), having separate figures feel kind of …“old”

On 21 Nov 2014 21:23, “Benjamin Root” <ben.root@…553…> wrote:

Some of your wishes are in progress already: https://github.com/matplotlib/matplotlib/pull/3818

There is also an issue open about scaling the dashes with the line width, and you are right, the spacing for the dashes are terrible.

I can definitely see the argument to making a bunch of these visual changes together. Preferably, I would like to do these changes via style sheets so that we can provide a “classic” stylesheet for backwards compatibility.

I do actually like the autoscaling system as it exists now. The problem is that the data margins feature is applied haphazardly. The power spectra example is a good example of where we could “smarten” the system. As for the ticks… I think that is a very obscure edge-case. I personally prefer inward.

It is good to get these grievances enumerated. I am interested in seeing where this discussion goes.

Cheers!

Ben Root

On Fri, Nov 21, 2014 at 6:22 PM, Nathaniel Smith <njs@…503…> wrote:

Hi all,

Since we’re considering the possibility of making a matplotlib 2.0

release with a better default colormap, it occurred to me that it

might make sense to take this opportunity to improve other visual

defaults.

Defaults are important. Obviously for publication graphs you’ll want

to end up tweaking every detail, but (a) not everyone does but we

still have to read their graphs, and (b) probably only 1% of the plots

I make are for publication; the rest are quick one-offs that I make

on-the-fly to help me understand my own data. For such plots it’s

usually not worth spending much/any time tweaking layout details, I

just want something usable, quickly. And I think there’s a fair amount

of low-hanging improvements possible.

Batching multiple visual changes like this together seems much better

than spreading them out over multiple releases. It keeps the messaging

super easy to understand: "matplotlib 2.0 is just like 1.x, your code

will still work, the only difference is that your plots will look

better by default". And grouping these changes together makes it

easier to provide for users who need to revert back to the old

defaults – it’s easy to provide simple binary choice between "before

2.0" versus “after 2.0”, harder to keep track of a bunch of different

changes spread over multiple releases.

Some particular annoyances I often run into and that might be

candidates for changing:

  • The default method of choosing axis limits is IME really, really

annoying, because of the way it tries to find “round number”

boundaries. It’s a clever idea, but in practice I’ve almost never seen

this pick axis limits that are particularly meaningful for my data,

and frequently it picks particularly bad ones. For example, suppose

you want to plot the spectrum of a signal; because of FFT’s preference

for power-of-two sizes works it’s natural to end up with samples

ranging from 0 to 255. If you plot this, matplotlib will give you an

xlim of (0, 300), which looks pretty ridiculous. But even worse is the

way this method of choosing xlims can actually obscure data – if the

extreme values in your data set happen to fall exactly on a "round

number", then this will be used as the axis limits, and you’ll end up

with data plotted directly underneath the axis spine. I frequently

encounter this when making scatter plots of data in the 0-1 range –

the points located at exactly 0 and 1 are very important to see, but

are nearly invisible by default. A similar case I ran into recently

was when plotting autocorrelation functions for different signals. For

reference I wanted to include the theoretically ideal ACF for white

noise, which looks like this:

plt.plot(np.arange(1000), [1] + [0] * 999)

Good luck reading that plot!

R’s default rule for deciding axis limits is very simple: extend the

data range by 4% on each side; those are your limits. IME this rule –

while obviously not perfect – always produces something readable and

unobjectionable.

  • Axis tickmarks should point outwards rather than inwards: There’s

really no advantage to making them point inwards, and pointing inwards

means they can obscure data. My favorite example of this is plotting a

histogram with 100 bins – that’s an obvious thing to do, right? Check

it out:

plt.hist(np.random.RandomState(0).uniform(size=100000), bins=100)

This makes me do a double-take every few months until I remember

what’s going on: "WTF why is the bar on the left showing a stacked

barplot…ohhhhh right those are just the ticks, which happen to be

exactly the same width as the bar." Very confusing.

Seaborn’s built-in themes give you the options of (1) no axis ticks at

all, just a background grid (by default the white-on-light-grey grid

as popularized by ggplot2), (2) outwards pointing tickmarks. Either

option seems like a better default to me!

  • Default line colors: The rgbcmyk color cycle for line plots doesn’t

appear to be based on any real theory about visualization – it’s just

the corners of the RGB color cube, which is a highly perceptually

non-uniform space. The resulting lines aren’t terribly high contrast

against the default white background, and the different colors have

varying luminance that makes some lines “pop out” more than others.

Seaborn’s default is to use a nice isoluminant variant on matplotlib’s default:

http://web.stanford.edu/~mwaskom/software/seaborn/tutorial/aesthetics.html

ggplot2 uses isoluminant colors with maximally-separated hues, which

also works well. E.g.:

http://www.cookbook-r.com/Graphs/Colors_%28ggplot2%29/ggplot2_scale_hue_colors_l45.png

  • Line thickness: basically every time I make a line plot I wish the

lines were thicker. This is another thing that seaborn simply changes

unconditionally.

In general I guess we could do a lot worse than to simply adopt

seaborn’s defaults as the matplotlib defaults :slight_smile: Their full list of

overrides can be seen here:

https://github.com/mwaskom/seaborn/blob/master/seaborn/rcmod.py#L135

https://github.com/mwaskom/seaborn/blob/master/seaborn/rcmod.py#L301

  • Dash styles: a common recommendation for line plots is to

simultaneously vary both the color and the dash style of your lines,

because redundant cues are good and dash styles are more robust than

color in the face of greyscale printing etc. But every time I try to

follow this advice I find myself having to define new dashes from

scratch, because matplotlib’s default dash styles (“-”, “–”, “-.”,

“:”) have wildly varying weights; in particular I often find it hard

to even see the dots in the “:” and “-.” styles. Here’s someone with a

similar complaint:

 [http://philbull.wordpress.com/2012/03/14/custom-dashdot-line-styles-in-matplotlib/](http://philbull.wordpress.com/2012/03/14/custom-dashdot-line-styles-in-matplotlib/)

Just as very rough numbers, something along the lines of “–” = [7,

4], “-.” = [7, 4, 3, 4], “:” = [2, 1.5] looks much better to me.

It might also make sense to consider baking the advice I mentioned

above into matplotlib directly, and having a non-trivial dash cycle

enabled by default. (So the first line plotted uses “-”, second uses

“–” or similar, etc.) This would also have the advantage that if we

make the length of the color cycle and the dash cycle relatively

prime, then we’ll dramatically increase the number of lines that can

be plotted on the same graph with distinct appearances. (I often run

into the annoying situation where I throw up a quick-and-dirty plot,

maybe with something like pandas’s dataframe.plot(), and then discover

that I have multiple indistinguishable lines.)

Obviously one could quibble with my specific proposals here, but does

in general seem like a useful thing to do?

-n

Nathaniel J. Smith

Postdoctoral researcher - Informatics - University of Edinburgh

http://vorpus.org


Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server

from Actuate! Instantly Supercharge Your Business Reports and Dashboards

with Interactivity, Sharing, Native Excel Exports, App Integration & more

Get technology previously reserved for billion-dollar corporations, FREE

http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk


Matplotlib-devel mailing list

Matplotlib-devel@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-devel


Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server

from Actuate! Instantly Supercharge Your Business Reports and Dashboards

with Interactivity, Sharing, Native Excel Exports, App Integration & more

Get technology previously reserved for billion-dollar corporations, FREE

http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk


Matplotlib-devel mailing list

Matplotlib-devel@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-devel


Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server

from Actuate! Instantly Supercharge Your Business Reports and Dashboards

with Interactivity, Sharing, Native Excel Exports, App Integration & more

Get technology previously reserved for billion-dollar corporations, FREE

http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk_______________________________________________

Matplotlib-devel mailing list

Matplotlib-devel@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-devel


Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server

from Actuate! Instantly Supercharge Your Business Reports and Dashboards

with Interactivity, Sharing, Native Excel Exports, App Integration & more

Get technology previously reserved for billion-dollar corporations, FREE

http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk


Matplotlib-devel mailing list

Matplotlib-devel@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

A few thoughts to add to the excellent ones to date, to do with colorbar
behaviour.
My general comment would be that if the axis tick formatter defaults are
changed not to forget about the colorbar as I typically find it needs
more tweaking than the main axes.
I'll make a couple of suggestions, but these are low on the list
compared to the suggestions that others have made.

1. consider rasterizing colorbar contents by default
2. make colorbar axis sizing for matshow behave like imshow

1. consider rasterizing colorbar contents by default
Eric describes this here
http://matplotlib.1069221.n5.nabble.com/rasterized-colorbar-td39582.html
and suggests that rasterizing the colorbar may not be desirable,
although I'm not totally sure why. Perhaps it is because I have noticed
that mixing rasterized content with vector lines/axes in matplotlib is
generally imperfect. If saving the figure as a pdf or svg with dpi left
at default, you can usually see offsets and scaling problems. For
example after rasterizing a colorbar I usually see white pixels along
the top and side within the vector colorbar frame. This also shows up
when using imshow or matshow to show images. I don't know if this is an
agg limitation, a backend limitation or a bug. If it's a known
limitation, maybe avoid this suggestion, but if it's a bug, maybe it can
be fixed and then rasterizing the colorbar might become a better default
option.

I think the problem is that the outlines are snapped to pixel boundaries, but the color blocks are not. Something like that. I think a similar problem is manifest in the small offsets often seen between colorbar ticks and colorbar boundaries.

For colorbars I usually do lots of tweaking along the lines of:

cb = plt.colorbar(format=ScalarFormatter(useMathText=True))
cb.formatter.set_useOffset(False)
cb.formatter.set_scientific(True)
cb.formatter.set_powerlimits((0,2))
cb.update_ticks()
cb.solids.set_rasterized(True)

although I'm not sure about advocating useMathText and set_scientific
for defaults. I wonder what other think about this?

I don't see why you would want the *default* to be to override the rcParams setting for use_mathtext. This just makes it harder to document, and harder for people to keep track of what determines what.

To some extent this applies to the rest of your customizations as well. Deviations from the rcParams defaults via special cases, hardwired into mpl, should be avoided as much as possible. A richer configuration system, building on rcParams or some modification of it, will probably be the goal instead. The evolving style module is a step in this direction.

Things like default powerlimits for the colorbar might be rethought. I
think colorbars typically have too many ticks and associated labels and
they should perhaps favour integer labels over floating point
representation if possible.
In the extreme case, for continuous colormaps, often a tick at just the
top and bottom of the range would be adequate.

I agree, but the question is how to make it as easy as possible for each user to get their desired result. I don't think this is the time to do much in the way of tweaking hard-wired defaults.

2. I'm not sure how much pyplot.matshow is generally used but I still
use it.
Could the colorbar height for matshow pick up the axis height of the
main figure, or maybe imshow could default to interpolation='nearest' so
I wouldn't be tempted to use matshow any more?

For example,
plt.matshow(rand(20,20))
plt.colorbar()

doesn't behave nicely like

plt.imshow(rand(20,20), interpolation='nearest')
plt.colorbar()

The difference is that matshow is adjusting the figure size based on the array dimensions without taking into account the later addition of a colorbar. The only way to fix this in our present framework would be to use a kwarg to tell matshow to include a colorbar from the start, so it would be able to calculate the figure size appropriately.

With imshow plus a colorbar, the "nice" behavior occurs only for a particular small range of array dimension ratios, such as the unity ratio in your example. For example, try using rand(5, 10).

Eric

···

On 2014/11/22, 9:06 AM, gary ruben wrote:

Gary

I’d like to propose an update to the default boxplot symbology: all black

Q: How much more black could the boxplots be?

(sorry, ben)

···

On Fri, Nov 21, 2014 at 7:18 PM, Benjamin Root <ben.root@…553…> wrote:

With regards to defaults for 2.0, I am actually all for breaking them for the better. What I find important is giving users an easy mechanism to use an older style, if it is important to them. The current behavior isn’t “buggy” (for the most part) and failing to give users a way to get behavior that they found desirable would be alienating. I think this is why projects like prettyplotlib and seaborn have been so important to matplotlib. It enables those who are in the right position to judge styles to explore the possibilities easily without commiting matplotlib to any early decision and allowing it to have a level of stability that many users find attractive.

At the moment, the plans for the OO interface changes should not result in any (major) API breaks, so I am not concerned about that at the moment. Let’s keep focused on style related issues in this thread.

Tabbed figures? Intriguing… And I really do need to review that MEP of yours…

Cheers!
Ben Root


Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server

from Actuate! Instantly Supercharge Your Business Reports and Dashboards

with Interactivity, Sharing, Native Excel Exports, App Integration & more

Get technology previously reserved for billion-dollar corporations, FREE

http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk


Matplotlib-devel mailing list

Matplotlib-devel@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

On Fri, Nov 21, 2014 at 9:36 PM, Federico Ariza <ariza.federico@…149…> wrote:

I like the idea of aligning a set of changes for 2.0 even if still far away.

Regarding to backwards compatibility I think that indeed it is important but when changing mayor version (1.x to 2.0) becomes less important and we must take care of prioritizing evolution.
Take for example the OO interface (not defined yet) this is very probable to break the current pyplot interface but still this is a change that needs to be done.

In terms of defaults. I would like to see the new Navigation as default (if it gets merged) and tabbed figures (to come after navigation), having separate figures feel kind of …“old”

On 21 Nov 2014 21:23, “Benjamin Root” <ben.root@…867…> wrote:

Some of your wishes are in progress already: https://github.com/matplotlib/matplotlib/pull/3818
There is also an issue open about scaling the dashes with the line width, and you are right, the spacing for the dashes are terrible.

I can definitely see the argument to making a bunch of these visual changes together. Preferably, I would like to do these changes via style sheets so that we can provide a “classic” stylesheet for backwards compatibility.

I do actually like the autoscaling system as it exists now. The problem is that the data margins feature is applied haphazardly. The power spectra example is a good example of where we could “smarten” the system. As for the ticks… I think that is a very obscure edge-case. I personally prefer inward.

It is good to get these grievances enumerated. I am interested in seeing where this discussion goes.

Cheers!
Ben Root


Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server

from Actuate! Instantly Supercharge Your Business Reports and Dashboards

with Interactivity, Sharing, Native Excel Exports, App Integration & more

Get technology previously reserved for billion-dollar corporations, FREE

http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk


Matplotlib-devel mailing list

Matplotlib-devel@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

On Fri, Nov 21, 2014 at 6:22 PM, Nathaniel Smith <njs@…503…> wrote:

Hi all,

Since we’re considering the possibility of making a matplotlib 2.0

release with a better default colormap, it occurred to me that it

might make sense to take this opportunity to improve other visual

defaults.

Defaults are important. Obviously for publication graphs you’ll want

to end up tweaking every detail, but (a) not everyone does but we

still have to read their graphs, and (b) probably only 1% of the plots

I make are for publication; the rest are quick one-offs that I make

on-the-fly to help me understand my own data. For such plots it’s

usually not worth spending much/any time tweaking layout details, I

just want something usable, quickly. And I think there’s a fair amount

of low-hanging improvements possible.

Batching multiple visual changes like this together seems much better

than spreading them out over multiple releases. It keeps the messaging

super easy to understand: "matplotlib 2.0 is just like 1.x, your code

will still work, the only difference is that your plots will look

better by default". And grouping these changes together makes it

easier to provide for users who need to revert back to the old

defaults – it’s easy to provide simple binary choice between "before

2.0" versus “after 2.0”, harder to keep track of a bunch of different

changes spread over multiple releases.

Some particular annoyances I often run into and that might be

candidates for changing:

  • The default method of choosing axis limits is IME really, really

annoying, because of the way it tries to find “round number”

boundaries. It’s a clever idea, but in practice I’ve almost never seen

this pick axis limits that are particularly meaningful for my data,

and frequently it picks particularly bad ones. For example, suppose

you want to plot the spectrum of a signal; because of FFT’s preference

for power-of-two sizes works it’s natural to end up with samples

ranging from 0 to 255. If you plot this, matplotlib will give you an

xlim of (0, 300), which looks pretty ridiculous. But even worse is the

way this method of choosing xlims can actually obscure data – if the

extreme values in your data set happen to fall exactly on a "round

number", then this will be used as the axis limits, and you’ll end up

with data plotted directly underneath the axis spine. I frequently

encounter this when making scatter plots of data in the 0-1 range –

the points located at exactly 0 and 1 are very important to see, but

are nearly invisible by default. A similar case I ran into recently

was when plotting autocorrelation functions for different signals. For

reference I wanted to include the theoretically ideal ACF for white

noise, which looks like this:

plt.plot(np.arange(1000), [1] + [0] * 999)

Good luck reading that plot!

R’s default rule for deciding axis limits is very simple: extend the

data range by 4% on each side; those are your limits. IME this rule –

while obviously not perfect – always produces something readable and

unobjectionable.

  • Axis tickmarks should point outwards rather than inwards: There’s

really no advantage to making them point inwards, and pointing inwards

means they can obscure data. My favorite example of this is plotting a

histogram with 100 bins – that’s an obvious thing to do, right? Check

it out:

plt.hist(np.random.RandomState(0).uniform(size=100000), bins=100)

This makes me do a double-take every few months until I remember

what’s going on: "WTF why is the bar on the left showing a stacked

barplot…ohhhhh right those are just the ticks, which happen to be

exactly the same width as the bar." Very confusing.

Seaborn’s built-in themes give you the options of (1) no axis ticks at

all, just a background grid (by default the white-on-light-grey grid

as popularized by ggplot2), (2) outwards pointing tickmarks. Either

option seems like a better default to me!

  • Default line colors: The rgbcmyk color cycle for line plots doesn’t

appear to be based on any real theory about visualization – it’s just

the corners of the RGB color cube, which is a highly perceptually

non-uniform space. The resulting lines aren’t terribly high contrast

against the default white background, and the different colors have

varying luminance that makes some lines “pop out” more than others.

Seaborn’s default is to use a nice isoluminant variant on matplotlib’s default:

http://web.stanford.edu/~mwaskom/software/seaborn/tutorial/aesthetics.html

ggplot2 uses isoluminant colors with maximally-separated hues, which

also works well. E.g.:

http://www.cookbook-r.com/Graphs/Colors_%28ggplot2%29/ggplot2_scale_hue_colors_l45.png

  • Line thickness: basically every time I make a line plot I wish the

lines were thicker. This is another thing that seaborn simply changes

unconditionally.

In general I guess we could do a lot worse than to simply adopt

seaborn’s defaults as the matplotlib defaults :slight_smile: Their full list of

overrides can be seen here:

https://github.com/mwaskom/seaborn/blob/master/seaborn/rcmod.py#L135

https://github.com/mwaskom/seaborn/blob/master/seaborn/rcmod.py#L301

  • Dash styles: a common recommendation for line plots is to

simultaneously vary both the color and the dash style of your lines,

because redundant cues are good and dash styles are more robust than

color in the face of greyscale printing etc. But every time I try to

follow this advice I find myself having to define new dashes from

scratch, because matplotlib’s default dash styles (“-”, “–”, “-.”,

“:”) have wildly varying weights; in particular I often find it hard

to even see the dots in the “:” and “-.” styles. Here’s someone with a

similar complaint:

 [http://philbull.wordpress.com/2012/03/14/custom-dashdot-line-styles-in-matplotlib/](http://philbull.wordpress.com/2012/03/14/custom-dashdot-line-styles-in-matplotlib/)

Just as very rough numbers, something along the lines of “–” = [7,

4], “-.” = [7, 4, 3, 4], “:” = [2, 1.5] looks much better to me.

It might also make sense to consider baking the advice I mentioned

above into matplotlib directly, and having a non-trivial dash cycle

enabled by default. (So the first line plotted uses “-”, second uses

“–” or similar, etc.) This would also have the advantage that if we

make the length of the color cycle and the dash cycle relatively

prime, then we’ll dramatically increase the number of lines that can

be plotted on the same graph with distinct appearances. (I often run

into the annoying situation where I throw up a quick-and-dirty plot,

maybe with something like pandas’s dataframe.plot(), and then discover

that I have multiple indistinguishable lines.)

Obviously one could quibble with my specific proposals here, but does

in general seem like a useful thing to do?

-n

Nathaniel J. Smith

Postdoctoral researcher - Informatics - University of Edinburgh

http://vorpus.org


Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server

from Actuate! Instantly Supercharge Your Business Reports and Dashboards

with Interactivity, Sharing, Native Excel Exports, App Integration & more

Get technology previously reserved for billion-dollar corporations, FREE

http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk


Matplotlib-devel mailing list

Matplotlib-devel@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

Some of your wishes are in progress already: https://github.com/matplotlib/matplotlib/pull/3818
There is also an issue open about scaling the dashes with the line width, and you are right, the spacing for the dashes are terrible.

Nice!

I can definitely see the argument to making a bunch of these visual changes together. Preferably, I would like to do these changes via style sheets so that we can provide a “classic” stylesheet for backwards compatibility.

Yeah, I didn’t want to get into the details of mechanism here because that’s a comparatively simple technical question, compared to the questions about whether we should make changes and which changes we should make. But I’m definitely assuming we’ll provide a simple supported/documented way to request the old defaults, and I agree that the obvious way is by swapping out stylesheets. This might require adding a few more parameters to rcParam, but I’m guessing that won’t be a big deal.

I do actually like the autoscaling system as it exists now. The problem is that the data margins feature is applied haphazardly. The power spectra example is a good example of where we could “smarten” the system.

Can you elaborate on what you like about it? Like I said, when I first heard about it sounded like a neat idea. But in practice, over my years of using matplotlib… sometimes it’s been fine, and sometimes it’s made me roll my eyes/swear, but I don’t think there’s been a single instance where I looked at a graph and thought “oo, nice one matplotlib - your insistence on shrinking my data to use fewer pixels in order to get a major tick lined up exactly with the spines has really improved this graph. Neat tick/spine alignment really is the highest priority in data visualization”.

Even in the rare cases where my measurement scale actually does have a neat 0-1 or 0-100 range, I usually find that matplotlib has chosen something like 0-90, or, if we fix the issue with cramming data right up into the axes, then I guess I’ll end up with -10 - 110. Which looks worse than something like -4 - 104, because with -4 - 104, my outermost axis labels are 0 and 100. With -10 - 110, the outermost labels are -10 and 110, and it’s weird and confusing to have axis labels naming impossible values.

So can you share your examples of where this behavior has given you substantively better results?

As for the ticks… I think that is a very obscure edge-case. I personally prefer inward.

Yeah, that one is a pet peeve - I was gratified to see that the seaborn folks also took the trouble to fix it (I’m not alone!). To be fair, though, the reason I noticed isn’t that I care a lot about ticks per se, it’s because the default was screwing up my figures so I had to go track it down :-/. Here’s another example – the final versions of the autocorrelation graphs I mentioned above.

In both of these graphs, having the ticks to point inwards created weird confusing intersections with the lines, so I had to flip them to point outwards. It’s just an objective thing, if you use the same pixels for data and metadata then that creates room for ugly stuff to happen. And when it comes to defaults, if you have two choices that are basically equivalent, except that one is always fine and one is usually fine but sometimes screws things up, then the former seems like the obvious choice…

-n

···

On 22 Nov 2014 02:22, “Benjamin Root” <ben.root@…55…553…> wrote:

About this, I am not expert so forgive me if this is nonsensical. However,
it would seem to me that these requirements are basically the same as the
requirements for the new default colormap that prompted this whole
discussion. So, rather than create two inconsistent set of colors that
accomplish similar goals, might it be better to instead use the default
colormap for the line colors? You could pick "N" equally-spaced colors
from the colormap and use those as the line colors.

You could even take this a step further, and instead of hard-coding the
line colors, you could make it possible to assign a named colormap to the
line colors parameter. Then there could be a second integer parameter that
determines how many colors to pick from that colormap (it would only do
anything if the line colors are a colormap, otherwise it would be
ignored).

···

On Sat, Nov 22, 2014 at 12:22 AM, Nathaniel Smith <njs@...503...> wrote:

- Default line colors: The rgbcmyk color cycle for line plots doesn't
appear to be based on any real theory about visualization -- it's just
the corners of the RGB color cube, which is a highly perceptually
non-uniform space. The resulting lines aren't terribly high contrast
against the default white background, and the different colors have
varying luminance that makes some lines "pop out" more than others.

Seaborn's default is to use a nice isoluminant variant on matplotlib's
default:

Controlling figure aesthetics — seaborn 0.13.0 documentation
ggplot2 uses isoluminant colors with maximally-separated hues, which
also works well. E.g.:

http://www.cookbook-r.com/Graphs/Colors_(ggplot2)/ggplot2_scale_hue_colors_l45.png

I'm no expert either, but while similar principles about colorblind
compatibility, etc apply, you want to sue a different scheme to represent a
continuous range of colors and a set of distinct colors that aren't
intended to be ranked.

-Chris

···

On Wed, Nov 26, 2014 at 1:30 AM, Todd <toddrjen@...149...> wrote:

About this, I am not expert so forgive me if this is nonsensical.
However, it would seem to me that these requirements are basically the same
as the requirements for the new default colormap that prompted this whole
discussion. So, rather than create two inconsistent set of colors that
accomplish similar goals, might it be better to instead use the default
colormap for the line colors? You could pick "N" equally-spaced colors
from the colormap and use those as the line colors.

--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@...236...

I’ve also become throughly annoyed with the default colour cycle, especially with its
glaring cyan-magenta contrast, and found it desirable to have an easier way to
customise this either explicitly or by changing color_cycle.
As there are already a couple of sequences existing in the available colourmaps that
could be useful for different purposes or tastes, what’s lacking in particular in my view
is an easier-to-use interface to draw colours from those maps; I think that’s along the
lines of what Todd also has suggested further down in his mail.
I’ve written a little utility I’m simply appending because it’s so short, which returns an
array of colours of specified length that could be passed to axes.color_cycle or just
explicitly used as crange[i]. Also useful to colour scatter plot markers according to a
certain quantity (pass this quantity as “values” to crange).

Regarding to the above, I think sometimes the line colour requirements are similar to
those for a general colourmap, e.g. I often want to plot a series of lines like different
spectra, which are easily enough distinguishable, but should IMO reflect a certain
continuous trend like different temperatures - are ranked, IOW - and thus would be well
represented by a sequence of values from “heat" or “coolwarm". However there are still
some additional requirements, as you’d generally want every colour to have enough
contrast on a white or bright background canvas. In the example below I’ve added a
“max_lum” keyword to darken whitish or yellow colours appropriately.

This is probably not extremely sophisticated in terms of colour physiology, but if you
have a suggestion if and where it could be added to matplotlib, I could go ahead and
make a pull request (and try to find the time to add some tests and examples).

Cheers,
            Derek

def crange(cmap, values, max_lum=1, start=0, stop=255, vmin=None, vmax=None):
    """
    Returns RGBA colour array of length values from colormap cmap

    cmap: valid matplotlib.cm colormap name or instance
    values: either int - number of colour values to return or
            array of values to be mapped on colormap range
    max_lum: restrict colours to maximum brightness (1=white)
    start,stop: range of colormap to use (full range 0-255)
    vmin,vmax: input values mapped to start/stop (default actual data limits)
    """

    try:
        if np.isscalar(values):
            vrange = np.linspace(start,stop,np.int(values))
        else:
            v = np.array(values).astype(np.float)
            vmin = vmin or v.min()
            vmax = vmax or v.max()
            vrange = start+(v-vmin)*(stop-start)/(vmax-vmin)
    except (ValueError, TypeError) as err:
        print("invalid input values: must be no. of colours or array: %s" %
              err)
        return None
    vrange = np.uint8(np.round(vrange))
    cmap = matplotlib.cm.get_cmap(cmap)
    lcor = (1.0-max_lum) / 9
    crange = cmap(vrange)
    crange[:,:3] *= (1-crange[:,:3].sum(axis=1)**2*lcor).reshape(-1,1)
    return crange

···

On 26 Nov 2014, at 07:53 pm, Chris Barker <Chris.Barker@...236...> wrote:

On Wed, Nov 26, 2014 at 1:30 AM, Todd <toddrjen@...149...> wrote:

About this, I am not expert so forgive me if this is nonsensical. However, it would seem to me that these requirements are basically the same as the requirements for the new default colormap that prompted this whole discussion. So, rather than create two inconsistent set of colors that accomplish similar goals, might it be better to instead use the default colormap for the line colors? You could pick "N" equally-spaced colors from the colormap and use those as the line colors.

I'm no expert either, but while similar principles about colorblind compatibility, etc apply, you want to sue a different scheme to represent a continuous range of colors and a set of distinct colors that aren't intended to be ranked.

The main differences in requirements are:
- for the color cycle, you want isoluminant colors, to avoid the issue
where one line is glaring bright red and one is barely-visible-grey.
For general-purpose 2d colormaps, though, you almost always want the
luminance to vary to help distinguish colors from each other.
- for the color cycle, there's no problem with using widely separated
hues -- in fact it's usually better b/c it increases contrast between
the different items, and there's no need to communicate an ordering
between them. But if you try to use the whole hue space in a colormap
then you end up with the much-loathed jet.

-n

···

On Wed, Nov 26, 2014 at 9:30 AM, Todd <toddrjen@...149...> wrote:

On Sat, Nov 22, 2014 at 12:22 AM, Nathaniel Smith <njs@...503...> wrote:

- Default line colors: The rgbcmyk color cycle for line plots doesn't
appear to be based on any real theory about visualization -- it's just
the corners of the RGB color cube, which is a highly perceptually
non-uniform space. The resulting lines aren't terribly high contrast
against the default white background, and the different colors have
varying luminance that makes some lines "pop out" more than others.

Seaborn's default is to use a nice isoluminant variant on matplotlib's
default:

Controlling figure aesthetics — seaborn 0.13.0 documentation
ggplot2 uses isoluminant colors with maximally-separated hues, which
also works well. E.g.:

http://www.cookbook-r.com/Graphs/Colors_(ggplot2)/ggplot2_scale_hue_colors_l45.png

About this, I am not expert so forgive me if this is nonsensical. However,
it would seem to me that these requirements are basically the same as the
requirements for the new default colormap that prompted this whole
discussion. So, rather than create two inconsistent set of colors that
accomplish similar goals, might it be better to instead use the default
colormap for the line colors? You could pick "N" equally-spaced colors from
the colormap and use those as the line colors.

--
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh

If you used isoluminance colors for the lines, wouldn’t that mean a plot printed in grayscale would have all lines be the same shade of gray?

···

On Nov 26, 2014 10:04 PM, “Nathaniel Smith” <njs@…503…> wrote:

On Wed, Nov 26, 2014 at 9:30 AM, Todd <toddrjen@…149…> wrote:

On Sat, Nov 22, 2014 at 12:22 AM, Nathaniel Smith <njs@…503…> wrote:

  • Default line colors: The rgbcmyk color cycle for line plots doesn’t
    appear to be based on any real theory about visualization – it’s just
    the corners of the RGB color cube, which is a highly perceptually
    non-uniform space. The resulting lines aren’t terribly high contrast
    against the default white background, and the different colors have
    varying luminance that makes some lines “pop out” more than others.

Seaborn’s default is to use a nice isoluminant variant on matplotlib’s
default:

http://web.stanford.edu/~mwaskom/software/seaborn/tutorial/aesthetics.html
ggplot2 uses isoluminant colors with maximally-separated hues, which
also works well. E.g.:

http://www.cookbook-r.com/Graphs/Colors_%28ggplot2%29/ggplot2_scale_hue_colors_l45.png

About this, I am not expert so forgive me if this is nonsensical. However,
it would seem to me that these requirements are basically the same as the
requirements for the new default colormap that prompted this whole
discussion. So, rather than create two inconsistent set of colors that
accomplish similar goals, might it be better to instead use the default
colormap for the line colors? You could pick “N” equally-spaced colors from
the colormap and use those as the line colors.

The main differences in requirements are:

  • for the color cycle, you want isoluminant colors, to avoid the issue
    where one line is glaring bright red and one is barely-visible-grey.
    For general-purpose 2d colormaps, though, you almost always want the
    luminance to vary to help distinguish colors from each other.

Yes. But IME it's very difficult to use greyscale alone to distinguish
between multiple plot lines no matter what: you can't go much beyond 2
lines before you either end up with hard-to-see lines (b/c they don't
have enough contrast with the white background) or the lines become
nigh-indistinguishable ("which one is the slightly-darker grey?"). And
if you have substantial luminance variation to make the greyscale
work, then the color images end up looking really weird (the scarlet
versus faint-yellow problem, where you end up emphasizing one set of
data over another -- emphasis should be done on purpose! in
matplotlib's current color cycle the yellow and cyan tend to
disappear).

If you're worried about greyscale then IMHO you should use different
line styles (solid/dashed/dotted/...) and/or use solid black for
everything and label the lines directly.

Which isn't to say that there's never any value in picking line colors
from a colormap, it's just more complicated than it seems :-).

-n

···

On Thu, Nov 27, 2014 at 9:54 AM, Todd <toddrjen@...149...> wrote:

On Nov 26, 2014 10:04 PM, "Nathaniel Smith" <njs@...503...> wrote:

The main differences in requirements are:
- for the color cycle, you want isoluminant colors, to avoid the issue
where one line is glaring bright red and one is barely-visible-grey.
For general-purpose 2d colormaps, though, you almost always want the
luminance to vary to help distinguish colors from each other.

If you used isoluminance colors for the lines, wouldn't that mean a plot
printed in grayscale would have all lines be the same shade of gray?

--
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh