mpl.math namespace [was: Polygon examples broken]

>> so I think it does make sense to bring the common names that show up in
>> math expressions into the main namespace.

>> This is probably best just done by each individual according to his/her
>> taste.
>
> That's what I'm trying to get away from. I want to be able to write
> the contains() function in patch.py and just use the normal math where
> it makes sense to use normal math.

Ahh -- we're back on an a mpl-devel topic now.

I was thinking that you were proposing a "math" namespace for pylab
users -- but it sounds like you're proposing a standard set of math
names that will be brought in to modules for the matplotlib project
itself. Different issue.

Through the use of "from mpl.math import *" yes.

I don't write enough MPL internal code to have any opinion on that.

I'll let the code speak for itself:

~/src/matplotlib/lib/matplotlib pkienzle$ for sym in $symlist; do

    echo `grep "[^A-Za-z0-9_]$sym[^A-Za-z0-9_]" *.py | wc -l` $sym;
done | sort -n -r | column -c 75

163 max 7 remainder 1 cosh 0 isnormal
136 arg 7 pow 1 arctanh 0 isinf
109 min 7 inf 1 arcsinh 0 isfinite
102 log 6 arctan2 1 arccosh 0 frexp
64 pi 5 fabs 0 trunc 0 fmin
56 sqrt 4 imag 0 tgamma 0 fmax
44 abs 3 tan 0 signbit 0 fdim
38 sin 3 nan 0 scalbn 0 expm1
28 cos 3 log2 0 rint 0 exp2
23 minimum 3 hypot 0 remquo 0 erfc
22 round 2 isnan 0 nexttoward 0 erf
19 maximum 2 arctan 0 nearbyingt 0 cproj
19 floor 2 arcsin 0 modf 0 copysign
18 log10 2 arccos 0 logb 0 conj
18 ceil 1 tanh 0 log1p 0 cbrt
13 real 1 sinh 0 lgamma 0 NaN
12 exp 1 fmod 0 ldexp 0 Inf

I used the following list:

symlist=`cat <<EOF
  pi inf Inf nan NaN
  isfinite isnormal isnan isinf
  arccos arcsin arctan arctan2 cos sin tan
  arccosh arcsinh arctanh cosh sinh tanh
  exp log log10 expm1 log1p exp2 log2
  pow sqrt cbrt erf erfc lgamma tgamma hypot
  fmod remainder remquo
  fabs fdim fmax fmin
  copysign signbit frexp ldexp logb modf scalbn
  ceil floor rint nexttoward nearbyingt round trunc
  conj cproj abs arg imag real
  min max minimum maximum
EOF`

This measure doesn't distinguish between comments and
code, but it should still be good enough for the purposes
of this discussion. Tuning the list to the set of functions
available in numpy rather than c99 would help (I did rename
ayyy to arcyyy for the trig functions).

- Paul

···

On Fri, Jul 20, 2007 at 05:05:40PM -0700, Christopher Barker wrote:

As far as namespaecs are concerned, I agree they are a good idea and
should be used in almost all places. I also don't want the perfect to
be the enemy of the good, or succumb to a foolish consistency, so I
think is is OK to have some very common names that have a clear
meaning to be used w/o a namespace. I have been following your
discussion at a bit of a distance: are you talking about using scalar
functions or array functions here, eg math.sqrt vs numpy.sqrt? Also,
a few of your symbols clash with python builtins (min, max, abs) which
is best avoided. Finally, how would you feel about allowing these
symbols in the module namespace, but w/o the import * semantics, eg,
for these symbols we do

from mpl.math import exp, sin, pi, sin, cos, ...

it does defeat the purpose of your idea a bit, which is to have a set
of commonly agreed on math symbols that everyone agrees on and we can
always rely on with easy convenience. On the other hand, I am more
comfortable being explicit here.

If I am missing some fundamental ideas, please forgive me and
enlighten me, because as I say I've been following this (previous)
thread only loosely.

JDH

···

On 7/21/07, Paul Kienzle <pkienzle@...537...> wrote:

I used the following list:

symlist=`cat <<EOF
  pi inf Inf nan NaN
  isfinite isnormal isnan isinf
  arccos arcsin arctan arctan2 cos sin tan
  arccosh arcsinh arctanh cosh sinh tanh
  exp log log10 expm1 log1p exp2 log2
  pow sqrt cbrt erf erfc lgamma tgamma hypot
  fmod remainder remquo
  fabs fdim fmax fmin
  copysign signbit frexp ldexp logb modf scalbn
  ceil floor rint nexttoward nearbyingt round trunc
  conj cproj abs arg imag real
  min max minimum maximum
EOF`

This measure doesn't distinguish between comments and
code, but it should still be good enough for the purposes

John Hunter wrote:
[...]

functions or array functions here, eg math.sqrt vs numpy.sqrt? Also,
a few of your symbols clash with python builtins (min, max, abs) which
is best avoided. Finally, how would you feel about allowing these
symbols in the module namespace, but w/o the import * semantics, eg,
for these symbols we do

from mpl.math import exp, sin, pi, sin, cos, ...

There is no point in this; better to import these directly from numpy, if that is what is wanted. But sometimes we actually want a masked array version.

For many of these things there are up to 5 different possible sources:

(builtin, if not math or cmath)
math
cmath
numpy
numpy.ma
maskedarray

Sometimes functions from the different sources do the same thing, but usually at different speeds, and sometimes they don't do the same thing at all. In most cases we want, or at least can manage with, either the numpy version or one of the masked versions, presently accessed via numpy.numerix.npyma, which is imported via

import numpy.numerix.npyma as ma

The recently introduced policy of simply being very explicit *does* work; when looking at an expression one always knows which functions are being invoked. Like Paul, I recoil a bit at the clunky appearance, but apparently unlike Paul, I find the explicitness helpful--especially since I am very conscious of the need to use masked versions in some places.

There is nothing inherently wrong with being explicit by importing some symbols into the module namespace with lines at the top, but this works best if there are not too many of those symbols, if they don't clash with symbols from other modules one is using, and if the module is not too long. A prime example of a case where these conditions are violated is axes.py.

Consider two possible policies:

Common to both:
c1) Never mask a builtin.
c2) Use nonconflicting names, specifically, always use amin and amax from numpy or ma instead of min or max.
c3) Use methods in preference to functions where possible; this has the advantage of taking care of masked or ordinary cases automatically.

1) Present: always be explicit: npy.sin or ma.sin or math.sin or cmath.sin. (For scalars, the math module functions are much faster than the numpy versions, but on the other hand they should be called seldom enough that this would never matter.)

2) Pick a set of math symbols that may be imported directly from numpy at the top, and either import all routinely, or import as needed. Use explicit "ma." when masked versions are needed. (Depending on design decisions, this could end up being much of the time.)
Suboptions:
2a) Include other very common symbols such as array, asarray, newaxis, ones, zeros.
2b) Use something like "from matplotlib.numpyapi import *" to accomplish all of this. This has the advantage of consolidating the names in one place, so one can easily see what the standard names are, and one doesn't have to keep checking the top of the file to see whether an additional name needs to be added.

I can accept either of these, so long as we can decide and then get on with life. John, Norbert, and I have already spent time converting some modules to option 1. There was some discussion of this a couple months ago when John first proposed it. Now we have some experience. Another conversion is OK, but let's get it straight and make it the last one. Or leave it.

My impression is that other projects typically use something closer to option 2. Prior to our partial conversion to option 1, the tops of our modules were an ugly mess. Option 1 represented a substantial cleanup and clarification--but with the penalty of uglier math expressions.

I'm sorry I don't have a strong recommendation yet; I hope the above overly-long text helps the decision process nevertheless.

Looking at the options without consideration of prior decisions and work done, my present preference is *mildly* towards option 2 with both suboptions. The biggest problem is the slippery slope--this can easily end up with more and more symbols being added until one is effectively doing "from numpy import *", and I don't want to do that.

Eric

And the difference may be *very* significant:

In [1]: import numpy as N

In [2]: import math as M

In [3]: nsqrt = N.sqrt

In [4]: msqrt = M.sqrt

In [5]: def sqtest(sqrt,reps):
   ...: x = 99.9
   ...: for i in xrange(reps):
   ...: a = sqrt(x)
   ...:

In [13]: reps = int(1e6)

In [14]: time sqtest(msqrt,reps)
CPU times: user 0.90 s, sys: 0.00 s, total: 0.90 s
Wall time: 0.90

In [15]: time sqtest(nsqrt,reps)
CPU times: user 7.62 s, sys: 0.39 s, total: 8.02 s
Wall time: 8.08

The overhead from numpy for scalars is not trivial at all. And as you
pointed out, the semantical differences between math, cmath and numpy
(ignoring ma for now) are important enough that people should know
exactly what they are getting.

I guess if you want to provide a 'common math functions' module with
clearly defined conventions for everyday usage, you could do something
like (using cos as an example, apply to all names that are common to
all such modules):

numpy.cos -> cos # unmodified, numpy is the 'default' a la matlab
math.cos -> scos # names from math are s-prefixed for 'scalar' (could be 'm')
cmath.cos -> ccos # complex names
numpy.ma.cos -> macos # masked array names

Just an idea...

Cheers,

f

···

On 7/21/07, Eric Firing <efiring@...229...> wrote:

Sometimes functions from the different sources do the same thing, but
usually at different speeds.