scalarFormatter

With MPL 0.62.4, the following script is yielding a few

    > badly formatted y-axis labels:

    > import matplotlib from matplotlib.matlab import *

    > a=array([2,4,6,8,10,12,14,16,18,20])*1e5 plot(a) show()

    > I tried adding a "print s,m" statement after line 272: m =
    > self._zerorgx.match(s), where s is the original text. Here
    > is the output:

    > 2.0e+005 <_sre.SRE_Match object at 0x01249608> 4.0e+005
    > <_sre.SRE_Match object at 0x01249608> 6.0e+005
    > <_sre.SRE_Match object at 0x01249608> 8.0e+005
    > <_sre.SRE_Match object at 0x01249608> 1.0e+006
    > <_sre.SRE_Match object at 0x01249608> 1.2e+006 None
    > 1.4e+006 None 1.6e+006 None 1.8e+006 None 2.0e+006
    > <_sre.SRE_Match object at 0x01249608>

    > _zerorgx = re.compile('^(.*?)\.?0+(e[+-]\d+)?$'), which is
    > greek to me (and I dont have a good reference on
    > regexp's).

There's a reason Jamie Zawinski said

  Some people, when confronted with a problem, think "I know, I'll use
  regular expressions." Now they have two problems.

Here's a re-implementation using string methods. See if it passes all
your tests. More importantly, do you agree that removing the zeros
etc which aren't necessary to visually parse the number is a good
idea?

class ScalarFormatter(Formatter):
    """
    Tick location is a plain old number. If viewInterval is set, the
    formatter will use %d, %1.#f or %1.ef as appropriate. If it is
    not set, the formatter will do str conversion
    """

    def __call__(self, x, pos):
        'Return the format for tick val x at position pos'
        self.verify_intervals()
        d = abs(self.viewInterval.span())

        return self.pprint_val(x,d)

    def pprint_val(self, x, d):
        #if the number is not too big and it's an int, format it as an
        #int
        if abs(x)<1e4 and x==int(x): return '%d' % x

        if d < 1e-2: fmt = '%1.3e'
        elif d < 1e-1: fmt = '%1.3f'
        elif d > 1e5: fmt = '%1.1e'
        elif d > 10 : fmt = '%1.1f'
        elif d > 1 : fmt = '%1.2f'
        else: fmt = '%1.3f'
        s = fmt % x

        tup = s.split('e')
        if len(tup)==2:
            mantissa = tup[0].rstrip('0').rstrip('.')
            exponent = tup[1].replace('+', '').lstrip('0')
            s = '%se%s' %(mantissa, exponent)
        return s

JDH

John Hunter wrote:

"Darren" == Darren Dale <dd55@...163...> writes:
           
   > With MPL 0.62.4, the following script is yielding a few
   > badly formatted y-axis labels:

   > import matplotlib from matplotlib.matlab import *

   > a=array([2,4,6,8,10,12,14,16,18,20])*1e5 plot(a) show()

   > I tried adding a "print s,m" statement after line 272: m =
   > self._zerorgx.match(s), where s is the original text. Here
   > is the output:

   > 2.0e+005 <_sre.SRE_Match object at 0x01249608> 4.0e+005
   > <_sre.SRE_Match object at 0x01249608> 6.0e+005
   > <_sre.SRE_Match object at 0x01249608> 8.0e+005
   > <_sre.SRE_Match object at 0x01249608> 1.0e+006
   > <_sre.SRE_Match object at 0x01249608> 1.2e+006 None
   > 1.4e+006 None 1.6e+006 None 1.8e+006 None 2.0e+006
   > <_sre.SRE_Match object at 0x01249608>

   > _zerorgx = re.compile('^(.*?)\.?0+(e[+-]\d+)?$'), which is
   > greek to me (and I dont have a good reference on
   > regexp's).

There's a reason Jamie Zawinski said

Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems.

Here's a re-implementation using string methods. See if it passes all
your tests. More importantly, do you agree that removing the zeros
etc which aren't necessary to visually parse the number is a good
idea?

class ScalarFormatter(Formatter):
   """
   Tick location is a plain old number. If viewInterval is set, the
   formatter will use %d, %1.#f or %1.ef as appropriate. If it is
   not set, the formatter will do str conversion
   """

   def __call__(self, x, pos):
       'Return the format for tick val x at position pos' self.verify_intervals()
       d = abs(self.viewInterval.span())

       return self.pprint_val(x,d)

   def pprint_val(self, x, d):
       #if the number is not too big and it's an int, format it as an
       #int
       if abs(x)<1e4 and x==int(x): return '%d' % x

       if d < 1e-2: fmt = '%1.3e'
       elif d < 1e-1: fmt = '%1.3f'
       elif d > 1e5: fmt = '%1.1e'
       elif d > 10 : fmt = '%1.1f'
       elif d > 1 : fmt = '%1.2f'
       else: fmt = '%1.3f'
       s = fmt % x

       tup = s.split('e')
       if len(tup)==2:
           mantissa = tup[0].rstrip('0').rstrip('.')
           exponent = tup[1].replace('+', '').lstrip('0')
           s = '%se%s' %(mantissa, exponent)
       return s

JDH

I strongly support stripping all unnecessary zeros from the labels. It seems a clear choice to me, but I would be interested to know if others are against it.

The third to last line needs changing, prepare for hackishness:
temp = tup[1][0] + tup[1][1:].lstrip('0') # strip zeros if there is a + or - sign
temp.replace('+', '')

The regexpr was more elegant, and I don't mind spending an evening learning it so I can continue to work with it. On the other hand, string methods are more accessible. Elegant or accessible? What would "upper management" pick?

Darren Dale wrote:

I strongly support stripping all unnecessary zeros from the labels. It
seems a clear choice to me, but I would be interested to know if others
are against it.

I agree that stripping unnecessary zeros from labels is a good
idea, but I also prefer preserving significant trailing zeros, and
do not like axes reading (for example):
   0.997 0.998 0.999 1 1.001 1.002

In my opinion, that '1' should be '1.000': the precision here is
significant. I also prefer more uniform series of '0.0 0.5 1.0'
to the jagged appearance of '0 0.5 1'. Maybe that's just my
interpretation of 'necessary'.

Currently, ScalarFormatter prefers '1' to '1.000'. I also see
that it uses the axis "span" to determine the formatting. I'm not
sure how easy it would be to change, but it might be better to use
the difference between axis ticks so that the formatting
guaranteed unique labels with "just enough precision".

Fortunately, matplotlib makes it easy enough to tweak the axis
formatting for those of us who obsess about such things. I've been
using FuncFormatter and the custom formatter below in a WX
plotting frame I'm working on (linear 2D plotting only so far).
I send it the list of ticks returned from the axis "major locator"
to determine the precision. The fallback is to use 0.1*span:
this is usually OK, but can be overly precise sometimes. There may
be a more efficient way of doing this, but I got a little lost in
ticker.py and axis.py....

The regexpr was more elegant, and I don't mind spending an
evening learning it so I can continue to work with it. On the
other hand, string methods are more accessible. Elegant or
accessible? What would "upper management" pick?

I think the main issue would be if using regular expressions would
be noticably faster. In this case, I'd guess not.

--Matt Newville

    self.axes.xaxis.set_major_formatter(FuncFormatter(self.format_x))
    self.axes.yaxis.set_major_formatter(FuncFormatter(self.format_y))

    def format_x(self,x,pos):
        " x-axis formatter "
        xticks = self.axes.xaxis.get_major_locator()()
        span = self.axes.xaxis.get_view_interval().span()
        return tick_formatter(x,span,ticks=xticks,pos=pos)
    
    def format_y(self,y,pos):
        " y-axis formatter "
        yticks = self.axes.yaxis.get_major_locator()()
        span = self.axes.yaxis.get_view_interval().span()
        return tick_formatter(y,span,ticks=yticks,pos=pos)

def tick_formatter(x, span, ticks=None,pos=0):
    """ home built tick formatter to use with FuncFormatter():
    x value to be formatted
    span span of axis, as from viewLim.span() -- required!!!!
    ticks optional list of ticks to format
    """
    fmt = '%1.6e'
    d = 0.1 * span
    try:
        d = abs(ticks[pos+1] - ticks[pos])
    except:
        pass
    if d > 99999: fmt = '%1.6e'
    elif d > 0.99: fmt = '%1.0f'
    elif d > 0.099: fmt = '%1.1f'
    elif d > 0.0099: fmt = '%1.2f'
    elif d > 0.00099: fmt = '%1.3f'
    elif d > 0.000099: fmt = '%1.4f'
    elif d > 0.0000099: fmt = '%1.5f'

    s = fmt % x
    s.strip()
    s = s.replace('+', '')
    while s.find('e0')>0: s = s.replace('e0','e')
    while s.find('-0')>0: s = s.replace('-0','-')
    return s

Matt Newville wrote:

Darren Dale wrote:

I strongly support stripping all unnecessary zeros from the labels. It seems a clear choice to me, but I would be interested to know if others are against it.
   
I agree that stripping unnecessary zeros from labels is a good
idea, but I also prefer preserving significant trailing zeros, and
do not like axes reading (for example):
  0.997 0.998 0.999 1 1.001 1.002

In my opinion, that '1' should be '1.000': the precision here is
significant. I also prefer more uniform series of '0.0 0.5 1.0'
to the jagged appearance of '0 0.5 1'. Maybe that's just my interpretation of 'necessary'.

Currently, ScalarFormatter prefers '1' to '1.000'. I also see
that it uses the axis "span" to determine the formatting. I'm not
sure how easy it would be to change, but it might be better to use
the difference between axis ticks so that the formatting
guaranteed unique labels with "just enough precision".

I have been mulling this over myself. Each label is formatted independently of the rest, so guessing the precision to display would be difficult. It would be easier if all labels were formatted together with one call to scalarFormatter, but I dont think that is a possibility. Alternatively, it would be possible to write custom formatter that accepted an integer to display the proper precision.

Matlab will truncate precision at 4 places past the decimal point. At that point the labels are already looking a bit long. Matlab also represents 1.0000 as 1, regardless of the other labels. I am planning a scalarFormatterMathtext, which will replace 1e6 with the more desirable $1*10^6$, and will consider precision.

Darren