BrokenBarHCollection with pandas timeseries

Folks,

I'm trying to use BrokenBarHCollection with pandas timeseries object.

Here's a minimal example: (python 3.3, pandas 0.15.1, matplotlib 1.4.2)

···

#-----------------------------------------------------
import pandas as pd
import numpy as np
from datetime import datetime as dt
import matplotlib.pyplot as plt
import matplotlib.collections as collections
span_where = collections.BrokenBarHCollection.span_where

# init the dataframe
time = pd.date_range(pd.datetime(1950,1,1), periods=5, freq='MS')
df = pd.DataFrame(np.arange(5), index=time, columns=['data'])
df['cond'] = df['data'] == 3

# Make the plot
fig = plt.figure()
ax = fig.add_subplot(111)
df['data'].plot(ax=ax, c='black')
c = span_where(df.index, ymin=0, ymax=4, where=df['cond'], facecolor='green', alpha=0.5)
#-----------------------------------------------------

I get the error:
"TypeError: float() argument must be a string or a number"

Basically, span_where() is not happy with my x values which are a panda timeserie. I tried several stuffs (df.index.to_*) but there is something I still don't get in the internal representation of dates in matplolib.

Any hint? Thanks a lot!

Fabien

Please provide the full traceback. Could you also show df.info()? In any case, I suspect that the problem is that pandas recently started using datetime64 for their timeseries, and matplotlib hasn’t implemented the unit converter for it. There was a post recently showing how to add pandas’s converter to matplotlib’s unit framework, but I can’t find it right now…

Cheers!

Ben Root

···

On Tue, Dec 2, 2014 at 9:24 AM, Fabien <fabien.maussion@…287…> wrote:

Folks,

I’m trying to use BrokenBarHCollection with pandas timeseries object.

Here’s a minimal example: (python 3.3, pandas 0.15.1, matplotlib 1.4.2)

#-----------------------------------------------------

import pandas as pd

import numpy as np

from datetime import datetime as dt

import matplotlib.pyplot as plt

import matplotlib.collections as collections

span_where = collections.BrokenBarHCollection.span_where

init the dataframe

time = pd.date_range(pd.datetime(1950,1,1), periods=5, freq=‘MS’)

df = pd.DataFrame(np.arange(5), index=time, columns=[‘data’])

df[‘cond’] = df[‘data’] == 3

Make the plot

fig = plt.figure()

ax = fig.add_subplot(111)

df[‘data’].plot(ax=ax, c=‘black’)

c = span_where(df.index, ymin=0, ymax=4, where=df[‘cond’],

facecolor=‘green’, alpha=0.5)

#-----------------------------------------------------

I get the error:

“TypeError: float() argument must be a string or a number”

Basically, span_where() is not happy with my x values which are a panda

timeserie. I tried several stuffs (df.index.to_*) but there is something

I still don’t get in the internal representation of dates in matplolib.

Any hint? Thanks a lot!

Fabien


Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server

from Actuate! Instantly Supercharge Your Business Reports and Dashboards

with Interactivity, Sharing, Native Excel Exports, App Integration & more

Get technology previously reserved for billion-dollar corporations, FREE

http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk


Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users

sure, I pasted the traceback below. Here are the pandas infos:

In [17]: df.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 5 entries, 1950-01-01 00:00:00 to 1950-05-01 00:00:00
Freq: MS
Data columns (total 2 columns):
data 5 non-null int64
cond 5 non-null bool
dtypes: bool(1), int64(1)
memory usage: 85.0 bytes

In [18]: df.index
Out[18]:
<class 'pandas.tseries.index.DatetimeIndex'>
[1950-01-01, ..., 1950-05-01]
Length: 5, Freq: MS, Timezone: None

In [19]: df.index.values
Out[19]:
array(['1950-01-01T00:00:00.000000000Z', '1950-02-01T00:00:00.000000000Z',
        '1950-03-01T00:00:00.000000000Z', '1950-04-01T00:00:00.000000000Z',
        '1950-05-01T00:00:00.000000000Z'], dtype='datetime64[ns]')

Traceback:

In [16]: c = span_where(df.index, ymin=0, ymax=4, where=df['cond'], color='green')

···

On 02.12.2014 16:34, Benjamin Root wrote:

Please provide the full traceback

---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-16-d033044a6db2> in <module>()
----> 1 c = span_where(df.index, ymin=0, ymax=4, where=df['cond'], color='green')

/home/mowglie/.pyvirtualenvs/py3.3/lib/python3.3/site-packages/matplotlib/collections.py in span_where(x, ymin, ymax, where, **kwargs)
     871
     872 collection = BrokenBarHCollection(
--> 873 xranges, [ymin, ymax - ymin], **kwargs)
     874 return collection
     875

/home/mowglie/.pyvirtualenvs/py3.3/lib/python3.3/site-packages/matplotlib/collections.py in __init__(self, xranges, yrange, **kwargs)
     851 (xmin + xwidth, ymin),
     852 (xmin, ymin)] for xmin, xwidth in xranges]
--> 853 PolyCollection.__init__(self, verts, **kwargs)
     854
     855 @staticmethod

/home/mowglie/.pyvirtualenvs/py3.3/lib/python3.3/site-packages/matplotlib/collections.py in __init__(self, verts, sizes, closed, **kwargs)
     799 Collection.__init__(self, **kwargs)
     800 self.set_sizes(sizes)
--> 801 self.set_verts(verts, closed)
     802
     803 def set_verts(self, verts, closed=True):

/home/mowglie/.pyvirtualenvs/py3.3/lib/python3.3/site-packages/matplotlib/collections.py in set_verts(self, verts, closed)
     819 codes[0] = mpath.Path.MOVETO
     820 codes[-1] = mpath.Path.CLOSEPOLY
--> 821 self._paths.append(mpath.Path(xy, codes))
     822 else:
     823 self._paths.append(mpath.Path(xy))

/home/mowglie/.pyvirtualenvs/py3.3/lib/python3.3/site-packages/matplotlib/path.py in __init__(self, vertices, codes, _interpolation_steps, closed, readonly)
     135 vertices = vertices.astype(np.float_).filled(np.nan)
     136 else:
--> 137 vertices = np.asarray(vertices, np.float_)
     138
     139 if codes is not None:

/home/mowglie/.pyvirtualenvs/py3.3/lib/python3.3/site-packages/numpy/core/numeric.py in asarray(a, dtype, order)
     460
     461 """
--> 462 return array(a, dtype, copy=False, order=order)
     463
     464 def asanyarray(a, dtype=None, order=None):

TypeError: float() argument must be a string or a number

Does the workaround posted here fix things for you?
https://github.com/matplotlib/matplotlib/issues/3727#issuecomment-60899590

···

On Tue, Dec 2, 2014 at 10:43 AM, Fabien <fabien.maussion@…83…287…> wrote:

On 02.12.2014 16:34, Benjamin Root wrote:

Please provide the full traceback

sure, I pasted the traceback below. Here are the pandas infos:

In [17]: df.info()

<class ‘pandas.core.frame.DataFrame’>

DatetimeIndex: 5 entries, 1950-01-01 00:00:00 to 1950-05-01 00:00:00

Freq: MS

Data columns (total 2 columns):

data 5 non-null int64

cond 5 non-null bool

dtypes: bool(1), int64(1)

memory usage: 85.0 bytes

In [18]: df.index

Out[18]:

<class ‘pandas.tseries.index.DatetimeIndex’>

[1950-01-01, …, 1950-05-01]

Length: 5, Freq: MS, Timezone: None

In [19]: df.index.values

Out[19]:

array([‘1950-01-01T00:00:00.000000000Z’, ‘1950-02-01T00:00:00.000000000Z’,

    '1950-03-01T00:00:00.000000000Z', '1950-04-01T00:00:00.000000000Z',

    '1950-05-01T00:00:00.000000000Z'], dtype='datetime64[ns]')

Traceback:

In [16]: c = span_where(df.index, ymin=0, ymax=4, where=df[‘cond’],

color=‘green’)


TypeError Traceback (most recent call last)

in ()

----> 1 c = span_where(df.index, ymin=0, ymax=4, where=df[‘cond’],

color=‘green’)

/home/mowglie/.pyvirtualenvs/py3.3/lib/python3.3/site-packages/matplotlib/collections.py

in span_where(x, ymin, ymax, where, **kwargs)

 871

 872         collection = BrokenBarHCollection(

→ 873 xranges, [ymin, ymax - ymin], **kwargs)

 874         return collection

 875

/home/mowglie/.pyvirtualenvs/py3.3/lib/python3.3/site-packages/matplotlib/collections.py

in init(self, xranges, yrange, **kwargs)

 851                   (xmin + xwidth, ymin),

 852                   (xmin, ymin)] for xmin, xwidth in xranges]

→ 853 PolyCollection.init(self, verts, **kwargs)

 854

 855     @...4603...

/home/mowglie/.pyvirtualenvs/py3.3/lib/python3.3/site-packages/matplotlib/collections.py

in init(self, verts, sizes, closed, **kwargs)

 799         Collection.__init__(self, **kwargs)

 800         self.set_sizes(sizes)

→ 801 self.set_verts(verts, closed)

 802

 803     def set_verts(self, verts, closed=True):

/home/mowglie/.pyvirtualenvs/py3.3/lib/python3.3/site-packages/matplotlib/collections.py

in set_verts(self, verts, closed)

 819                     codes[0] = mpath.Path.MOVETO

 820                     codes[-1] = mpath.Path.CLOSEPOLY

→ 821 self._paths.append(mpath.Path(xy, codes))

 822                 else:

 823                     self._paths.append(mpath.Path(xy))

/home/mowglie/.pyvirtualenvs/py3.3/lib/python3.3/site-packages/matplotlib/path.py

in init(self, vertices, codes, _interpolation_steps, closed, readonly)

 135             vertices = vertices.astype(np.float_).filled(np.nan)

 136         else:

→ 137 vertices = np.asarray(vertices, np.float_)

 138

 139         if codes is not None:

/home/mowglie/.pyvirtualenvs/py3.3/lib/python3.3/site-packages/numpy/core/numeric.py

in asarray(a, dtype, order)

 460

 461     """

→ 462 return array(a, dtype, copy=False, order=order)

 463

 464 def asanyarray(a, dtype=None, order=None):

TypeError: float() argument must be a string or a number


Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server

from Actuate! Instantly Supercharge Your Business Reports and Dashboards

with Interactivity, Sharing, Native Excel Exports, App Integration & more

Get technology previously reserved for billion-dollar corporations, FREE

http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk


Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users

sorry it doesn't.

I updated the test case below (including the workaround, I hope I got it right). The strange thing is that fill_between() works fine, but pan_where() is the problem.

Thanks!

···

On 02.12.2014 16:59, Benjamin Root wrote:

Does the workaround posted here fix things for you?
https://github.com/matplotlib/matplotlib/issues/3727#issuecomment-60899590

#-------------------------------------------
import pandas as pd
import numpy as np
from datetime import datetime as dt
import matplotlib.pyplot as plt
import matplotlib.collections as collections
span_where = collections.BrokenBarHCollection.span_where
import matplotlib.units as units

units.registry[np.datetime64] = pd.tseries.converter.DatetimeConverter()

# init the dataframe
time = pd.date_range(pd.datetime(1950,1,1), periods=5, freq='MS')
df = pd.DataFrame(np.arange(5), index=time, columns=['data'])
df['cond'] = df['data'] >= 3

# This is working (but its not what I want)
x = np.arange(5)
fig = plt.figure()
ax = fig.add_subplot(111)
plt.plot(x, df['data'], 'k')
c = span_where(x, ymin=0, ymax=4, where=df['cond'], color='green')
ax.add_collection(c)
plt.show()

#This is not
x = df.index.values
fig = plt.figure()
ax = fig.add_subplot(111)
plt.plot(x, df['data'], 'k')
c = span_where(x, ymin=0, ymax=4, where=df['cond'], color='green')
ax.add_collection(c)
plt.show()

#This is producing an error
x = df.index
fig = plt.figure()
ax = fig.add_subplot(111)
plt.plot(x, df['data'], 'k')
c = span_where(x, ymin=0, ymax=4, where=df['cond'], color='green')
ax.add_collection(c)
plt.show()
#-------------------------------------------

Ok, then this looks like a legitimate bug in span_where(). It probably isn’t applying units, somehow. This isn’t really a problem with pandas, it is an issue where we aren’t being consistent in applying units for all plotting functions. Could you file a bug report, please?

Cheers!
Ben Root

···

On Tue, Dec 2, 2014 at 11:15 AM, Fabien <fabien.maussion@…287…> wrote:

On 02.12.2014 16:59, Benjamin Root wrote:

Does the workaround posted here fix things for you?

https://github.com/matplotlib/matplotlib/issues/3727#issuecomment-60899590

sorry it doesn’t.

I updated the test case below (including the workaround, I hope I got it

right). The strange thing is that fill_between() works fine, but

pan_where() is the problem.

Thanks!

#-------------------------------------------

import pandas as pd

import numpy as np

from datetime import datetime as dt

import matplotlib.pyplot as plt

import matplotlib.collections as collections

span_where = collections.BrokenBarHCollection.span_where

import matplotlib.units as units

units.registry[np.datetime64] = pd.tseries.converter.DatetimeConverter()

init the dataframe

time = pd.date_range(pd.datetime(1950,1,1), periods=5, freq=‘MS’)

df = pd.DataFrame(np.arange(5), index=time, columns=[‘data’])

df[‘cond’] = df[‘data’] >= 3

This is working (but its not what I want)

x = np.arange(5)

fig = plt.figure()

ax = fig.add_subplot(111)

plt.plot(x, df[‘data’], ‘k’)

c = span_where(x, ymin=0, ymax=4, where=df[‘cond’], color=‘green’)

ax.add_collection(c)

plt.show()

#This is not

x = df.index.values

fig = plt.figure()

ax = fig.add_subplot(111)

plt.plot(x, df[‘data’], ‘k’)

c = span_where(x, ymin=0, ymax=4, where=df[‘cond’], color=‘green’)

ax.add_collection(c)

plt.show()

#This is producing an error

x = df.index

fig = plt.figure()

ax = fig.add_subplot(111)

plt.plot(x, df[‘data’], ‘k’)

c = span_where(x, ymin=0, ymax=4, where=df[‘cond’], color=‘green’)

ax.add_collection(c)

plt.show()

#-------------------------------------------


Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server

from Actuate! Instantly Supercharge Your Business Reports and Dashboards

with Interactivity, Sharing, Native Excel Exports, App Integration & more

Get technology previously reserved for billion-dollar corporations, FREE

http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk


Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users

OK I just filled a bug report:

my first bug report ever!

···

On 02.12.2014 17:15, Fabien wrote:

On 02.12.2014 16:59, Benjamin Root wrote:

Does the workaround posted here fix things for you?
https://github.com/matplotlib/matplotlib/issues/3727#issuecomment-60899590

sorry it doesn't.

I updated the test case below (including the workaround, I hope I got it
right). The strange thing is that fill_between() works fine, but
pan_where() is the problem.

Thanks!

#-------------------------------------------
import pandas as pd
import numpy as np
from datetime import datetime as dt
import matplotlib.pyplot as plt
import matplotlib.collections as collections
span_where = collections.BrokenBarHCollection.span_where
import matplotlib.units as units

units.registry[np.datetime64] = pd.tseries.converter.DatetimeConverter()

# init the dataframe
time = pd.date_range(pd.datetime(1950,1,1), periods=5, freq='MS')
df = pd.DataFrame(np.arange(5), index=time, columns=['data'])
df['cond'] = df['data'] >= 3

# This is working (but its not what I want)
x = np.arange(5)
fig = plt.figure()
ax = fig.add_subplot(111)
plt.plot(x, df['data'], 'k')
c = span_where(x, ymin=0, ymax=4, where=df['cond'], color='green')
ax.add_collection(c)
plt.show()

#This is not
x = df.index.values
fig = plt.figure()
ax = fig.add_subplot(111)
plt.plot(x, df['data'], 'k')
c = span_where(x, ymin=0, ymax=4, where=df['cond'], color='green')
ax.add_collection(c)
plt.show()

#This is producing an error
x = df.index
fig = plt.figure()
ax = fig.add_subplot(111)
plt.plot(x, df['data'], 'k')
c = span_where(x, ymin=0, ymax=4, where=df['cond'], color='green')
ax.add_collection(c)
plt.show()
#-------------------------------------------