Print statistics for time series data whenever changing zoom level

I love to use matplotlib’s zoom button when I analyze time series data from panda’s dataframe.
I wonder if there is a way to print statistics like dataframe.describe() from the only data displayed in matplotlib chart whenever I change zoom level.

This stackoverflow answer edited by @Ernest basically describes the underlying mechanism. On every update of the x-axis (presuming that’s where you’re plotting time) reindex into the dataframe and print describe. Something like:

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
#
# Some toy data
df = pd.DataFrame(index=pd.date_range(start='1/1/2019', end='12/11/2019'))
df['data'] = np.sin(np.linspace(-10*np.pi, 10*np.pi, len(df.index)))

# Scatter plot
fig, ax = plt.subplots(1, 1)
df.plot(ax=ax)

# Declare and register callbacks
def on_xlims_change(axes):
    start, end = ax.get_xlim()
    ax.set_title(f"{start}-{end}")


ax.callbacks.connect('xlim_changed', on_xlims_change)
ax.callbacks.connect('ylim_changed', on_ylims_change)

where the new lims are then used to reindex back into the dataframe to print a new describe.
Another example of doing this sort of thing is https://matplotlib.org/3.1.1/gallery/event_handling/resample.html?highlight=xlim_changed

1 Like

Made a fully worked out example:

%matplotlib widget

import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
import pandas.plotting._matplotlib.converter as pdtc
import numpy as np
import itertools

df = pd.DataFrame(index=pd.date_range(start='1/1/2019', end='12/11/2019', freq='D'))
df['inverseD'] = [pdtc.get_datevalue(date, 'D') for date in df.index]
x = np.linspace(-10*np.pi, 10*np.pi, len(df.index))
df['sin'] = np.sin(x)
df['cos'] = np.cos(x)

# line plot
fig, (ax) = plt.subplots(constrained_layout=True)
df[['sin', 'cos']].plot(ax=ax)
stats = df.describe().T
nrows, ncols = stats.shape

table = ax.table(np.around(stats.values, -2), colLabels=stats.columns.to_list(), 
          rowLabels=stats.index.to_list(), loc='top', fontsize=12)

# Declare and register callbacks
def on_xlims_change(axes):
    start, end = ax.get_xlim()
    sub = df[(start<=df['inverseD']) & (df['inverseD'] <=end)]
    stats = sub.describe().T
    for r, c in list(itertools.product(range(nrows-1), range(ncols-1))):
        table[r+1, c+1].get_text().set_text(f'{stats.values[r,c]:.2f}')
    return [table]
    
ax.callbacks.connect('xlim_changed', on_xlims_change)

The inverseD column is to do the reverse lookup on zoom because this hits conversion machinery. You can find out what the convertor is for your time series data by looking at ax.xaxis.convert (h/t @jklymak).

Also interactive version that will maybe work: https://mybinder.org/v2/gh/story645/scraps/master

3 Likes

Wow you are so kind. It is wonderful to print all the stats in the chart. Thank you so much @story645.

I also made my own solution. It prints stats in console so I can get a history. It prints stats only when I press the button. With this way, I can collect all the stats what I want to see.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.widgets import Button


df = pd.DataFrame()

fig, ax = plt.subplots()
plt.subplots_adjust(bottom=0.2)
df['time'] = np.arange(0.0, 1.0, 0.001)
df['data'] = np.sin(2*np.pi*df['time'])
df.plot.scatter(x='time', y='data', s=1, ax=ax)


def print_stats(event):
    start, end = ax.get_xlim()
    cond1 = df['time'] > start 
    cond2 = df['time'] < end
    df_tmp = df[cond1 & cond2]
    print(df_tmp.describe())

b_loc_size = plt.axes([0.81, 0.05, 0.1, 0.075])
b = Button(b_loc_size, 'Stats', hovercolor='green')
b.on_clicked(print_stats)
plt.show()
3 Likes

No problem, I thought it was a really smart idea. I think your approach is also super sensible-love the button save. :smile:

1 Like