How should I plot clustered data?

David_Frey · November 3, 2010, 5:18am

I am trying to use matplotlib (for the first time) to graph the address space
usage of an application against time. The data is written to a log file by
trace statements throughout the source code of the application. The trace
statements contain the current address space usage as well as a timer value
with millisecond granularity.

My data in the y-axis (address space usage) is fairly uniform (0-2000 MB
values), but my data in the x-axis (the time at which the the trace statements
were executed) is highly clustered. For example, I have approximately 150
data points over a 5 minute run, but some of the data points are only 10ms
apart.

I would like to annotate each point on the graph with the line number in the
log file so that the user can look up what was happening at that point. I have
succeeded, but the graph isn't readable because there is so much overlap in
the points.

Is there a standard way that people display data like this? I don't really
like the idea of equally spacing all of the points along the x-axis because
you lose the understanding of the timing. One idea I had was to have some
sort of vertical break in the graph at areas where there was a long gap
without a data point, but I have no idea whether it's possible to implement
something like that in matplotlib.

The output format hasn't been strictly specified, so if you have any ideas of
how I can produce a useful graph, I would be happy to hear them.

Thanks,
Dave

_John_Washakie · November 3, 2010, 9:41pm

log Y axis?

···

On Wed, Nov 3, 2010 at 6:18 AM, David Frey <dpfrey@...3339...> wrote:

I am trying to use matplotlib (for the first time) to graph the address space
usage of an application against time. The data is written to a log file by
trace statements throughout the source code of the application. The trace
statements contain the current address space usage as well as a timer value
with millisecond granularity.

My data in the y-axis (address space usage) is fairly uniform (0-2000 MB
values), but my data in the x-axis (the time at which the the trace statements
were executed) is highly clustered. For example, I have approximately 150
data points over a 5 minute run, but some of the data points are only 10ms
apart.

I would like to annotate each point on the graph with the line number in the
log file so that the user can look up what was happening at that point. I have
succeeded, but the graph isn't readable because there is so much overlap in
the points.

Is there a standard way that people display data like this? I don't really
like the idea of equally spacing all of the points along the x-axis because
you lose the understanding of the timing. One idea I had was to have some
sort of vertical break in the graph at areas where there was a long gap
without a data point, but I have no idea whether it's possible to implement
something like that in matplotlib.

The output format hasn't been strictly specified, so if you have any ideas of
how I can produce a useful graph, I would be happy to hear them.

Thanks,
Dave

------------------------------------------------------------------------------
Achieve Improved Network Security with IP and DNS Reputation.
Defend against bad network traffic, including botnets, malware,
phishing sites, and compromised hosts - saving your company time,
money, and embarrassment. Learn More!
http://p.sf.net/sfu/hpdev2dev-nov
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options

--
Configuration
``````````````````````````
Plone 2.5.3-final,
CMF-1.6.4,
Zope (Zope 2.9.7-final, python 2.4.4, linux2),
Python 2.6
PIL 1.1.6
Mailman 2.1.9
Postfix 2.4.5
Procmail v3.22 2001/09/10
Basemap: 1.0
Matplotlib: 1.0.0

Justin_McCann · November 4, 2010, 5:59pm

You might want to create multiple subplots, with some of the
subplots/axes zoomed in on the main axes. See this example:
http://matplotlib.sourceforge.net/examples/pylab_examples/axes_zoom_effect.html

It looks like the image isn't on the website. You can run the example
on your local machine by saving it from the [source code] link at the
top of the page.

That seems to work well if you know in advance how many zoom areas you
want, or are working with it interactively.

If you want to auto-generate the whole figure, you might want to try
something like this:
  - figure out how many zoom regions you need (e.g., by figuring out
how many clusters you have)
  - use figure.add_subplot() or axes_grid1
(http://matplotlib.sourceforge.net/mpl_toolkits/axes_grid/users/overview.html)
to place all of your separate axes
  - plot the main figure and all of the zoom regions

BTW, if the "axes_zoom_effect" image could be added to the gallery,
it's the example I was thinking about in the "Vlines across multiple
subplots" thread:
http://permalink.gmane.org/gmane.comp.python.matplotlib.general/24999

Hope that helps,
Justin

···

On Wed, Nov 3, 2010 at 1:18 AM, David Frey <dpfrey@...3339...> wrote:

...
My data in the y-axis (address space usage) is fairly uniform (0-2000 MB
values), but my data in the x-axis (the time at which the the trace statements
were executed) is highly clustered. For example, I have approximately 150
data points over a 5 minute run, but some of the data points are only 10ms
apart.

I would like to annotate each point on the graph with the line number in the
log file so that the user can look up what was happening at that point. I have
succeeded, but the graph isn't readable because there is so much overlap in
the points.