example data in example code

John Hunter <jdh2358@...149...> writes:

    # TODO: how to handle stale data in the cache that has been
    # updated from svn -- is there a clean http way to get the current
    # revision number that will not leave us at the mercy of html
    # changes at sf?

The mod_dav_svn server sends an ETag header that happens to contain the
revision number where the file was last modified, and a Last-Modified
header that contains the date of that revision. The clean http way to
make use of these is to make a conditional request - I hacked up a
processor class for urllib2 that does this, and checked it in.

···

--
Jouni K. Sepp�nen
http://www.iki.fi/jks

Wow, that is really clever and cool. Nicely done. I added
mpl_data/testdata.csv which is easier to modify than lena.png to test
the revision control and it worked beautifully
(examples/misc/mpl_data_test.py)

I didn't understand this part of the code:

        fn = rightmost
        while os.path.exists(self.in_cache_dir(fn)):
            fn = rightmost + '.' + str(random.randint(0,9999999))

when would there be a name clash that would require the randint appended?

Also, how hard would it be to add support for a directory structure?
I see you are getting the filename from the url as the last thing past
the '/'. Is there any way to generalize this so a relative path could
be supported in the svn repo and local cache dir?

JDH

···

On Tue, Aug 4, 2009 at 2:45 PM, Jouni K. Seppänen<jks@...278...> wrote:

The mod_dav_svn server sends an ETag header that happens to contain the
revision number where the file was last modified, and a Last-Modified
header that contains the date of that revision. The clean http way to
make use of these is to make a conditional request - I hacked up a
processor class for urllib2 that does this, and checked it in.

John Hunter <jdh2358@...149...> writes:

\# TODO: how to handle stale data in the cache that has been
\# updated from svn \-\- is there a clean http way to get the current
\# revision number that will not leave us at the mercy of html
\# changes at sf?

The mod_dav_svn server sends an ETag header that happens to contain the
revision number where the file was last modified, and a Last-Modified
header that contains the date of that revision. The clean http way to
make use of these is to make a conditional request - I hacked up a
processor class for urllib2 that does this, and checked it in.

Also, it would be preferable for the returned file object which
supports the "seek" method. This is what cbook.to_filehandle checks
for, and what mlab.csv2rec uses to rewind the file after doing a data
introspection pass through to get the data types. Eg,

import matplotlib.mlab as mlab
import matplotlib.cbook as cbook
r = mlab.csv2rec( cbook.get_mpl_data('testdata.csv') )

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jdhunter/dev/lib/python2.6/site-packages/matplotlib/mlab.py",
line 2108, in csv2rec
    fh = cbook.to_filehandle(fname)
  File "/Users/jdhunter/dev/lib/python2.6/site-packages/matplotlib/cbook.py",
line 339, in to_filehandle
    raise ValueError('fname must be a string or file handle')
ValueError: fname must be a string or file handle

Perhaps we could return a plain file handle pointing to the cached data?

JDH

···

On Tue, Aug 4, 2009 at 2:45 PM, Jouni K. Seppänen<jks@...278...> wrote:

Another option is to use StringIO to create a new file-like object after read()-ing in all the data.

Ryan

···

On Wed, Aug 5, 2009 at 7:11 AM, John Hunter <jdh2358@…149…> wrote:

import matplotlib.mlab as mlab
import matplotlib.cbook as cbook

r = mlab.csv2rec( cbook.get_mpl_data(‘testdata.csv’) )

Traceback (most recent call last):

File “”, line 1, in

File “/Users/jdhunter/dev/lib/python2.6/site-packages/matplotlib/mlab.py”,

line 2108, in csv2rec

fh = cbook.to_filehandle(fname)

File “/Users/jdhunter/dev/lib/python2.6/site-packages/matplotlib/cbook.py”,

line 339, in to_filehandle

raise ValueError('fname must be a string or file handle')

ValueError: fname must be a string or file handle

Perhaps we could return a plain file handle pointing to the cached data?


Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma

OK, I've made a few changes to the code so Jouni you will probably
want to review them

* I renamed the svn repo and function to be "sample_data" rather than
"mpl_data" to avoid confusion with lib/matplotlib/mpl-data. The svn
repo, the examples and the cbook function have all been renamed. The
repo is ::

    svn co https://matplotlib.svn.sourceforge.net/svnroot/matplotlib/trunk/sample_data

  and the examples are::

   johnh@...749...:mpl> ls examples/misc/sam*.py
   examples/misc/sample_data_demo.py examples/misc/sample_data_test.py

* I added support for nested subdirs, so you can now do, as in
examples/misc/sample_data_test.py::

    datafile = 'testdir/subdir/testsub.csv'
    fh = cbook.get_sample_data(datafile)

* I commented out the random number appending, because I do not see
the use case, but we can re-add it when you enlighten me :slight_smile:

* I always return a file handle to the cached file, so seek works, and
is exercised in examples/misc/sample_data_test.py

It is probably worth doing a little more work to make the processor
plus the "get_sample_data" function all part of one class, so other
people can reuse it with other repos and other dirs. Eg, something
like the following in cbook::

  myserver = ViewVCCacheServer(mycachedir, myurlbase)
  get_sample_data = myserver.get_sample_data

···

On Wed, Aug 5, 2009 at 7:11 AM, John Hunter<jdh2358@...149...> wrote:

Perhaps we could return a plain file handle pointing to the cached data?