Experiments in removing/replacing PyCXX

but some of that complexity could be reduced by using Numpy arrays in place of the
image buffer types that each of them contain

OR Cython arrays and/or memoryviews -- this is indeed a real strength of Cython.

Sure, but when we return to Python, they should be Numpy arrays which
have more methods etc. -- or am I missing something?

Cython makes it really easy to switch between ndarrays and
memoryviews, etc -- it's a question of what you want to work with in
your code, so you have write a function that takes numpy arrays and
returns numpy arrays, but uses a memoryview internally (and passes to
C code that way). But I'm not an expert on this, I'mve found that I'm
either doing simplestuff where using numpy arrays directly works fine,
or passing the pointer to the data array off to C:

def a_function_to_call_C( cnp.ndarray[double, ndim=2, mode="c" ] in_array ):
    """
    calls the_c_function, altering the array in-place
    """
     cdef int m, n
     m = in_array.size[0]
     m = in_array.size[1]
     the_c_function( &in_array[0], m, n )

It does support the C99 fixed-width integer types:
from libc.stdint cimport int16_t, int32_t,

The problem is that Cython can't actually read the C header,

yeah, this is a pity. There has been some work on auto-generating
Cython from C headers, though nothing mature. For my work, I've been
considering writing some simple pyd-generating code, just to make sure
my data types are inline with the C++ as it may change.

so there
are types in libpng, for example, that we don't actually know the size
of. They are different on different platforms. In C, you just include
the header. In Cython, I'd have to determine the size of the types in a
pre-compilation step, or manually determine their sizes and hard code
them for the platforms we care about.

yeah -- this is a tricky problem, however, I think you can follow what
you'd do in C -- i.e. presumable the header define their own data
types: png_short or whatever. The actually definition is filled in by
the pre-processor. So I wonder if you can declare those types in
Cython, then have it write C code that uses those types, and it all
gets cleared up at compile time -- maybe. The key is that when you
declare stuff in Cython, that declaration is used to determine how to
write the C code, I don't think the declarations themselves are
translated.

It would at least make this a more fair comparison to have the Cython
code as Cythonic as possible. However, I couldn't find any ways around
using these particular APIs -- other than the Numpy stuff which probably
does have a more elegant solution in the form of Cython arrays and
memory views.

yup -- that's what I noticed right away -- I"m note sure it there is
easier handling of file handles.

True. We do have two categories of stuff using PyCXX in matplotlib:
things that (primarily) wrap third-party C/C++ libraries, and things
that are actually doing algorithmic heavy lifting. It's quite possible
we don't want the same solution for all.

And I'm not sure the wrappers all need to be written the same way, either.

-Chris

···

On Mon, Dec 3, 2012 at 11:59 AM, Michael Droettboom <mdroe@...31...> wrote:
--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@...236...

so there
are types in libpng, for example, that we don't actually know the size
of. They are different on different platforms. In C, you just include
the header. In Cython, I'd have to determine the size of the types in a
pre-compilation step, or manually determine their sizes and hard code
them for the platforms we care about.

yeah -- this is a tricky problem, however, I think you can follow what
you'd do in C -- i.e. presumable the header define their own data
types: png_short or whatever. The actually definition is filled in by
the pre-processor. So I wonder if you can declare those types in
Cython, then have it write C code that uses those types, and it all
gets cleared up at compile time -- maybe. The key is that when you
declare stuff in Cython, that declaration is used to determine how to
write the C code, I don't think the declarations themselves are
translated.

Yeah, this isn't an issue in Cython, it's a totally standard thing
(though perhaps not well documented). When you write

  cdef extern from "png.h":
      ctypedef int png_short

or whatever, what you are saying is "the C compiler knows about a type
called png_short, which acts in an int-like fashion, so Cython, please
use your int rules when dealing with it". So this means that Cython
will know that if you return a png_short from a python function, it
should insert a call to PyInt_FromLong (or maybe PyInt_FromSsize_t? --
cython worries about these things so I don't have to). But Cython only
takes care of the Python<->C interface. It will leave the C compiler
to actually allocate the appropriate memory for png_shorts, perform C
arithmetic, coerce a png_short into a 'long' when necessary, etc.

It's kind of mind-bending to wrap your head around, and it definitely
does help to spend some time reading the C code that Cython spits out
to understand how the mapping works (it's both more and less magic
than it looks -- Python stuff gets carefully expanded, C stuff goes
through almost verbatim), but the end result works amazingly well.

It would at least make this a more fair comparison to have the Cython
code as Cythonic as possible. However, I couldn't find any ways around
using these particular APIs -- other than the Numpy stuff which probably
does have a more elegant solution in the form of Cython arrays and
memory views.

yup -- that's what I noticed right away -- I"m note sure it there is
easier handling of file handles.

For the file handle, I would just write

  cdef FILE *fp = fdopen(file_obj.fileno(), "w")

and be done with it. This will work with any version of Python etc.

-n

···

On Mon, Dec 3, 2012 at 8:24 PM, Chris Barker - NOAA Federal <chris.barker@...236...> wrote:

On Mon, Dec 3, 2012 at 11:59 AM, Michael Droettboom <mdroe@...31...> wrote:

OK -- so I poked at it, and this is my (very untested) version of
write_png (I left out the py3 stuff, though it does look like it may
be required for file handling...

Letting Cython unpack the numpy array is the real win. Maybe having it
this simple won't work for MPL, but this is what my code tends to look
like.

def write_png(cnp.ndarray[cnp.uint32, ndim=2, mode="c" ] buff not None,
              file_obj,
              double dpi=0.0):

    cdef png_uint_32 width = buff.size[0]
    cdef png_uint_32 height = buff.size[1]

    if PyFile_CheckExact(file_obj):
        cdef FILE *fp = fdopen(file_obj.fileno(), "w")
        fp = PyFile_AsFile(file_obj)
        write_png_c(buff[0,0], width, height, fp,
                    NULL, NULL, NULL, dpi)
        return
    else:
        raise TypeError("write_png only works with real PyFileObject")

NOTE: that could be:

cnp.ndarray[cnp.uint8, ndim=3, mode="c" ]

I'm not sure how MPL stores image buffers.

or you could accept any object, then call:

np.view()

-Chris

···

On Mon, Dec 3, 2012 at 12:24 PM, Chris Barker - NOAA Federal <chris.barker@...236...> wrote:

but some of that complexity could be reduced by using Numpy arrays in place

It would at least make this a more fair comparison to have the Cython
code as Cythonic as possible. However, I couldn't find any ways around
using these particular APIs -- other than the Numpy stuff which probably
does have a more elegant solution in the form of Cython arrays and
memory views.

--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@...236...

Yeah, this is a general problem with the Python file API, trying to
hook it up to stdio is not at all an easy thing. A better version of
this code would skip that altogether like:

cdef void write_to_pyfile(png_structp s, png_bytep data, png_size_t count):
    fobj = <object>png_get_io_ptr(s)
    pydata = PyString_FromStringAndSize(data, count)
    fobj.write(pydata)

cdef void flush_pyfile(png_structp s):
    # Not sure if this is even needed
    fobj = <object>png_get_io_ptr(s)
    fobj.flush()

# in write_png:
write_png_c(<png_byte*>pix_buffer, width, height,
  NULL, <void*>file_obj, write_to_pyfile, flush_pyfile, dpi)

But this is a separate issue :slight_smile: (and needs further fiddling to make
exception handling work).

Or if you're only going to work on real OS-level file objects anyway,
you might as well just accept a filename as a string and fopen() it
locally. Having Python do the fopen just makes your life harder for no
reason.

-n

···

On Mon, Dec 3, 2012 at 11:50 PM, Chris Barker - NOAA Federal <chris.barker@...236...> wrote:

On Mon, Dec 3, 2012 at 2:21 PM, Nathaniel Smith <njs@...503...> wrote:

For the file handle, I would just write

  cdef FILE *fp = fdopen(file_obj.fileno(), "w")

and be done with it. This will work with any version of Python etc.

yeah, that makes sense -- though what if you want to be able to
read_to/write_from a file that is already open, and in the middle of
the file somewhere -- would that work?

I just posted a question to the Cython list, and indeed, it looks like
there is no easy answer to the file issue.

Good point -- not at all Cython-specific, but do you need libpng (or
whatever) to write to the file? can you just get a buffer with the
encoded data and write it on the Python side? Particularly if the user
wants to pass in an open file object. This might be a better API for
folks that might want stream an image right through a web app, too.

As a lot of Python APIs take either a file name or a file-like object,
perhaps it would make sense to push that distinction down to the
Cython level:
  -- if it's a filename, open it with raw C
  -- if it's a file-like object, have libpng write to a buffer (bytes
object) , and pass that to the file-like object in Python

anyway, not really a Cython issue, but that second object sure would
be easy on Cython....

-Chris

···

On Mon, Dec 3, 2012 at 4:16 PM, Nathaniel Smith <njs@...503...> wrote:

Yeah, this is a general problem with the Python file API, trying to
hook it up to stdio is not at all an easy thing. A better version of
this code would skip that altogether like:

cdef void write_to_pyfile(png_structp s, png_bytep data, png_size_t count):
    fobj = <object>png_get_io_ptr(s)
    pydata = PyString_FromStringAndSize(data, count)
    fobj.write(pydata)

--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@...236...

Not entirely relevant to the PyCXX discussion, but to avoid misleading others reading this discussion, I must strongly disagree with your assertion about Cython's usefulness for wrapping C libraries or small chunks of C. I think this has always been a primary function of Cython and Pyrex, as far back as I have been aware of them. I wrote the raw interface to our contouring code, and I have written cython interfaces to various chunks of C outside of mpl; and cython makes it much easier for a non-professional programmer such as myself. So I am not arguing that Cython should be the choice for removing PyCXX, but for non-wizards, it can work very well as glue. It is much more approachable than any alternative of which I am aware. For Fortran, of course, f2py plays this glue code generation role.

Eric

···

On 2012/12/03 4:54 AM, Michael Droettboom wrote:

I think Cython is well suited to writing new algorithmic code to speed
up hot spots in Python code. I don't think it's as well suited as glue
between C and Python -- that was not a main goal of the original Pyrex
project, IIRC. It feels kind of tacked on and not a very good fit to
the problem.

The buffer comes in both ways, so the latter solution seems like the thing to do.

Thanks for working this through. This sort of thing is very helpful.

We can also, of course, maintain the existing code that allows writing to an arbitrary file-like object, but this fast path (where it is a "real" file) is very important. It's significantly faster than calling methods on Python objects.

Mike

···

On 12/03/2012 07:00 PM, Chris Barker - NOAA Federal wrote:

On Mon, Dec 3, 2012 at 12:24 PM, Chris Barker - NOAA Federal > <chris.barker@...236...> wrote:

but some of that complexity could be reduced by using Numpy arrays in place

It would at least make this a more fair comparison to have the Cython
code as Cythonic as possible. However, I couldn't find any ways around
using these particular APIs -- other than the Numpy stuff which probably
does have a more elegant solution in the form of Cython arrays and
memory views.

OK -- so I poked at it, and this is my (very untested) version of
write_png (I left out the py3 stuff, though it does look like it may
be required for file handling...

Letting Cython unpack the numpy array is the real win. Maybe having it
this simple won't work for MPL, but this is what my code tends to look
like.

def write_png(cnp.ndarray[cnp.uint32, ndim=2, mode="c" ] buff not None,
               file_obj,
               double dpi=0.0):

     cdef png_uint_32 width = buff.size[0]
     cdef png_uint_32 height = buff.size[1]

     if PyFile_CheckExact(file_obj):
         cdef FILE *fp = fdopen(file_obj.fileno(), "w")
         fp = PyFile_AsFile(file_obj)
         write_png_c(buff[0,0], width, height, fp,
                     NULL, NULL, NULL, dpi)
         return
     else:
         raise TypeError("write_png only works with real PyFileObject")

NOTE: that could be:

cnp.ndarray[cnp.uint8, ndim=3, mode="c" ]

I'm not sure how MPL stores image buffers.

or you could accept any object, then call:

np.view()

For the file handle, I would just write

   cdef FILE *fp = fdopen(file_obj.fileno(), "w")

and be done with it. This will work with any version of Python etc.

yeah, that makes sense -- though what if you want to be able to
read_to/write_from a file that is already open, and in the middle of
the file somewhere -- would that work?

I just posted a question to the Cython list, and indeed, it looks like
there is no easy answer to the file issue.

Yeah, this is a general problem with the Python file API, trying to
hook it up to stdio is not at all an easy thing. A better version of
this code would skip that altogether like:

cdef void write_to_pyfile(png_structp s, png_bytep data, png_size_t count):
     fobj = <object>png_get_io_ptr(s)
     pydata = PyString_FromStringAndSize(data, count)
     fobj.write(pydata)

cdef void flush_pyfile(png_structp s):
     # Not sure if this is even needed
     fobj = <object>png_get_io_ptr(s)
     fobj.flush()

# in write_png:
write_png_c(<png_byte*>pix_buffer, width, height,
   NULL, <void*>file_obj, write_to_pyfile, flush_pyfile, dpi)

This is what my original version already does in the event that the file_obj is not a "real" file. In practice, you need to support both methods -- the callback approach is many times slower than writing directly to a regular old FILE object, because there is overhead both at the libpng and Python level, and there's no way to select a good buffer size.

But this is a separate issue :slight_smile: (and needs further fiddling to make
exception handling work).

Or if you're only going to work on real OS-level file objects anyway,
you might as well just accept a filename as a string and fopen() it
locally. Having Python do the fopen just makes your life harder for no
reason.

There's actually a very good reason. It is difficult to deal with Unicode in file paths from C in a portable way. On Windows, for example, if the user's name contains non-ascii characters, you can't write to the home directory using fopen, etc. It's doable with some care by using platform-specific C APIs etc., but CPython has already done all of the hard work for us, so it's easiest just to leverage that by opening the file from Python.

Mike

···

On 12/03/2012 07:16 PM, Nathaniel Smith wrote:

On Mon, Dec 3, 2012 at 11:50 PM, Chris Barker - NOAA Federal > <chris.barker@...236...> wrote:

On Mon, Dec 3, 2012 at 2:21 PM, Nathaniel Smith <njs@...503...> wrote:

Yeah, this is a general problem with the Python file API, trying to
hook it up to stdio is not at all an easy thing. A better version of
this code would skip that altogether like:

cdef void write_to_pyfile(png_structp s, png_bytep data, png_size_t count):
     fobj = <object>png_get_io_ptr(s)
     pydata = PyString_FromStringAndSize(data, count)
     fobj.write(pydata)

Good point -- not at all Cython-specific, but do you need libpng (or
whatever) to write to the file? can you just get a buffer with the
encoded data and write it on the Python side? Particularly if the user
wants to pass in an open file object. This might be a better API for
folks that might want stream an image right through a web app, too.

You need to support both: raw C FILE objects for speed, and writing to a Python file-like object for flexibility. The code in master already does this (albeit with PyCXX), and the code on my "No CXX" branch does this as well with Cython.

As a lot of Python APIs take either a file name or a file-like object,
perhaps it would make sense to push that distinction down to the
Cython level:
   -- if it's a filename, open it with raw C

Unfortunately, as stated in detail in my last e-mail, that doesn't work with Unicode paths.

   -- if it's a file-like object, have libpng write to a buffer (bytes
object) , and pass that to the file-like object in Python

libpng does one better and allows us to stream directly to a callback which can then write to a Python object. This prevents double allocation of memory.

anyway, not really a Cython issue, but that second object sure would
be easy on Cython....

Yeah -- once I figured out how to make a real C callback function from Cython, the contents of the callback function itself is pretty easy to write.

Mike

···

On 12/03/2012 08:01 PM, Chris Barker - NOAA Federal wrote:

On Mon, Dec 3, 2012 at 4:16 PM, Nathaniel Smith <njs@...503...> wrote:

Also -- this feedback is really helpful when writing some comments in the wrappers as to why certain things are the way they are... I'll make sure to include rationales for raw file fast path and the need to open the files on the Python side.

Mike

···

On 12/04/2012 08:45 AM, Michael Droettboom wrote:

On 12/03/2012 08:01 PM, Chris Barker - NOAA Federal wrote:

On Mon, Dec 3, 2012 at 4:16 PM, Nathaniel Smith <njs@...503...> wrote:

Yeah, this is a general problem with the Python file API, trying to
hook it up to stdio is not at all an easy thing. A better version of
this code would skip that altogether like:

cdef void write_to_pyfile(png_structp s, png_bytep data, png_size_t count):
      fobj = <object>png_get_io_ptr(s)
      pydata = PyString_FromStringAndSize(data, count)
      fobj.write(pydata)

Good point -- not at all Cython-specific, but do you need libpng (or
whatever) to write to the file? can you just get a buffer with the
encoded data and write it on the Python side? Particularly if the user
wants to pass in an open file object. This might be a better API for
folks that might want stream an image right through a web app, too.

You need to support both: raw C FILE objects for speed, and writing to a
Python file-like object for flexibility. The code in master already
does this (albeit with PyCXX), and the code on my "No CXX" branch does
this as well with Cython.

As a lot of Python APIs take either a file name or a file-like object,
perhaps it would make sense to push that distinction down to the
Cython level:
    -- if it's a filename, open it with raw C

Unfortunately, as stated in detail in my last e-mail, that doesn't work
with Unicode paths.

    -- if it's a file-like object, have libpng write to a buffer (bytes
object) , and pass that to the file-like object in Python

libpng does one better and allows us to stream directly to a callback
which can then write to a Python object. This prevents double
allocation of memory.

anyway, not really a Cython issue, but that second object sure would
be easy on Cython....

Yeah -- once I figured out how to make a real C callback function from
Cython, the contents of the callback function itself is pretty easy to
write.

Mike

------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
matplotlib-devel List Signup and Options

As far as I'm concerned, this is an argument against Cython.

I've had to touch the C/C++/ObjC codebase. It was not automatically
generated by Cython and it's not that hard to read. There's almost
certainly a C/C++/ObjC expert around to help out. There's almost
certainly Cython experts to help out, too. There is almost certainly
*not* an expert in Cython-generated C code that is hard to read.

I vote raw Python/C API. Managing reference counters is not the
mundane task pythonistas make it out to be, in my opinion. If you know
ObjC, you've had to do your own reference counting. If you know C,
you've had to do your own memory management. If you know C++, you've
had to do your own new/delete (or destructor) management. I agree not
having to worry about reference counting is nice positive, but I don't
think it outweighs the negatives.

It seems to me that Cython is a 'middle-man' tool, with the added
downside of hard-to-maintain under-code.

···

On Mon, Dec 3, 2012 at 12:12 PM, Chris Barker - NOAA Federal <chris.barker@...236...> wrote:

generated code is ugly and hard to maintain, it is not designed to be
human-readable, and we wouldn't get the advantages of bug-fixes
further development in Cython.

--
Damon McDougall
http://www.damon-is-a-geek.com
Institute for Computational Engineering Sciences
201 E. 24th St.
Stop C0200
The University of Texas at Austin
Austin, TX 78712-1229

You've had to touch the C/C++/ObjC because that's the only source that
exists; in this case that's the C *is* the implementation of the wrapper.

If we go Cython, the cython source is all that is maintained. It may be
useful to glance at generated code, but no-one should be tweaking it by
hand--the Cython source, and only the Cython source, represents the
implementation of the wrapper.

Ryan

···

On Tue, Dec 4, 2012 at 4:07 PM, Damon McDougall <damon.mcdougall@...149...>wrote:

On Mon, Dec 3, 2012 at 12:12 PM, Chris Barker - NOAA Federal > <chris.barker@...236...> wrote:
> generated code is ugly and hard to maintain, it is not designed to be
> human-readable, and we wouldn't get the advantages of bug-fixes
> further development in Cython.

As far as I'm concerned, this is an argument against Cython.

I've had to touch the C/C++/ObjC codebase. It was not automatically
generated by Cython and it's not that hard to read. There's almost
certainly a C/C++/ObjC expert around to help out. There's almost
certainly Cython experts to help out, too. There is almost certainly
*not* an expert in Cython-generated C code that is hard to read.

--
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma

generated code is ugly and hard to maintain, it is not designed to be
human-readable, and we wouldn't get the advantages of bug-fixes
further development in Cython.

As far as I'm concerned, this is an argument against Cython.

Nonsense. It is an argument against the idea of maintaining the generated code directly, rather than maintaining the cython source code and regenerating the C code as needed. That idea never made any sense in the first place. I doubt that anyone follows it. Chris already pointed this out. Would you maintain the assembly code generated by your C++ compiler? Do you consider the fact that this is unreadable and unmaintainable a reason to avoid using that compiler, and instead to code directly in assembly?

I've had to touch the C/C++/ObjC codebase. It was not automatically
generated by Cython and it's not that hard to read. There's almost
certainly a C/C++/ObjC expert around to help out. There's almost
certainly Cython experts to help out, too. There is almost certainly
*not* an expert in Cython-generated C code that is hard to read.

There doesn't need to be.

I vote raw Python/C API. Managing reference counters is not the
mundane task pythonistas make it out to be, in my opinion. If you know
ObjC, you've had to do your own reference counting. If you know C,
you've had to do your own memory management. If you know C++, you've
had to do your own new/delete (or destructor) management. I agree not
having to worry about reference counting is nice positive, but I don't
think it outweighs the negatives.

You have completely misrepresented the negatives.

It seems to me that Cython is a 'middle-man' tool, with the added
downside of hard-to-maintain under-code.

Please, if you don't use Cython yourself, and therefore don't know it well, refrain from these sorts of criticisms. In normal cython use, one *never* modifies the code it generates. In developing with cython, one *might* read this code to find out what is going on, and especially to find out whether one inadvertently triggered a call to the python API by forgetting to declare a variable, for example. This is pretty easy, because the comments in the generated code show exactly which source line has generated each chunk of generated code. Context is included. It is very nicely done.

Eric

···

On 2012/12/04 12:07 PM, Damon McDougall wrote:

On Mon, Dec 3, 2012 at 12:12 PM, Chris Barker - NOAA Federal > <chris.barker@...236...> wrote:

I think this has been a very helpful and useful discussion.

I'm going to attempt to summarize this discussion and propose some ways forward here.

The impetus for this discussion is that PyCXX seems to be not adequately maintained. It is difficult to build matplotlib with "vanilla" PyCXX in certain configurations. (This history sort of predates this thread).

So we have some options:

1) One way forward is to offer to take ownership of the PyCXX project. (I'm not using the "f" word here... I'd much prefer to just become more involved upstream somehow). I don't think this would be considerable additional work, as most of that work has been done in matplotlib for some time anyway. To the extent that it needs new features, it would be killer to add support for Numpy so Numpy no longer required manual reference counting. I had initially dismissed this approach, as I seem to be in the minority in liking PyCXX -- I happen to think it's fundamentally an extremely good approach to the problem: it helps with reference counting errors, but otherwise mostly stays out of the way. But I'd like to remove any one person as a bottleneck by choosing something that's more preferred all around.

2) Move to a different wrapping mechanism of some sort. While Cython is the clear choice for a third-party Python/C wrapping tool, it seems to be polarizing. (I won't attempt to repeat or summarize, but I think good points have been made on either side of the argument). I think it's ok to allow Cython to be used in matplotlib, given that we include both the Cython source and the generated C in the source repository such that matplotlib can be built without Cython installed. There are many other projects doing this that can provide best practices for us. I don't think, however, that we can or should require that all wrapping is done with Cython. I think we should allow raw Python/C API where it is most appropriate (and that is mainly in the case of wrapping third-party libraries, such as the png module and the macosx module which is already raw Python/C API). What I wouldn't want to see is the use of more than one wrapping tool, if only for reasons of proliferation of dependencies. (I count the Python/C API as "free" since it's always available anyway). I haven't seen in this discussion anyone really pushing for any of the alternatives (SWIG, Boost.Python, etc.) in any event.

Note also, the goal is to deal with the PyCXX "problem", not rewrite large chunks of our existing and well-tested C/C++ code base in Cython, unless someone sees a real clear benefit to doing that for a particular module and is highly motivated to do the work. This is primarily about refactoring the code so that the interface layer between Python and C is separated and then replaced with either Cython or raw Python/C API using the most appropriate tool for the job.

3) Any other options...?

Cheers,
Mike

···

On 12/04/2012 05:33 PM, Eric Firing wrote:

On 2012/12/04 12:07 PM, Damon McDougall wrote:

On Mon, Dec 3, 2012 at 12:12 PM, Chris Barker - NOAA Federal >> <chris.barker@...236...> wrote:

generated code is ugly and hard to maintain, it is not designed to be
human-readable, and we wouldn't get the advantages of bug-fixes
further development in Cython.

As far as I'm concerned, this is an argument against Cython.

Nonsense. It is an argument against the idea of maintaining the
generated code directly, rather than maintaining the cython source code
and regenerating the C code as needed. That idea never made any sense
in the first place. I doubt that anyone follows it. Chris already
pointed this out. Would you maintain the assembly code generated by
your C++ compiler? Do you consider the fact that this is unreadable and
unmaintainable a reason to avoid using that compiler, and instead to
code directly in assembly?

I've had to touch the C/C++/ObjC codebase. It was not automatically
generated by Cython and it's not that hard to read. There's almost
certainly a C/C++/ObjC expert around to help out. There's almost
certainly Cython experts to help out, too. There is almost certainly
*not* an expert in Cython-generated C code that is hard to read.

There doesn't need to be.

I vote raw Python/C API. Managing reference counters is not the
mundane task pythonistas make it out to be, in my opinion. If you know
ObjC, you've had to do your own reference counting. If you know C,
you've had to do your own memory management. If you know C++, you've
had to do your own new/delete (or destructor) management. I agree not
having to worry about reference counting is nice positive, but I don't
think it outweighs the negatives.

You have completely misrepresented the negatives.

It seems to me that Cython is a 'middle-man' tool, with the added
downside of hard-to-maintain under-code.

Please, if you don't use Cython yourself, and therefore don't know it
well, refrain from these sorts of criticisms. In normal cython use, one
*never* modifies the code it generates. In developing with cython, one
*might* read this code to find out what is going on, and especially to
find out whether one inadvertently triggered a call to the python API by
forgetting to declare a variable, for example. This is pretty easy,
because the comments in the generated code show exactly which source
line has generated each chunk of generated code. Context is included.
It is very nicely done.

Eric

------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
matplotlib-devel List Signup and Options

I think this has been a very helpful and useful discussion.

I'm going to attempt to summarize this discussion and propose some ways forward here.

The impetus for this discussion is that PyCXX seems to be not adequately maintained. It is difficult to build matplotlib with "vanilla" PyCXX in certain configurations. (This history sort of predates this thread).

So we have some options:

1) One way forward is to offer to take ownership of the PyCXX project. (I'm not using the "f" word here... I'd much prefer to just become more involved upstream somehow). I don't think this would be considerable additional work, as most of that work has been done in matplotlib for some time anyway. To the extent that it needs new features, it would be killer to add support for Numpy so Numpy no longer required manual reference counting. I had initially dismissed this approach, as I seem to be in the minority in liking PyCXX -- I happen to think it's fundamentally an extremely good approach to the problem: it helps with reference counting errors, but otherwise mostly stays out of the way. But I'd like to remove any one person as a bottleneck by choosing something that's more preferred all around.

2) Move to a different wrapping mechanism of some sort. While Cython is the clear choice for a third-party Python/C wrapping tool, it seems to be polarizing. (I won't attempt to repeat or summarize, but I think good points have been made on either side of the argument). I think it's ok to allow Cython to be used in matplotlib, given that we include both the Cython source and the generated C in the source repository such that matplotlib can be built without Cython installed. There are many other projects doing this that can provide best practices for us. I don't think, however, that we can or should require that all wrapping is done with Cython. I think we should allow raw Python/C API where it is most appropriate (and that is mainly in the case of wrapping third-party libraries, such as the png module and the macosx module which is already raw Python/C API). What I wouldn't want to see is the use of more than one wrapping tool, if only for reasons of proliferation of dependencies. (I count the Python/C API as "free" since it's always available anyway). I haven't seen in this discussion anyone really pushing for any of the alternatives (SWIG, Boost.Python, etc.) in any event.

Note also, the goal is to deal with the PyCXX "problem", not rewrite large chunks of our existing and well-tested C/C++ code base in Cython, unless someone sees a real clear benefit to doing that for a particular module and is highly motivated to do the work. This is primarily about refactoring the code so that the interface layer between Python and C is separated and then replaced with either Cython or raw Python/C API using the most appropriate tool for the job.

3) Any other options...?

Cheers,
Mike

···

On 12/04/2012 05:33 PM, Eric Firing wrote:

On 2012/12/04 12:07 PM, Damon McDougall wrote:

On Mon, Dec 3, 2012 at 12:12 PM, Chris Barker - NOAA Federal >> <chris.barker@...236...> wrote:

generated code is ugly and hard to maintain, it is not designed to be
human-readable, and we wouldn't get the advantages of bug-fixes
further development in Cython.

As far as I'm concerned, this is an argument against Cython.

Nonsense. It is an argument against the idea of maintaining the
generated code directly, rather than maintaining the cython source code
and regenerating the C code as needed. That idea never made any sense
in the first place. I doubt that anyone follows it. Chris already
pointed this out. Would you maintain the assembly code generated by
your C++ compiler? Do you consider the fact that this is unreadable and
unmaintainable a reason to avoid using that compiler, and instead to
code directly in assembly?

I've had to touch the C/C++/ObjC codebase. It was not automatically
generated by Cython and it's not that hard to read. There's almost
certainly a C/C++/ObjC expert around to help out. There's almost
certainly Cython experts to help out, too. There is almost certainly
*not* an expert in Cython-generated C code that is hard to read.

There doesn't need to be.

I vote raw Python/C API. Managing reference counters is not the
mundane task pythonistas make it out to be, in my opinion. If you know
ObjC, you've had to do your own reference counting. If you know C,
you've had to do your own memory management. If you know C++, you've
had to do your own new/delete (or destructor) management. I agree not
having to worry about reference counting is nice positive, but I don't
think it outweighs the negatives.

You have completely misrepresented the negatives.

It seems to me that Cython is a 'middle-man' tool, with the added
downside of hard-to-maintain under-code.

Please, if you don't use Cython yourself, and therefore don't know it
well, refrain from these sorts of criticisms. In normal cython use, one
*never* modifies the code it generates. In developing with cython, one
*might* read this code to find out what is going on, and especially to
find out whether one inadvertently triggered a call to the python API by
forgetting to declare a variable, for example. This is pretty easy,
because the comments in the generated code show exactly which source
line has generated each chunk of generated code. Context is included.
It is very nicely done.

Eric

------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
matplotlib-devel List Signup and Options

Just for completeness, there is also ctypes. I wrapped the freetype library (Google Code Archive - Long-term storage for Google Code Project Hosting.) using it and it is quite easy (and boring). But this only works for C (not C++).

Nicolas

···

On Dec 6, 2012, at 18:06 , Michael Droettboom wrote:

I think this has been a very helpful and useful discussion.

I'm going to attempt to summarize this discussion and propose some ways
forward here.

The impetus for this discussion is that PyCXX seems to be not adequately
maintained. It is difficult to build matplotlib with "vanilla" PyCXX in
certain configurations. (This history sort of predates this thread).

So we have some options:

1) One way forward is to offer to take ownership of the PyCXX project.
(I'm not using the "f" word here... I'd much prefer to just become more
involved upstream somehow). I don't think this would be considerable
additional work, as most of that work has been done in matplotlib for
some time anyway. To the extent that it needs new features, it would be
killer to add support for Numpy so Numpy no longer required manual
reference counting. I had initially dismissed this approach, as I seem
to be in the minority in liking PyCXX -- I happen to think it's
fundamentally an extremely good approach to the problem: it helps with
reference counting errors, but otherwise mostly stays out of the way.
But I'd like to remove any one person as a bottleneck by choosing
something that's more preferred all around.

2) Move to a different wrapping mechanism of some sort. While Cython is
the clear choice for a third-party Python/C wrapping tool, it seems to
be polarizing. (I won't attempt to repeat or summarize, but I think
good points have been made on either side of the argument). I think
it's ok to allow Cython to be used in matplotlib, given that we include
both the Cython source and the generated C in the source repository such
that matplotlib can be built without Cython installed. There are many
other projects doing this that can provide best practices for us. I
don't think, however, that we can or should require that all wrapping is
done with Cython. I think we should allow raw Python/C API where it is
most appropriate (and that is mainly in the case of wrapping third-party
libraries, such as the png module and the macosx module which is already
raw Python/C API). What I wouldn't want to see is the use of more than
one wrapping tool, if only for reasons of proliferation of
dependencies. (I count the Python/C API as "free" since it's always
available anyway). I haven't seen in this discussion anyone really
pushing for any of the alternatives (SWIG, Boost.Python, etc.) in any event.

Note also, the goal is to deal with the PyCXX "problem", not rewrite
large chunks of our existing and well-tested C/C++ code base in Cython,
unless someone sees a real clear benefit to doing that for a particular
module and is highly motivated to do the work. This is primarily about
refactoring the code so that the interface layer between Python and C is
separated and then replaced with either Cython or raw Python/C API using
the most appropriate tool for the job.

3) Any other options...?

Mike,

That is an excellent summary. The options actually are not mutually exclusive; perhaps what we are considering is more in the line of nudging evolution in one direction or the other, not pushing for rapid extinction of a species.

Regarding PyCXX, I respect your opinion that it is a good match for what it does. To the limited extent that I can work with C++ at all, I don't have any problem with its use in mpl. I do share the concern about depending heavily on it, given that problems with it have cropped up, and you have been the only one willing and able to deal with those problems. Since PyCXX is a pure C++ construct, perhaps other C++ gurus--and it seems that we now have more than previously--would be willing to take a closer look at it, and reconsider whether they can relieve the single-person-bottleneck problem.

There is always a tradeoff between going to a higher-level language or library versus sticking with lower levels. Personally, I like C over C++ because the former is simple enough that I can generally figure out what is going on; but I like Cython over raw Python/C API because its internal complexity allows an external simplicity, hiding all sorts of things I really don't want to have to think about. Going to higher levels always brings the risk of dependency on a complex system, whether it be Cython or PyCXX or Agg, or even the C/C++ compiler itself. The cost/benefit ratios of such tradeoffs vary greatly with the situation, and from person to person, depending on training, experience, and personal quirks. So we just have to keep looking for the balance that is appropriate to the task, the times, and the people at hand. Your summary nicely facilitates that balancing act.

Eric

···

On 2012/12/06 7:16 AM, Michael Droettboom wrote:

I think this has been a very helpful and useful discussion.

I'm going to attempt to summarize this discussion and propose some ways
forward here.

The impetus for this discussion is that PyCXX seems to be not adequately
maintained. It is difficult to build matplotlib with "vanilla" PyCXX in
certain configurations. (This history sort of predates this thread).

So we have some options:

1) One way forward is to offer to take ownership of the PyCXX project.
(I'm not using the "f" word here... I'd much prefer to just become more
involved upstream somehow). I don't think this would be considerable
additional work, as most of that work has been done in matplotlib for
some time anyway. To the extent that it needs new features, it would be
killer to add support for Numpy so Numpy no longer required manual
reference counting. I had initially dismissed this approach, as I seem
to be in the minority in liking PyCXX -- I happen to think it's
fundamentally an extremely good approach to the problem: it helps with
reference counting errors, but otherwise mostly stays out of the way.
But I'd like to remove any one person as a bottleneck by choosing
something that's more preferred all around.

2) Move to a different wrapping mechanism of some sort. While Cython is
the clear choice for a third-party Python/C wrapping tool, it seems to
be polarizing. (I won't attempt to repeat or summarize, but I think
good points have been made on either side of the argument). I think
it's ok to allow Cython to be used in matplotlib, given that we include
both the Cython source and the generated C in the source repository such
that matplotlib can be built without Cython installed. There are many
other projects doing this that can provide best practices for us. I
don't think, however, that we can or should require that all wrapping is
done with Cython. I think we should allow raw Python/C API where it is
most appropriate (and that is mainly in the case of wrapping third-party
libraries, such as the png module and the macosx module which is already
raw Python/C API). What I wouldn't want to see is the use of more than
one wrapping tool, if only for reasons of proliferation of dependencies.
   (I count the Python/C API as "free" since it's always available
anyway). I haven't seen in this discussion anyone really pushing for
any of the alternatives (SWIG, Boost.Python, etc.) in any event.

Note also, the goal is to deal with the PyCXX "problem", not rewrite
large chunks of our existing and well-tested C/C++ code base in Cython,
unless someone sees a real clear benefit to doing that for a particular
module and is highly motivated to do the work. This is primarily about
refactoring the code so that the interface layer between Python and C is
separated and then replaced with either Cython or raw Python/C API using
the most appropriate tool for the job.

3) Any other options...?

Cheers,
Mike

Yes, an excellent summary and neatly bringing us back to the crux of the matter.

For completeness I should say that I wouldn’t use SWIG. I used it about 5 years ago to wrap some C++ for Python and other languages. Initially it was very useful, but eventually the default mapping between C++ and Python types and the default handling of object lifetimes weren’t quite what I wanted and I found myself writing a lot of extra configuration code to coax it back in line. In the end I went back to the Python/C API. Perhaps its aim of targetting many different languages means it isn’t suited for our language of interest. It doesn’t support Numpy arrays anyway.

I would like to be able to recommend Boost.Python, but I have never used it. I have used some Boost modules and found all to be well-designed and actively maintained. However, it currently doesn’t support Numpy arrays (although it is an active area of work) and it appears that there are difficulties building it with anything other than the default build tool BJam leading to concerns over portability.

Although my preference, in the absence of PyCXX, for wrapping our larger C/C++ modules is to use the Python/C API rather than Cython, I have been persuaded that there is a place for Cython in matplotlib. The ability to improve the performance of small sections of Python code in a simple and localised manner seems very useful, and would be a good starting point for speeding up areas of code that a number of users are frustrated by. Given the number of people who would like to see it used in matplotlib, I think it is inevitable that we eventually will.

I hadn’t really considered the option of adding Numpy arrays to PyCXX. I’ve taken a quick look at the existing code and whilst I don’t think it is a trivial task it looks well worth investigating - the existing code is well organised and quite well documented. If two or more of us were prepared to make the Numpy additions to PyCXX and provide ongoing maintenance, we would address the deficiencies of the current solution and remove the single-person bottleneck. But I am not sure where this leaves us as I a now advocating use of Cython to some extent and hence we would have two wrapping tools. Should we reject Cython + improved PyCXX on these grounds and revert to Cython + Python/C API?

Ian