Experiments in removing/replacing PyCXX

Given the slow pace of development on PyCXX, I know it has been the desire of some here to remove our dependency on it.

I thought a helpful starting point to evaluate the alternatives would be to restructure one of our extensions to not use PyCXX anymore. I've taken the PNG extension, which is reasonably straightforward in that it doesn't define any custom types, but does have some low level C-wrapping challenges, and separated out the Python-specific parts from the libpng-specific parts. The Python-specific parts are now written using the "raw" Python C/API. The other part still uses C++ (not C) and does throw exceptions, but doesn't use classes or templates or anything else that can be difficult to wrap. All of this is on my "no_cxx" branch.

Now here's the challenge: can we do better than this using any of the available wrapping tools? Cython, SWIG, Boost.Python etc.? I've not had much luck with Cython for this kind of thing in the past, but I know it is popular. Perhaps someone with more Cython experience would want to take a crack at this and then we could have something concrete to compare...

Cheers,
Mike

Hi,

The Mac OS X backend is entirely written in C (with some Objective-C elements where necessary). AFAICT, this is the largest C/C++ code in matplotlib. This backend was written from scratch without using Cython, SWIG, or Boost.Python. From my experience, I would prefer to write such extensions in C directly rather than relying on Cython, SWIG, or Boost.Python, because those approaches would lead to another dependency (for developers at least), and requires developers to learn how to code in them. Which may not be very hard, but we may as well avoid that if possible.

I'd be happy to help out with the conversion of the other extensions from CXX to C. I would need some help though to use github appropriately.

Best,
-Michiel.

···

--- On Thu, 11/29/12, Michael Droettboom <mdroe@...31...> wrote:

From: Michael Droettboom <mdroe@...31...>
Subject: [matplotlib-devel] Experiments in removing/replacing PyCXX
To: "matplotlib-devel@lists.sourceforge.net" <matplotlib-devel@...1114...ceforge.net>
Date: Thursday, November 29, 2012, 11:59 AM
Given the slow pace of development on
PyCXX, I know it has been the
desire of some here to remove our dependency on it.

I thought a helpful starting point to evaluate the
alternatives would be
to restructure one of our extensions to not use PyCXX
anymore. I've
taken the PNG extension, which is reasonably straightforward
in that it
doesn't define any custom types, but does have some low
level C-wrapping
challenges, and separated out the Python-specific parts from
the
libpng-specific parts. The Python-specific parts are
now written using
the "raw" Python C/API. The other part still uses C++ (not
C) and does
throw exceptions, but doesn't use classes or templates or
anything else
that can be difficult to wrap. All of this is on my
"no_cxx" branch.

Now here's the challenge: can we do better than this using
any of the
available wrapping tools? Cython, SWIG, Boost.Python
etc.? I've not had
much luck with Cython for this kind of thing in the past,
but I know it
is popular. Perhaps someone with more Cython
experience would want to
take a crack at this and then we could have something
concrete to compare...

Cheers,
Mike

------------------------------------------------------------------------------
Keep yourself connected to Go Parallel:
VERIFY Test and improve your parallel project with help from
experts
and peers. http://goparallel.sourceforge.net
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
matplotlib-devel List Signup and Options

Thanks, Michiel.

If you read between the lines of what I was saying, that is basically where I fall as well. There seems to be a lot of desire to use Cython to make the code more accessible, however, and I'm willing to consider it if it can be shown to be superior to the raw C/API for this task -- I'm not sure it is -- I always seem to end up with things that are more lines of code with more obscure workarounds than just coding in C directly.

Cheers,
Mike

···

On 11/29/2012 08:47 PM, Michiel de Hoon wrote:

Hi,

The Mac OS X backend is entirely written in C (with some Objective-C elements where necessary). AFAICT, this is the largest C/C++ code in matplotlib. This backend was written from scratch without using Cython, SWIG, or Boost.Python. From my experience, I would prefer to write such extensions in C directly rather than relying on Cython, SWIG, or Boost.Python, because those approaches would lead to another dependency (for developers at least), and requires developers to learn how to code in them. Which may not be very hard, but we may as well avoid that if possible.

I'd be happy to help out with the conversion of the other extensions from CXX to C. I would need some help though to use github appropriately.

Best,
-Michiel.

--- On Thu, 11/29/12, Michael Droettboom <mdroe@...31...> wrote:

From: Michael Droettboom <mdroe@...31...>
Subject: [matplotlib-devel] Experiments in removing/replacing PyCXX
To: "matplotlib-devel@lists.sourceforge.net" <matplotlib-devel@lists.sourceforge.net>
Date: Thursday, November 29, 2012, 11:59 AM
Given the slow pace of development on
PyCXX, I know it has been the
desire of some here to remove our dependency on it.

I thought a helpful starting point to evaluate the
alternatives would be
to restructure one of our extensions to not use PyCXX
anymore. I've
taken the PNG extension, which is reasonably straightforward
in that it
doesn't define any custom types, but does have some low
level C-wrapping
challenges, and separated out the Python-specific parts from
the
libpng-specific parts. The Python-specific parts are
now written using
the "raw" Python C/API. The other part still uses C++ (not
C) and does
throw exceptions, but doesn't use classes or templates or
anything else
that can be difficult to wrap. All of this is on my
"no_cxx" branch.

Now here's the challenge: can we do better than this using
any of the
available wrapping tools? Cython, SWIG, Boost.Python
etc.? I've not had
much luck with Cython for this kind of thing in the past,
but I know it
is popular. Perhaps someone with more Cython
experience would want to
take a crack at this and then we could have something
concrete to compare...

Cheers,
Mike

------------------------------------------------------------------------------
Keep yourself connected to Go Parallel:
VERIFY Test and improve your parallel project with help from
experts
and peers. http://goparallel.sourceforge.net
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
matplotlib-devel List Signup and Options

I'm curious about what problems you've run into and how long it was. In the past, Cython hasn't supported C++ very well, but the situation has greatly improved recently. See Using C++ in Cython — Cython 3.1.0a0 documentation for some details.

Thanks,

Jason

···

On 11/29/12 10:59 AM, Michael Droettboom wrote:

I've not had
much luck with Cython for this kind of thing in the past, but I know it
is popular.

If you read between the lines of what I was saying, that is basically
where I fall as well. There seems to be a lot of desire to use Cython
to make the code more accessible,

I'll add a beat to that drum -- I'm a big Cython fan.

however, and I'm willing to consider
it if it can be shown to be superior to the raw C/API for this task --

I think there is NO QUESTION that Cython is superior to the C/API --
why would you want to deal with the reference counting, etc yourself?
Cython can handle the boiler plate code for you very cleanly an
elegantly.

Something to keep in mind about Cython:

It can be used in multiple ways:

1) Add static typing to what is essentially Python code to get better
performance -- this may be what you mean by the "more accesible" part.
A great use, but maybe, maybe, maybe not best for the core bits of
MPL.

2) Calling C/C++ code -- Cython is s great way to call C/C++ code --
it can handle the packing and unpacking of python types, reference
counting, etc. for you, so much like using the C API, but a lot less
tricky boiler plate code to write.

(2) is the use case that I'm arguing is NO QUESTION a better option
than the C API.

Compared to SWIG, SIP (and I assume C_XX), the downside is that there
is no auto-generation of wrappers (at least nothing mature). However,
for the MPL case, we're not trying to wrap a large existing library,
but rather particular code that is often written for MPL specifically,
so hand-writing the Cython is a fine option.

So why not Ctypes, or??? I think the real strength of Cython in
wrapping C code is that you can write a "thick" wrapper in an
almost_python language. So if you want to vectorize a C function, for
instance, you can write that bit in Cython very easily (and Cython's
built-in understanding of numpy array is very helpful here). When you
use ctypes, you need to write that in pure Python -- easy enough, but
probably not very performant.

With SWIG, etc, you end up writing a fair bi tof C (or SWIG) code to
handle the thicker bits of the wrapper -- so you're dealing with the
raw CPython API, and , well, C. Cython really is an easier option.

I've found that for stuf that is less than very small (i.e. one or two
loops through an array), writing the core code in native C or C++ can
be easier, you know for sure you're not accidentally making expensive
Python calls, etc. but using Cython to call it is still very helpful.

I'm not sure it is -- I always seem to end up with things that are more
lines of code with more obscure workarounds than just coding in C directly.

Exactly -- but I don't think that applies to the CPython-API bits, but
rather the core code -- so keep that in C.

In summary, I guess what I think is the power of Cython is the
flexibility in where you draw the line between Python, Cython, and C
-- you can pass pure Python through Cython, or you can do almost
nothing with it but call a C function, and eveything in between.

From my experience, I would prefer to write such extensions in C directly rather
than relying on Cython, SWIG, or Boost.Python, because those approaches would
lead to another dependency (for developers at least),

The dependency is pretty easy to deal with compared to the many others in MPL.

and requires developers to
learn how to code in them. Which may not be very hard, but we may as well avoid > that if possible.

Here's where I disagree -- if we go pure C and C-API developers need
to know the Python C-API -- that is actually a pretty big deal, and
hard to get right. Knowing enough Cython to call some C code is a
smaller lift for sure.

Anyway, I saw give it a shot -- I suspect you'll like it.

-Chris

···

On Fri, Nov 30, 2012 at 6:06 AM, Michael Droettboom <mdroe@...31...> wrote:

--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@...236...

One package (Pysam) that I use a lot relies on Cython, and requires users to install Cython before they can install Pysam itself. With Cython, is that always the case? Will all users need to install Cython? Or is it sufficient if only matplotlib developers install Cython?

Best,
-Michiel.

···

--- On Fri, 11/30/12, Chris Barker - NOAA Federal <chris.barker@...236...> wrote:

From: Chris Barker - NOAA Federal <chris.barker@...236...>
Subject: Re: [matplotlib-devel] Experiments in removing/replacing PyCXX
To: "Michael Droettboom" <mdroe@...31...>
Cc: "Michiel de Hoon" <mjldehoon@...42...>, "matplotlib-devel@...958...eforge.net" <matplotlib-devel@lists.sourceforge.net>
Date: Friday, November 30, 2012, 12:32 PM
On Fri, Nov 30, 2012 at 6:06 AM, > Michael Droettboom <mdroe@...31...> > wrote:

> If you read between the lines of what I was saying,
that is basically
> where I fall as well. There seems to be a lot of
desire to use Cython
> to make the code more accessible,

I'll add a beat to that drum -- I'm a big Cython fan.

> however, and I'm willing to consider
> it if it can be shown to be superior to the raw C/API
for this task --

I think there is NO QUESTION that Cython is superior to the
C/API --
why would you want to deal with the reference counting, etc
yourself?
Cython can handle the boiler plate code for you very cleanly
an
elegantly.

Something to keep in mind about Cython:

It can be used in multiple ways:

1) Add static typing to what is essentially Python code to
get better
performance -- this may be what you mean by the "more
accesible" part.
A great use, but maybe, maybe, maybe not best for the core
bits of
MPL.

2) Calling C/C++ code -- Cython is s great way to call C/C++
code --
it can handle the packing and unpacking of python types,
reference
counting, etc. for you, so much like using the C API, but a
lot less
tricky boiler plate code to write.

(2) is the use case that I'm arguing is NO QUESTION a better
option
than the C API.

Compared to SWIG, SIP (and I assume C_XX), the downside is
that there
is no auto-generation of wrappers (at least nothing mature).
However,
for the MPL case, we're not trying to wrap a large existing
library,
but rather particular code that is often written for MPL
specifically,
so hand-writing the Cython is a fine option.

So why not Ctypes, or??? I think the real strength of Cython
in
wrapping C code is that you can write a "thick" wrapper in
an
almost_python language. So if you want to vectorize a C
function, for
instance, you can write that bit in Cython very easily (and
Cython's
built-in understanding of numpy array is very helpful here).
When you
use ctypes, you need to write that in pure Python -- easy
enough, but
probably not very performant.

With SWIG, etc, you end up writing a fair bi tof C (or SWIG)
code to
handle the thicker bits of the wrapper -- so you're dealing
with the
raw CPython API, and , well, C. Cython really is an easier
option.

I've found that for stuf that is less than very small (i.e.
one or two
loops through an array), writing the core code in native C
or C++ can
be easier, you know for sure you're not accidentally making
expensive
Python calls, etc. but using Cython to call it is still very
helpful.

> I'm not sure it is -- I always seem to end up with
things that are more
> lines of code with more obscure workarounds than just
coding in C directly.

Exactly -- but I don't think that applies to the CPython-API
bits, but
rather the core code -- so keep that in C.

In summary, I guess what I think is the power of Cython is
the
flexibility in where you draw the line between Python,
Cython, and C
-- you can pass pure Python through Cython, or you can do
almost
nothing with it but call a C function, and eveything in
between.

> From my experience, I would prefer to write such
extensions in C directly rather
> than relying on Cython, SWIG, or Boost.Python, because
those approaches would
> lead to another dependency (for developers at least),

The dependency is pretty easy to deal with compared to the
many others in MPL.

> and requires developers to
> learn how to code in them. Which may not be very hard,
but we may as well avoid > that if possible.

Here's where I disagree -- if we go pure C and C-API
developers need
to know the Python C-API -- that is actually a pretty big
deal, and
hard to get right. Knowing enough Cython to call some C code
is a
smaller lift for sure.

Anyway, I saw give it a shot -- I suspect you'll like it.

-Chris

--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R
(206) 526-6959 voice
7600 Sand Point Way NE (206)
526-6329 fax
Seattle, WA 98115 (206)
526-6317 main reception

Chris.Barker@...236...

You can set things up so that end-users don't have to install cython.
You just convert the .pyx files to regular .c files before
distributing your package. Numpy itself uses cython, but end-users
don't notice or care. (It's something more of a hassle for developers
to do things this way, and cython is very easy to install, so I don't
know if it's worth it. But it's certainly possible.)

-n

···

On Fri, Nov 30, 2012 at 11:40 PM, Michiel de Hoon <mjldehoon@...42...> wrote:

One package (Pysam) that I use a lot relies on Cython, and requires users to install Cython before they can install Pysam itself. With Cython, is that always the case? Will all users need to install Cython? Or is it sufficient if only matplotlib developers install Cython?

Since when has numpy used Cython? I specifically remember a rather involved discussion thread on numpy-discussion about the pros-and-cons of including cython. Now, SciPy on the other hand, does utilize Cython in some spots IIRC, but does it in a way that it isn’t even required for the developers to have cython installed to build from source.

I would not be against such an approach. Much of the C/C++ stuff is rarely touched. If we have some source cython that is used to generate C/C++ source code that is packaged in the same way as the current code is, I would have no problem with that. Given that matplotlib is such a fundamental tool in the ecosystem, I want to make sure that the decisions we make are ones that improves our packaging situation.

Cheers!
Ben Root

···

On Fri, Nov 30, 2012 at 6:44 PM, Nathaniel Smith <njs@…503…> wrote:

On Fri, Nov 30, 2012 at 11:40 PM, Michiel de Hoon <mjldehoon@…42…> wrote:

One package (Pysam) that I use a lot relies on Cython, and requires users to install Cython before they can install Pysam itself. With Cython, is that always the case? Will all users need to install Cython? Or is it sufficient if only matplotlib developers install Cython?

You can set things up so that end-users don’t have to install cython.

You just convert the .pyx files to regular .c files before

distributing your package. Numpy itself uses cython, but end-users

don’t notice or care. (It’s something more of a hassle for developers

to do things this way, and cython is very easy to install, so I don’t

know if it’s worth it. But it’s certainly possible.)

> One package (Pysam) that I use a lot relies on Cython, and requires
> users to install Cython before they can install Pysam itself. With Cython,
> is that always the case? Will all users need to install Cython? Or is it
> sufficient if only matplotlib developers install Cython?

You can set things up so that end-users don't have to install cython.
You just convert the .pyx files to regular .c files before
distributing your package. Numpy itself uses cython, but end-users
don't notice or care. (It's something more of a hassle for developers
to do things this way, and cython is very easy to install, so I don't
know if it's worth it. But it's certainly possible.)

Since when has numpy used Cython? I specifically remember a rather involved
discussion thread on numpy-discussion about the pros-and-cons of including
cython. Now, SciPy on the other hand, does utilize Cython in some spots
IIRC, but does it in a way that it isn't even required for the developers to
have cython installed to build from source.

You just ship the c/c++ code for the developpers as well as for the
end users. This is what we do with scikit-learn. It requires the
developpers to make sure to compile the cython code, and commit both
files. It is also quite annoying for reviews to have the generated c++
code, so the cython code needs to be compile after the reviews.

The reason the scikit's developpers chose to use cython instead of
something else is to decrease the maintenance burden: more
contributors understand cython code than c/c++ code (or more
precisely, understand c++ code written by someone else). Hence, this
increases the bus number.

···

I would not be against such an approach. Much of the C/C++ stuff is rarely
touched. If we have some source cython that is used to generate C/C++
source code that is packaged in the same way as the current code is, I would
have no problem with that. Given that matplotlib is such a fundamental tool
in the ecosystem, I want to make sure that the decisions we make are ones
that improves our packaging situation.

Cheers!
Ben Root

------------------------------------------------------------------------------
Keep yourself connected to Go Parallel:
INSIGHTS What's next for parallel hardware, programming and related areas?
Interviews and blogs by thought leaders keep you ahead of the curve.
http://goparallel.sourceforge.net
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
matplotlib-devel List Signup and Options

If you should choose cython please don't follow scipy too closely.
Up until rather recent git head they did not ship the cython sources in
their source tarballs which occasionally lead to inconsistent generated
files (e.g. in 0.10.1 interpnd.pyx) and causes trouble for distributors
(see e.g. debian bug 589731)

A better example to follow would be e.g. pyzmq which ships both the
cython and generated sources and has an easy to use cython setup.py
target to recythonize.

···

On 12/01/2012 02:32 AM, Benjamin Root wrote:

Since when has numpy used Cython? I specifically remember a rather
involved discussion thread on numpy-discussion about the pros-and-cons
of including cython. Now, SciPy on the other hand, does utilize Cython
in some spots IIRC, but does it in a way that it isn't even required for
the developers to have cython installed to build from source.

In my experience, Benjamin is right that the C code is rarely touched. This is even more true for the Python/C glue code, at least from my experience with the Mac OS X backend. Since the Python/C glue code is modified only very rarely, there may not be a need for regenerating the Python/C glue code by developers or users from a Cython source code.
In addition, it is much easier to maintain the Python/C glue code than to write it from scratch. Once you have the Python/C glue code, it’s relatively straightforward to modify it by looking at the existing Python/C glue code.
This argues against making the Cython source code a part of the matplotlib codebase.
At the same time, to minimize errors, we could use Cython to create the initial Python/C glue code, and then add the generated code to the matplotlib codebase. Then neither users nor
developers have to install Cython, we don’t have to worry about inconsistencies (if any) between different Cython versions, we don’t have to worry about keeping the Cython source code and the generated code in sync, and we will still get a high-quality Cython-generated Python/C glue code.
By the way, how many modules in matplotlib make use of CXX, and would have to be converted?
Best,
-Michiel.

···

— On Fri, 11/30/12, Benjamin Root <ben.root@…553…> wrote:

From: Benjamin Root <ben.root@…553…>
Subject: Re: [matplotlib-devel] Experiments in removing/replacing PyCXX
To: “Nathaniel Smith” <njs@…503…>
Cc: “Michiel de Hoon” <mjldehoon@…552…42…>, “matplotlib-devel@lists.sourceforge.net” <matplotlib-devel@…898…sts.sourceforge.net>, “Chris Barker - NOAA Federal”
<chris.barker@…236…>
Date: Friday, November 30, 2012, 8:32 PM

On Fri, Nov 30, 2012 at 6:44 PM, Nathaniel Smith <njs@…503…> wrote:

On Fri, Nov 30, 2012 at 11:40 PM, Michiel de Hoon <mjldehoon@…42…> wrote:

One package (Pysam) that I use a lot relies on Cython, and requires users to install Cython before they can install Pysam itself. With Cython, is that always the case? Will all users need to install Cython? Or is it sufficient if only matplotlib developers install Cython?

You can set things up so that end-users don’t have to install cython.

You just convert the .pyx files to regular .c files before

distributing your package. Numpy itself uses cython, but end-users

don’t notice or care. (It’s something more of a hassle for developers

to do things this way, and cython is very easy to install, so I don’t

know if it’s worth it. But it’s certainly possible.)

Since when has numpy used Cython? I specifically remember a rather involved discussion thread on numpy-discussion about the pros-and-cons of including cython. Now, SciPy on the other hand, does utilize Cython in some spots IIRC, but does it in a way that it isn’t even required for the developers to have cython installed to build from source.

I would not be against such an approach. Much of the C/C++ stuff is rarely touched. If we have some source cython that is used to generate C/C++ source code that is packaged in the same way as the current code is, I would have no problem with that. Given that matplotlib is such a fundamental tool in the ecosystem, I want to make sure that the decisions we make are ones that improves our packaging situation.

Cheers!
Ben Root

I’m +1 on Cython. I think its prevalence in the community gives us a larger potential contributor pool than CXX or hand-coded python C-API. I know using Cython would open up that part of the code base for me.

Ryan

···

On Dec 1, 2012, at 8:44, Michiel de Hoon <mjldehoon@…42…> wrote:

In my experience, Benjamin is right that the C code is rarely touched. This is even more true for the Python/C glue code, at least from my experience with the Mac OS X backend. Since the Python/C glue code is modified only very rarely, there may not be a need for regenerating the Python/C glue code by developers or users from a Cython source code.
In addition, it is much easier to maintain the Python/C glue code than to write it from scratch. Once you have the Python/C glue code, it’s relatively straightforward to modify it by looking at the existing Python/C glue code.
This argues against making the Cython source code a part of the matplotlib codebase.
At the same time, to minimize errors, we could use Cython to create the initial Python/C glue code, and then add the generated code to the matplotlib codebase. Then neither users nor
developers have to install Cython, we don’t have to worry about inconsistencies (if any) between different Cython versions, we don’t have to worry about keeping the Cython source code and the generated code in sync, and we will still get a high-quality Cython-generated Python/C glue code.
By the way, how many modules in matplotlib make use of CXX, and would have to be converted?
Best,
-Michiel.
— On Fri, 11/30/12, Benjamin Root <ben.root@…553…> wrote:

From: Benjamin Root <ben.root@…553…>
Subject: Re: [matplotlib-devel] Experiments in removing/replacing PyCXX
To: “Nathaniel Smith” <njs@…503…>
Cc: “Michiel de Hoon” <mjldehoon@…42…>, “matplotlib-devel@lists.sourceforge.net” <matplotlib-devel@…712…et >, “Chris Barker - NOAA Federal”
<chris.barker@…236…>
Date: Friday, November 30, 2012, 8:32 PM

On Fri, Nov 30, 2012 at 6:44 PM, Nathaniel Smith <njs@…503…> wrote:

On Fri, Nov 30, 2012 at 11:40 PM, Michiel de Hoon <mjldehoon@…42…> wrote:

One package (Pysam) that I use a lot relies on Cython, and requires users to install Cython before they can install Pysam itself. With Cython, is that always the case? Will all users need to install Cython? Or is it sufficient if only matplotlib developers install Cython?

You can set things up so that end-users don’t have to install cython.

You just convert the .pyx files to regular .c files before

distributing your package. Numpy itself uses cython, but end-users

don’t notice or care. (It’s something more of a hassle for developers

to do things this way, and cython is very easy to install, so I don’t

know if it’s worth it. But it’s certainly possible.)

Since when has numpy used Cython? I specifically remember a rather involved discussion thread on numpy-discussion about the pros-and-cons of including cython. Now, SciPy on the other hand, does utilize Cython in some spots IIRC, but does it in a way that it isn’t even required for the developers to have cython installed to build from source.

I would not be against such an approach. Much of the C/C++ stuff is rarely touched. If we have some source cython that is used to generate C/C++ source code that is packaged in the same way as the current code is, I would have no problem with that. Given that matplotlib is such a fundamental tool in the ecosystem, I want to make sure that the decisions we make are ones that improves our packaging situation.

Cheers!
Ben Root


Keep yourself connected to Go Parallel:
INSIGHTS What’s next for parallel hardware, programming and related areas?
Interviews and blogs by thought leaders keep you ahead of the curve.
http://goparallel.sourceforge.net


Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

For the PNG extension specifically, it was creating callbacks that can be called from C and the setjmp magic that libpng requires. I think it's possible to do it, but I was surprised at how non-obvious those pieces of Cython were. I was really hoping by creating this experiment that a Cython expert would step up and show the way :wink:

The Agg backend has more C++-specific challenges, particularly instantiating very complex template expressions -- but I haven't really followed that on through.

Mike

···

On 11/30/2012 09:13 AM, Jason Grout wrote:

On 11/29/12 10:59 AM, Michael Droettboom wrote:

I've not had
much luck with Cython for this kind of thing in the past, but I know it
is popular.

I'm curious about what problems you've run into and how long it was. In
the past, Cython hasn't supported C++ very well, but the situation has
greatly improved recently. See
Using C++ in Cython — Cython 3.1.0a0 documentation for some
details.

Thanks,

Jason

------------------------------------------------------------------------------
Keep yourself connected to Go Parallel:
TUNE You got it built. Now make it sing. Tune shows you how.
http://goparallel.sourceforge.net
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
matplotlib-devel List Signup and Options

Including the Cython-generated C in the tarballs and optionally the git repository as well can certainly be considered to reduce the need for Cython for developers and users alike. However, the Cython source should also be included in the repository for the inevitable times when it does need to be updated -- it shouldn't be off somewhere else.

The png, path, ft2font, backend_agg, gtkagg, tkagg, tri, and image modules all use CXX. The backend_agg, image and ft2font ones are particularly complex, but some of that complexity could be reduced by using Numpy arrays in place of the image buffer types that each of them contain (that code predates matplotlib's numpy requirement, so it's not terribly surprising that a more complex approach was taken).

Mike

···

On 12/01/2012 09:44 AM, Michiel de Hoon wrote:

In my experience, Benjamin is right that the C code is rarely touched. This is even more true for the Python/C glue code, at least from my experience with the Mac OS X backend. Since the Python/C glue code is modified only very rarely, there may not be a need for regenerating the Python/C glue code by developers or users from a Cython source code.

In addition, it is much easier to maintain the Python/C glue code than to write it from scratch. Once you have the Python/C glue code, it's relatively straightforward to modify it by looking at the existing Python/C glue code.

This argues against making the Cython source code a part of the matplotlib codebase.

At the same time, to minimize errors, we could use Cython to create the initial Python/C glue code, and then add the generated code to the matplotlib codebase. Then neither users nor developers have to install Cython, we don't have to worry about inconsistencies (if any) between different Cython versions, we don't have to worry about keeping the Cython source code and the generated code in sync, and we will still get a high-quality Cython-generated Python/C glue code.

By the way, how many modules in matplotlib make use of CXX, and would have to be converted?

Best,
-Michiel.

--- On *Fri, 11/30/12, Benjamin Root /<ben.root@...553...>/* wrote:

    From: Benjamin Root <ben.root@...553...>
    Subject: Re: [matplotlib-devel] Experiments in removing/replacing
    PyCXX
    To: "Nathaniel Smith" <njs@...503...>
    Cc: "Michiel de Hoon" <mjldehoon@...42...>,
    "matplotlib-devel@lists.sourceforge.net"
    <matplotlib-devel@lists.sourceforge.net>, "Chris Barker - NOAA
    Federal" <chris.barker@...236...>
    Date: Friday, November 30, 2012, 8:32 PM

    On Fri, Nov 30, 2012 at 6:44 PM, Nathaniel Smith <njs@...503... > </mc/compose?to=njs@...503...>> wrote:

        On Fri, Nov 30, 2012 at 11:40 PM, Michiel de Hoon > <mjldehoon@...42... </mc/compose?to=mjldehoon@...42...>> wrote:
        > One package (Pysam) that I use a lot relies on Cython, and
        requires users to install Cython before they can install Pysam
        itself. With Cython, is that always the case? Will all users
        need to install Cython? Or is it sufficient if only matplotlib
        developers install Cython?

        You can set things up so that end-users don't have to install
        cython.
        You just convert the .pyx files to regular .c files before
        distributing your package. Numpy itself uses cython, but end-users
        don't notice or care. (It's something more of a hassle for
        developers
        to do things this way, and cython is very easy to install, so
        I don't
        know if it's worth it. But it's certainly possible.)

    Since when has numpy used Cython? I specifically remember a
    rather involved discussion thread on numpy-discussion about the
    pros-and-cons of including cython. Now, SciPy on the other hand,
    does utilize Cython in some spots IIRC, but does it in a way that
    it isn't even required for the developers to have cython installed
    to build from source.

    I would not be against such an approach. Much of the C/C++ stuff
    is rarely touched. If we have some source cython that is used to
    generate C/C++ source code that is packaged in the same way as the
    current code is, I would have no problem with that. Given that
    matplotlib is such a fundamental tool in the ecosystem, I want to
    make sure that the decisions we make are ones that improves our
    packaging situation.

    Cheers!
    Ben Root

------------------------------------------------------------------------------
Keep yourself connected to Go Parallel:
INSIGHTS What's next for parallel hardware, programming and related areas?
Interviews and blogs by thought leaders keep you ahead of the curve.
http://goparallel.sourceforge.net

_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
matplotlib-devel List Signup and Options

There are a lot more Cython experts on the Cython mailing list ;).

Thanks,

Jason

···

On 12/1/12 12:36 PM, Michael Droettboom wrote:

I was really hoping by creating this experiment
that a Cython expert would step up and show the way :wink:

For point of comparison, my branch now has a Cython and C++ version of the same thing.

Here's the Cython version:

https://github.com/mdboom/matplotlib/blob/no_cxx/src/_png.pyx

Here's the C++ version:

https://github.com/mdboom/matplotlib/blob/no_cxx/src/_png_wrap.cpp

Some interesting things to note:

The Cython version isn't that much shorter than the C++ version. It mostly consists of declarations. These declarations aren't exact matches to what one would find in the header file(s) because Cython doesn't support exact-width data types etc. The Cython documentation says "not to worry", but I do wonder how well this will work across different architectures etc. I'm not sure why some of the Python/C API calls I needed were not defined in Cython's include wrappers.

The Cython extension only builds with "-fpermissive" because I can't seem to get the casts and const coercions to work. Maybe there's a simple solution...

The exception handling in the png_core.cpp file will need to be updated because Cython only supports handling built-in C++ exceptions (and subclasses), and the Cython custom exception handler doesn't provide a way to get at the exception object that was thrown.

It seems that a lot of things pass through the Cython compiler, but then fail in the C compiler -- you then have to wade through the generated C code to figure out what's going wrong. This reminds me of the bad old days of C++ when the error messages generated would be dozens of lines long and rather inscrutable.

Once things compiled, due to my own mistake, calling the function segfaulted. Debugging that segfault in gdb required, again, wading through the generated code. Using gdb on hand-written code is *much* nicer.

So, it's probably clear that I'm not much of a fan of this approach, but I am trying to find something that the whole community around matplotlib finds easier and more accessible so that the C/C++ experts among use are not exclusively burdened to maintain this part of the code base. I would be interested to see what others think now that we have an apples-to-apples comparison.

Cheers,
Mike

···

On 12/01/2012 01:36 PM, Michael Droettboom wrote:

For the PNG extension specifically, it was creating callbacks that can
be called from C and the setjmp magic that libpng requires. I think
it's possible to do it, but I was surprised at how non-obvious those
pieces of Cython were. I was really hoping by creating this experiment
that a Cython expert would step up and show the way :wink:

The Agg backend has more C++-specific challenges, particularly
instantiating very complex template expressions -- but I haven't really
followed that on through.

Mike

On 11/30/2012 09:13 AM, Jason Grout wrote:

On 11/29/12 10:59 AM, Michael Droettboom wrote:

I've not had
much luck with Cython for this kind of thing in the past, but I know it
is popular.

I'm curious about what problems you've run into and how long it was. In
the past, Cython hasn't supported C++ very well, but the situation has
greatly improved recently. See
Using C++ in Cython — Cython 3.1.0a0 documentation for some
details.

Thanks,

Jason

------------------------------------------------------------------------------
Keep yourself connected to Go Parallel:
TUNE You got it built. Now make it sing. Tune shows you how.
http://goparallel.sourceforge.net
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
matplotlib-devel List Signup and Options

------------------------------------------------------------------------------
Keep yourself connected to Go Parallel:
INSIGHTS What's next for parallel hardware, programming and related areas?
Interviews and blogs by thought leaders keep you ahead of the curve.
http://goparallel.sourceforge.net
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
matplotlib-devel List Signup and Options

I vote for using the raw Python/C API. I’ve written a couple of PyCXX extensions and whilst it is mostly convenient, PyCXX doesn’t support the use of numpy arrays so for them you have to use the Python/C API. This means dealing with the reference counting yourself for numpy arrays; extending this to do the reference counting for all python objects is not onerous. Dealing with object lifetimes is bread-and-butter work for C/C++ developers.

I have never used Cython, but to me the code looks like an inelegant combination of Python, C/C++ and some Cython-specific stuff. I can see the advantage of this approach for small sections of code, but I have strong reservations about using it for complicated modules that have extensive use of templated code and/or Standard Template Library collections (mpl has examples of both of these).

I agree that Cython opens us up to a larger body of contributors, but I don’t think that this is necessarily a good thing. I think this really means opens us up to a larger body of Python/Cython contributors, and is a view expressed from the Python side of the fence and has the wrong emphasis. I am primarily a C++ developer is a sea of Python developers, and rather than encourage other Python contributors to dip their toes into C/C++ via Cython I think we should be encouraging C/C++ contributors to do what they do best. We only need a few C/C++ developers if we allow them to use their skills in their preferred way, and they are used to interfacing to legacy APIs and dealing with object lifetimes.

OK, cards on the table. If we wanted to switch all of our PyCXX modules to use the raw Python/C API, I would happily take on some of the burden for making the changes and ongoing maintenance of such modules. Particularly if, in return, I get some help with my sometimes substandard Python! If we go down the Cython route I couldn’t make this offer; would our many Cython advocates take on the responsibility of changing and maintaining my C++ code in this scenario?

Ian Thomas

That matches my experience quite well.
Even for C libraries like libpng, which requires use of C function
callbacks for some things, Cython is more convoluted, particularly
when things go wrong and require debugging. (Running gdb over
generated Cython code is not fun!) And in my view, writing code
like that requires a pretty deep understanding of the Python/C API,
C itself, and the rather complex transformations that Cython
performs. Writing directly to the Python/C API only requires
knowledge of the first two. And there’s a large body of
books/tutorials/debuggers/tools for C that don’t really have
equivalents for Cython.
I think Cython is well suited to writing new algorithmic code to
speed up hot spots in Python code. I don’t think it’s as well
suited as glue between C and Python – that was not a main goal of
the original Pyrex project, IIRC. It feels kind of tacked on and
not a very good fit to the problem. Most of the work to remove
PyCXX use in matplotlib is either wrapping third-party libraries
(where Cython doesn’t really shine), or wrapping C/C++ code in our
own tree that’s already well-tested and vetted, and I wouldn’t
propose rewriting that in Cython. I’m only really considering
rewriting the Python-to-C interface layer.
That’s a good way to look at this. I was definitely hoping that
moving to Cython might open us up to more developers, but at the end
of the day, the chosen tool should be the one preferred by those
doing the work. Maybe rather than asking “if we switched to using
Cython, would more participate”, I should be asking “among those
that can participate in removing the PyCXX dependency, what is the
preferred approach?”
Cheers,
Mike

···

On 12/03/2012 04:07 AM, Ian Thomas
wrote:

  I vote for using the raw Python/C API.  I've written a couple of

PyCXX extensions and whilst it is mostly convenient, PyCXX doesn’t
support the use of numpy arrays so for them you have to use the
Python/C API. This means dealing with the reference counting
yourself for numpy arrays; extending this to do the reference
counting for all python objects is not onerous. Dealing with
object lifetimes is bread-and-butter work for C/C++ developers.

  I have never used Cython, but to me the code looks

like an inelegant combination of Python, C/C++ and some
Cython-specific stuff. I can see the advantage of this approach
for small sections of code, but I have strong reservations about
using it for complicated modules that have extensive use of
templated code and/or Standard Template Library collections (mpl
has examples of both of these).

  I agree that Cython opens us up to a larger body of contributors,

but I don’t think that this is necessarily a good thing. I think
this really means opens us up to a larger body of Python/Cython
contributors, and is a view expressed from the Python side of the
fence and has the wrong emphasis. I am primarily a C++ developer
is a sea of Python developers, and rather than encourage other
Python contributors to dip their toes into C/C++ via Cython I
think we should be encouraging C/C++ contributors to do what they
do best. We only need a few C/C++ developers if we allow them to
use their skills in their preferred way, and they are used to
interfacing to legacy APIs and dealing with object lifetimes.

  OK, cards on the table.  If we wanted to switch all of our PyCXX

modules to use the raw Python/C API, I would happily take on some
of the burden for making the changes and ongoing maintenance of
such modules. Particularly if, in return, I get some help with my
sometimes substandard Python! If we go down the Cython route I
couldn’t make this offer; would our many Cython advocates take on
the responsibility of changing and maintaining my C++ code in this
scenario?

On Sat, Dec 1, 2012 at 6:44 AM, Michiel de Hoon

Since the Python/C glue code is modified only very rarely, there may not be a need for regenerating the Python/C glue code by developers or users from a Cython source code.

True.

In addition, it is much easier to maintain the Python/C glue code than to write it from scratch. Once you have the Python/C glue code, it's relatively straightforward to modify it by looking at the existing Python/C glue code.

not so true -- getting reference counting right, etc is difficult -- I
suppose once the glue code is robust, and all you are changing is a
bit of API to the C, maybe....

This argues against making the Cython source code a part of the matplotlib codebase.

huh? are you suggesting that we use Cython to generate the glue, then
hand-maintain that glue? I think that is a really, rally bad idea --
generated code is ugly and hard to maintain, it is not designed to be
human-readable, and we wouldn't get the advantages of bug-fixes
further development in Cython.

So -- if you use Cython, you want to keep using, and theat means the
Cython source IS the source. I agree that it's a good idea to ship the
generated code as well, so that no one that is not touching the Cython
has to generate. Other than the slight mess from generated files
showing up in diffs, etc, this really works just fine.

Any reason MPL couldn't continue with EXACTLY the same approach now
used with C_XX -- it generates code as well, yes?

Michael Droettboom wrote:

For the PNG extension specifically, it was creating callbacks that can
be called from C and the setjmp magic that libpng requires. I think
it's possible to do it, but I was surprised at how non-obvious those
pieces of Cython were. I was really hoping by creating this experiment
that a Cython expert would step up and show the way :wink:

Did you not get the support you expected from the cython list? Anyway,
there's no reason you can't keep stuff in C that's easier in C (or did
C_XX make this easy?). I think making basic callbacks is actually
pretty straightforward, but In don't know about the setjmp magic (I
have no idea hat that means!).

The Agg backend has more C++-specific challenges, particularly
instantiating very complex template expressions --

I'm guessing you'd do the complex template stuff in C++ -- and let
Cython see a more traditional static API.

but some of that complexity could be reduced by using Numpy arrays in place of the
image buffer types that each of them contain

OR Cython arrays and/or memoryviews -- this is indeed a real strength of Cython.

The Cython version isn't that much shorter than the C++ version.

I think some things make sense to keep in C++, though I do see a fair
bit of calls (in the C++) to the python API -- I'm surprised there
isn't much code advantage, but anyway, the goal is more robust/easier
to maintain, which may correlate with code-size, but not completely.

These declarations aren't exact matches to what one would find in the header file(s) >because Cython doesn't support exact-width data types etc.

It does support the C99 fixed-width integer types:

from libc.stdint cimport int16_t, int32_t,

Or are you talking about something else?

I'm not sure why some of the Python/C API calls I needed were not defined in Cython's include wrappers.

I suspect that's an oversight -- for the most part, stuff has been
added as it's needed.

One other note -- from a quick glance at your Cython code, it looks
like you did almost everything is Cython-that-will-compile-to-pure-C
-- i.e. a lot of calls to the CPython API. But the whole point of
Cython is that it makes those calls for you. So you can do type
checking, and switching on types, and calling np.asarray(), etc, etc,
etc, in Python, without calling the CPython api yourself. I know
nothing of the PNG API, and am pretty week on the CPython API (and C
for that matter), but I it's likely that the Cython code you've
written could be much simplified.

Once things compiled, due to my own mistake, calling the function segfaulted. Debugging
that segfault in gdb required, again, wading through the generated code. Using gdb on
hand-written code is *much* nicer.

for sure -- there is a plug-in/add-on/something for using gdb on
Cython code -- I haven't used it but I imagine it would help.

Ian Thomas wrote:

I have never used Cython, but to me the code looks like an inelegant combination of
Python,C/C++ and some Cython-specific stuff.

well, yes, it is that!

I can see the advantage of this approach for small sections of code, but I have strong > reservations about using it for complicated modules that have extensive use of
templated code and/or Standard Template Library collections (mpl has examples of
both of these).

So far, I've found that Cython is good for:
- The simple stuff -- basic loops through numpy arrays, etc.
- wrapping/calling more complex C or C++
    -- essentially handling the reference counting and python type
packing/unpacking of python types.

So we find we do write some shim code in C++ to make the access to the
core libraries Cython-friendly. We haven't dealt with complex
templating, etc, but I'd guess if we did I'd keep that in C++. And
since the resulting actual glue code is pretty simple, it makes the
debugging easier.

Maybe rather than asking "if we switched to using Cython, would more participate", I
should be asking "among those that can participate in removing the PyCXX
dependency, what is the preferred approach?"

I don't know that we need a one-sieze fits all approach -- perhaps
some bits make the most sense to move to plain old C/C++, and some to
Cython, either because of the nature of the code itself, or because of
the experience/preference of the person that takes ownership of a
particular problem.

-Chris

···

--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@...236...

This argues against making the Cython source code a part of the matplotlib codebase.

huh? are you suggesting that we use Cython to generate the glue, then
hand-maintain that glue? I think that is a really, rally bad idea --
generated code is ugly and hard to maintain, it is not designed to be
human-readable, and we wouldn't get the advantages of bug-fixes
further development in Cython.

So -- if you use Cython, you want to keep using, and theat means the
Cython source IS the source. I agree that it's a good idea to ship the
generated code as well, so that no one that is not touching the Cython
has to generate. Other than the slight mess from generated files
showing up in diffs, etc, this really works just fine.

I agree with this approach.

Any reason MPL couldn't continue with EXACTLY the same approach now
used with C_XX -- it generates code as well, yes?

No -- PyCXX is just C++. Its killer feature is that it provides a fairly thin layer around the Python C/API that does implicit reference counting through the use of C++ constructors and destructors. I actually think it's a really elegant approach to the problem. The downside we're running into is that it's barely maintained, so using vanilla upstream as provided by packagers is not viable. An alternative to all of this discussion is to fork PyCXX and release as needed. The maintenance required is primarily when new versions of Python are released, so it wouldn't necessarily be a huge undertaking. However, I know some are reluctant to use a relatively unused tool.

Michael Droettboom wrote:

For the PNG extension specifically, it was creating callbacks that can
be called from C and the setjmp magic that libpng requires. I think
it's possible to do it, but I was surprised at how non-obvious those
pieces of Cython were. I was really hoping by creating this experiment
that a Cython expert would step up and show the way :wink:

Did you not get the support you expected from the cython list? Anyway,
there's no reason you can't keep stuff in C that's easier in C (or did
C_XX make this easy?).

The support has been adequate, but the solutions aren't always an improvement over raw Python/C API (not just in terms of lines of code but in terms of the number of layers of abstraction and "magic" between the coder and what actually happens).

  I think making basic callbacks is actually
pretty straightforward, but In don't know about the setjmp magic (I
have no idea hat that means!).

It turned out to be not terrible once I figured out the correct incantation.

The Agg backend has more C++-specific challenges, particularly
instantiating very complex template expressions --

I'm guessing you'd do the complex template stuff in C++ -- and let
Cython see a more traditional static API.

Agreed -- I'm really only considering replacing the glue code provided by PyCXX, not the whole thing. matplotlib's C/C++ code has been around for a while and has been fairly vetted at this point, so I don't think a wholesale rewrite makes sense.

but some of that complexity could be reduced by using Numpy arrays in place of the
image buffer types that each of them contain

OR Cython arrays and/or memoryviews -- this is indeed a real strength of Cython.

Sure, but when we return to Python, they should be Numpy arrays which have more methods etc. -- or am I missing something?

The Cython version isn't that much shorter than the C++ version.

I think some things make sense to keep in C++, though I do see a fair
bit of calls (in the C++) to the python API -- I'm surprised there
isn't much code advantage, but anyway, the goal is more robust/easier
to maintain, which may correlate with code-size, but not completely.

These declarations aren't exact matches to what one would find in the header file(s) >because Cython doesn't support exact-width data types etc.

It does support the C99 fixed-width integer types:

from libc.stdint cimport int16_t, int32_t,

Or are you talking about something else?

The problem is that Cython can't actually read the C header, so there are types in libpng, for example, that we don't actually know the size of. They are different on different platforms. In C, you just include the header. In Cython, I'd have to determine the size of the types in a pre-compilation step, or manually determine their sizes and hard code them for the platforms we care about.

I'm not sure why some of the Python/C API calls I needed were not defined in Cython's include wrappers.

I suspect that's an oversight -- for the most part, stuff has been
added as it's needed.

One other note -- from a quick glance at your Cython code, it looks
like you did almost everything is Cython-that-will-compile-to-pure-C
-- i.e. a lot of calls to the CPython API. But the whole point of
Cython is that it makes those calls for you. So you can do type
checking, and switching on types, and calling np.asarray(), etc, etc,
etc, in Python, without calling the CPython api yourself. I know
nothing of the PNG API, and am pretty week on the CPython API (and C
for that matter), but I it's likely that the Cython code you've
written could be much simplified.

It would at least make this a more fair comparison to have the Cython code as Cythonic as possible. However, I couldn't find any ways around using these particular APIs -- other than the Numpy stuff which probably does have a more elegant solution in the form of Cython arrays and memory views.

Once things compiled, due to my own mistake, calling the function segfaulted. Debugging
that segfault in gdb required, again, wading through the generated code. Using gdb on
hand-written code is *much* nicer.

for sure -- there is a plug-in/add-on/something for using gdb on
Cython code -- I haven't used it but I imagine it would help.

Ah. I wasn't aware of that. Thanks for pointing that out. I have the CPython plug-in for gdb and it's great.

Ian Thomas wrote:

I have never used Cython, but to me the code looks like an inelegant combination of
Python,C/C++ and some Cython-specific stuff.

well, yes, it is that!

I can see the advantage of this approach for small sections of code, but I have strong > reservations about using it for complicated modules that have extensive use of
templated code and/or Standard Template Library collections (mpl has examples of
both of these).

So far, I've found that Cython is good for:
  - The simple stuff -- basic loops through numpy arrays, etc.
  - wrapping/calling more complex C or C++
     -- essentially handling the reference counting and python type
packing/unpacking of python types.

So we find we do write some shim code in C++ to make the access to the
core libraries Cython-friendly. We haven't dealt with complex
templating, etc, but I'd guess if we did I'd keep that in C++. And
since the resulting actual glue code is pretty simple, it makes the
debugging easier.

Maybe rather than asking "if we switched to using Cython, would more participate", I
should be asking "among those that can participate in removing the PyCXX
dependency, what is the preferred approach?"

I don't know that we need a one-sieze fits all approach -- perhaps
some bits make the most sense to move to plain old C/C++, and some to
Cython, either because of the nature of the code itself, or because of
the experience/preference of the person that takes ownership of a
particular problem.

True. We do have two categories of stuff using PyCXX in matplotlib: things that (primarily) wrap third-party C/C++ libraries, and things that are actually doing algorithmic heavy lifting. It's quite possible we don't want the same solution for all.

Cheers,
Mike

···

On 12/03/2012 01:12 PM, Chris Barker - NOAA Federal wrote: