cProfiling cartopy's vs GeoPandas's interface with Matplotlib: puzzling results

gregorhd · August 15, 2021, 10:54am

Hi there,

I’m currently comparing, among others, different geospatial mpl-interfaces including cartopy and GeoPandas for a thesis. One part is comparing the total CPU runtimes of each library to generate the same map product, using the same data. The performance benchmarking is done with cProfile and the results are a bit puzzling

2021-08-14 dd comparison

By this measure, cartopy allegedly outperforms every other library even including datashader. The cartopy and GeoPandas implementations are almost identical except for the former using add_geometries() and the latter GeoDataFrame.plot() to interface with mpl. So what do you think could be happening here?

The basic setup is this: I’m wrapping the entire figure and axes definitions, including the central add_geometries() calls for cartopy or .plot() for GeoPandas, in a function called renderFigure() which is then wrapped with a decorator that basically does this:

p = cProfile.Profile() 
p.enable() # start cProfiling

value = func(*args, **kwargs) # execute renderFigure()

p.disable() # stop cProfiling
p.dump_stats() # write cProfile to file

Strangely, for the cartopy interface to mpl, the cProfiles are being created almost immediately (a print statement confirms the dump to .prof) even though the actual figures are not rendered, or show up, in the interpreter until maybe a minute later (we’re talking 144,000 partially very elaborate polygons).

For GeoPandas and geoplot, the cProfiles look more realistic. If anything, cartopy should hardly be faster than datashader.

I presume this has to do with how cartopy interfaces with the mpl renderer and when cProfile thinks the function is ‘done’ even though the interpreter or mpl renderer still has to do some work behind the scenes. Is there any way to also catch that ‘actual’ rendering done until the figure appears in the interpeter as part of a cProfile? And why does this seem to be caught for GeoPandas and geoplot but not for cartopy? Or could somebody explain what’s happening behind the scenes with either mpl or cartopy?

QuLogic · August 19, 2021, 6:42am

I see from recent changes that you added a draw, which would significantly change how long it takes to run (or saving to a file would be equivalent.)

But I’m not sure if you also accounted for downloads? Some shapefiles are bundled in Cartopy, while Natural Earth shapefiles (especially higher resolution ones) are downloaded as needed, but then they are cached.

gregorhd · August 19, 2021, 7:10am

Hi there, thanks for your suggestion. The data acquisition portion, which in my case pulls the GDFs from a PostGIS database, is not included in the renderFigure() function which is wrapped with the cProfiling decorator. Data acquisition is part of the sql2gdf() function executed prior, so cProfile doesn’t look at that bit at all.

It now seems that adding the draw brings the total roughly up to where GeoPandas is (53 vs 61 seconds).