John,
Sometime in January, we are going to spend some time fixing a few minor MPL bugs we've hit and a probably work on a few enhancements (I'll send you a list in Jan before we start anything - it's nothing major). We're also going to work on writing a set of tests that try various plots w/ units. I was thinking this would be a good time to introduce a standard test harness into the MPL CM tree.
Hey Ted -- Sorry I haven't gotten back to you yet. These proposals
sound good. I have only very limited experience with unit testing and
you have tons, so I don't have a lot to add to what you've already
written, but I have a few inline comments below.
I think we should:
1) Select a standard test harness. The two big hitters seem to be unittest and nose. unittest has the advantage that it's shipped w/ Python. nose seems to do better with automatic discovery of test cases.
I prefer nose. I've used both a bit and find nose much more intuitive
and easy to use. The fact that ipython, numpy, and scipy are all
using nose makes the choice fairly compelling, especially if some of
your image specific tests could be ported w/o too much headache.
2) Establish a set of testing requirements. Naming conventions, usage conventions, etc. Things like tests should never print anything to the screen (i.e. correct behavior is encoded in the test case) or rely on a GUI unless that's what is being tested (allows tests to be run w/o an X-server). Basically write some documentation for the test system that includes how to use it and what's required of people when they add tests.
3) Write a test 'template' for people to use. This would define a test case and put TODO statements or something like it in place for people to fill in. More than one might be good for various classes of tests (maybe an image comparison template for testing agg drawing and a non-plot template for testing basic computations like transforms?).
Some things we do on my project for our Python test systems:
We put all unit tests in a 'test' directory inside the python package being tested. The disadvantage of this is that potentially large tests are inside the code to be delivered (though a nice delivery script can easily strip them out). The advantage of this is that it makes coverage checking easier. You can run the test case for a package and then check the coverage in the module w/o trying to figure out which things should be coverage checked or not. If you put the test cases in a different directory tree, then it's much harder to identify coverage sources. Though in our case we have 100's of python modules - in MPL's case, there is really just MPL, projections, backends, and numerix so maybe that's not too much of a problem.
Automatic coverage isn't something that is must have, but it is really nice. I've found that it actually causes developers to write more tests because they can run the coverage and get a "score" that other people will see. It's also a good way to check a new submission to see if the developer has done basic testing of the code.
All of the above sounds reasonable and I don't have strong opinions on
any of it, so I will defer to those who write the initial framework
and tests.
For our tests, we require that the test never print anything to the screen, clean up any of its output files (i.e. leave the directory in the same state it was before), and only report that the test passed or failed and if it failed, add some error message. The key thing is that the conditions for correctness are encoded into the test itself. We have a command line option that gets passed to the test cases to say "don't clean up" so that you can examine the output from a failing test case w/o modifying the test code. This option is really useful when an image comparison fails.
We've wrapped the basic python unittest package. It's pretty simple and reasonably powerful. I doubt there is anything MPL would be doing that it can't handle. The auto-discovery of nose is nice but unnecessary in my opinion. As long as people follow a standard way of doing things, auto-discovery is fairly easy. Of course if you prefer nose and don't mind the additional tool requirement, that's fine too. Some things that are probably needed:- command line executable that runs the tests.
- support flags for running only some tests
- support flags for running only tests that don't need a GUI backend
(require Agg?). This allows automated testing and visual testing to be
combined. GUI tests could be placed in identified directories and then
only run when requested since by their nature they require specific backends
and user interaction.
- nice report on test pass/fail status
- hooks to add coverage checking and reporting in the future
- test utilities
- image comparison tools
- ??? basically anything that helps w/ testing and could be common across
test casesAs a first cut, I would suggest is something like this:
.../test/run.py
mplTest/
test_unit/
test_transform/
test_...The run script would execute all/some of the tests. Any common test code would be put in the mplTest directory. Any directory named 'test_XXX' is for test cases where 'XXX' is some category name that can be used in the run script to run a subset of cases. Inside each test_XXX directory, one unittest class per file. The run script would find the .py files in the test_XXX directories, import them, find all the unittest classes, and run them. The run script also sets up sys.path so that the mplTest package is available.
Links:
unittest — Unit testing framework — Python 3.12.0 documentation
Most Dangerous Prescription Medications and Medical Devices of 2016 - Something About Orange
The Other Kelly Yancey: Python's unittest module ain't that badcoverage checking:
Coverage.py — Coverage.py 7.3.2 documentation
http://darcs.idyll.org/~t/projects/figleaf/doc/Thoughts?
Ted
ps: looking at the current unit directory, it looks like at least one test (nose_tests) is using nose even though it's not supplied w/ MPL. Most of the tests do something and show a plot but the correct behavior is never written into the test.
My fault -- I wrote some tests to make sure all the different kwargs
variants were processed properly, but since we did not have a
"correctness of output" framework in place, punted on that part. I
think having coverage of the myriad ways of setting properties is of
some value.
On the issue of units (not unit testing but unit support which is
motivating your writing of unit test) I think we may need a new
approach. The current approach is to put unitized data into the
artists, and update the converted data at the artist layer. I don't
know that this is the proper design. For this approach to work, every
scalar and array quantity must support units at the artist layer, and
all the calculations that are done at the plotting layer (eg error
bar) to setup these artists must be careful to preserve unitized data
throughout. So it is burdensome on the artist layer and on the
plotting function layer.
The problem is compounded because most of the other developers are not
really aware of how to use the units interface, which I take
responsibility for because they have oft asked for a design document,
which I have yet to provide because I am unhappy with the design. So
new code tends to break functions that once had unit support. Which
is why we need unit tests ....
I think everything might be easier if mpl had an intermediate class
layer PlotItem for plot types, eg XYPlot, BarChart, ErrorBar as we
already do for Legend. The plotting functions would instantiate these
objects with the input arguments and track unit data through the
reference to the axis. These plot objects would contain all the
artist primitives which would store their data in native floating
point, which would remove the burden on the artists from handling
units and put it all in the plot creation/update logic. The objects
would store references to all of the original inputs, and would update
the primitive artists on unit changes. The basic problem is that the
unitized data must live somewhere, and I am not sure that the low
level primitive artists are the best place for that -- it may be a
better idea to keep this data at the level of a PlotItem and let the
primitive artists handle already converted floating point data. This
is analogous to the current approach of passing transformed data to
the backends to make it easier to write new backends. I need to chew
on this some more.
But this question aside, by all means fire away on creating the unit tests.
JDH
···
On Mon, Dec 22, 2008 at 11:45 AM, Drain, Theodore R <theodore.r.drain@...179...> wrote: