Michael Droettboom wrote:
Interesting result. I pulled all of the new "actual" files from the 21
failing tests on the buildbots to my local machine and all of those
tests now pass for me. Good. Interestingly, there are still two tests
failing on my machine which did not fail on the buildbots, so I can't
grab the buildbots' new output.
Well, if they're not failing on the buildbots, that means the baseline
in svn can't be too different than what they generate. But it's a good
point that we want the actual output of the buildbots regardless of
whether the test failed.
Could this just be a thresholding issue
for the tolerance value? I'm a little wary of "polluting" the baseline
images with images from my machine which doesn't have our "standard"
version of Freetype, so I'll leave those out of SVN for now, but will go
ahead and commit the new baseline images from the buildbots.
Looking at the 2 images failing on the buildbots, I'm reasonably sure
they were generated by James Evans when he created the first test
infrastructure. So I say go ahead an check in the actual images
generated by the buildbots. (Or did you recently re-upload those images?)
Assuming
these two mystery failures are resolved by pulling new images from the
buildbots, I think this experiment with turning of hinting is a success.
Yes, I think so, too. I was going to suggest getting on the freetype
email list to ask them about their opinion on what we're doing.
As an aside, is there an easy way to update the baselines I'm missing?
At the moment, I'm copying each result file to the correct folder under
tests/baseline_images, but it takes me a while because I don't know the
heirarchy by heart and there are 22 failures. I was expecting to just
manually verify everything was ok and then "cp *.png" from my scratch
tests folder to baseline_images and let SVN take care of which files had
actually changed.
Unfortunately, there's no easy baseline update yet. John wrote one for
the old test infrastructure, but I ended up dropping that in the
switchover to the simplified infrastructure. The reason was that the
image comparison mechanism, and the directories to which they were
saved, changed, and thus his script would have require a re-working.
Given that I don't consider the current mechanism for this particularly
good, I was hesitant to invest the effort to port over support for a
crappy layout.
(The trouble with the current actual/baseline/diff result gathering
mechanism is that it uses the filesystem as a means for communication
withing the nose test running process in addition to communication with
the buildbot process through hard-coded assumptions about paths and
filenames. If the only concern was within nose, we could presumably
re-work some the old MplNoseTester plugin to handle the new case, but
given the buildbot consideration it gets more difficult to get these
frameworks talking through supported API calls. Thus, although the
hardcoded path and filename stuff is a hack, it will require some
serious nose and buildbot learning to figure out how to do it the
"right" way. So I'm all for sticking with the hack right now, and making
a bit nicer by doing things like having a better directory hierarchy
layout for the actual result images.)
This is just the naive feedback of a new set of eyes:
it's extremely useful and powerful what you've put together here.
Thanks for the feedback.
The goal is that Joe Dev would think it's easy and useful and thus start
using it. Tests should be simple to write and run so that we actually do
that. Like I wrote earlier, by keeping the tests themselves simple and
clean, I hope we can improve the testing infrastructure mostly
independently of changes to the tests themselves.
-Andrew