Abstract
Matplotlib is the de facto standard library atop which the bulk of the Python visualization ecosystem is built, including such popular packages as cartopy
, pandas
, seaborn
, and many others.
Over the years, Matplotlib has evolved to contain a stunning array of features, however discoverability remains a central problem for our documentation.
There is a vast existing store of matplotlib documentation that can be broken
up largely into tutorials, API documentation, and examples, (ignoring “meta” documentation, e.g. installation how-to’s, contribution guides, etc.).
However, while anything not covered in detail with the API documentation is likely to have a tutorial or example written about it, it seems that even amongst core developers the best way to figure out the “latest and greatest” way to accomplish a task is often to ask the community.
In order to remedy this problem, I propose to integrate the API documentation with a large swath of existing tutorials and examples by attempting to systematically document what I will call Matplotlib’s “implicit types”.
These are a set of core concepts which, due to their “duck type”-heavy nature, have not yet merited their own proper class, but nonetheless are commonly accepted as inputs to a variety of Matplotlib routines internally, and whose documentation is currently scattered throughout docstrings, tutorials, and sometimes even just in examples.
The goal is to create a path to discoverability of Matplotlib’s core features for one of our largest user groups: the so-called “copy/paste/edit” developer (see the User research section below).
We will leverage our existing Sphinx/Readthedocs infrastructure by simply creating a new reStructuredText role :type:
which can be used in place of the usual numpydoc type syntax when appropriate.
For example, in Line2D
:
"""
color : :type:`Color`, default: :rc:`lines.edgecolor`
"""
would link to a comprehensive “Color” tutorial (currently links to
Line2D.set_color
).
The goals of this change are to
- Make it to find the “common”/“easiest” option for a parameter (preferably in zero clicks).
- Make it easy to “scroll down” to see more advanced options (preferably with one click, max).
- Provide a centralized strategy for linking top-level “API” docs to the relevant “tutorials”.
- Avoid API-doc-explosion, where scanning through the many possible options to each parameters makes individual docstrings unwieldy.
User research
Because matplotlib users are such a diverse group, creating documentation that
works for everyone—from those for whom matplotlib is their introduction to
Python to seasoned data scientists and library writers hoping to wrap
matplotlib into their own packages—is a herculean task. In order to identify
what unmet needs are most relevant for different kinds of users, I watched
several matplotlib users undertake assigned tasks using the library, where the
users were split into three different categories of experience with matplotlib
and Python:
- Experienced with matplotlib and Python
- Experienced with Python, but not plotting or matplotlib specifically
- Experienced with design, but with neither Python nor matplotlib
TODO: Fill in this section with full write-up.
The main takeaway was that both types 1 and 2 are “copy/paste/edit” developers
who do the majority of their editing with the docs pulled up, and only discover
features by googling in plain english when they have a specific task they want
to accomplish that they don’t immediately found in the API docs (or in the docs
for the example they are copying from).
Detailed proposal
Historically, matplotlib’s API has relied heavily on string-as-enum
“implicit types”. Besides mimicking matlab’s API, these parameter-strings allow the
user to pass semantically-rich values as arguments to matplotlib functions
without having to explicitly import or verbosely prefix an actual enum value
just to pass basic plot options (i.e. plt.plot(x, y, linestyle='solid')
is
easier to type and less redundant than plt.plot(x, y, linestyle=mpl.LineStyle.solid)
).
Many of these string-as-enum implicit types have since evolved more sophisticated
features. For example, a linestyle
can now be either a string or a 2-tuple
of sequences, and a MarkerStyle can now be either a string or a path. While this
is true of many implicit types, MarkerStyle is the only one (to my knowledge) that
has the status of being a proper Python type.
Because these implicit types are not classes in their own right, Matplotlib has
historically had to roll its own solutions for centralizing documentation and
validation of these implicit types (e.g. the docstring.interpd.update
docstring
interpolation pattern and the cbook._check_in_list
validator pattern,
respectively) instead of using the standard toolchains.
While these solutions have worked well for us, the lack of an explicit location
to document each implicit type means that the documentation is often difficult to
find, large tables of allowed values are repeated throughout the documentation,
and often an explicit statement of the scope of a implicit type is completely
missing from the docs. Take the plt.plot
docs, for example. In the “Notes”,
a description of the matlab-like format-string styling method mentions
linestyle
, color
, and markers
options. There are many more ways to
pass these three values than are hinted at, but, for many users, this is their
only source of understanding about what values are possible for those options
until they stumble on one of the relevant tutorials. In the table of Line2D
attributes, the linestyle
entry does a good job of linking to
Line2D.set_linestyle
where those options are described, but the color
and markers
entries do not. color
simply links to Line2D.set_color
,
which does nothing in the way of offering intuition on what kinds of inputs are
allowed.
… It can be argued that plt.plot
is a good candidate to be explicitly
excempted from any documentation best practices we try to codify, and I’ve
chosen it intentionally to elicit the strongest opinions from everyone.
It could be argued that this is something that can be fixed by simply tidying
up the individual docstrings that are causing problems, but the issue is
unfortunately much more systemic than that. Without a centralized place to find
the documentation, this will simply lead to us having more and more copies of
increasingly verbose documentation repeated everywhere each of these implicit
types is used. The alternative, of scattering the information throughout the
documentation, will instead lead to the users having to slowly piece together
their mental model of each implicit type through wiki-diving style traversal
throughout our documentation, or piecemeal from StackOverflow examples.
Ideally, a mention of linestyle
in the LineCollection
docs should
instead link to the same place as it does in the plt.plot
docs. By
organizing these linestyle
-specific docs in order from most-common to
most-complex input types, we can maintain a “single-click-to-discover” property
for our advanced plotting options, while also making sure that we don’t hurt
usability for users that simply want to know the simplest way to accomplish a
common task.
Practically speaking, the actual information that we want to have in the
LineCollection
docs is just:
- A link to complete docs for allowable inputs (like those found in
Line2D.set_linestyle
). - A plain words description of what the parameter is meant to accomplish. To
matplotlib power users, this is evident from the parameter’s name, but for
new users this need not be the case. (e.g.linestyle: a description of whether the stroke used to draw each line in the collection is dashed, dotted or solid
). - A link to any tutorials that visually depict the possible options (currently
found only after already clicking through to theLine2D.set_linestyle
docs).
In order to make this information available for all implicit types, helping the
continued improval of the consistency and readability of the docs, we propose
the following best-practices for handling implicit types:
- Implicit type documentation should be centralized at a dedicated page, where
the easiest/most common/simplest options are plainly documented in a
separate section on the top of the page, and more advanced options can be
found by simply scrolling down. - Functions that accept implicit types as parameters should link to the
appropriate:type:
docs. - If a implicit type is a “string-as-enum”, it should simply be made an
Enum
, and each possible value should have a Sphinx-parseable documentation
string.
In particular, notice that (1) would replace large copies of tables of possible
linestyles, markerstyles, etc, with links to the complete documentation for
each. Without all the visual noise from these tables of valid options, the
relevant functions would be free to visibly link to tutorials where these
options are visually demonstrated.
The way this would look in the actual docs is just
"""
linestyles: :type:`Linestyle` or list thereof, default: :rc:`lines.linestyle`
"""
would link to a comprehensive explanation of what “linestyles” are allowable,
similar to what is currently found at
:doc:`/gallery/lines_bars_and_markers/linestyles.html`
, (currently does not link to anything at all!)
Some benefits of this approach include:
- Less likely for docs to become stale, due to centralization.
- Increased discoverability of advanced options. If the simple linestyle option
'-'
is documented alongside more complex on-off dash specifications,
users are more likely to scroll down than they are to stumble across an
unlinked-to tutorial that describes a feature they need. - Canonicalization of many of matplotlib’s “implicit standards” (like what is a
“bounds” versus and “extents”) that currently have to be learned by reading
the code. - The process would likely highlight issues with API consistency in a way that
could be more easily tracked via Issues, helping with the process of
improving our API (see below for discussion). - Becoming more compatible with potentially adding typing to the library.
- Faster doc build times, due to significant decreases in the amount of
text needing to be parsed.
Implementation
This proposal would create one centralized “tutorial” page per implicit type.
For types with complex construction requirements, we would produce and use
classmethods for explicit construction from a known type, but __init__
would continue to hold the logic required to deduce how to construct the type
from the type of the input.
All functions that accept this implicit type as a parameter would have their
docstrings changed to simply use the numpydoc “input type” syntax to link to
this new class. All functions which use this implicit type (i.e. would raise on
an invalid input) would construct an explicit object instance using the general
__init__
, allowing the new class to handle validation.
The implicit types that I propose require better organized tutorials
capstyle
joinstyle
bounds
extents
linestyle
colors
colornorm/colormap
ticks
- Probably others…
Related issues
Some common discoverability issues that this proposal does not address involve
parameters whose allowable types depend on other parameters (for example
x
and y
in plt.plot
depending on data
.
Alternatives
I submitted a similar proposal as MEP30, which instead of simply adding
documentation, actually proposes to make each of these concepts into a new
style class, so that much of what is effectively tutorials documentation
effectively becomes just API documentation, which can be linked using the
standard numpydocs conventions for types of parameters.
Timeline
//TODO