Google Season of Docs 2020

We’re applying to Google Season of Docs this year!

We welcome proposals for all things documentation, including but not at all limited to the project listed here. We strongly believe you’ve probably got a better handle on how to fix our docs than we do. Please start a new topic with your proposal and tag it gsod

The current Matplotlib documentation entry paths as defined by the website are:

Project name: Develop Matplotlib Entry Paths
Matplotlib has a tremendous amount of documentation, including examples and tutorials that live inside the main source repository and longer tutorials that live in other repositories in the Matplotlib organization. This can lead to users having a hard time discovering the documentation most appropriated for their need. The aim of this project would be to define and organize paths through the docs for users, potentially consolidating or reorganizing content as needed and updating the web page to improve discoverability of paths.

Project name: Matplotlib Usage Guide Overhaul
Mentors: @story645, @tacaswell
Description: Matplotlib was primarily developed by scientists for scientific visualization, and modeled on scientific computation languages such as Matlab, but one of the fastest growing audiences for Matplotlib is data science practitioners. To address the needs of this audience, we propose overhauling the usage guide to be more welcoming to newcomers to scientific visualization.

The following topics might be more add to or more broadly explained in the guide:

  • the visualization pipeline: data->transformation->chart [1]
  • Matplotlib’s role in the pipeline , i.e. that it’s a low level library [7,2,3]
  • assumptions matplotlib makes about the structure of the data, user, and task
  • Matplotlib’s imperative architecture and how that differs from declarative libraries like ggplot [5]
  • the relationship between matplotlib and libaries built on top, such as Pandas and Seaborn [5, 8, 9]
  • terminology that can be confusing or is used inconsistently (ex norm and colormap on images, tickers/locators) [5]
  • how to customize the plots created by libraries like Pandas and Seaborn

These topics, and the scope, are expected to change as the guide develops and we welcome suggestions by the technical writer on approaches for addressing this audience.
Paper: (email for copy)
J. D. Hunter, “Matplotlib: A 2D Graphics Environment,” in Computing in Science & Engineering, vol. 9, no. 3, pp. 90-95, May-June 2007.
doi: 10.1109/MCSE.2007.55
URL: https://ieeexplore.ieee.org/document/4160265

Related material:

3 Likes

Hello I am Dhiraj Sharma, currently in final year of my undergraduation in Bachelor of Engineering in Electronics & Telecommunication.
I have been contributing to opensource since last 2 years and was selected as a Student Developer in Summer of Code in Space program organised by the European Space Agency I was among 17 selected developers where I worked on “docker containers for esajpip server” . I had also used docker containers and sphinx documentation to write the technical aspects of the project.

I am Core Developer in coala where I used python programming for linting tools, used antlr and worked on lexer, parser combination for python-based libraries. Moreover, I reviewed more than 25+ PRs.
I was appointed as Google Code In 2018 mentor in this organisation.

I am also the part of the Public Lab organisation’s mentor and review team where worked on frontend and backend.
Here I was appointed as Google Summer of Code 2019 mentor.

I have done internships in Wipro on Machine Learning test Automation using NLP, Bharat Electronics Limited in JDBC, SQL injection.

You can check my portfolio here at https://dhiraj240.github.io/

I am interested in “Matplotlib Usage Guide Overhaul”. Can you provide me repository link and issues to contribute with and should I start with visualization pipeline too ?

1 Like

Hi @Dhiraj_Sharma,
Thanks for your interest! Since a big part of both projects is clarifying the onboarding path for new users, you are strongly encouraged to document your path through our documentation and offer feedback on how it can be improved - informally by starting a discourse discussion thread and formally by opening an issue with actionable tasks that can be taken to improve the documentation.

For example, what is the barrier to you finding the repository link and issues on your own? This is very much not to pick on you, but because a big part of the GSoD work is to find and address discoverability issues; therefore that’s a skill we will be very much looking for in the candidates.

1 Like

Hi Team,

I am a graduate student at UCLA currently pursuing my Masters in ECE with a focus on computer vision and machine learning. I have been an avid open-source contributor to organisations like Scilab, Gensim and TensorFlow.

In 2017, as a part of Google Summer of Code I created a Machine Learning toolbox for Scilab by a Jupyter kernel integration in their CLI. The project got successful and I was invited to Mentor its development in 2018. We further integrated it with AWS and GCP to create a remote ML learning toolbox. Working on similarity learning research at Gensim as a Mentor for GSoC also helped me bolster my understanding of open-source mentoring and development.

I worked as a ML Engineer at Citibank from 2017-2019 where we extensively used python libraries like numpy, pandas and TensorFlow. While continuing my full time employment, I worked as a GSoC Mentor for TensorFlow in 2019 and will be mentoring projects for library migration to TF2.0 in 2020 summer as well.

I understand how important documentation is for open source projects, especially projects like Matplotlib with its massive scale or usage and support across various industries now. I am a fluent English speaker and have published multiple blogs with illustrations explaining the data science pipelines.You can read more about my technical writing here - https://medium.com/@razzormandar and https://mandroid6.github.io/archive/.

I am specifically interested in Develop Matplotlib Entry Paths, as I can personally relate to the pain developers need to go to find the perfect tutorial for a task. Being a developer and technical writer myself, I see this as an huge opportunity for me to grow in the role and contribute to the exciting work Matplotlib is doing in scientific computing.

Looking forward to discussing with you!

Understood, I will be working now.
I need copy of http://ieeexplore.ieee.org.ccny-proxy1.libr.ccny.cuny.edu/stamp/stamp.jsp?tp=&arnumber=4160265&isnumber=4160244

Can you send it to dhiraj.8.sharma@gmail.com

Hi @Dhiraj_Sharma that link doesn’t work and this is not a site to request papers, especially without an explanation of how the request is related to Matplotlib.

Hi @Mandar_Deshpande, do you by any chance have any longer or more comprehensive examples of technical writing - like manuals or reports? These examples are cool but the project requires organizing a ton of somewhat disparate pieces of information so it would be nice to see some more examples of that.

One place potentially to start is offer your in put on how you’d tackle Developing Matplotlib Entry Paths for GSoD'20: Suggestions

Greetings to all,
I am Sarthak Khandelwal student at Medicaps University , India. I have come up with the following suggestions that will aim towards making the onbording paths easier to new users. Also there are some changes which aim to improve the current documentation.
These changes are :

  1. It must include Dropdowns for all Non-Atomic data to wrap it up in units and make only a relevant portion of data available to developers.

  2. Mostly Data Science Practitioners and Researchers access it and they keep on just looking
    through the docs to find stuff of their Interest. So a Section for them must also be added.
    An Addition to it is to add a Discussion Forum to matplotlib.

  3. Most Data Scientists and researchers use pandas for data mining and wrangling. Sometimes for effective data visualization they use pandas for plotting of the data and use matplotlib on the top of it and hence a section named matplotlib with pandas should also be added so that beginners and people who don’t know about how we can integrate the two packages can get help from it.

  4. As Such Matplotlib includes a number of tutorials and also they are grouped together upto some extent. Naming of sections such as Introductory, Intermediate, Advanced, Colors, Text, Toolkit is helpful but what if someone is unaware of how and where one should apply these visualizations. So we must also include parts such as Post-Visualization and Pre-Visualization under the Tutorials section to attract most of the crowd of Machine Learning Researchers and practitioners. Some fraction of Business Analysts will also get attracted because in Data Science when we are working with data such as time series then instead of building a Machine Learning model for that, people find Statistical Models providing satisfactory results to them and in such cases they also need to first pre-visualize the data before model building and also post-visualize the data after the model has been built so as to show what impact the model has on the data. Hence these parts should also be added.

  5. Next I want to seek your attention at the Documentation Section . Every organization reflects its ease of service to its audience through its Documentation section and we must keep this page as short as possible with an aim to include all Non-Atomic Domains of our Organization.

  6. Next we can move to the Index Section and here we can see that all the Classes and Methods are shown. Keeping in view the concept of Shortening the section ensuring not any of the details are vanished . We can make this page shorter by first making all the methods of the class under the Class Name in a Dropdown menu . Here I have entered into some Front-End Part, since I am a Backend Developer, my optimization suggestions for Front-End related ideas may further be improved. Moreover other listings on the Index page such as about Modules should be included after the Classes Part as they are referenced in the Modules API and hence we don’t need to describe them here however if a user still click on it we don’t want it to act as a Label rather a link to its description. A short overview has been made by me to demonstrate my suggestion.

I hope these will prove to be useful. :slightly_smiling_face:

1 Like

can you spin this off into its own topic with a gsod tag? It’ll be much easier for folks to give you feedback than if it’s buried here.

Hi @Sarthak_Khandelwal really great points. Loved the 3rd point and the 6th point. Having an index section would actually help organize a lot of the methods in MPL. This is something that I intend to work on in the near future :slight_smile:

the homepage links out to the index but the link is somewhat buried.
image

I strongly encourage that all proposals show a deep understanding of the existing documentation and incorporate it as much as possible. Both because the work has already been done and because it’ll be easier to get community buy in for the ideas. We are also unlikely to move away from sphinx too much (unless someone makes a very good argument for doing so) so please factor that into your proposals.

1 Like

Hey @r0cketr1kky thank you for reading my suggestion and also for complementing it. :smiley:

1 Like

Do you mean adding my suggestion under a topic head like -
Topic: Suggested changes in Developer Entry paths. ?

Yes @Sarthak_Khandelwal
You can have a look at my post here
Feel free to drop in any more suggestions :slight_smile:

Sorry for not reading your request more carefully! @brunobeltran pointed out that you’re just asking for the Matplotlib paper :woman_facepalming:. I’ll send it your way.

Alright Mam, I need to know that should I start with implementing this paper on matplotlib repository first by opening relevant issues, which will be assigned to me to work on as a contribution before proposal submission.
I need invitation from the organisation for that to assign it to myself.
My GitHub is https://github.com/Dhiraj240

That paper describes the existing matplotlib architecture so it’s already implemented. We suggest you open issues on the repository and then you can claim them by putting in the pull requests.

Mam, soon I will be producing multiple PRs for documentation on the opened issues and as of now I have come up with few approaches for the Virtualization etc which I will share on opening up new discussion with my draft proposal on it.
May I know is the candidate already selected by you or you communicate it later?

We will make a selection among the candidates who have officially submitted via google and don’t expect to start that process until the application window closes in July.

1 Like