It was noted on gitter that [ENH]: data kwarg support for mplot3d #20912 by jayjoshi112711 · Pull Request #20951 · matplotlib/matplotlib · GitHub included a commit that included most of a virtual environment. This was corrected in the following two commits, however the commits were not squashed so that the files (~40MB worth) are still in the history and and includes several compiled files (hat tip to @anntzer.lee for noticing this). On the down side we have 250+ commits on master
, but on the bright side these commits came in after we branched 3.5.x so we do not have any tags that include the bad commits.
Given the number of commits, I think it is too late to just force-push the commits out of existence (as we did in Removed commits from master branch), however I propose that we look at this as an opportunity to rename our default branch from master
→ main
. My proposal is:
- we use
bfg
(or on equivalent git filter, but bfg is simpler to use) to remove the files we do not want. Fortunately the filenames / folders names are sufficiently unique in our history than we can trivially remove them - we push the cleaned branch to github as
main
and switch the default branch to bemain
. This should (if I understand the GH tools correctly) will re-target all open PRs. Anything opened before #20951 should “just work” as they were never aware of those commits. Anything that was opened (and not merged) inbetween will need to be rebased / cherry-picked to remove the extra commits (as they will suddenly show they have ~250 additional commits due to the re-writing) - we remove the
master
branch from GitHub
There will need to be some coordination to make sure that between steps 1 and 3 in time no one merges anything to master
, but our merge rate is low enough and we can check our work sufficiently to make sure that it infact did not happen (and if it did fix it).
The added work to move from master → main is
- find-and-replace in on the code base to update both the docs and anyplace where the branch name is hard-coded into CI etc
- document how to checkout and create a
main
branch for users who already have a clone - document how to fix up a feature branch that forked after the PR was merged
From some quick experimentation just rebasing a branch is net enough (as git will very cleverly just preserve the commits we do not want!). There are (as of now) 28 open PRs : Pull requests · matplotlib/matplotlib · GitHub that were created after those commits went into master (but some of which were probably branched before the problematic commits) so I think the option are:
- document how to use cherry-picking or interactive rebasing to get rid of the commits (I think
git rebase main; git rebase -i
would do the trick orgit cherry-pick SOME...RANGE
) - ask everyone to run
bfg
and force push - One of us (me) does 2 and force-pushes on behalf of everyone with an open PR
I suspect we should document all 3 and ask people which they want to use for their PR (from eyeballing it 2/3 of the PRs are from core developers).
The invocation to clean the repo is (following Removing sensitive data from a repository - GitHub Docs see links there for install instructions):
bfg --delete-folders '{share,bin,python3.9}' --delete-files pyvenv.cfg
I have run this and am pushing the results to GitHub - tacaswell/matplotlib at main
It is my understanding that this operation is deterministic so anyone should be able to re-run this and verify my work.
I suspect that there might be a way to fully drop those three commits. I’m happy to just make them empty, but if someone wants to sort out how to drop them and advocate for that I would not be opposed.
Commits to be effectively removed (links may break once we drop the master branch and GH cleans their history)
- ed117f4d6a22d68d0846dc304047156716e41385 (brings in the file)
- 93d67769bf649578122f9b9734ede58a31d2aabd (removes most of the files)
- 39ae6304e52736b4017fbb723198787d2f3764e8 (removes the last of the files)
After running bfg
we get (note these to to my fork)
- 78727a8c304e4d715bc63d2694fbd1a7dd521876
- a50e2dfa485e5fb01cf0be1d440445be4071debc
- cfb2ea968cbc104f5c66ff5d3f2a0a4979781032
which are notably all empty (showing that this worked)