git-svn matplotlib mirror

Thu, 27 Jan 2011 12:39:48 -0500, Darren Dale wrote:
[clip]

Me too. I just posted the latest version of the repository to
github.com/darrendale/matplotlib.git . Its ~42MB, but it has a bunch of
unreachable objects. As soon as we figure out how to git rid of them, I
think we will be ready to freeze the svn repo and wrap this up.

Unreachable from where? How do you know there are unreachable
objects?

Note that the snippet

    git fsck --unreachable HEAD $(git for-each-ref --format="%(objectname)" refs/heads)

only checks for objects unreachable from branches (by definition,
stuff under refs/heads). However, there's also other stuff under refs/:
tags and hidden branches. Especially the postprocess.sh script hides
some branches.

To see all that is there, check the output from

    git for-each-ref

···

--
Pauli Virtanen

Oh, I didn't understand what I was doing with the git fsck command.

Still, Even after removing the the largest blob in the repo with

run git filter-branch --index-filter \
   'git rm --cached --ignore-unmatch release/osx/matplotlib-0.98.5.tar.gz' \
   -- 750059aa09340^..

the blob still exists, but is not associated with a commit according to

git log --pretty=oneline -- release/osx/matplotlib-0.98.5.tar.gz

That blob accounts for 1/4 of the total size of the repo. It would be
nice to get rid of it, if possible.

Darren

···

On Thu, Jan 27, 2011 at 12:57 PM, Pauli Virtanen <pav@...278...> wrote:

Thu, 27 Jan 2011 12:39:48 -0500, Darren Dale wrote:
[clip]

Me too. I just posted the latest version of the repository to
github.com/darrendale/matplotlib.git . Its ~42MB, but it has a bunch of
unreachable objects. As soon as we figure out how to git rid of them, I
think we will be ready to freeze the svn repo and wrap this up.

Unreachable from where? How do you know there are unreachable
objects?

Note that the snippet

git fsck --unreachable HEAD $(git for-each-ref --format="%(objectname)" refs/heads)

only checks for objects unreachable from branches (by definition,
stuff under refs/heads). However, there's also other stuff under refs/:
tags and hidden branches. Especially the postprocess.sh script hides
some branches.

To see all that is there, check the output from

git for-each-ref

to, 2011-01-27 kello 13:44 -0500, Darren Dale kirjoitti:
[clip]

Still, Even after removing the the largest blob in the repo with

run git filter-branch --index-filter \
   'git rm --cached --ignore-unmatch release/osx/matplotlib-0.98.5.tar.gz' \
   -- 750059aa09340^..

the blob still exists, but is not associated with a commit according to

git log --pretty=oneline -- release/osx/matplotlib-0.98.5.tar.gz

That blob accounts for 1/4 of the total size of the repo. It would be
nice to get rid of it, if possible.

I think "git log" will show you only the current branch by default. Do

        git log --pretty=oneline --all -- release/osx/matplotlib-0.98.5.tar.gz

to get all branches, and do

        for branch in `git for-each-ref --format='%(refname)'`; do S=`git log --pretty=oneline $branch -- release/osx/matplotlib-0.98.5.tar.gz`; if test -n "$S"; then echo "$branch"; echo "$S"; fi; done

to see which refs have the commits containing it.

Similarly, git-filter-branch rewrites only the current branch unless
told otherwise. To filter everything, it's best to do

        git filter-branch --index-filter \
          'git rm --cached --ignore-unmatch release/osx/matplotlib-0.98.5.tar.gz' \
          -- `git for-each-ref --format="750059aa09340^..%(refname)"`

Note that all branches and tags should be filtered in the same way:
since rewriting changes the hashes of all following commits, you end up
with incompatible histories otherwise.

After that, I get down to 34 MB.

···

--
Pauli Virtanen

You are brilliant. If you send me your address off-list, I'll send you
a bottle of scotch, or tequila, or a doughnut, or whatever you want.

···

On Thu, Jan 27, 2011 at 4:18 PM, Pauli Virtanen <pav@...278...> wrote:

to, 2011-01-27 kello 13:44 -0500, Darren Dale kirjoitti:
[clip]

Still, Even after removing the the largest blob in the repo with

run git filter-branch --index-filter \
'git rm --cached --ignore-unmatch release/osx/matplotlib-0.98.5.tar.gz' \
-- 750059aa09340^..

the blob still exists, but is not associated with a commit according to

git log --pretty=oneline -- release/osx/matplotlib-0.98.5.tar.gz

That blob accounts for 1/4 of the total size of the repo. It would be
nice to get rid of it, if possible.

I think "git log" will show you only the current branch by default. Do

   git log \-\-pretty=oneline \-\-all \-\- release/osx/matplotlib\-0\.98\.5\.tar\.gz

to get all branches, and do

   for branch in \`git for\-each\-ref \-\-format=&#39;%\(refname\)&#39;\`; do S=\`git log \-\-pretty=oneline $branch \-\- release/osx/matplotlib\-0\.98\.5\.tar\.gz\`; if test \-n &quot;$S&quot;; then echo &quot;$branch&quot;; echo &quot;$S&quot;; fi; done

to see which refs have the commits containing it.

Similarly, git-filter-branch rewrites only the current branch unless
told otherwise. To filter everything, it's best to do

   git filter\-branch \-\-index\-filter \\
           &#39;git rm \-\-cached \-\-ignore\-unmatch release/osx/matplotlib\-0\.98\.5\.tar\.gz&#39; \\
           \-\- \`git for\-each\-ref \-\-format=&quot;750059aa09340^\.\.%\(refname\)&quot;\`

Note that all branches and tags should be filtered in the same way:
since rewriting changes the hashes of all following commits, you end up
with incompatible histories otherwise.

After that, I get down to 34 MB.