There is still an outstanding issue that must be taken care of before
we migrate. The conversion routines create a basemap repository out of
trunk/toolkits/basemap, and matplotlib repository out of
trunk/matplotlib. Still, the matplotlib repo (at
github.com/darrendale/matplotlib) is over 200 MB. One can search the
objects in the large packfile, and find that there are still
references to basemap data in the matplotlib repo. I don't know how it
got in there, nor how to remove it.
I went through the exercise of identifying the largest blob, as
described near the end of http://progit.org/book/ch9-7.html :
$ git verify-pack -v
objects/pack/pack-fa44ca56d7ec3964e562494f2fe08203143074bd.idx | sort
-k 3 -n | tail -3
3b8b6c010f8ce59afac1e811b1bbc3efc21b770a blob 9154481 9089827 62749144
6328b70e665b58ed7f5aa1e110418cbb3facc07a blob 9331200 94297 156884507
f784efc1518b10dff33673ad9a7a1ac3a7d107d5 blob 51399604 14333430 162328624
$ git rev-list --objects --all | grep f784efc1518b10dff
f784efc1518b10dff33673ad9a7a1ac3a7d107d5 toolkits/basemap/data/gshhs_h.txt
This shell script is supposed to identify which commits have that blob
in their tree (http://stackoverflow.com/questions/223678/git-which-commit-has-this-blob):
---
#!/bin/sh
obj_name="$1"
shift
git log "$@" --pretty=format:'%T %h %s' \
> while read tree commit subject ; do
if git ls-tree -r $tree | grep -q "$obj_name" ; then
echo $commit "$subject"
fi
done
---
but it comes up empty, so now I'm stuck. Any ideas would be greatly appreciated.
First of all, I must clarify that I'm not a git expert by any means.
I suspected this could be some dangling objects within the repository,
which could be side effects of svn2git. After some googling, I found
that
$ git fsck --unreachable HEAD $(git for-each-ref
--format="%(objectname)" refs/heads)
This gave me 2774 objects which includes the blob of
"toolkits/basemap/data/gshhs_h.txt".
Since they are unreachable, I suppose that they can be simply removed.
I spend an hour to figure out how we can delete these unreachable
objects. But it turned out that the answer seems to be simple.
$ git repack -ad
Now there is no unreachable object reported and this seems to reduce
the total size down to ~140 MB.
Now the biggest blob is for "release/osx/matplotlib-0.98.5.tar.gz". and
$ git log -r -- release/osx/matplotlib-0.98.5.tar.gz
works as expected.
And some of the biggest blobs are associated with svg and pdf files.
It seems possible to remove these files (if needed) from the
repository using "git-filter-branch" to further reduce the size, but
I'm not sure if we need that.
IHTH,
-JJ
···
On Wed, Jan 26, 2011 at 8:38 AM, Darren Dale <dsdale24@...149...> wrote:
On Tue, Jan 25, 2011 at 6:12 PM, Darren Dale <dsdale24@...149...> wrote:
------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires
February 28th, so secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel