git-svn matplotlib mirror

Tue, 25 Jan 2011 12:19:37 -0500, Darren Dale wrote:

There is a potential problem converting the entire basemap history to
git. In svn commit 4418, trunk/toolkits had basemap and basemap-testing
directories. In commit 4419, basemap was renamed basemap-0.9.6.1, so
there was only basemap-0.9.6.1 and basemap-testing. In commit 4420,
basemap-testing is renamed basemap. The git history only goes back as
far as svn4420, it looks like the conversion routines get confused by
the temporary absence of the basemap directory.

I'm trying to find a workaround, but if I can't... ?

You can maybe do it like this:

1) Write matplotlib.rules so that all of the directories where basemap
stuff has been ends in the basemap repository. (I'm assuming this does
not error out...)

2) This will create a number of separate heads in the basemap repo that
do not share common history.

3) Add graft rules in matplotlib.grafts to stitch the disconnected
history graphs together.

This happened also with Numpy: part of the old history had a this sort of
a rename and so part of the history was not connected to the main graph.
So I just stitched the graphs together manually.

···

--
Pauli Virtanen

Tue, 25 Jan 2011 12:19:37 -0500, Darren Dale wrote:

There is a potential problem converting the entire basemap history to
git. In svn commit 4418, trunk/toolkits had basemap and basemap-testing
directories. In commit 4419, basemap was renamed basemap-0.9.6.1, so
there was only basemap-0.9.6.1 and basemap-testing. In commit 4420,
basemap-testing is renamed basemap. The git history only goes back as
far as svn4420, it looks like the conversion routines get confused by
the temporary absence of the basemap directory.

I'm trying to find a workaround, but if I can't... ?

You can maybe do it like this:

1) Write matplotlib.rules so that all of the directories where basemap
stuff has been ends in the basemap repository. (I'm assuming this does
not error out...)

Aha! I thought I had tried that. Thanks.

2) This will create a number of separate heads in the basemap repo that
do not share common history.

3) Add graft rules in matplotlib.grafts to stitch the disconnected
history graphs together.

Mercifully, the latest checkout of svn2git seems to take care of that.
I've developed a wicked headache.

Jeff, the repository is temporarily available at
https://github.com/darrendale/basemap . It would be really helpful if
you would have a look at the network graph at
https://github.com/darrendale/basemap/network to make sure there are
no surprises, maybe clone the repository and check that the working
directory is identical to your svn checkout.

Darren

···

On Tue, Jan 25, 2011 at 1:31 PM, Pauli Virtanen <pav@...278...> wrote:

There is still an outstanding issue that must be taken care of before
we migrate. The conversion routines create a basemap repository out of
trunk/toolkits/basemap, and matplotlib repository out of
trunk/matplotlib. Still, the matplotlib repo (at
github.com/darrendale/matplotlib) is over 200 MB. One can search the
objects in the large packfile, and find that there are still
references to basemap data in the matplotlib repo. I don't know how it
got in there, nor how to remove it.

Jeff: was there ever any basemap data committed directly to trunk/matplotlib?

Pauli: could I trouble you to have a look at my rules file, maybe you
will notice something I overlooked?
(https://github.com/darrendale/mpl2git/blob/master/matplotlib.rules)
Any other ideas?

Thanks
Darren

···

On Tue, Jan 25, 2011 at 3:06 PM, Darren Dale <dsdale24@...149...> wrote:

On Tue, Jan 25, 2011 at 1:31 PM, Pauli Virtanen <pav@...278...> wrote:

Tue, 25 Jan 2011 12:19:37 -0500, Darren Dale wrote:

There is a potential problem converting the entire basemap history to
git. In svn commit 4418, trunk/toolkits had basemap and basemap-testing
directories. In commit 4419, basemap was renamed basemap-0.9.6.1, so
there was only basemap-0.9.6.1 and basemap-testing. In commit 4420,
basemap-testing is renamed basemap. The git history only goes back as
far as svn4420, it looks like the conversion routines get confused by
the temporary absence of the basemap directory.

I'm trying to find a workaround, but if I can't... ?

You can maybe do it like this:

1) Write matplotlib.rules so that all of the directories where basemap
stuff has been ends in the basemap repository. (I'm assuming this does
not error out...)

Aha! I thought I had tried that. Thanks.

2) This will create a number of separate heads in the basemap repo that
do not share common history.

3) Add graft rules in matplotlib.grafts to stitch the disconnected
history graphs together.

Mercifully, the latest checkout of svn2git seems to take care of that.
I've developed a wicked headache.

Jeff, the repository is temporarily available at
https://github.com/darrendale/basemap . It would be really helpful if
you would have a look at the network graph at
https://github.com/darrendale/basemap/network to make sure there are
no surprises, maybe clone the repository and check that the working
directory is identical to your svn checkout.

I went through the exercise of identifying the largest blob, as
described near the end of http://progit.org/book/ch9-7.html :

$ git verify-pack -v
objects/pack/pack-fa44ca56d7ec3964e562494f2fe08203143074bd.idx | sort
-k 3 -n | tail -3
3b8b6c010f8ce59afac1e811b1bbc3efc21b770a blob 9154481 9089827 62749144
6328b70e665b58ed7f5aa1e110418cbb3facc07a blob 9331200 94297 156884507
f784efc1518b10dff33673ad9a7a1ac3a7d107d5 blob 51399604 14333430 162328624

$ git rev-list --objects --all | grep f784efc1518b10dff
f784efc1518b10dff33673ad9a7a1ac3a7d107d5 toolkits/basemap/data/gshhs_h.txt

This shell script is supposed to identify which commits have that blob
in their tree (git - Which commit has this blob? - Stack Overflow):

···

On Tue, Jan 25, 2011 at 6:12 PM, Darren Dale <dsdale24@...149...> wrote:

There is still an outstanding issue that must be taken care of before
we migrate. The conversion routines create a basemap repository out of
trunk/toolkits/basemap, and matplotlib repository out of
trunk/matplotlib. Still, the matplotlib repo (at
github.com/darrendale/matplotlib) is over 200 MB. One can search the
objects in the large packfile, and find that there are still
references to basemap data in the matplotlib repo. I don't know how it
got in there, nor how to remove it.

---
#!/bin/sh
obj_name="1&quot; shift git log &quot;@" --pretty=format:'%T %h %s' \

while read tree commit subject ; do

    if git ls-tree -r $tree | grep -q "$obj_name" ; then
        echo $commit "$subject"
    fi
done
---

but it comes up empty, so now I'm stuck. Any ideas would be greatly appreciated.

There is still an outstanding issue that must be taken care of before
we migrate. The conversion routines create a basemap repository out of
trunk/toolkits/basemap, and matplotlib repository out of
trunk/matplotlib. Still, the matplotlib repo (at
github.com/darrendale/matplotlib) is over 200 MB. One can search the
objects in the large packfile, and find that there are still
references to basemap data in the matplotlib repo. I don't know how it
got in there, nor how to remove it.

I went through the exercise of identifying the largest blob, as
described near the end of http://progit.org/book/ch9-7.html :

$ git verify-pack -v
objects/pack/pack-fa44ca56d7ec3964e562494f2fe08203143074bd.idx | sort
-k 3 -n | tail -3
3b8b6c010f8ce59afac1e811b1bbc3efc21b770a blob 9154481 9089827 62749144
6328b70e665b58ed7f5aa1e110418cbb3facc07a blob 9331200 94297 156884507
f784efc1518b10dff33673ad9a7a1ac3a7d107d5 blob 51399604 14333430 162328624

$ git rev-list --objects --all | grep f784efc1518b10dff
f784efc1518b10dff33673ad9a7a1ac3a7d107d5 toolkits/basemap/data/gshhs_h.txt

This shell script is supposed to identify which commits have that blob
in their tree (git - Which commit has this blob? - Stack Overflow):

---
#!/bin/sh
obj_name="1&quot; shift git log &quot;@" --pretty=format:'%T %h %s' \
> while read tree commit subject ; do
if git ls-tree -r $tree | grep -q "$obj_name" ; then
echo $commit "$subject"
fi
done
---

but it comes up empty, so now I'm stuck. Any ideas would be greatly appreciated.

First of all, I must clarify that I'm not a git expert by any means.

I suspected this could be some dangling objects within the repository,
which could be side effects of svn2git. After some googling, I found
that

git fsck \-\-unreachable HEAD (git for-each-ref
--format="%(objectname)" refs/heads)

This gave me 2774 objects which includes the blob of
"toolkits/basemap/data/gshhs_h.txt".
Since they are unreachable, I suppose that they can be simply removed.

I spend an hour to figure out how we can delete these unreachable
objects. But it turned out that the answer seems to be simple.

$ git repack -ad

Now there is no unreachable object reported and this seems to reduce
the total size down to ~140 MB.

Now the biggest blob is for "release/osx/matplotlib-0.98.5.tar.gz". and

$ git log -r -- release/osx/matplotlib-0.98.5.tar.gz

works as expected.

And some of the biggest blobs are associated with svg and pdf files.
It seems possible to remove these files (if needed) from the
repository using "git-filter-branch" to further reduce the size, but
I'm not sure if we need that.

IHTH,

-JJ

···

On Wed, Jan 26, 2011 at 8:38 AM, Darren Dale <dsdale24@...149...> wrote:

On Tue, Jan 25, 2011 at 6:12 PM, Darren Dale <dsdale24@...149...> wrote:

------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires
February 28th, so secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
matplotlib-devel List Signup and Options

There is still an outstanding issue that must be taken care of before
we migrate. The conversion routines create a basemap repository out of
trunk/toolkits/basemap, and matplotlib repository out of
trunk/matplotlib. Still, the matplotlib repo (at
github.com/darrendale/matplotlib) is over 200 MB. One can search the
objects in the large packfile, and find that there are still
references to basemap data in the matplotlib repo. I don't know how it
got in there, nor how to remove it.

Darren,

It looks like at least some of the problem is the origin/unit_support:

efiring@...340...:~/test/matplotlib.git.ddale$ git checkout origin/unit_support
Checking out files: 100% (4514/4514), done.
Note: checking out 'origin/unit_support'.
[...]
HEAD is now at 8d705be... refactoring, moved units conversion to units.UnitsManager
efiring@...340...:~/test/matplotlib.git.ddale$ ls
course CVSROOT htdocs matplotlib scipy06 toolkits users_guide

It appears to have branched trunk, not trunk/matplotlib. Junk it!

I think that losing some bits of history from the git repo is entirely acceptable. The history is still in the svn repo, if anyone really needs to dig back into the earliest recorded origins of unit_support.

Eric

···

On 01/25/2011 01:38 PM, Darren Dale wrote:

On Tue, Jan 25, 2011 at 6:12 PM, Darren Dale<dsdale24@...149...> wrote:

I went through the exercise of identifying the largest blob, as
described near the end of http://progit.org/book/ch9-7.html :

$ git verify-pack -v
objects/pack/pack-fa44ca56d7ec3964e562494f2fe08203143074bd.idx | sort
-k 3 -n | tail -3
3b8b6c010f8ce59afac1e811b1bbc3efc21b770a blob 9154481 9089827 62749144
6328b70e665b58ed7f5aa1e110418cbb3facc07a blob 9331200 94297 156884507
f784efc1518b10dff33673ad9a7a1ac3a7d107d5 blob 51399604 14333430 162328624

$ git rev-list --objects --all | grep f784efc1518b10dff
f784efc1518b10dff33673ad9a7a1ac3a7d107d5 toolkits/basemap/data/gshhs_h.txt

This shell script is supposed to identify which commits have that blob
in their tree (git - Which commit has this blob? - Stack Overflow):

---
#!/bin/sh
obj_name="1&quot; shift git log &quot;@" --pretty=format:'%T %h %s' \
> while read tree commit subject ; do
     if git ls-tree -r $tree | grep -q "$obj_name" ; then
         echo $commit "$subject"
     fi
done
---

but it comes up empty, so now I'm stuck. Any ideas would be greatly appreciated.

------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires
February 28th, so secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
matplotlib-devel List Signup and Options

[...]

Darren,

It looks like at least some of the problem is the origin/unit_support:

efiring@...340...:~/test/matplotlib.git.ddale$ git checkout origin/unit_support
Checking out files: 100% (4514/4514), done.
Note: checking out 'origin/unit_support'.
[...]
HEAD is now at 8d705be... refactoring, moved units conversion to
units.UnitsManager
efiring@...340...:~/test/matplotlib.git.ddale$ ls
course CVSROOT htdocs matplotlib scipy06 toolkits users_guide

It appears to have branched trunk, not trunk/matplotlib. Junk it!

I think that losing some bits of history from the git repo is entirely
acceptable. The history is still in the svn repo, if anyone really
needs to dig back into the earliest recorded origins of unit_support.

Eric

Or, maybe a rule like this will work?

match /branches/unit_support/matplotlib/
  repository matplotlib
  branch unit_support
end match

(I don't know how significant the parentheses are; I am guessing they are for regular expressions, which are not needed here.)

Eric

···

On 01/25/2011 06:53 PM, Eric Firing wrote:

Some of this appears to be the result of some svn tags referencing the
contents of trunk/htdocs. It looks like I can cut 25 MB off the size
of the repo.

···

On Tue, Jan 25, 2011 at 11:37 PM, Jae-Joon Lee <lee.j.joon@...149...> wrote:

And some of the biggest blobs are associated with svg and pdf files.
It seems possible to remove these files (if needed) from the
repository using "git-filter-branch" to further reduce the size, but
I'm not sure if we need that.

That feels pretty good to me, considering that the compressed release
tarball is 13MB. If we can get the whole history for twice that, then
you must be getting pretty close to the limit.

JDH

···

On Thu, Jan 27, 2011 at 10:26 AM, Darren Dale <dsdale24@...149...> wrote:

On Tue, Jan 25, 2011 at 11:37 PM, Jae-Joon Lee <lee.j.joon@...149...> wrote:

And some of the biggest blobs are associated with svg and pdf files.
It seems possible to remove these files (if needed) from the
repository using "git-filter-branch" to further reduce the size, but
I'm not sure if we need that.

Some of this appears to be the result of some svn tags referencing the
contents of trunk/htdocs. It looks like I can cut 25 MB off the size
of the repo.

Me too. I just posted the latest version of the repository to
github.com/darrendale/matplotlib.git . Its ~42MB, but it has a bunch
of unreachable objects. As soon as we figure out how to git rid of
them, I think we will be ready to freeze the svn repo and wrap this
up.

···

On Thu, Jan 27, 2011 at 12:00 PM, John Hunter <jdh2358@...149...> wrote:

On Thu, Jan 27, 2011 at 10:26 AM, Darren Dale <dsdale24@...149...> wrote:

On Tue, Jan 25, 2011 at 11:37 PM, Jae-Joon Lee <lee.j.joon@...149...> wrote:

And some of the biggest blobs are associated with svg and pdf files.
It seems possible to remove these files (if needed) from the
repository using "git-filter-branch" to further reduce the size, but
I'm not sure if we need that.

Some of this appears to be the result of some svn tags referencing the
contents of trunk/htdocs. It looks like I can cut 25 MB off the size
of the repo.

That feels pretty good to me, considering that the compressed release
tarball is 13MB. If we can get the whole history for twice that, then
you must be getting pretty close to the limit.

Darren:

https://github.com/darrendale/basemap looks fine, thanks!

-Jeff

···

On 1/25/11 1:06 PM, Darren Dale wrote:

On Tue, Jan 25, 2011 at 1:31 PM, Pauli Virtanen<pav@...278...> wrote:

Tue, 25 Jan 2011 12:19:37 -0500, Darren Dale wrote:

There is a potential problem converting the entire basemap history to
git. In svn commit 4418, trunk/toolkits had basemap and basemap-testing
directories. In commit 4419, basemap was renamed basemap-0.9.6.1, so
there was only basemap-0.9.6.1 and basemap-testing. In commit 4420,
basemap-testing is renamed basemap. The git history only goes back as
far as svn4420, it looks like the conversion routines get confused by
the temporary absence of the basemap directory.

I'm trying to find a workaround, but if I can't... ?

You can maybe do it like this:

1) Write matplotlib.rules so that all of the directories where basemap
stuff has been ends in the basemap repository. (I'm assuming this does
not error out...)

Aha! I thought I had tried that. Thanks.

2) This will create a number of separate heads in the basemap repo that
do not share common history.

3) Add graft rules in matplotlib.grafts to stitch the disconnected
history graphs together.

Mercifully, the latest checkout of svn2git seems to take care of that.
I've developed a wicked headache.

Jeff, the repository is temporarily available at
https://github.com/darrendale/basemap . It would be really helpful if
you would have a look at the network graph at
https://github.com/darrendale/basemap/network to make sure there are
no surprises, maybe clone the repository and check that the working
directory is identical to your svn checkout.

Darren