git migration

Eric_Firing2 · March 2, 2010, 10:21pm

All,

I think the git migration deserves its own thread on the devel list, so here is a start.

The full svn repo includes much more than just matplotlib: also course, htdocs, py4science, sample_data, sampledoc_tut, scipy06, toolkits, and users_guide. Before moving matplotlib, I think we should have a clear plan as to how these other parts are going to be handled. Will some or all remain as the active parts of the svn repo, with matplotlib somehow marked as invalid? Will some or all get their own github repos? My primary interest here is toolkits/basemap, but I am sure other good stuff is in there.

Before the transition, it would be good to have a pointers to the simplest possible docs illustrating typical workflows after the transition; maybe one for present developers with svn access, and another for occasional contributors.

Does it makes sense to retain the entire history in the new github repo, or would it be just as well to start from a later point so as to reduce the size? The entire history could still be available in a separate read-only repo, or fossilized in svn on sourceforge, or in my hg mirror. (Andrew's repo, at just under 200MB, is not prohibitively large by any means, but it is a bit hefty.)

Eric

Andrew_Straw5 · March 3, 2010, 3:17am

Eric Firing wrote:

All,

I think the git migration deserves its own thread on the devel list, so
here is a start.

To the uninitiated - a decision is being made that MPL is moving to git
and github. We hope that this move will foster greater contributions
from the community and a blurring of the line between MPL committers and
users.

The decision process happened off-list to keep the flames and
bike-shedding minimal. Several of the core developers were consulted and
we all agreed that a move to a DVCS was desirable and inevitable. We did
not unanimously agree that git was best, but it was preferred by most
developers over mercurial/bitbucket, the other serious contender, and
neither camp voiced strong objections to the other system.

The full svn repo includes much more than just matplotlib: also course,
htdocs, py4science, sample_data, sampledoc_tut, scipy06, toolkits, and
users_guide. Before moving matplotlib, I think we should have a clear
plan as to how these other parts are going to be handled. Will some or
all remain as the active parts of the svn repo, with matplotlib somehow
marked as invalid? Will some or all get their own github repos? My
primary interest here is toolkits/basemap, but I am sure other good
stuff is in there.

This is a good point. My preferred option is that we jettison all the
stuff that is not going to be shipped with MPL 1.0 from the git repo.
(More correctly - we build a git repo without that stuff ever going in.)
We can keep the old svn tree around and migrate the other projects to
git as desired. I think this is what's present in
http://github.com/astraw/matplotlib . Or am I missing something?

Another issue is whether to use github's Issue's system over
SourceForge's tracker. Personally, I'm in favor of moving the issue
tracking to github, but I think we should take stock of how we use the
tracker as see if github's features will support that.

Before the transition, it would be good to have a pointers to the
simplest possible docs illustrating typical workflows after the
transition; maybe one for present developers with svn access, and
another for occasional contributors.

I agree. I think the best learning material is from github. See
http://help.github.com/ and http://learn.github.com/ , for example. To
get to the "a ha" feeling, I highly recommend "Git from the bottom up"
by John Wiegley, available from
http://ftp.newartisans.com/pub/git.from.bottom.up.pdf . This latter is
what it took for me to come to a real understanding of git. Git was
designed from the data structures and plumbing up, and that the rest
("porcelain" in git parlance) came later and was less the focus of
initial development. Hence, the history is that git had a rougher UI
from the start and other DVCSs having nicer UIs but less stable and fast
repository formats. (Understanding the git model of the universe was key
to me becoming really fluent in git, but according to my office mate,
it's absolutely not necessary to use git for daily tasks. )

Does it makes sense to retain the entire history in the new github repo,
or would it be just as well to start from a later point so as to reduce
the size? The entire history could still be available in a separate
read-only repo, or fossilized in svn on sourceforge, or in my hg mirror.
(Andrew's repo, at just under 200MB, is not prohibitively large by any
means, but it is a bit hefty.)

I can see advantages either way, but I'm in favor keeping it. Tons of
MPL is undercommented, and seeing the history is extremely useful when
spelunking.

-Andrew

Gokhan_SEVER · March 3, 2010, 4:46am

Apart from being inflammatory, has anyone considered code.google.com (GC) as a solution? To me amongst all code hosting sites (launchpad, sourceforge, bitbucket, github) GC provides the simplest and the most effective interface. There is also practically very less learning curve on GC comparing to other alternatives. This is a great advantage for the newcomers to the project. For instance SF has all the useful code management functionalities but their interface is really not inviting --at least to my eyes. It takes a while also before the site content are indexed by crawlers.

On the negative side, GC doesn’t offer git. However the source could be externally linked like in the sympy project.

What do you think? Does simplicity really counts on the decision or the functionality beats simplicity?

···

On Tue, Mar 2, 2010 at 9:17 PM, Andrew Straw <strawman@…272…36…> wrote:

Eric Firing wrote:

All,

I think the git migration deserves its own thread on the devel list, so

here is a start.

To the uninitiated - a decision is being made that MPL is moving to git

and github. We hope that this move will foster greater contributions

from the community and a blurring of the line between MPL committers and

users.

The decision process happened off-list to keep the flames and

bike-shedding minimal. Several of the core developers were consulted and

we all agreed that a move to a DVCS was desirable and inevitable. We did

not unanimously agree that git was best, but it was preferred by most

developers over mercurial/bitbucket, the other serious contender, and

neither camp voiced strong objections to the other system.

–
Gökhan

_Matthew_Brett · March 3, 2010, 5:03am

Hi,

Apart from being inflammatory, has anyone considered code.google.com (GC) as
a solution?

- speaking as someone with no right to offer an opinion - please,
no. Google blocks Cuba from google code completely, for no obvious
reason, and a) that seems to me quite wrong and outside the spirit of
free software and b) I work there fairly often and it's hard for me to
persuade the excellent scientists there to use Python if they are
being specifically blocked for political reasons.

See you,

Matthew

Gokhan_SEVER · March 3, 2010, 5:39am

I didn’t really know that Google was embargoing countries on their code hosting site. I was more inspired after watching this talk Google I/O 2008 - Project Hosting on Google Code

It is very interesting for a company that does great things for the OSS also blocking code access on certain countries. Thanks for pointing this out. Indeed an important point consider.

This is not the first time today my Google integration idea has been rejected. During our school’s tech forum I asked them the possibilities of integrating Google Apps to the university network. The lower cost was a reasonable answer, but it is beyond my logic to understand that possible plans to integrate something that is not even up (live.edu)

···

On Tue, Mar 2, 2010 at 11:03 PM, Matthew Brett <matthew.brett@…149…> wrote:

Hi,

Apart from being inflammatory, has anyone considered code.google.com (GC) as

a solution?

- speaking as someone with no right to offer an opinion - please,

no. Google blocks Cuba from google code completely, for no obvious

reason, and a) that seems to me quite wrong and outside the spirit of

free software and b) I work there fairly often and it’s hard for me to

persuade the excellent scientists there to use Python if they are

being specifically blocked for political reasons.

See you,

Matthew

–
Gökhan

Eric_Firing2 · March 3, 2010, 5:41am

Andrew Straw wrote:
[...]

This is a good point. My preferred option is that we jettison all the
stuff that is not going to be shipped with MPL 1.0 from the git repo.
(More correctly - we build a git repo without that stuff ever going in.)
We can keep the old svn tree around and migrate the other projects to
git as desired. I think this is what's present in
http://github.com/astraw/matplotlib . Or am I missing something?

No, that is what you have, and I agree that this strategy makes sense. I just wanted to make sure everyone understood, and make the plan explicit.

Eri

Eric_Firing2 · March 3, 2010, 5:55am

Eric Firing wrote:

All,

I think the git migration deserves its own thread on the devel list, so here is a start.

Explanation: the last bit of discussion was actually off-list, but because it was tacked onto a matplotlib-users list thread, and appeared there in my mailer, I failed to notice that matplotlib-users was not in the address list. So I jumped to the conclusion that it was already on a list, but was merely misplaced and should be shifted to matplotlib-devel. I apologize for the error. To minimize the potential unproductive thrashing, I request that everyone restrain their urges to comment on the choice of git and github, to suggest alternatives, to raise objections, etc.

Eric

_william_ratcliff · March 3, 2010, 6:17am

I think there’s a legal reason for the embargo–sourceforge apparently also has such a policy:

http://sourceforge.net/blog/clarifying-sourceforgenets-denial-of-site-access-for-certain-persons-in-accordance-with-us-law/

So, as a US company, they may not have a choice…

···

On Wed, Mar 3, 2010 at 12:39 AM, Gökhan Sever <gokhansever@…149…> wrote:

On Tue, Mar 2, 2010 at 11:03 PM, Matthew Brett <matthew.brett@…149…> wrote:

Hi,

Apart from being inflammatory, has anyone considered code.google.com (GC) as

a solution?

- speaking as someone with no right to offer an opinion - please,

no. Google blocks Cuba from google code completely, for no obvious

reason, and a) that seems to me quite wrong and outside the spirit of

free software and b) I work there fairly often and it’s hard for me to

persuade the excellent scientists there to use Python if they are

being specifically blocked for political reasons.

See you,

Matthew

I didn’t really know that Google was embargoing countries on their code hosting site. I was more inspired after watching this talk Google I/O 2008 - Project Hosting on Google Code

It is very interesting for a company that does great things for the OSS also blocking code access on certain countries. Thanks for pointing this out. Indeed an important point consider.

This is not the first time today my Google integration idea has been rejected. During our school’s tech forum I asked them the possibilities of integrating Google Apps to the university network. The lower cost was a reasonable answer, but it is beyond my logic to understand that possible plans to integrate something that is not even up (live.edu)

–
Gökhan

Download Intel® Parallel Studio Eval

Try the new software tools for yourself. Speed compiling, find bugs

proactively, and fine-tune applications for parallel performance.

See why Intel Parallel Studio got high marks during beta.

http://p.sf.net/sfu/intel-sw-dev

Matplotlib-devel mailing list

Matplotlib-devel@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

_Matthew_Brett · March 3, 2010, 7:45am

Hi,

···

On Tue, Mar 2, 2010 at 10:17 PM, william ratcliff <william.ratcliff@...149...> wrote:

I think there's a legal reason for the embargo--sourceforge apparently also
has such a policy:
Clarifying SourceForge.net's denial of site access for certain persons in accordance with US law - SourceForge Community Blog
So, as a US company, they may not have a choice...

In my experience Google is the worst in this respect by a considerable
margin, and has become more so in the last year.

See you,

Matthew

_John_Hunter · March 3, 2010, 2:29pm

Andrew Straw wrote:
[...]

This is a good point. My preferred option is that we jettison all the
stuff that is not going to be shipped with MPL 1.0 from the git repo.
(More correctly - we build a git repo without that stuff ever going in.)
We can keep the old svn tree around and migrate the other projects to
git as desired. I think this is what's present in
http://github.com/astraw/matplotlib . Or am I missing something?

No, that is what you have, and I agree that this strategy makes sense.
I just wanted to make sure everyone understood, and make the plan explicit.

It looks like Andrew has trunk/matplotlib. There is other stuff in
trunk that definitely should not be migrated, and some stuff that
needs consideration.

* trunk/py4science, which is a project Fernando and I have been
working on for several years but is not specific to mpl (it only uses
mpl). We will eventually migrate this into it's own repo, but this
is not an mpl project and should not be migrated.

* trunk/course - looks like a very old and no longer used py4science
dir. Should probably be simply deleted and not frozen

*trunk/htdocs - the old mpl site docs. Should live somewhere for
archival purposes in case there is a useful code snippet in there, but
certainly does not need to be in git or the new repo. It could live
frozen in the sf repo.

* trunk/sampledata - this is important. The mpl trunk examples use
this to pull example data. We will need to migrate this -- we could
leave it in sf svn, but it might be preferable to have one version
control system. Whatever we do here, we will need to update
matplotlib.cbook.get_sample_data to work with the new system.
Definitely an argument for getting all this migration sorted out
before a trunk release.

* trunk/sampledoc_tut - this is the source code for the
sampledoc tutorial — sampledoc 1.0 documentation tutorial which shows how to build
mpl like sites using sphinx and associated extensions. Related to mpl
in that it uses the plot directive etc, but is by no means integral.
I can eventually port this to a new repo if there is any reason to.

* trunk/scipy06 should probably be deleted

* trunk/toolkits - should probably be migrated (Andrew you have not
migrated this right?). One nice thing about having the toolkits in
the same svn repo as the main codebase was for revision tagging, so
basemap svn commits are synched with a trunk/matplotlib state. How
should we proceed with the toolkits repo? Jeff?

* trunk/users_guide - the old latex source for the mpl user's guide.
Deprecated but should not be deleted. Same treatment as trunk/htdocs
above.

If we end up migratinga the toolkits to git/github (pending Jeff's
comments) we may want to branch the stuff in trunk we want to keep for
archival purposes (htdocs, users_guide) and clean as much stuff out of
trunk as possible to avoid confusion for people browsing the trunk
(and put a README in there explaining what and where stuff is).

I think the plan is to keep trunk/matplotlib as a tracking repo, so
that commits to the git master are pushed to the svn repo, so casual
users who are running from svn HEAD will not be affected by the
migration. Is this your understanding, Andrew?

Does it makes sense to retain the entire history in the new github repo,
or would it be just as well to start from a later point so as to reduce
the size? The entire history could still be available in a separate
read-only repo, or fossilized in svn on sourceforge, or in my hg mirror.
(Andrew's repo, at just under 200MB, is not prohibitively large by any
means, but it is a bit hefty.)

I can see advantages either way, but I'm in favor keeping it. Tons of
MPL is undercommented, and seeing the history is extremely useful when
spelunking.

I am strongly in favor of keeping the entire commit history of
trunk/matplotlib. While the repo is large now, most of the size comes
from data and regression test images, and the early history is largely
code so will not add much incremental size. I suppose one of the
downsides of git is since you have to get the *entire* history on one
checkout, you end up with a bunch of stuff you are unlikely to ever
need, like data that was once in the repo but has now been removed (eg
the stuff we migrated to sampledata). Not sure if there is an easy
solution here.

JDH

···

On Tue, Mar 2, 2010 at 11:41 PM, Eric Firing <efiring@...229...> wrote:

_william_ratcliff · March 3, 2010, 2:29pm

I don’t want to get into a flame war over this, but if Sourceforge was pressured into this and is having complaints and google has the same problem, how does Github get around it? Are they incorporated in the US or outside? If this is likely to become a problem, is there another service that can be used with git besides github that would not eventually be subject to such constraints? Sorry, I’m just ignorant about such matters.

William

···

On Wed, Mar 3, 2010 at 2:45 AM, Matthew Brett <matthew.brett@…149…> wrote:

Hi,

On Tue, Mar 2, 2010 at 10:17 PM, william ratcliff > > <william.ratcliff@…714…> wrote:

I think there’s a legal reason for the embargo–sourceforge apparently also

has such a policy:

http://sourceforge.net/blog/clarifying-sourceforgenets-denial-of-site-access-for-certain-persons-in-accordance-with-us-law/

So, as a US company, they may not have a choice…

In my experience Google is the worst in this respect by a considerable

margin, and has become more so in the last year.

See you,

Matthew

_John_Hunter · March 3, 2010, 2:55pm

github has it's offices in the US and so they may change their policy
on this in the future if they feel the heat from the long arm of the
US law. Currently they do not appear to enforce export restrictions.
Here is a helpful summary of different open source hosting facilities
and their features and policies:

http://en.wikipedia.org/wiki/Comparison_of_open_source_software_hosting_facilities

On Jan 25th, 2010, SF implemented a ban enforcing US export restrictions.

Clarifying SourceForge.net's denial of site access for certain persons in accordance with US law - SourceForge Community Blog

But on Feb 7th, 2010, they lifted the blanket ban and now project
admins can impose the restriction if they are distributing restricted
technologies, which seems like a good compromise.

Clarifying SourceForge.net's denial of site access for certain persons in accordance with US law - SourceForge Community Blog

Looks like the wikipedia site I linked above is out of date w/ respect
to sourceforge.

As far as I know, mpl is not distributing any restricted technologies
-- we do make extensive use of message digest functions like md5 for
caching, but these do not appear to be covered (eg, see
MD5: Command Line Message Digest Utility). So it would be preferable to be on a
host that does not implement blanket restrictions. github does not
currently, and if they change their policy going forward we may elect
to move. Given that sourceforge has found a way to distribute
compliant code to restricted countries, and github currently does not
impose restrictions, I'm cautiously optimistic that a subsequent move
will not be necessary.

JDH

···

On Wed, Mar 3, 2010 at 8:29 AM, william ratcliff <william.ratcliff@...149...> wrote:

I don't want to get into a flame war over this, but if Sourceforge was
pressured into this and is having complaints and google has the same
problem, how does Github get around it? Are they incorporated in the US or
outside? If this is likely to become a problem, is there another service
that can be used with git besides github that would not eventually be
subject to such constraints? Sorry, I'm just ignorant about such matters.

Jeff_Whitaker1 · March 3, 2010, 3:29pm

John Hunter wrote:

Andrew Straw wrote:
[...]


This is a good point. My preferred option is that we jettison all the
stuff that is not going to be shipped with MPL 1.0 from the git repo.
(More correctly - we build a git repo without that stuff ever going in.)
We can keep the old svn tree around and migrate the other projects to
git as desired. I think this is what's present in
http://github.com/astraw/matplotlib . Or am I missing something?

No, that is what you have, and I agree that this strategy makes sense.
I just wanted to make sure everyone understood, and make the plan explicit.

It looks like Andrew has trunk/matplotlib. There is other stuff in
trunk that definitely should not be migrated, and some stuff that
needs consideration.

  * trunk/py4science, which is a project Fernando and I have been
working on for several years but is not specific to mpl (it only uses
mpl). We will eventually migrate this into it's own repo, but this
is not an mpl project and should not be migrated.

  * trunk/course - looks like a very old and no longer used py4science
dir. Should probably be simply deleted and not frozen

  *trunk/htdocs - the old mpl site docs. Should live somewhere for
archival purposes in case there is a useful code snippet in there, but
certainly does not need to be in git or the new repo. It could live
frozen in the sf repo.

* trunk/sampledata - this is important. The mpl trunk examples use
this to pull example data. We will need to migrate this -- we could
leave it in sf svn, but it might be preferable to have one version
control system. Whatever we do here, we will need to update
matplotlib.cbook.get_sample_data to work with the new system.
Definitely an argument for getting all this migration sorted out
before a trunk release.

  * trunk/sampledoc_tut - this is the source code for the
sampledoc tutorial — sampledoc 1.0 documentation tutorial which shows how to build
mpl like sites using sphinx and associated extensions. Related to mpl
in that it uses the plot directive etc, but is by no means integral.
I can eventually port this to a new repo if there is any reason to.

  * trunk/scipy06 should probably be deleted

  * trunk/toolkits - should probably be migrated (Andrew you have not
migrated this right?). One nice thing about having the toolkits in
the same svn repo as the main codebase was for revision tagging, so
basemap svn commits are synched with a trunk/matplotlib state. How
should we proceed with the toolkits repo? Jeff?

John, Eric, Andrew: I am OK with this. Don't know much about DVCS systems, but I guess this will be my excuse to learn.

-Jeff

···

On Tue, Mar 2, 2010 at 11:41 PM, Eric Firing <efiring@...229...> wrote:
  * trunk/users_guide - the old latex source for the mpl user's guide.
Deprecated but should not be deleted. Same treatment as trunk/htdocs
above.

If we end up migratinga the toolkits to git/github (pending Jeff's
comments) we may want to branch the stuff in trunk we want to keep for
archival purposes (htdocs, users_guide) and clean as much stuff out of
trunk as possible to avoid confusion for people browsing the trunk
(and put a README in there explaining what and where stuff is).

I think the plan is to keep trunk/matplotlib as a tracking repo, so
that commits to the git master are pushed to the svn repo, so casual
users who are running from svn HEAD will not be affected by the
migration. Is this your understanding, Andrew?

Does it makes sense to retain the entire history in the new github repo,
or would it be just as well to start from a later point so as to reduce
the size? The entire history could still be available in a separate
read-only repo, or fossilized in svn on sourceforge, or in my hg mirror.
  (Andrew's repo, at just under 200MB, is not prohibitively large by any
means, but it is a bit hefty.)

I can see advantages either way, but I'm in favor keeping it. Tons of
MPL is undercommented, and seeing the history is extremely useful when
spelunking.

I am strongly in favor of keeping the entire commit history of
trunk/matplotlib. While the repo is large now, most of the size comes
from data and regression test images, and the early history is largely
code so will not add much incremental size. I suppose one of the
downsides of git is since you have to get the *entire* history on one
checkout, you end up with a bunch of stuff you are unlikely to ever
need, like data that was once in the repo but has now been removed (eg
the stuff we migrated to sampledata). Not sure if there is an easy
solution here.

JDH

------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
matplotlib-devel List Signup and Options

--
Jeffrey S. Whitaker Phone : (303)497-6313
Meteorologist FAX : (303)497-6449
NOAA/OAR/PSD R/PSD1 Email : Jeffrey.S.Whitaker@...236...
325 Broadway Office : Skaggs Research Cntr 1D-113
Boulder, CO, USA 80303-3328 Web : Jeffrey S. Whitaker: NOAA Physical Sciences Laboratory

Jonathan_Taylor3 · March 3, 2010, 7:31pm

I am strongly in favor of keeping the entire commit history of
trunk/matplotlib. While the repo is large now, most of the size comes
from data and regression test images, and the early history is largely
code so will not add much incremental size. I suppose one of the
downsides of git is since you have to get the *entire* history on one
checkout, you end up with a bunch of stuff you are unlikely to ever
need, like data that was once in the repo but has now been removed (eg
the stuff we migrated to sampledata). Not sure if there is an easy
solution here.

I think you should be able to use git clone --depth=x to get a shallow
copy of the repository. The limitation is that you cannot push from
or pull from your new repository. You can pull to it and create
patches though, which is enough for most people I think.

Best,
Jon.

_John_Hunter · March 3, 2010, 8:11pm

Tried a few options from Andrew's repo:

jdhunter@...687...:~> du -hs mpl.git*
191M mpl.git # no --depth
191M mpl.git0 # --depth=0
147M mpl.git1 # --depth=1
147M mpl.git10 # --depth=10

This compares with 87M for a clean svn checkout. So it doesn't look
like a huge deal to get the whole thing compared to svn, and it looks
like the --depth save very little currently. Didn't notice too much
in terms of checkout time either...

Thanks for the suggestion though!
JDH

···

On Wed, Mar 3, 2010 at 1:31 PM, Jonathan Taylor <jtaylor@...756...> wrote:

I am strongly in favor of keeping the entire commit history of
trunk/matplotlib. While the repo is large now, most of the size comes
from data and regression test images, and the early history is largely
code so will not add much incremental size. I suppose one of the
downsides of git is since you have to get the *entire* history on one
checkout, you end up with a bunch of stuff you are unlikely to ever
need, like data that was once in the repo but has now been removed (eg
the stuff we migrated to sampledata). Not sure if there is an easy
solution here.

I think you should be able to use git clone --depth=x to get a shallow
copy of the repository. The limitation is that you cannot push from
or pull from your new repository. You can pull to it and create
patches though, which is enough for most people I think.