svn ancient history broken

I was trying to spot check the git repo by checking out the first
commit that we have a history for in the log

git checkout 48111d043ec52f9afb511ac447438877b236e7f3

and notice that the main code directory 'matplotlib' was missing. I
then tried to compare with a svn checkout of the same revision

svn co -r7 https://matplotlib.svn.sourceforge.net/svnroot/matplotlib/trunk/matplotlib mpl7

and it had the same problem. I went forward several commits, and the
log messages clearly indicate that many of the commits apply to
matplotlib proper, but the code is missing.

The first good svn version is apparently 541; the prior commit 540 had
the log message "reorganizes py code".
This was when we moved "matplotlib" to "lib/matplotlib" which I
thought svn would handle gracefully. Any gurus have any idea if that
early history is hidden somewhere in the bowels of svn?

JDH

That was probably back when matplotlib was still using CVS, right?
Does the CVS repository still exist?

···

On Fri, Jan 28, 2011 at 3:00 PM, John Hunter <jdh2358@...149...> wrote:

I was trying to spot check the git repo by checking out the first
commit that we have a history for in the log

git checkout 48111d043ec52f9afb511ac447438877b236e7f3

and notice that the main code directory 'matplotlib' was missing. I
then tried to compare with a svn checkout of the same revision

svn co -r7 https://matplotlib.svn.sourceforge.net/svnroot/matplotlib/trunk/matplotlib mpl7

and it had the same problem. I went forward several commits, and the
log messages clearly indicate that many of the commits apply to
matplotlib proper, but the code is missing.

The first good svn version is apparently 541; the prior commit 540 had
the log message "reorganizes py code".
This was when we moved "matplotlib" to "lib/matplotlib" which I
thought svn would handle gracefully. Any gurus have any idea if that
early history is hidden somewhere in the bowels of svn?

I found a mailing list thread from Feb 2006 debating the switch from
CVS to SVN, so yes, apparently I did the re-org while we were still on
cvs which is why the history was lost. It may not be worth it, but I
wonder if the matplotlib history from before the move to lib/ could be
stitched back in.

···

On Fri, Jan 28, 2011 at 2:09 PM, Darren Dale <dsdale24@...149...> wrote:

The first good svn version is apparently 541; the prior commit 540 had
the log message "reorganizes py code".
This was when we moved "matplotlib" to "lib/matplotlib" which I
thought svn would handle gracefully. Any gurus have any idea if that
early history is hidden somewhere in the bowels of svn?

That was probably back when matplotlib was still using CVS, right?
Does the CVS repository still exist?

Stitched in from where? The jdhunter branch appears to only contain
one commit, so it only contains the contents of matplotlib/ for rev4.

···

On Fri, Jan 28, 2011 at 4:14 PM, John Hunter <jdh2358@...149...> wrote:

On Fri, Jan 28, 2011 at 2:09 PM, Darren Dale <dsdale24@...149...> wrote:

The first good svn version is apparently 541; the prior commit 540 had
the log message "reorganizes py code".
This was when we moved "matplotlib" to "lib/matplotlib" which I
thought svn would handle gracefully. Any gurus have any idea if that
early history is hidden somewhere in the bowels of svn?

That was probably back when matplotlib was still using CVS, right?
Does the CVS repository still exist?

I found a mailing list thread from Feb 2006 debating the switch from
CVS to SVN, so yes, apparently I did the re-org while we were still on
cvs which is why the history was lost. It may not be worth it, but I
wonder if the matplotlib history from before the move to lib/ could be
stitched back in.

It's not a completely fleshed out thought, but if we got the cvs repo
before the directory move, did cvs to svn on that repo, and then
converted that to git, we might be able to stitch the two git
histories together, one from before the move and one after.

JDH

···

On Fri, Jan 28, 2011 at 4:41 PM, Darren Dale <dsdale24@...149...> wrote:

Stitched in from where? The jdhunter branch appears to only contain
one commit, so it only contains the contents of matplotlib/ for rev4.

That might be possible. Do you have access to the cvs repo?

···

On Fri, Jan 28, 2011 at 6:56 PM, John Hunter <jdh2358@...149...> wrote:

On Fri, Jan 28, 2011 at 4:41 PM, Darren Dale <dsdale24@...149...> wrote:

Stitched in from where? The jdhunter branch appears to only contain
one commit, so it only contains the contents of matplotlib/ for rev4.

It's not a completely fleshed out thought, but if we got the cvs repo
before the directory move, did cvs to svn on that repo, and then
converted that to git, we might be able to stitch the two git
histories together, one from before the move and one after.

It's not a completely fleshed out thought, but if we got the cvs repo
before the directory move, did cvs to svn on that repo, and then
converted that to git, we might be able to stitch the two git
histories together, one from before the move and one after.

That might be possible. Do you have access to the cvs repo?

Apparently not

cvs -z3 -d:pserver:anonymous@...158...:/cvsroot/matplotlib co -P matplotlib

cvs [checkout aborted]: connect to
cvs.sourceforge.net(216.34.181.96):2401 failed: Connection refused

Amazing how fragile digital data is! Well, there isn't much real use
for history that old, except it's sometimes fun to see how small mpl
used to be :slight_smile: While I was poking around in git though, it was
certainly nice how fast you could switch the current directory to
different revisions.

JDH

···

On Fri, Jan 28, 2011 at 5:58 PM, Darren Dale <dsdale24@...149...> wrote:

SF may simply have turned off CVS for now: http://sourceforge.net/blog/sourceforge-net-attack/

···

On 29-Jan-11 01:08, John Hunter wrote:

cvs -z3 -d:pserver:anonymous@...158...:/cvsroot/matplotlib co -P matplotlib

cvs [checkout aborted]: connect to
cvs.sourceforge.net(216.34.181.96):2401 failed: Connection refused

Amazing how fragile digital data is!

Thanks Andrew.

As much as I would like to push the git repos to github today, I think
it is worth waiting. When SF CVS comes back up, I can attempt to
convert the CVS repository to SVN, verify that the data has been
preserved, and convert r1:540 to git. Then I can convert the master
svn repo starting at r541, and graft the result onto the older
history. When the resulting repo is postprocessed to clean it up and
reduce the size, the graft would be made permanent (is actually
incorporated into the history, as opposed to being a reference in
.git/info/grafts).

Darren

···

On Sat, Jan 29, 2011 at 3:35 AM, Andrew Straw <strawman@...36...> wrote:

On 29-Jan-11 01:08, John Hunter wrote:

cvs -z3 -d:pserver:anonymous@...158...:/cvsroot/matplotlib co
-P matplotlib

cvs [checkout aborted]: connect to
cvs.sourceforge.net(216.34.181.96):2401 failed: Connection refused

Amazing how fragile digital data is!

SF may simply have turned off CVS for now:
http://sourceforge.net/blog/sourceforge-net-attack/

Sourceforge just enabled enough access to get a copy of the cvs
repository by doing:

rsync -av matplotlib.cvs.sourceforge.net::cvsroot/matplotlib/* .

So, if I have the cvs repo in a local directory called "mpl.cvs", then I can do:

cvs2svn --encoding=utf_8 -s mpl.svn mpl.cvs

Unfortunately, I am getting exactly the same results: the matplotlib/
directory is missing in the earliest history. I've tried adding
--use-cvs and --keep-trivial-imports, to no avail. I've tried checking
out a working copy of the cvs repo (setting CVSROOT to point to the
directory I created using rsync), and I *thought* the right way to
inspect the r7 working directory is to do "cvs update -R -r 7", but
thats not right. So I'm currently having trouble determining whether
the history even exists in CVS. Anybody have a longer memory than I
do? How can I get cvs to perform this basic operation?

Darren

···

On Sat, Jan 29, 2011 at 9:00 AM, Darren Dale <dsdale24@...149...> wrote:

On Sat, Jan 29, 2011 at 3:35 AM, Andrew Straw <strawman@...36...> wrote:

On 29-Jan-11 01:08, John Hunter wrote:

cvs -z3 -d:pserver:anonymous@...158...:/cvsroot/matplotlib co
-P matplotlib

cvs [checkout aborted]: connect to
cvs.sourceforge.net(216.34.181.96):2401 failed: Connection refused

Amazing how fragile digital data is!

SF may simply have turned off CVS for now:
http://sourceforge.net/blog/sourceforge-net-attack/

Thanks Andrew.

As much as I would like to push the git repos to github today, I think
it is worth waiting. When SF CVS comes back up, I can attempt to
convert the CVS repository to SVN, verify that the data has been
preserved, and convert r1:540 to git. Then I can convert the master
svn repo starting at r541, and graft the result onto the older
history. When the resulting repo is postprocessed to clean it up and
reduce the size, the graft would be made permanent (is actually
incorporated into the history, as opposed to being a reference in
.git/info/grafts).

[clip]

Unfortunately, I am getting exactly the same results: the matplotlib/
directory is missing in the earliest history. I've tried adding
--use-cvs and --keep-trivial-imports, to no avail. I've tried checking
out a working copy of the cvs repo (setting CVSROOT to point to the
directory I created using rsync), and I *thought* the right way to
inspect the r7 working directory is to do "cvs update -R -r 7", but
thats not right. So I'm currently having trouble determining whether the
history even exists in CVS. Anybody have a longer memory than I do? How
can I get cvs to perform this basic operation?

Maybe you can try skipping SVN altogether (needs "git-cvs" package on
Ubuntu):

export CVSROOT=/rsynced/directory
test -d "$CVSROOT/CVSROOT" || echo "Wrong cvsroot..."
mkdir imported
cd imported
git cvsimport matplotlib

This at least shows some files in the first revisions. You can probably
then just graft the two histories together at a suitable point.

Apparently, it also needs some use of "git filter-branch" to get rid of
the top-level matplotlib/ directory.

···

On Thu, 10 Feb 2011 17:34:32 -0500, Darren Dale wrote:

--
Pauli Virtanen

On further inspection, the direct cvs to git conversion *also* yields
a repository lacking the matplotlib package directory. It looks like
the history leading up to revision 540 may have been lost from the CVS
repository itself, not during the cvs2svn conversion.

John, do you want some time to continue looking into the cvs repo
yourself? Or should we go ahead with the git migration? If the latter,
should we start the git repo at revision 540, or include all available
history, even though some of it is missing the matplotlib package
directory? If we want to go ahead with the git migration, I can
probably work on it this weekend.

Darren

···

On Thu, Feb 10, 2011 at 5:54 PM, Pauli Virtanen <pav@...278...> wrote:

On Thu, 10 Feb 2011 17:34:32 -0500, Darren Dale wrote:
[clip]

Unfortunately, I am getting exactly the same results: the matplotlib/
directory is missing in the earliest history. I've tried adding
--use-cvs and --keep-trivial-imports, to no avail. I've tried checking
out a working copy of the cvs repo (setting CVSROOT to point to the
directory I created using rsync), and I *thought* the right way to
inspect the r7 working directory is to do "cvs update -R -r 7", but
thats not right. So I'm currently having trouble determining whether the
history even exists in CVS. Anybody have a longer memory than I do? How
can I get cvs to perform this basic operation?

Maybe you can try skipping SVN altogether (needs "git-cvs" package on
Ubuntu):

export CVSROOT=/rsynced/directory
test -d "$CVSROOT/CVSROOT" || echo "Wrong cvsroot..."
mkdir imported
cd imported
git cvsimport matplotlib

This at least shows some files in the first revisions. You can probably
then just graft the two histories together at a suitable point.

Apparently, it also needs some use of "git filter-branch" to get rid of
the top-level matplotlib/ directory.