How to convert a Subversion repository to a Git repository

I needed to convert several Subversion repositories to Git. Most of my process for this comes from this post. The following are some condensed steps and additional notes.

  1. Work in a new no-metadata folder; this helps keep your working environment isolated, and will let you re-use the users.txt file if you need to do this to multiple repositories. Create the users.txt file and ensure all authors in the repo are listed. To get started, do something like this, where SVN_URL is the URL of the repo:

    svn log -q SVN_URL | awk -F " \\\\| " '/^r/ {print $2" = "$2" <>"}' | sort -u > users.txt
    

    Note that this will preserve spaces in user names, if present (some other snippets you might find are broken and will not). Now edit users.txt and fill in details. You might need to add a (no author) = No Author <no@author.invalid> line, too, if there are any commits which have no author, such as the empty ones created by some techniques that manipulate dump files to purge files.

  2. Use git-svn like this (replace SVN_URL with the correct URL) to clone the repo without metadata; add --stdlayout if using standard branch folders, or as many -T, -t [...]

Read this post

How to add branches to a non-standard Subversion repository in a way that is compatible with git-svn

I found myself in the unusual situation of having a Subversion repository which was created initially without the usual trunk, branches, and tags folders, but where the team found later that, you know, this newfangled “branches” idea might just come in useful after all. If that's all there was to it, I could have just added those folders in a regular commit and continued on. The tricky part was that we also wanted git-svn compatibility, and for it to recognize branches, it needs the folder structure to be in place from the first commit onward.

Unfortunately, there ain't no easy way to achieve this which also preserves the revision numbers. But it is possible. What I had to do was rewrite the first commit manually:

  1. Checkout the original repo at revision 1.
  2. Copy all the files to a new repo, but this time place it under a trunk top-level folder, ensuring everything is otherwise identical inside.
  3. Also add tags and branches top-level folders.
  4. Commit all those using the same commit message as the original.

Now, this commit still doesn't have the same metadata, so I had to then take an svndump of the new repo, and edit that dump so [...]

Read this post

Preserving merge trees when using git-svn

If you're like me, you might be using git-svn to connect to centralized SVN repositories (e.g., at the workplace) while keeping many of the powerful features of git. Sometimes you might be working in a local feature branch that is linked to a remote branch, and do a git svn rebase to grab updates from the SVN server. This command only fetches from the corresponding server branch. But, if an SVN merge, say from trunk, had happened on that branch at the server, then git-svn needs the corresponding local master branch to be up-to-date to notice that it was the parent of that merge. If you hadn't fetched from master recently, you lose the merge tree. Oops.

Before the rebase it would've been better to use git svn fetch --all (or it ought to have been, assuming git-svn fetches revisions in order once for all branches). But you might forget to do that. If that's where you find yourself, it is possible to recover. In the feature branch, use git svn reset -r{REV} -p, replacing {REV} with the revision number of the SVN merge which git-svn failed to assign a parent. Then do git svn fetch --all before [...]

Read this post

How to make private git repositories open for public access

Some of my personal projects are tracked using private git repositories, hosted on this server. I can access these via ssh, but for a while I've had in mind to make at least a couple of them publicly accessible... somehow. After finally getting around to looking into it, this turns out to be deceptively simple using git-daemon (instructions for Debian distros):

  1. Install the git-daemon-sysvinit package.
  2. Enable the daemon by editing /etc/default/git-daemon. Reboot, or start the git-daemon service by hand.
  3. Add a symlink to each git repo you want to make public under /var/lib/git. These will then be accessible via git://<hostname>/git/<linkname>.
  4. Ensure the git protocol port (9418) forwards to the server.

Simple! The git protocol is faster than serving over http(s), and the standard configuration ensures that anonymous clients can pull, but not push, which is exactly what I was after.

Over the next little while I'll introduce the couple of projects that I'm opening up for public access.

Read this post

How to find, and obliterate, large files in the history of a subversion repository

Sometimes, as I have, you'll find yourself working with colleagues who, through no fault of their own, are either not acquainted with the etiquette of Subversion repository use, or simply have an accident. What you may then end up with is a repository that contains one or more giant blobs of useless data that, really, should never have been added in the first place. Whether or not the culprit well-intentionedly removes these giant blobs in subsequent revisions, you're still left with a huge chunk of nothing-much wasting space on your server's hard drive.

Though a long-standing item on Subversion's wishlist, there is no command that will simply obliterate files from the repository's history. Nevertheless, there is a way to achieve this. Here's how.

The first step of the process is to determine which files need to go. (Some snippets in the following are derived from StackOverflow and Christosoft blog.) First, find the size of each revision in the repository. Replacing REPO appropriately, run this on your server:

REPO=[/absolute/path/to/repo]
for r in `svn log -q file://$REPO | grep ^r | cut -d ' ' -f 1 | tr -d r`
do
    echo "revision $r is " `svn diff -c $r file://$REPO [...]

Read this post


Page 1 of 1