Simple, straight line git to bitkeeper importer written in Little

Get bk-7.2.1ce or pull the tree on bkbits and install it. Then you can run:

bk L `bk bin`/contrib/git2bk.l --help

to get help (or for those so inclined you can read the source, it’s about 1K lines).

To import a tree you do

bk L `bk bin`/contrib/git2bk.l [options] path/to/git-repo

and it’s sort of funky, it leaves the .bk directory right next to the .git directory so the first
thing I’d do is

bk clone path/to/git-repo path/to/bk-repo

The defaults are set up for large imports (where large is ~30K files, ~~500K changesets).
For small trees try --all to get more history.

This works for a lot of repos but there are complicated rename scenarios and other places where it can fail, it’s not our end all to be all importer, it’s just something so you can get some of your history into BK and play.

Enjoy and let us know how it goes on the IRC (or here).

We discovered a bug on some platforms with how bk L is invoked. If it fails with an error message about not being able to find libl.tcl, run it like this:

TCL_LIBRARY=`bk bin`/gui/lib/tcl8.6 bk L `bk bin`/contrib/git2bk.l --help

So here are a couple of observations:

  • submodules aren’t supported therefore I’ve tested on arcanist and jekyll repos with --all
  • full jekyll import went fine and everything looks OK
  • arcanist failed on a rename:
sccsmv: destination src/lint/linter/__tests__/SCCS/s.ArcanistPhpcsLinterTestCase.php exists
bk mv 'src/lint/linter/__tests__/ArcanistPHPCSLinterTestCase.php' 'src/lint/linter/__tests__/ArcanistPhpcsLinterTestCase.php' = 1
Caller: renames

I moved the discussion of this particular importer to its own topic.

@vhbit hit a rename problem caused by renaming the case of a filename on a case-insensitive file system. I assume this was running on a Mac. This is somewhat hard to fix.

Here are some other gottchas from this importer: (from an inside review)

  • The code totally ignores mode changes in git. So if someone sets the executable bit on a file after it was originally created we don’t notice.

  • the incremental import code doesn’t notice or care if the current bk tree is unrelated to the git repository. If we have a repository then the top cset must have a GIT marker and that rev must be in the git tree It doesn’t look like the code tests for this case and the user sees:

    fatal: bad object 44eb0e39a23e2cb669328c9eaa5279457f7d07bf

  • The code requires running in a repository in-place, but it also requires the repository be totally un-edited with no extra files. And it makes assumptions about a normal .git layout which isn’t always true. Better would be to take a git URL on the command line and do a fresh close as part of the import.

  • The rename handling code cannot handle circular renames or other oddities like:
    usr/doc/as/as -> usr/doc/assembler
    The new names.c code in dev can do this nicely. The list of required renames can be feed on stdin and then all the renames are executated at once. (or everything this reverted)

  • The commit code doesn’t use --ci

  • In general the code could be made faster by avoiding sfiles entirely. We already run ‘git diff-tree’ and then entire cset can be generated by that output:

    • git diff-tree -M FROM TO

      • remember all the operations
    • deletes + renames | bk names - (frompath|topath on stdin)
      (requires new names.c)

    • write c.files for all renames and deletes

    • create all rename deltas
      deletes + renames | bk -?_BK_MV_OK=1 ci -c -

    • do any mode changes

    • git checkout NEWREV

    • all files | bk commit -ci -

  • The current code runs sfiles multiple times and has to sccs_init all the files. It also does changes like renames one file at a time.

  • The validate code is a bit overkill and doesn’t attempt to be fast.

On submodules, wack the importer to have a --product that calls bk setup with -P so it is a product.
Then import each of the submodules and attach them.

That’s not perfect (nothing in this importer is perfect) in that you won’t get csets that span the submodules.
But you get something.

Yep, that was OS X.

But I also got a different rename problem on importing buck on 7c79582:

sccsmv: not an SCCS file: third-party/java/guava/guava-17.0-sources.jar
bk mv 'third-party/java/guava/guava-17.0-sources.jar' 'third-party/java/guava/guava-18.0-sources.jar' = 1
Caller: renames

which looks more like a false rename detection or maybe that file being binary is supposed to be handled by BAM?

Edit: just in case, here is a part of git revision which relates to those files.