Turning an imported repository into a nested product

thoughtpolice · June 26, 2016, 8:10pm

Hi,

I’ve been testing BitKeeper on several git mirrors using fast-import, and it’s working quite well! I’ve also been using this opportunity to use Nested repositories, and I have a question.

The LLVM project is really a collection of nested repositories, that require a specific organization to build. It looks like this (where each entry is actually a separate repository as of now):

./llvm                      # top-level LLVM repository
./llvm/projects/openmp      # OpenMP library
./llvm/projects/compiler-rt # Runtime library
./llvm/tools/clang          # Clang compiler
./llvm/tools/polly          # Polly optimizer
./llvm/tools/lldb           # Debugger
... etc ...

So this seems like a perfect opportunity for nested repositories. I used fast-import on some git mirrors (https://github.com/llvm-mirror, a mirror of their SVN, i.e. no octo-merges the importer fails on, etc) and have a collection of non-nested repositories now, but there’s a minor snag:

How do I turn the top level llvm repository into a product? The documentation says this needs to be done with bk setup -P to specifically mark the repository as a product, but the importer doesn’t allow you to do this on a repository it creates.

One trick I thought up was to use bk partition with a components file that was:

Empty (e.g. implicitly collected all paths into the product)
Only had the entry “.” (no quotes), i.e. the explicit collection of all paths into the product.
Only had the entry “/” (no quotes), i.e. the explicit collection of all paths into the product (part 2, electric boogaloo).

All three of these failed in a few various ways; I don’t have the errors on hand, but I’ll report back with them shortly.

Naturally, as an obvious workaround I can do this (assuming fresh imports are under /srv/repos/bk/llvm/*):

$ bk setup -P llvm.nested
$ cd llvm.nested
$ bk attach /srv/repos/bk/llvm/llvm && cd llvm/projects
$ bk attach /srv/repos/bk/llvm/compiler-rt && cd ..
$ bk attach /srv/repos/bk/llvm/openmp && cd ../tools
$ bk attach /srv/repos/bk/llvm/clang && cd ../
$ bk attach /srv/repos/bk/llvm/lldb && cd ../
$ ... etc etc ...

And this works great, although it means I have to cd into the top level repository under llvm first.

In the mean time: is there any way to work around this?

And: as a convenient feature request, it would be great if fast-import could mark a repository as a product. (If this is easy, I could try to write a patch myself.)

wscott · June 26, 2016, 9:11pm

Cool lots of fun things to mess with here. I will do more with this tomorrow morning.

The product marking isn't complicated, you can create a `BitKeeper/log/PRODUCT` file and then run check and I think this will fix things up.

And as a hack the top level git repository could skip the submodule update when they occur if you hack the code.

thoughtpolice · June 27, 2016, 2:43pm

Thanks, that worked great. I had to run:

$ touch BitKeeper/log/PORTAL
$ bk check && echo $?
0
$ bk portal .
$ cd tools && bk attach ...

to get it working, but it does seem to work fine now. And it clones fast! I can clone LLVM + 8 extra nested repositories in 40s on my machine (dual socket xeon), from a repo on a hosted, wimpy 2 core server across the Atlantic:

austin@server-09:~/tmp$ time bk clone bk://area51.cia.nsa.blacksite.gov.mil/llvm/llvm.nested llvm
Clone bk://area51.cia.nsa.blacksite.gov.mil/llvm/llvm.nested
   -> file:///home/austin/tmp/llvm
.                                  100% |==============================| OK
1/8 projects/compiler-rt           100% |==============================| OK
2/8 projects/openmp                100% |==============================| OK
3/8 tools/clang                    100% |==============================| OK
4/8 .../clang-tools-extra          100% |==============================| OK
5/8 tools/lld                      100% |==============================| OK
6/8 tools/lldb                     100% |==============================| OK
7/8 tools/polly                    100% |==============================| OK
8/8 .                              100% |==============================| OK

real    0m47.076s
user    0m7.288s
sys     0m9.036s

vs just LLVM from GitHub (only one repository!):

austin@server-09:~/tmp$ time git clone git://github.com/llvm-mirror/llvm.git llvm.git
Cloning into 'llvm.git'...
remote: Counting objects: 1197875, done.
remote: Compressing objects: 100% (159/159), done.
remote: Total 1197875 (delta 77), reused 0 (delta 0), pack-reused 1197716
Receiving objects: 100% (1197875/1197875), 515.85 MiB | 39.86 MiB/s, done.
Resolving deltas: 100% (975650/975650), done.
Checking connectivity... done.

real    0m41.226s
user    1m15.716s
sys     0m5.784s

Not exactly fair – my (no-load) server vs GitHub’s fleet – but considering I’m going across the pond and my server is a stupid VM, that’s still pretty good! (FWIW, in all my tests except one so far, bk has always been faster than git, generally in the 2x range, even on OpenBSD/NetBSD source tree imports).

However, I’m afraid after reading the BK Nested documentation, and the man pages, I’m not much more privvy to exactly what a “portal” and a “gate” are, why I would want them, who would manage them, etc. I mean, I know apparently that you have to be a portal to attach things, and that means you have a full copy so you can go back in time. But a “gate” can’t. But I’m still not clear on: what exactly else can I do in a portal, that I can’t in a gate? How would I reverse everything synchronously to have fully atomic history? It’s also unclear still exactly how pushing works - in a nested repo, can I push to individual components and not the parent, and then resynchronize the parent? What if I’m working in the parent and I make a commit across several components at once? I’m still figuring some stuff out, but as an aside, explaining these concepts more in depth would be very useful.

Basically, I want more examples I guess.

Really, it would be nice if there was just a full, singular BitKeeper manual that people could contribute to, that sort of addressed all these in a singular place, rather than the separate “Getting Started” tests with the single bk_demo repository, and the “BK/Nested” and “Merge” tutorials all being separate.

Anyway, better documentation is probably best left for another thread; perhaps I can submit that with some suggestions too…

wscott · June 27, 2016, 5:09pm

You are making me happy.

Agreed about the documentation. We are working on it.

The idea about portals is that we don’t have a resolver to deal with the possible corner cases when different users attach components in parallel. So we add the portal concept to try and serialize top level configuration changes of your repository. The idea is that you should only ever have one PORTAL for a given repository and that portal is the only place where attaching a component is allowed. So if you follow that restriction then the problem conflicts can’t occur.

GATE is a different concept. You can have multiple GATEs. A gate repository is considered an integration tree. Csets in a GATE are assumed to be final and will never be collapsed (think rebase) and will not be thrown away. When you start using partially populated nested repositories, the code that keeps track of the missing components wants to know that the components that are not currently populated live in a gate somewhere. This is a safety check to make sure you are not building on a baseline that can’t be reproduced.

wscott · June 27, 2016, 11:32pm

BTW, I edited your original post to change “parent” to “product”.
The parent is set with bk parent and that is the default URL used for operations like pull or changes. The top-level repository in a nested collection is called the “product” (looks like we need to add that to ‘bk help terms’). And so ‘bk setup -P’ is used to create a new “product repository”.