Key-based repository store

wscott · April 18, 2016, 3:01pm

Key-based repository store

Overview

The idea here is to create a pastebin-like repository for changesets.

This is an extension to the existing bkd protocol to allow the bkd to maintain a collection of repositories that are accessed by the cset tipkey. Any number of keys can be saved and internally the data will be saved as a collection of repositories or perhaps some baseline repositories plus saved bk patches. The bkd can be accessed by one of these keys and will redirect to whichever repository contains that data. Possibly masking out the data past that key.

Why?

In many cases, we just want the ability to host a collection of repositories in order to reproduce a given cset. For example, if we changed the RTI to automatically save the csets under operation rather than require the user deal with all the hassle of managing the repository. This would all the use to get a simple URL for any cset they want to submit to a publically facing RTI without requiring us to provide generic hosting to anyone.

Usage

The URL bk://host/MD5KEY will be expanded internally as if the user had included a -r argument and referred to the correct backing store.

Push operations do a keysync over all repositories and potentially create a new tip.

Ideally a rclone will do a be converted to a push and only if the server has never seen that repository rootkey before would we need to do a real clone.

Nested

Each component of a nested collection will be saved independently under the syncroot repository id. The product will be saved as if no components are populated.

Operations

Clone
We really want a clonemod when the user provides a baseline
Not sure what to do with the ‘parent’ link. Using the key link will never return anything for a pull and is kinda funny for populate.
Pull
Push
Rclone, ideally this will query to see if this repository id is already in present and switch to a push
Changes -LR
I think for nested ‘bk changes -R’ of a product will act like only the product is populated remotely
Changes URL
Populate
The populate code needs to understand it can just request a desired tipkey and it will be returned if it can be found.

wscott · March 15, 2018, 1:00pm

I kinda wish I was never discouraged from pursuing this idea because is exactly what we need right now. I think I still have a source tree with the skeleton of this idea.

Rereading the post above I see I didn’t do a good idea of explaining the idea.

Basically, you start a bkd running in a directory that has been marked in some fashion to indicate that it is one of these keystores. Then we have an operation on the client that basically says: Make sure the local repository is stored in that remote bkd. That operation is either a new bk command or we overload clone or push.

So then when someone makes a change in bk they want to share on this forum they would do something like this:

$ cd myrepo
$ bk push bk://public.bitkeeper.com
$ bk changes -r+ -d:MD5KEY:
5aa66be1MaS_1t5lQkNCflPexCwd2w

This would efficiently transfer just the unique parts of the local repository to the bkd running on our servers.

And in the post, they would say they would refer to this cset as bk://public.bitkeeper.com/5aa66be1MaS_1t5lQkNCflPexCwd2w

So then the maintainer would review this cset like this:

# clone cset using a local repo for the bulk of the data
$ bk clone -@bk-dev bk://public.bitkeeper.com/5aa66be1MaS_1t5lQkNCflPexCwd2w bk-proposed
$ cd bk-proposed
$ bk changes -Lvv bk://bkbits.net/bk/dev    # review the change

# if the changes look good
$ bk push RELEASE_URL

And the maintainer could make a fixup cset on top and push that to the public bkd to continue the discussion. Csets can go back and forth until the final version gets collapsed (aka rebase).

rsmith · March 25, 2018, 3:59pm

I like the idea of a changeset repository. You say you have some skeleton code?

bk clone -@bk://bkbits.net/bk/dev bk://public.bitkeeper.com/5aa66be1MaS_1t5lQkNCflPexCwd2w bk-proposed

I’m not clear on why have a remote baseline. Seems like remote baselines (gates?) that would be built in to the changeset repo, such that a bk repocheck on the changeset repo would verify that all stored csets can find baseline repos. On the other hand, I can see that a dev would want a local baseline repo, for all the clonemod goodness.

Possibly related, something I’ve wanted is to be able to store nested patches containing unified diffs as a $csetname.bkpatch.gz and have an easy way to reconstitute that like bk clone -@$repo $csetname.bkpatch.gz dev-$csetname. That way, my local wine cellar of changes wouldn’t be sitting around in full repos, and would have diffs in them so I can refresh what they were about without needing a repo. I like having the idea that all these patches could be housed in some container which could be checked they all can be rebuilt from some gate or collection of gates.

wscott · March 25, 2018, 5:46pm

Yeah. That was a typo. I fixed the original post.

Yes, we need the ability to save nested patches and apply them. That is a feature of bk we lost with nested and it is making working with outside contributors more difficult and would be another way to address this problem.

That is an interesting command line. I would have expected bk receive or bk takepatch, possibly with some option to discard any existing csets that were not included in the baseline for this patch.

rsmith · March 25, 2018, 6:14pm

well, bk receive or bk takepatch should work as well, but then I would need to manually clone a base repo. I was inspired by your clone working on a cset key, and thought how nice to have clone also work on a patch file, using either a list of known gates, and or a local -@repo. But that’s just syntactic sugar. Easy enough to build in a helper script, especially with:

Yeah, I would like that.