Replicated BAM servers?

harlan · October 20, 2020, 1:57am

I’m getting ready to start using the BAM feature, and I have some questions.

First, the BAM server directory area is effectively “normal user filesystem” stuff, so I can use normal tools to back it up and recover it, right? If not, what is the recommended way to do backup and recovery on the BAM server directory area?

Next, is there a way to have the BAM server be distributed/replicated over multiple sites?

wscott · October 20, 2020, 2:20pm

Yes, the BAM files live in BitKepeer/BAM in your repository and every revision of every file is a different file under this tree.

The idea is that BAM data is not replicated. When you clone a repository with BAM data you only transfer the BAM files needed to checkout the tip revision. Then if you look at older data the files are fetch on-demand.

However, each repository can have a different BAM server. So if I clone a repo and override the BAM server to a geographically closer BAM server, then it will transfer all data used by that repository to my new BAM server. If someone else does the same clone they won’t need to transfer any data because it is already present locally.

harlan · October 21, 2020, 10:39am

Thanks, Wayne, and I almost understand.

So what maintains consistency between the various BAM servers?

If bk.site1 is my “master” repo and I have it set up to push updates to bk.site2 and bk.site3, what is the process by which the BAM files on bk.site1 would get pushed/updated on bk.site2 and bk.site3?

I’m hoping that I would be able to ‘bk clone bk://bk/SomeRepo’ and that operation would be done against the bk repo of the site I’m working at, and the result would be SomeRepo that contained the latest regular and BAM files for that repo.

wscott · October 21, 2020, 1:05pm

Every repository has a BAM server associated with it. It may be itself or usually it is another repository. Your BAM server always has all the data needed for all revisions in your local repository if those csets have ever gone somewhere else. Local changes are stored locally and only get sent to the BAM server when they get pushed.

If you pull in new csets from another repository that uses a different BAM server then the BAM data required by those new csets that are not already stored in the local BAM server will be sent along with the new csets and stored locally.

When csets go out from a repository, either from the client pushing or someone cloning or pulling from a bkd talking to this repository, any BAM data associated with the csets in transit are pushed to that repositories BAM server. And also are sent to the remote repository if it doesn’t share the same BAM server, but we check first so data that already exists is not sent again.

The idea is that if you use bk to move csets around then BAM data will get migrated as well. So you can have a master repository at two different sites that each have the BAM server set to “.”. Anyone who clones one of those masters will use that master as a parent and as a BAM server. But if you push csets between sites the data will migrate.

harlan · October 22, 2020, 6:54am

Thanks a bunch, Wayne! I think I have enough information to forge ahead without too much thrashing around now

martindorey · December 1, 2021, 12:12am

I don’t see that in bk help bam or bk help Howto-BAM. Like Harlan wrote from what sounds like the same position, it was just what I needed today, thanks, to feel that I could maybe start deploying the feature too… before concluding that I wanted to post Push after cp: Failed to locate BAM data for the following deltas rather than try to work around it.