Bkd and http/bk protocols

harlan · August 18, 2018, 7:42am

Where can I go to learn the differences between accessing a bkd instance
via the ‘bk’ v. the ‘http’ protocols?

I’ve got to implement accessing bkd instances thru haproxy gateways, and
I’d like to be able to have the haproxy stuff serve both http and bkd
ports. It’s not yet clear to me if that will just work, or if I have to
do anything extra for certain cases.

wscott · August 18, 2018, 11:35am

Hey @harlan

It works like this. The bkd program listens on a TCP port and response to requests from a bk process. If that request happens to look like an HTTP request starting with GET or POST then the bkd enables http-mode and wraps the response in a proper http 1.0 response. Otherwise, it uses the builtin “bk protocol”

From the bkd you use the URL (bk help url) to control how you talk to a bkd:

machine 1: bk bkd -d
machine 2: bk pull bk://machine1/repo

or

machine 3: bk bkd -p80 -d
machine 4: bk pull http://machine3/repo

but these also work:

machine 2: bk pull bk://machine3:80/repo      # assuming no proxies
machine 4: bk pull http://machine1:14690/repo

So the above is the answer to your question. If you talk to the bkd via a http:// url then the connection will be wrapped in HTTP and putting a proxy in the middle will work just fine. What follows is some thoughts about how it works and why HTTP (as it is currently implemented) is a bit slower than bk:// connections.

My preference would have been that bk just used HTTP for all network connections, but we never did that part and had to maintain the two different methods. Part of the reason that that bk sticks to a small subset of HTTP 1.0 which has a strict request/response per connection and because of the Context-len header the bkd needs to know the exact size of the response before sending.

So consider running a ‘bk pull’. The HTTP version of this makes 2 network connections:

pull part1 (in http-mode)
- a TCP connection is opened to the bkd
- the bk says it is going to pull and sends details about the local repo
- the bkd will error or responds with a partial list of csets keys in this repository including the tip
- now the TCP connection is dropped
pull part2
- a new TCP connection is opened to the bkd
- the bk client has looked at the list of landmark csets send from the remote side and now sends a list of all local cset keys that are not implied by one we know the remote already has
- the bkd look at this keylist and can now compute exactly which csets are missing remotely and can create a bk patch to send
- the bk patch is created and written to a temp file
- the response with the bk patch for the pull is sent to the bkd
- the connection is dropped.
if BAM is involved an optional 3rd round might be made here

The point is that HTTP is a little slower than a bk connection for two reasons. First, it makes multiple TCP connections, but more importantly, the requests and replies on the connections get buffered as they are computed and sent at the end so the size can be computed. For some large requests, this serializes some processing that would normally overlap.

If we had updated bk to use features from HTTP/1.1 (released 1999 ) like persistent connections and chucked responses then bk could have just used HTTP everywhere without any downsides.

harlan · August 18, 2018, 11:32pm

Awesome - thanks a bunch, Wayne!