A plan for packaging

wscott · June 28, 2016, 2:28pm

Ok. Fair question. Originally tomcrypt was added for the commercial licensing code which has mostly all been ripped out. So let’s see what it left:

MD5 hash
We use md5 hash’s for bk’s :MD5KEY: and as an index for BAM files
SHA1 hash
The fast-import code needs to compute sha1’s to talk with git
base64
We use tomcrypt’s base64_encode() and base64_decode() functions
prng code
To generate good random numbers we seed with /dev/random and then use tomcrypt’s prng code to generate a series random numbers
HMAC hash
Leftover from the commercial world we still have one case where a file uses an HMAC for validation, where we wanted to be in the loop when some error checks were surpressed.

There is still a ‘bk crypto’ command that can be used to do symmetric and asymmetric encryption, hashing, and HMACs but there isn’t really a need to keep that.

calhariz · June 29, 2016, 2:27pm

I have the first patch that prevents the compilation and linking of the bundled zlib. It don’t have the quality to be applied upstream, but can be used by others packagers.

04-remove-bundled-zlib.txt (1.1 KB)

calhariz · June 29, 2016, 6:37pm

I have a cumulative patch this time for libpcre.

05-disable-bundled-pcre.txt (4.9 KB)

thoughtpolice · June 29, 2016, 10:45pm

Yes, so that should be pretty easy to replace.

When you say ‘index’ BAM files, I’m guessing that you mean: BitKeeper hashes the contents of files synced into BAM, and uses that hash as an index to lookup, store/download BAM files later. Is that idea correct?

If it is, then would you say MD5 is actually speed sensitive, in this case? As in, you may need to hash large (multi-GB) files. If that’s so, it’s worth spending a bit of extra time working in optimized implementations. But that’s still preferable, I think.

(Alternatively if that’s true, you could choose a faster, stronger, and much better hash function, too, like BLAKE2, which could also double as a MAC if you wanted to keep your HMAC-based verification code below. However, if my guess is correct, I’m also guessing that’s a backwards incompatible change, so it’s off the table.)

That’s easy enough to replace and I imagine fast-import probably isn’t bottlenecked very much on this.

Ditto.

Do these numbers require high entropy and secure generation for e.g. key material or something? If so, I’ll spare you a whole bunch of hand wringing and future complaints from the peanut gallery: this is very easy to replicate on both platforms. On Windows, you want to use RtlGenRandom, while on Unix, you want to use /dev/urandom. This is the same methods libraries like libsodium, etc generally use, and people seem to be somewhat standardizing on it.

I bring up hand wringing and peanut galleries because like many, many security things people are very dramatic and stern about how to generate random values. But it’s generally widely accepted to be a good idea to use /dev/urandom and stick with it on almost every modern Unix. So, my suggestion is to just call read on an fd to generate the bytes you need, and get word-values out of that with some twiddling.

On top of that, for systems like OpenBSD, and Linux 3.17+, you can use direct syscalls (see man getrandom(2) for more) to avoid have to even opening a file descriptor.

And anyway, even if you don’t need cryptographically secure entropy, /dev/urandom and RtlGenRandom are easily available, good quality, and require no external dependencies. I doubt this is a bottleneck. This is all a very small amount of code, traditionally, which I’ve written before (for my own libraries in Haskell, where I needed to access the system entropy source, ~50 lines of extra C).

HMAC is easy enough to replicate, but just to make sure I’m on the same page - should this be removed or kept? If it’s sort of a work around for the error suppression, it’s never to late to rip that bandaid off and then drop HMAC. BK does advise strong data consistency/checks though, so maybe you want to keep it!

OK, so it could just be nuked in that case?

As you may be able to tell - I’d be willing to write a few patches for this, if the above sounds like an amenable way to drop the libtom dependencies and make the codebase leaner, and you all agree. I probably won’t get to it for a short amount time, but it sounds like an easy enough way to start contributing.

wscott · June 30, 2016, 7:23pm

With BAM, MD5 keys are used as an integrity check. It is too small for a strict lookup by hash. But it is part of the network protocol so it would be hard to change. And yet is would need to be reasonably fast.

For the most part, the random numbers are used for a 64-bit field that is part of the unique identification of new files. It is used in the rootkey of a file:

$ bk log -nd:KEY: -r1.0 src/zone.c
lm@indy.bitmover.com|src/zone.c|19990425222324|58584|757510058031f1d0

That last field is a random number. It doesn’t need to be secure, ie I can’t think of an attack if a user can predict the next number, but it does need to be reasonably random. When we first started in 1999, it wasn’t uncommon for /dev/random to not exist on machines and for us to not have enough seed information. Right now bk has a Rube Goldberg system of passing random number seeds down to sub-processes. That could probably all get replaced now.

I think it can be removed. I see one use currently in check.c:ignorepoly(). It can be replaced with a simple hash and a big block comment that if you use this mechanism we are not going to put your foot back on for you.

Today @calhariz demonstrated that bk won’t compile with Debian’s tomcrypt library because it is incompatible with our src/libc/stdio library, so removing this dependancy seems like a good idea.

wscott · June 30, 2016, 7:35pm

BTW the current tree for the next release is at bk://bkbits.net/u/bk/dev
It has the code to remove diff and patch already integrated.

wscott · July 1, 2016, 7:25pm

I am incorporating these as I can. I think I have the zlib pretty close. I still needed to include it for building our Windows installer at the moment.

For pcre, is it OK if the Makefile tests if -lpcre works and then automatically decides if it should be built?

calhariz · July 2, 2016, 8:55am

It is the proper thing. Just document somewhere what libraries you need for building bitkeeper, what libraries are optional and what libraries you have bundled a version of them. For building a package I put somewhere what are the build depends.

wscott · July 9, 2016, 12:28pm

@calhariz The functionality from of your patches have been included in the bk-7.3. See the release canidiate announcement:

wscott · July 14, 2016, 3:15pm

OK with the bk-7.3 release a lot of what has been discussed in this thread has been accomplished.

diff & patch are gone
pcre, zlib, lz4, tomcrypt & tommath can use the already installed versions
well Tcl still staticily compiles in tommath, but the normal Tcl does the same thing
we respect DESTDIR
the Makefile’s no longer try to assume ‘bk’ is already installed

The tomcrypt requirement is a bit of a problem on MacOS. The released 0.17 version of tomcrypt doesn’t compile on Macs because of a bad inline asm macro. On github a hacky fix was included buried in the non-released dev branch. I included a better IMHO fix in our local copy of tomcrypt.

The MacPorts tomcrypt library works, but homebrew removed tomcrypt because it “has unfixable issues” (talking about the above bug).

Bummer. I forgot to fix these for bk-7.3.

So what needs to be done next to help people with packaging bk?

terinjokes · August 22, 2016, 5:56am

I’ve been working on getting the “bitkeeper” package in pkgsrc’s wip repository updated and compiling across the major supported systems, and hopefully get it moved into the main package tree.

The 7.3 release looks to now use the system (aka, pkgsrc) versions of LZ4, PCRE, and tomcrypt. On OS X the custom tcl/tk fails to build:

/private/tmp/pkgsrc/work/wip/bitkeeper/work/bitkeeper-bk-7.3ce/src/gui/tcltk/tcl/unix/tclUnixChan.c:14:10: fatal error:
      'tclInt.h' file not found
#include "tclInt.h"     /* Internal definitions for Tcl. */
         ^

Running make with V=1 set shows that the unix and general directories are included, so I’m not sure why the file can’t be found.

gcc -c -Os -pipe  -arch x86_64   -Wall -fno-common -DBUILD_tcl -I"." -I/private/tmp/pkgsrc/work/wip/bitkeeper/work/bitkeeper-bk-7.3ce/src/gui/tcltk/tcl/unix -I/private/tmp/pkgsrc/work/wip/bitkeeper/work/bitkeeper-bk-7.3ce/src/gui/tcltk/tcl/generic […]

I’m out of ideas, do any of you have any?

wscott · August 22, 2016, 11:46am

With the information you gave above, I don’t see it. If you mail your work in progress to dev@bitkeeper.com I could try it out and see if I could figure out what is happening.

Does the Linux build work?

wscott · August 22, 2016, 4:23pm

OK, I took at pass at updating the pkgsrc-wip bitkeeper package originally done by Thomas Klausner.

I updated it to bitkeeper-7.3 and fixed the dependancies for tomcrypt, tommath, pcre, lz4 and other. I also fixed the rest of the code so bk builds fully.

wscott · August 22, 2016, 8:13pm

An update in case someone else wants to try and debug @terinjokes’ failure.

He tried switching to my newer wip bitkeeper package and it failed in tclUnixChan.c with the same “tclInt.h not found” error on macos.

I installed pkgsrc and that wip tree on OSX 10.9 and it built correctly, so I don’t know what is happening.