A plan for packaging

Hi,

I asked my sponsor about good practices and he pointed me this page:

https://wiki.debian.org/UpstreamGuide

I did a cursory read and is full of good advice and links to more info.

I have a trivial question. What name to give to bitkeeper package and a name compatible with the paying version of it. I think that the ce version should be called bitkeeper-ce and the other version bitkeeper. What people think, specially people from other distributions.

I agree with your choices: community is -ce but I’m not a distro manager.

I intend to call it “bitkeeper” for Fedora. If there’s going to be some kind of “extended” version from BitMover, then it’ll be up to them to name the package appropriately. It’s less confusing for users and sensible in general as a distribution isn’t going to have two editions of BitKeeper in its repositories.

There’s a similar precedent in Debian. For example, Debian has GitLab CE packaged by the name “gitlab”. Again, it’s less confusing for users when the package is sensibly named.

There are more examples in Debian of that packaging name standard and I want if possible that bitkeeper ce package be named the same across several distributions. I will go with you Conan_Kudo and name my package simply bitkeeper.

Hi,

I’m a developer and also a NixOS maintainer, looking to incorporate BitKeeper into NixOS upstream. I’m also one of the maintainers of Glasgow Haskell Compiler, and have a bit of familiarity working with distro upstreams and organizing larger projects for many systems (including Windows). So I just wanted to chime in:

  • I also agree with everyone else that any upstream packages are best left as ‘bitkeeper’, and if BitMover wants to offer their own native system packages later on for Enterprisey Features, it’s probably best to call it’ bitkeeper-ee’ (Enterprise Edition) or something like that. In NixOS, I was merely going to call the package ‘bitkeeper’, and it’d be nice if we could all go with that.

  • The packaging plan looks fine to me. A minor suggestion: if BitKeeper itself has a limited subset of needed cryptographic functionality - why bother with libtomcrypt at all? From what you’ve said, you really only need it for some MD5 utilities - but MD5 is only a few hundred lines of code, and a custom utility that can work on both Linux and Windows is not much more. It’s also extremely unlikely that MD5 will be any performance bottleneck. I’d suggest just writing a simple, native MD5 utility in C that works on Windows and Unix systems and use that. This introduces a minor amount of complexity, but has other gains:

  • It completely avoids a dependency on the libtom libraries. In practice all distros deviate between packages at some level; when you have dependencies and want to work on many Linux systems, the effective complexity increases a lot, because you have to worry about the interactions between all the components, among all the different versions your users might have. libtom is very stable, so this is less of a problem, but also…

  • Because you avoid this dependency, it makes users and developers happier, IME. Users don’t have to build it (the difference between compiling an MD5 utility in two .c files and compiling all of libtom is quite big), or have it installed on their systems, the overall amount of trusted code goes down. For example, I just compiled BitKeeper on an ARM machine. On a machine like this, which takes probably 30-40 minutes to compile, making the code smaller with fewer dependencies would have been very desirable - why compile and depend on a whole library when you only need a few hundred lines of code? I estimate eliminating libtom from the build (ignoring all the other suggested improvements) would have decreased build time (and, IME, doing this helps make track dependencies more accurately; it’s much better to expose as much parallelism as possible at in the dependency graph; if you do something like depend on an .a file output by your build rules, you effectively serialize the critical path for anything that depends on that .a file. But also, having to work an entire foreign library into your own build, with accurate dependencies, sometimes sucks. Just getting rid of it avoids all this.)

  • If you want to make life easier on Windows, I strongly suggest you work towards a model of replacing the shell scripts you have with individual C programs that can replace it, where viable. Or with Little (which seems like you’ve already invested much in), I suppose. These are often larger, and more complex, but this significantly reduces build complexity on ‘foreign’ systems, means you don’t have to do things like ship complex environments to end users to support shell, and improves performance. git itself was originally a mash of Perl, shell, and C, but over time, the vast majority of git and all the user facing utilities were rewritten in C and designed to have few dependencies in mind. Not only are these usually faster, they can also avoid particular inefficiencies in the underlying system a script and Unix compat layer can’t; where Linux has very fast exec, CreateProcess on Windows is horribly slow and inefficient. This all, in total, has a big effect on systems like Windows and developing on it, and generally makes porting easier as well.

  • My understanding is that Windows wasn’t always a first class citizen of BitKeeper. As previously noted, this kind of shows in some ways (parts of bk use shell script, invoke custom build GNU tools that work on Windows, etc). If you’d like to help reduce the pain and make life easier for users and people trying to develop with bitkeeper on Windows, I strongly and heartily recommend an environment like MSYS2. MSYS2 is a very good environment for building Unix applications on Windows in a palatable way; we use it for GHC, so that developers can get a recent copy of make, autoconf, etc all on Windows and use it. These are stable ports that work well, and are updated frequently - MSYS2 even uses the pacman Package Manager for managing it all. MSYS2 allows you to build and install packages in an “MSYS environment” (which you can think of kind of like Cygwin - executables end up linking to a msys2.dll which provides some features), or using MinGW, which will produce ‘native’ Windows applications (only using things like ntdll.dll, kernel32.dll, etc and nothing else), so you can ship executables to users. It’s worth investigating and trying out; the net effect is you can immediately build native Windows programs, with Unix like workflows/tools, and a clean package manager, very quickly.

  • Regarding external packages: I have been meaning to set up a BitKeeper PPA for Ubuntu, but I haven’t done so yet. Another thing worth exploring is the OpenSuSE Build Service (OBS). OBS is free to open source developers, and offers a build system service with many supported package formats (rpm, deb) and systems (Ubuntu, Debian, RHEL, CentOS; ARMv7, PowerPC, x86/amd64, etc). It might be worth exploring for building lots of packages for end users; it’s quite a great service for free.

  • Finally, on the note of a build system: if you remove all of these 3rd party components, etc, I would imagine the complexity of the build system would drop dramatically, so from a pure complexity standpoint, it might be more tolerable. That said, something like CMake sounds like a good choice to look at for BitKeeper, long term. CMake has a bit shit syntax unfortunately (IT IS VERY VERBOSE LIKE PASCAL AND HAS LOTS OF CAPS), but it is quite good for portable applications that end-users will use, and offers some nice benefits. For example, using cmake to generate .ninja files (instead of Makefiles) for use with ninja is great - ninja is unbelievably fast, especially at rebuilds, and will make your life better. You can build millions of lines of code (like LLVM for example) and if nothing has changed, ninja has a rebuild time of almost 0 seconds, and if it has, it will rebuild and calculate that efficiently.

This is all a bit long - but having played with BitKeeper, it’s clearly a very interesting product I’d like to see continue to develop and evolve. Hopefully some of these suggestions can help bring it to a wider audience more easily.

Ok. Fair question. Originally tomcrypt was added for the commercial licensing code which has mostly all been ripped out. So let’s see what it left:

  • MD5 hash
    We use md5 hash’s for bk’s :MD5KEY: and as an index for BAM files
  • SHA1 hash
    The fast-import code needs to compute sha1’s to talk with git
  • base64
    We use tomcrypt’s base64_encode() and base64_decode() functions
  • prng code
    To generate good random numbers we seed with /dev/random and then use tomcrypt’s prng code to generate a series random numbers
  • HMAC hash
    Leftover from the commercial world we still have one case where a file uses an HMAC for validation, where we wanted to be in the loop when some error checks were surpressed. :wink:

There is still a ‘bk crypto’ command that can be used to do symmetric and asymmetric encryption, hashing, and HMACs but there isn’t really a need to keep that.

I have the first patch that prevents the compilation and linking of the bundled zlib. It don’t have the quality to be applied upstream, but can be used by others packagers.

04-remove-bundled-zlib.txt (1.1 KB)

I have a cumulative patch this time for libpcre.

05-disable-bundled-pcre.txt (4.9 KB)

Yes, so that should be pretty easy to replace.

When you say ‘index’ BAM files, I’m guessing that you mean: BitKeeper hashes the contents of files synced into BAM, and uses that hash as an index to lookup, store/download BAM files later. Is that idea correct?

If it is, then would you say MD5 is actually speed sensitive, in this case? As in, you may need to hash large (multi-GB) files. If that’s so, it’s worth spending a bit of extra time working in optimized implementations. But that’s still preferable, I think.

(Alternatively if that’s true, you could choose a faster, stronger, and much better hash function, too, like BLAKE2, which could also double as a MAC if you wanted to keep your HMAC-based verification code below. However, if my guess is correct, I’m also guessing that’s a backwards incompatible change, so it’s off the table.)

That’s easy enough to replace and I imagine fast-import probably isn’t bottlenecked very much on this.

Ditto.

Do these numbers require high entropy and secure generation for e.g. key material or something? If so, I’ll spare you a whole bunch of hand wringing and future complaints from the peanut gallery: this is very easy to replicate on both platforms. On Windows, you want to use RtlGenRandom, while on Unix, you want to use /dev/urandom. This is the same methods libraries like libsodium, etc generally use, and people seem to be somewhat standardizing on it.

I bring up hand wringing and peanut galleries because like many, many security things people are very dramatic and stern about how to generate random values. But it’s generally widely accepted to be a good idea to use /dev/urandom and stick with it on almost every modern Unix. So, my suggestion is to just call read on an fd to generate the bytes you need, and get word-values out of that with some twiddling.

On top of that, for systems like OpenBSD, and Linux 3.17+, you can use direct syscalls (see man getrandom(2) for more) to avoid have to even opening a file descriptor.

And anyway, even if you don’t need cryptographically secure entropy, /dev/urandom and RtlGenRandom are easily available, good quality, and require no external dependencies. I doubt this is a bottleneck. This is all a very small amount of code, traditionally, which I’ve written before (for my own libraries in Haskell, where I needed to access the system entropy source, ~50 lines of extra C).

HMAC is easy enough to replicate, but just to make sure I’m on the same page - should this be removed or kept? If it’s sort of a work around for the error suppression, it’s never to late to rip that bandaid off and then drop HMAC. :slight_smile: BK does advise strong data consistency/checks though, so maybe you want to keep it!

OK, so it could just be nuked in that case?


As you may be able to tell - I’d be willing to write a few patches for this, if the above sounds like an amenable way to drop the libtom dependencies and make the codebase leaner, and you all agree. I probably won’t get to it for a short amount time, but it sounds like an easy enough way to start contributing.

With BAM, MD5 keys are used as an integrity check. It is too small for a strict lookup by hash. But it is part of the network protocol so it would be hard to change. And yet is would need to be reasonably fast.

For the most part, the random numbers are used for a 64-bit field that is part of the unique identification of new files. It is used in the rootkey of a file:

$ bk log -nd:KEY: -r1.0 src/zone.c
lm@indy.bitmover.com|src/zone.c|19990425222324|58584|757510058031f1d0

That last field is a random number. It doesn’t need to be secure, ie I can’t think of an attack if a user can predict the next number, but it does need to be reasonably random. When we first started in 1999, it wasn’t uncommon for /dev/random to not exist on machines and for us to not have enough seed information. Right now bk has a Rube Goldberg system of passing random number seeds down to sub-processes. That could probably all get replaced now.

I think it can be removed. I see one use currently in check.c:ignorepoly(). It can be replaced with a simple hash and a big block comment that if you use this mechanism we are not going to put your foot back on for you.

Today @calhariz demonstrated that bk won’t compile with Debian’s tomcrypt library because it is incompatible with our src/libc/stdio library, so removing this dependancy seems like a good idea.

BTW the current tree for the next release is at bk://bkbits.net/u/bk/dev
It has the code to remove diff and patch already integrated.

I am incorporating these as I can. I think I have the zlib pretty close. I still needed to include it for building our Windows installer at the moment.

For pcre, is it OK if the Makefile tests if -lpcre works and then automatically decides if it should be built?

It is the proper thing. Just document somewhere what libraries you need for building bitkeeper, what libraries are optional and what libraries you have bundled a version of them. For building a package I put somewhere what are the build depends.

@calhariz The functionality from of your patches have been included in the bk-7.3. See the release canidiate announcement:

OK with the bk-7.3 release a lot of what has been discussed in this thread has been accomplished.

  • diff & patch are gone
  • pcre, zlib, lz4, tomcrypt & tommath can use the already installed versions
  • well Tcl still staticily compiles in tommath, but the normal Tcl does the same thing
  • we respect DESTDIR
  • the Makefile’s no longer try to assume ‘bk’ is already installed

The tomcrypt requirement is a bit of a problem on MacOS. The released 0.17 version of tomcrypt doesn’t compile on Macs because of a bad inline asm macro. On github a hacky fix was included buried in the non-released dev branch. I included a better IMHO fix in our local copy of tomcrypt.

The MacPorts tomcrypt library works, but homebrew removed tomcrypt because it “has unfixable issues” (talking about the above bug).


Bummer. I forgot to fix these for bk-7.3.


So what needs to be done next to help people with packaging bk?

I’ve been working on getting the “bitkeeper” package in pkgsrc’s wip repository updated and compiling across the major supported systems, and hopefully get it moved into the main package tree.

The 7.3 release looks to now use the system (aka, pkgsrc) versions of LZ4, PCRE, and tomcrypt. On OS X the custom tcl/tk fails to build:

/private/tmp/pkgsrc/work/wip/bitkeeper/work/bitkeeper-bk-7.3ce/src/gui/tcltk/tcl/unix/tclUnixChan.c:14:10: fatal error:
      'tclInt.h' file not found
#include "tclInt.h"     /* Internal definitions for Tcl. */
         ^

Running make with V=1 set shows that the unix and general directories are included, so I’m not sure why the file can’t be found.

gcc -c -Os -pipe  -arch x86_64   -Wall -fno-common -DBUILD_tcl -I"." -I/private/tmp/pkgsrc/work/wip/bitkeeper/work/bitkeeper-bk-7.3ce/src/gui/tcltk/tcl/unix -I/private/tmp/pkgsrc/work/wip/bitkeeper/work/bitkeeper-bk-7.3ce/src/gui/tcltk/tcl/generic […]

I’m out of ideas, do any of you have any?

With the information you gave above, I don’t see it. If you mail your work in progress to dev@bitkeeper.com I could try it out and see if I could figure out what is happening.

Does the Linux build work?

OK, I took at pass at updating the pkgsrc-wip bitkeeper package originally done by Thomas Klausner.

I updated it to bitkeeper-7.3 and fixed the dependancies for tomcrypt, tommath, pcre, lz4 and other. I also fixed the rest of the code so bk builds fully.

An update in case someone else wants to try and debug @terinjokes’ failure.

He tried switching to my newer wip bitkeeper package and it failed in tclUnixChan.c with the same “tclInt.h not found” error on macos.

I installed pkgsrc and that wip tree on OSX 10.9 and it built correctly, so I don’t know what is happening.