Occasional regression failure: upgrade running in deleted dir

We are seeing an elusive bug in regressions where it core dumps in bk upgrade occasionally.

The failure looks something like this:

Simple interface test (no repo) val .........................OK
Simple interface test (no repo) version .....................OK
Simple interface test (no repo) what ........................OK
Simple interface test (no repo) which .......................OK
Simple interface test (no repo) xflags ......................OK
Simple interface test (no repo) zone ........................OK
-rw------- 1 wscott uucp 598016 Sep 19 09:26 /build/.regression wscott/core
/build/.regression wscott/core: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from 'bk upgrade --update-latest'
/build/.dev-oss-stage.wscott: line 66: 156 - : syntax error: operand expected (error token is "- ")

This was a failure in t.simple-interface.no-repo. (It happens in other places) That regression tries to run every bk command in an empty directory with a couple different generic command line arguments. It is trying to show that commands won’t fail because we didn’t hand bad command line arguments.

The backtrace is as follows:

(gdb) bt
#0  0x00007fcc290d1107 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007fcc290d24e8 in __GI_abort () at abort.c:89
#2  0x00007fcc290ca226 in __assert_fail_base (
    fmt=0x7fcc29200ce8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=assertion@entry=0x5a2d9a "0", 
    file=file@entry=0x5b5472 "utils.c", line=line@entry=1730, 
    function=function@entry=0x5ac4e0 <__PRETTY_FUNCTION__.12723> "rmdir_findprocs") at assert.c:92
#3  0x00007fcc290ca2d2 in __GI___assert_fail (
    assertion=assertion@entry=0x5a2d9a "0", 
    file=file@entry=0x5b5472 "utils.c", line=line@entry=1730, 
    function=function@entry=0x5ac4e0 <__PRETTY_FUNCTION__.12723> "rmdir_findprocs") at assert.c:101
#4  0x000000000051f28a in rmdir_findprocs () at utils.c:1730
#5  0x0000000000522656 in rmdir_findprocs () at utils.c:1757
#6  0x0000000000406530 in bk_cleanup (ret=ret@entry=2) at bk.c:1117
#7  0x0000000000404b31 in bk_cleanup (ret=2) at bk.c:775
#8  main (ac=2, av=0x7ffc88a26658, env=0x7ffc88a26678) at bk.c:747

An assert(0) called during the cleanup routines after bk exits. In this case a routine at utils.c:rmdir_findprocs(). This routine ONLY runs in regressions and can’t happen in production. It reads /proc and sees if any bk processes are running in a deleted directory. That is strictly not allowed on Windows so we make sure Linux never does it so we don’t break Windows compatibility.

In this case, the process finds that it is running in a deleted directory.

What appears to be happening here is that ‘bk upgrade’ will spawn a process in the background that will talk to bk’s server to find the latest release and write that result in $HOME/.bk/latest-bkver. That background process occasionally takes long enough that the regression is finished and cleaning up before exiting and this causes the assert.

We need to tweak something to avoid this regression failure. One or more of the following:

  • Don’t spawn a background process in regressions
  • Don’t freak out if this command is the one running in a deleted directory
  • Have that upgrade process chdir($HOME) before running (Unfortunately $HOME is fake in regressions and that is the directory being deleted.)
  • clear $BK_REGRESSION in that background process so we skip this test

Trying the last option like this:

===== src/upgrade.c 1.63 vs edited =====
--- 1.63/src/upgrade.c	2016-09-17 11:13:50 -04:00
+++ edited/src/upgrade.c	2017-09-19 14:21:19 -04:00
@@ -446,6 +446,9 @@
 	if (time(0) - mtime(buf) > DAY) {
 		/* latest-bkver file is old, start update in background */
 		av = addLine(av, "bk");
+
+		/* avoid atexit() tests that can hang in regressions */
+		av = addLine(av, "-?BK_REGRESSION=");
 		av = addLine(av, "upgrade");
 		av = addLine(av, "--update-latest");
 		av = addLine(av, 0);

undocumented option: "bk '-?VAR=val&VAR2=val2' cmd" can set things in the environment of a bk command. Yeah, "env VAR=val bk cmd" would do the same thing.)