Issues with things like bk clone, after upgrade

I upgraded one host to version 7. My understanding is that it acts as a central location for most if not all of our repos. We’re seeing issues, that appear to have come up, since that update.

It really seems that a number of hosts who interact with those repos are running bk from nfsrooted partitions and the version of the tool on those partitions is still 6. Version 7 is backward compatible, so I’m not sure why we’re having problems. I know that we’ll need to upgrade the v6 instances, but should the version differences be causing these issues?

Here’s a synopsis of one of the error situations that has come up:

I am having issues when doing “BkPull” on my iCore 2.0 repo.

I am on my host “nancy”, under

/home/lzhu/dev/product-reveal-2-0

Below is the screen information…

nancy:(15:13:13):BkPull                                                       
PULLING ssh://bkd-eng-dev@bk0/product-reveal-2-0
Pull ssh://bkd-eng-dev@bk0/product-reveal-2-0
  -> file:///home/lzhu/dev/product-reveal-2-0
makepatch: serial 2192 doesn't match 'SYMBOLS(s, d)' at slib.c:3318
Patch aborted, SCCS/s.ChangeSet has errors
Run ``bk -r check -a'' for more information.
adler32 aborting
cmd_pull_part2: makepatch failed; status = 1
---------------------------------------------------------------------------
takepatch: saved entire patch in PENDING/2017-11-09.01
---------------------------------------------------------------------------
 
=================================== ERROR ====================================
takepatch: patch checksum is invalid.
The patch was probably corrupted in transit, sometimes mailers do this.
Please get a new copy and try again.
takepatch: other patches left in PENDING
==============================================================================
378 transferred
Pull failed: takepatch exited 1.
bk pull failed: 71 at /home/lzhu/system/bin/BkPull line 55.

Please do what the error message suggests.

Run 'bk -r check -a' and send the output. That message says that some of the metadata in your repository doesn’t match some new consistency checks with added in bk-7. Another customer had a similar problem, but it was with a different check.

The user who reported the error says that he ran the command in one of hist repos, but no output was generated at all. It just took one or two minutes to run through.

No output indicates no problems.

That suggests the problem might be with this repository ssh://bkd-eng-dev@bk0/product-reveal-2-0. What happens when you clone that?

I’m waiting for a response from one of the users on that last question, but, on another note, where I mentioned a number of hosts’ running bk from an nfsroot mount, the host “jackson”, is one that makes that filesystem available. On jackson:

jackson:(15:36:43):which bk ~
/usr/bin/X11/bk
jackson:(15:38:08):bk version ~
BitKeeper version is bk-7.3.2 for x86_64-glibc213-linux
Built by: wscott@debian70-64.bitkeeper.com in /build/bk-7.3.2-wscott/src
Built on: Sat Sep 23 2017 06:26:45 EDT (7 weeks ago)
Running on: x86_64-glibc213-linux,3.2.0-4-amd64

Yet, in the tree that it exports out, via nfsroot:

jackson:(15:36:11):/home/nfsroot/wheezy/usr/bin/bk version ~
BitKeeper version is bk-6.1.3 20150310145524 for x86_64-glibc213-linux
Options: Pro,BAM,ADM,ET
Customer ID: 3eb86a9f0001
Built by: wscott@debian70-64.bitkeeper.com in /build/bk-6.1.3-wscott/src
Built on: Tue Mar 10 2015 10:58:43 EDT (33 months ago)
Running on: x86_64-glibc213-linux,3.2.0-4-amd64
Latest version: bk-7.3.2 (released 7 weeks ago)
jackson:(15:36:29):which bk ~
/usr/bin/X11/bk

Would an upgrade of that older version of the binary just call for a run of “bk-7.3.2-x86_64-glibc213-linux.bin /usr/bin/X11/”?

You just demonstrated that /usr/bin/X11 is the new version of bk so that isn’t the one to upgrade.

Run '/home/nfsroot/wheezy/usr/bin/bk bin' to find where that version of bk is installed and then upgrade that location. ‘bk’ is just a symlink to a bk installation. And the installer just unpacks that directory and creates the symlink. You could just copy the files you want if that is easier.

Yeah, I meant to reference “/home/nfsroot/wheezy/usr/bin/” in that comment, not “/usr/bin/X11”.

I believe I got each instance of bk on that host upgraded to 7. Have you ever seen the issue detailed in this thread, though(?):

The core issue seems to be with the attempt to “su” to the bkd-eng-dev user on the host bka/letz (whether this is done local to bka or from remote, over ssh).

When I try, for instance, “su - awilson”, I’m switched to that account on letz, and I get the command-prompt back, afterward. “su - bkd-eng-dev” doesn’t return a prompt, though. The script/command that she’s running is likely having a problem with this.

Entries written to letz:/var/log/auth.log look about the same for both attempts:

letz:~# grep awilson/var/log/auth.log | grep agenerette
Nov 15 10:30:24 letz su[16759]: pam_unix(su:session): session opened for user awilson by agenerette(uid=0)

letz:~# grep bkd-eng-dev /var/log/auth.log | grep agenerette
Nov 15 10:00:31 letz su[12885]: pam_unix(su:session): session opened for user bkd-eng-dev by agenerette(uid=0)

And there does appear to be a user there for bkd-eng-dev:

letz:~# id awilson
uid=1228(awilson) gid=53(eng) groups=53(eng),100(users),113(revcvs),1228(awilson),116(review),111(engdoc)

letz:~# id bkd-eng-dev
uid=998(bkd-eng-dev) gid=999(bkd) groups=999(bkd)

So, the question is why isn’t “su - bkd-eng-dev” returning a prompt and what can we do to fix it. Nothing that I’ve found, researching the problem, has helped, so far.

Text of error:

The commit is aborted.
Error running ‘bk ci -a -y"ChangeLog for v1.2600.0" doc/ChangeLog && bk commit -y"ChangeLog for v1.2600.0"’.
Could not commit entry to the changelog!
Failed to execute command: ChangeLog --Set 1.2600.0 --File doc/ChangeLog --MilestoneComment Branch 1-2600 --User nobody --Reviewer nobody --DesignReviewer nobody --IgnoreDependencies at /common/system/bin/MaintainRepositories line 430.
Failed to execute command: ssh -q -t bk0 sudo -u bkd-eng-dev /common/system/bin/MaintainRepositories --Level 402650 --Version 1-2600 --Repository libproduct-common --Clone libproduct-common-1-26 at /common/system/bin/MaintainRepositories line 430.

Your bkd-eng-dev user is configured according to the “BKD LOGIN SHELL” description on the 'bk help bkd' man page. It is a special user account that runs a bkd as a login shell. It is set up this way to control access to a repository using ssh.

You other problems I am not going to try debugging for you. Those are layers upon layers of scripts written by your company that happen to run bk commands. You need to extract the actual bk commands and try running them directly to understand what is happening.

The original failure in the first post of this thread is interesting because it had an internal assertion from inside bk (SYMBOLS).

Hi Wayne & Anthony - this is Larry Morris, I’m a senior developer here at SciGames. I take your point that we have a lot of custom script wrappers, but this looks like an issue on any commit to a repo cloned with the new software. Getting this on a citool checkin:
Committing changes…
Checking in files…Committing in product…

bk commit failed with error 1:
check: serial 2192 doesn’t match ‘SYMBOLS(s, d)’ at slib.c:3318
check: serial 2192 doesn’t match ‘SYMBOLS(s, d)’ at slib.c:3318
The commit is aborted.

Correct the problem and then rescan for changes,
or you can Quit citool and try again later.

This is a brand-new repo clone, cloned without issues, and a simple 1-line change to a single text file

Yet look back. @agenerette has not provided any example like that. His original was a pull that failed and the local tree worked fine. That is why I reiterated the request that he run a check on the original repository.

Please run ‘bk -r check -a’ on your repository and let me know if it gets the same failure. Is this a repository can I could obtain to do testing on directly?

:bk -r check -a ~/dev/product-reveal-1-27
check: serial 2192 doesn’t match ‘SYMBOLS(s, d)’ at slib.c:3318
check: serial 2192 doesn’t match ‘SYMBOLS(s, d)’ at slib.c:3318

Looks to be the same error.
Everything is, of course, behind our company firewall. What do you need, just access to the bk0 server to perform a clone? I’d have to get approval for that, but it might be possible. If there’s anything else I can do locally to dig in, let me know.

We are essentially having the same issue as what is posted here: Consistency check failure when moving from BK4.1 to BK7.3.1CE

doc/ChangeLog 1.1 -> 1.2: 7 lines
Wrote v1.2600.0 data to doc/ChangeLog
Running: bk ci -a -y"ChangeLog for v1.2600.0" doc/ChangeLog && bk commit  -y"ChangeLog for v1.2600.0".
doc/ChangeLog revision 1.2: +7 -0 = 14
doc/ChangeLog 1.2: 14 lines
ChangeSet revision 1.2571: +2
check: serial 2192 doesn't match 'SYMBOLS(s, d)' at slib.c:3318
check: serial 2192 doesn't match 'SYMBOLS(s, d)' at slib.c:3318
The commit is aborted.
Error running 'bk ci -a -y"ChangeLog for v1.2600.0" doc/ChangeLog && bk commit  -y"ChangeLog for v1.2600.0"'.
Could not commit entry to the changelog!

No, I am pretty sure this one is different, it just happened to fail in a new area.

In bk-7 I added a whole new set of internal consistency checks because of problems we have had in the past with the tag graph. These errors are in these new checks.

I did track down one set of problems for a company that hired me independently and found a problem where a very old version of bk created a structure in the tag graph that isn’t possible in the current code. But your problem appears to be different.

Well here is one thing that might help.

Try running bk _heapdump ChangeSet and saving the output to a file and emailing it to me. You can strip or obscure some of the data if that is helpful. It might be useful to edit src/heapdump.c to make a version that doesn’t include comments for example.

Anyway, it may be with that information I would have a better idea of what is going on.

You might also comment out that line slib.c:3318 and rerun a 'bk check ChangeSet' and see if any of the other checks in that function are tripped.

Wayne, can I get an email address to send to? It’s about 3.5Mb zipped.
Looks like only two of our repos in our repo tree have this issue - our PHP repo seems fine.

Anthony, if you’re around and have access to the bk code, can you identify this slib.c file and make the suggested change? I don’t have access to the codebase.
I’m gonna introduce Robert Jewett into the conversation, since I’ll be away in FL for a few days. Robert’s quite familiar with our repo tree and mechanisms, and may be able to dig further.

Sent via direct message on this site.

I’ll check, now. To be clear: I’ll be modifying that file on hutz/bk0.

anthony do the alpharetta vms run the code from hutz/bk0? Do we have our own client install?

Judging from the messages that I’ve received, so far, each person is running the actual bk from their respective hosts: Larry, from \quiggly, Karmela, from \serak, etc. Everyone is just looking to \hutz as a central store for their repos.

@wscott, where is this slib.c file typically stored?