Issues with things like bk clone, after upgrade

I believe I got each instance of bk on that host upgraded to 7. Have you ever seen the issue detailed in this thread, though(?):

The core issue seems to be with the attempt to “su” to the bkd-eng-dev user on the host bka/letz (whether this is done local to bka or from remote, over ssh).

When I try, for instance, “su - awilson”, I’m switched to that account on letz, and I get the command-prompt back, afterward. “su - bkd-eng-dev” doesn’t return a prompt, though. The script/command that she’s running is likely having a problem with this.

Entries written to letz:/var/log/auth.log look about the same for both attempts:

letz:~# grep awilson/var/log/auth.log | grep agenerette
Nov 15 10:30:24 letz su[16759]: pam_unix(su:session): session opened for user awilson by agenerette(uid=0)

letz:~# grep bkd-eng-dev /var/log/auth.log | grep agenerette
Nov 15 10:00:31 letz su[12885]: pam_unix(su:session): session opened for user bkd-eng-dev by agenerette(uid=0)

And there does appear to be a user there for bkd-eng-dev:

letz:~# id awilson
uid=1228(awilson) gid=53(eng) groups=53(eng),100(users),113(revcvs),1228(awilson),116(review),111(engdoc)

letz:~# id bkd-eng-dev
uid=998(bkd-eng-dev) gid=999(bkd) groups=999(bkd)

So, the question is why isn’t “su - bkd-eng-dev” returning a prompt and what can we do to fix it. Nothing that I’ve found, researching the problem, has helped, so far.

Text of error:

The commit is aborted.
Error running ‘bk ci -a -y"ChangeLog for v1.2600.0" doc/ChangeLog && bk commit -y"ChangeLog for v1.2600.0"’.
Could not commit entry to the changelog!
Failed to execute command: ChangeLog --Set 1.2600.0 --File doc/ChangeLog --MilestoneComment Branch 1-2600 --User nobody --Reviewer nobody --DesignReviewer nobody --IgnoreDependencies at /common/system/bin/MaintainRepositories line 430.
Failed to execute command: ssh -q -t bk0 sudo -u bkd-eng-dev /common/system/bin/MaintainRepositories --Level 402650 --Version 1-2600 --Repository libproduct-common --Clone libproduct-common-1-26 at /common/system/bin/MaintainRepositories line 430.

Your bkd-eng-dev user is configured according to the “BKD LOGIN SHELL” description on the 'bk help bkd' man page. It is a special user account that runs a bkd as a login shell. It is set up this way to control access to a repository using ssh.

You other problems I am not going to try debugging for you. Those are layers upon layers of scripts written by your company that happen to run bk commands. You need to extract the actual bk commands and try running them directly to understand what is happening.

The original failure in the first post of this thread is interesting because it had an internal assertion from inside bk (SYMBOLS).

Hi Wayne & Anthony - this is Larry Morris, I’m a senior developer here at SciGames. I take your point that we have a lot of custom script wrappers, but this looks like an issue on any commit to a repo cloned with the new software. Getting this on a citool checkin:
Committing changes…
Checking in files…Committing in product…

bk commit failed with error 1:
check: serial 2192 doesn’t match ‘SYMBOLS(s, d)’ at slib.c:3318
check: serial 2192 doesn’t match ‘SYMBOLS(s, d)’ at slib.c:3318
The commit is aborted.

Correct the problem and then rescan for changes,
or you can Quit citool and try again later.

This is a brand-new repo clone, cloned without issues, and a simple 1-line change to a single text file

Yet look back. @agenerette has not provided any example like that. His original was a pull that failed and the local tree worked fine. That is why I reiterated the request that he run a check on the original repository.

Please run ‘bk -r check -a’ on your repository and let me know if it gets the same failure. Is this a repository can I could obtain to do testing on directly?

:bk -r check -a ~/dev/product-reveal-1-27
check: serial 2192 doesn’t match ‘SYMBOLS(s, d)’ at slib.c:3318
check: serial 2192 doesn’t match ‘SYMBOLS(s, d)’ at slib.c:3318

Looks to be the same error.
Everything is, of course, behind our company firewall. What do you need, just access to the bk0 server to perform a clone? I’d have to get approval for that, but it might be possible. If there’s anything else I can do locally to dig in, let me know.

We are essentially having the same issue as what is posted here: Consistency check failure when moving from BK4.1 to BK7.3.1CE

doc/ChangeLog 1.1 -> 1.2: 7 lines
Wrote v1.2600.0 data to doc/ChangeLog
Running: bk ci -a -y"ChangeLog for v1.2600.0" doc/ChangeLog && bk commit  -y"ChangeLog for v1.2600.0".
doc/ChangeLog revision 1.2: +7 -0 = 14
doc/ChangeLog 1.2: 14 lines
ChangeSet revision 1.2571: +2
check: serial 2192 doesn't match 'SYMBOLS(s, d)' at slib.c:3318
check: serial 2192 doesn't match 'SYMBOLS(s, d)' at slib.c:3318
The commit is aborted.
Error running 'bk ci -a -y"ChangeLog for v1.2600.0" doc/ChangeLog && bk commit  -y"ChangeLog for v1.2600.0"'.
Could not commit entry to the changelog!

No, I am pretty sure this one is different, it just happened to fail in a new area.

In bk-7 I added a whole new set of internal consistency checks because of problems we have had in the past with the tag graph. These errors are in these new checks.

I did track down one set of problems for a company that hired me independently and found a problem where a very old version of bk created a structure in the tag graph that isn’t possible in the current code. But your problem appears to be different.

Well here is one thing that might help.

Try running bk _heapdump ChangeSet and saving the output to a file and emailing it to me. You can strip or obscure some of the data if that is helpful. It might be useful to edit src/heapdump.c to make a version that doesn’t include comments for example.

Anyway, it may be with that information I would have a better idea of what is going on.

You might also comment out that line slib.c:3318 and rerun a 'bk check ChangeSet' and see if any of the other checks in that function are tripped.

Wayne, can I get an email address to send to? It’s about 3.5Mb zipped.
Looks like only two of our repos in our repo tree have this issue - our PHP repo seems fine.

Anthony, if you’re around and have access to the bk code, can you identify this slib.c file and make the suggested change? I don’t have access to the codebase.
I’m gonna introduce Robert Jewett into the conversation, since I’ll be away in FL for a few days. Robert’s quite familiar with our repo tree and mechanisms, and may be able to dig further.

Sent via direct message on this site.

I’ll check, now. To be clear: I’ll be modifying that file on hutz/bk0.

anthony do the alpharetta vms run the code from hutz/bk0? Do we have our own client install?

Judging from the messages that I’ve received, so far, each person is running the actual bk from their respective hosts: Larry, from \quiggly, Karmela, from \serak, etc. Everyone is just looking to \hutz as a central store for their repos.

@wscott, where is this slib.c file typically stored?

Oh, and I’m guessing that the file is on the hosts that are actually running bk, so I’ll check a few of those, next.

I am referring help debugging the problem yourself by building bk from source and changing the assertion that is causing the problem. You would need an experienced software developer.

OK. I looked at the problem cset. It looks like this:

serial: 2192 (1.390.5.5)
PARENT: 2191 (1.390.5.4)
PTAG: 2190 (1.390.6.2)
date: 1064501421 (03/09/25 10:50:21)
flags: 1201 (SYMGRAPH,CSET)
USERHOST: peter@smithers.revahertz
PATHNAME: ChangeSet
comments: ChangeLog:
  Changelog Entry for 0.2.324

This is a cset that is also on the tag graph. PARENT is the parent of this cset. No MERGE means it is a normal cset and not a merge. PTAG is the parent on the tag graph. Since this has both PARENT and PTAG it means it is part of both graphs and is created with 'bk commit --tag=TAG' which people often do when releasing a new version. The comments kind of match that. However, the ‘flags’ field is missing SYMBOLS so there isn’t actually a tag on this node. So this is a noop on the tag graph and should not have happened. That is why the assertion happened.

I am wondering if the tag was lost at some point or something. This damage probably happened long ago, but wasn’t discovered until the new consistency checks for the tag graph.

Commenting out that assertion and using bk would probably work in a pinch. Or doing surgery on the tree to add back the missing tag. From looking at the other tags it was clearly v0.2.324.

We won’t be building from source. Do you have instructions on “…doing surgery on the tree to add…”? Is there no switch to use that will bypass the check. Also, can you comment on this, i.e. does this behavior alter your opinion of the failure.

We have two repos 1.26 and a 1.27 that was very recently cloned from 1.26. If I clone 1.26, after running the check on the first level repos, it fails with (check: serial 2192 doesn’t match ‘SYMBOLS(s, d)’ at slib.c:3318). If I clone the 1.27 repo it works fine but after completion if I run the bk -r check -a I get the error above. Other can clone 1.26 but fail to commit to the clone, I can’t clone that one. Any reason for the different behavior?

Sorry for the delay. All day field trip with the kids. One of the advantages of being unemployed.

You had a lot of questions here, let me address some of them one by one.

That’s going to make this harder. Is there a reason for that?

No

Usually, it looks something like converting the ChangeSet file to the ascii format, editing the file to add the missing symbol, recomputing the now incorrect checksums and putting the modified file it back in place. It takes time and effort to put together a set of instructions and test them to be sure they work. And the result will fix one repository and any closes of that repository. The fix wouldn’t propagate on pull.

In what way? My summary is the same.

  • the problem cset is from 2003
  • some old version of commercial version of BitKeeper lost a tag that used to be on that cset
  • this problem was later fixed and doesn’t happen in current trees.
  • the bk-7.x series introduced new consistency checks that exposed this problem. This release was mid-2016.
  • the problem could probably be ignored while you are transitioning to git.

For performance reasons, bk does not do full checks after all operations. If a tree has passed a consistency check recently then a clone of that tree that doesn’t need to change anything can skip this check. But as I understand you an explicit check of both trees fails the same way.


In summary, you could comment out the problem line and move on.

We rebuilt from source and fixed the problems.

Wayne -
Hey this is Larry again. Any chance you might be able to feed us just another wee bit of help? We’re way further along than before, managed to get several repos baselined and imported into git with history, etc. But I’m having trouble with the incremental fast-export.
I baselined, imported, and pushed the “product-reveal-1-26” repo. Complete with history, all good.
Now I pulled from bk our incremental branch “product-reveal-1-2600” - which should have several additional csets - and I want to make that a corresponding branch on git. So I did:

bk fast-export --branch=1-2600 --incremental=../domestic/product-reveal-1-26 > ./archive/pr2600.bak
(export the new csets only since 1-26)
rsync -avz --exclude 'SCCS' product-reveal-1-2600 domestic (copy and eliminate the SCCS files)
cd ../domestic/product-reveal-1-2600 (move to the clean branch)
git init (set up the empty git repo)
cat ../../product-reveal-1-2600/archive/pr2600.bak | git fast-import (import the new csets)

and what I get is:

progress Analyzing baseline repo ../domestic/product-reveal-1-26
progress 26434 csets already imported
progress Processing files
progress Processing changes
fatal: Not a valid commit: 63180d9d1614b02ae1fb3953a9a36f356f2451ff
fast-import: dumping crash report to .git/fast_import_crash_18546

I can certainly get you the crash report. But what’s interesting is that if I do a non-incremental fast-export (that is, treat this 1-2600 repo just like the 1-26 baseline) all the csets fast-import just fine. So it’s not really a “bad” commit, right, or I’d get the same error on a full export/import?

I might be misunderstanding something about the incremental export, or just missing a step. If you have any time to think about this and see if I’m just doing something stupid, I’d appreciate any hints!

So, never mind - I think I figured out how I was being stupid.
I didn’t have the prior git history in my “clean” repo - if I just did a cp -r …/product-reveal-1-26/.git .git it brought in all the old history. After that, the git fast-import worked just fine.
As is often the case, just writing out my plea for help clarified things enough. But thanks for your help retroactively!