Friday, March 16, 2007

Moves between Servers - Not 100% foolproof

I moved over 500 accounts from an old BES server to a new BES server today. I set the expectation with our regional managers and BB admins that there would be no impact to end users during this migration. How could I be so naive?

First, a minor amount of users were not able to send or send & receive after the move, and required either pushing the service books, reactivation, or complete account removal and re-add / re-activate to get working again. This is not so bad, it is a small number (<20 probably) of 500 people.

The bigger problem was this: The source (old) BES server's "Blackberry Synchronization Service" would sometimes freak out during bulk moves. I would move maybe 50 people at a time max, but apparently that was too much for the sync service to handle. So it would just quit, and I would have to restart it. I also noticed that the initial moves went really fast, a few per minute, but then they started to bog down and it would take 5 minutes to move one account.

That is no big deal, until I realized by looking through the logs what was happening: the sync service contains the device backup service, and it was going through and deleting the backup data not only for the accounts being moved, but FOR EVERY USER ON THE SERVER: (names removed to protect the innocent)

[46036] (03/14 10:15:41):{0xC50} [SYNC-Gate] Start removing user. [W
[46036] (03/14 10:15:41):{0xC50} [SYNC-Gate] Start removing user. [R
[46036] (03/14 10:15:42):{0xC50} [SYNC-Gate] Start removing user. [Z
[46036] (03/14 10:15:42):{0xC50} [SYNC-Gate] Start removing user. [O
[46036] (03/14 10:15:43):{0xC50} [SYNC-Gate] Start removing user. [S
[46036] (03/14 10:15:43):{0xC50} [SYNC-Gate] Start removing user. [H
[46036] (03/14 10:15:44):{0xC50} [SYNC-Gate] Start removing user. [K
[46036] (03/14 10:15:44):{0xC50} [SYNC-Gate] Start removing user. [D
[46036] (03/14 10:15:45):{0xC50} [SYNC-Gate] Start removing user. [G
[46036] (03/14 10:15:45):{0xC50} [SYNC-Gate] Start removing user. [B
[46036] (03/14 10:15:46):{0xC50} [SYNC-Gate] Start removing user. [D
[46036] (03/14 10:15:46):{0xC50} [SYNC-Gate] Start removing user. [O
[46036] (03/14 10:15:47):{0xC50} [SYNC-Gate] Start removing user. [S
[46036] (03/14 10:15:48):{0xC50} [SYNC-Gate] Start removing user. [R
[46036] (03/14 10:15:48):{0xC50} [SYNC-Gate] Start removing user. [A
[46036] (03/14 10:15:48):{0xC50} [SYNC-Gate] Start removing user. [C
[46036] (03/14 10:15:49):{0xC50} [SYNC-Gate] Start removing user. [F
[46036] (03/14 10:15:49):{0xC50} [SYNC-Gate] Start removing user. [N
[46036] (03/14 10:15:50):{0xC50} [SYNC-Gate] Start removing user. [R
[46036] (03/14 10:15:50):{0xC50} [SYNC-Gate] Start removing user. [W
[46036] (03/14 10:15:51):{0xC50} [SYNC-Gate] Start removing user. [B
[46036] (03/14 10:15:51):{0xC50} [SYNC-Gate] Start removing user. [G
[46036] (03/14 10:15:52):{0xC50} [SYNC-Gate] Start removing user. [A
[46036] (03/14 10:15:53):{0xC50} [SYNC-Gate] Start removing user. [P
[46036] (03/14 10:15:53):{0xC50} [SYNC-Gate] Start removing user. [Z
[46036] (03/14 10:15:54):{0xC50} [SYNC-Gate] Start removing user. [B
[46036] (03/14 10:15:55):{0xC50} [SYNC-Gate] Start removing user. [L
[46036] (03/14 10:15:58):{0xC50} [SYNC-Gate] Start removing user. [M

Once this happened, it kicked off an OTA device backup, which once again sync'ed everything back from each device, which slowed down EVERYTHING.

Now here is the worst part: many of the devices (with accounts that were not even scheduled to be moved that day) received either:

1) An activation complete - OK prompt
2) A continuing activation process
3) A leftover activation icon on their ribbon (home screen).

I started to get calls from admins in other cities, who weren't even scheduled to be moved that day. Bad.

Lesson: Don't move more than 10-20 accounts at a time, and watch the SYNC service.

Oh, and by the way - the brand new server hardware I migrated to? It crashed this morning. :(

2 comments:

todonotes said...

Mark, I'm curious to know if I would use this same proces if I'm merging two blackberry server into a single blackberry server. We purchased a new company and we're going to consolidate their bb users onto our bb server.

Mark said...

Hi todonotes, this method cannot be used to merge two separate orgs together. In that situation you would need to wipe and reactivate one company's BB's to consolidate. I have heard rumblings of solutions where you can merge different orgs without a reactivation but have not investigated this as of yet, perhaps as part of a future post...