Tuesday, January 9, 2007

Knife-Edge Cutover for Domino BES

Here is a compilation of the steps required to do a knife-edge cutover (also called a forklift upgrade) to move a BES server over to new hardware. Disclaimer: I have not done this yet, I am compiling these steps for a future migration. So there may be steps missing or just wrong. But it may be useful as an initial starting point.

1. Build new server with new windows server name and ip address.
2. Stop Domino service on original server
3. Copy the following files to the new server:
- d:\lotus\domino\data\server.id
- d:\lotus\domino\data\mdsservi.id
- d:\lotus\domino\data\names.nsf
- d:\lotus\domino\data\certlog.nsf
- d:\lotus\domino\data\BES\*.*
4. Install Domino with same release and fixpack as original server (click Yes to any registry prompt)
5. Copy the following file from the original server c:\lotus\domino\notes.ini
6. Remove the BES task from startup in Notes.ini
7. Set Lotus Domino service startup to Manual if not already
6. Shut down original server completely and unplug from network
7. Delete original server from Windows AD
9. Change IP address on new server to old IP address
8. Change name on new server to old server name
9. Reboot server to have name change take effect
10. Disconnect network cable
11. From console, start Domino to ensure it operates properly
12. Shut down Domino
13. Re-connect to network
14. Set service to automatic and add BES to startup tasks
15. Start server and check for errors


Note: If using a new IP address, check the server doc Net Address field

BES Stability - A History

Well the stability of our BES servers have been up and down over the last few years. Let me detail what has happened and where we are now.

2004 Implement BES in production. Ramp up to over 1,000 users within 1 year.

2005 BES 2.2 begins having stability issues with different types of faults, about once or twice a week. Requires a restart each time, which takes awhile to initialize given >1,000 users on the box.

10/2005 After submitting enless tickets and logfiles and dumpfiles to RIM, decided - even though monitoring said it was not an issue - to upgrade the RAM on our BES box from 2GB to 4GB. Immediately notice vast improvement - crashes virtually end overnight. Huzzah!

2006 Now have a new 4.0 server (w/ 4GB) that approaches and surpasses 1,000 users without issue.

Fall 2006 At about 1,300 users, 4.0 starts getting crashes now in either NBES or NSERVER process. Back down below 1,100 users and the server stops crashing (regularly)!

Lessons learned: 1,000 is just about the technical limit for a BES Domino box, even with 4GB of memory.

There are reasons why you would want to go below this limit for recoverability as well:

1) Restart times are less with lower users
2) Log file size (and associated searching, zipping up for RIM, etc) is smaller
3) Less time to do sourceless moves to another available server in DR situation
4) Less users out of service when server goes down
5) Maintain spare capacity to move users from other servers in case they go down

For all these reasons, I like to use 500 as an upper limit for number of BB accounts per BES 4.0 server. Yes this is much less than the theoretical 2,000 per server, but then I notice I end up sleeping better these days vs. when I didn't know how the server would act the next day...

Beware Sourceless Moves!

A few months back we had one of our servers go down, fortunately it "only" had about 350 users on it. Well I thought what a great opportunity to test out the sourceless move option to our hot standby server in our Colocation site.

I moved a few users at a time to be sure it worked, and it only seemed to take a minute or so per user. So I got some confidence and started moving 10-20 people at a time. (I didn't get confident enough to bulk move more than that) All this time I was watching the logs to see when the accounts completed the move process to gauge how fast it was going. Pretty nice, actually, the move automatically updates the service books on the device with the new SRP ID, so no user intervention required, in fact it truly is seamless. Nice.

Well after a couple of hours all the accounts were moved over and I focused on rebuilding the original server. Then after awhile I noticed something odd - there were some error messages in the log I was not familiar with:

[NoteID] was not found in any of the 1 redirected folders

Well actually I have seen these before but not so many of them. After a LOT of investigation, I discovered that Sent Items were not being redirected to *any* of the devices that had been moved to the new server.

Although I had replicas of all the user's State DB's on the Colo BES server, I had assumed that I did not a backup of the BlackBerryProfiles.nsf database. Upon each move, the new BES would pull info out of SQL and create a new Profile entry in this empty database. "Great", I thought, "that creates a fresh clean profile entry, and besides, we are SQL all the way now, so these profile entries aren't really used."

Ummm.... not quite.

Creating a fresh profile DB entry for each user misses some critical information that, apparently - even in this age of the SQL DB as the master and core of our BES environment - exists only in the profile DB entries.

I will name names here. In particular, the following fields:

1) DeviceCapabilities
2) RedirectSelectedFolders

First of all, DeviceCapabilities is set to null upon a sourceless move, and RedirectSelectedFolders doesn't even exist!

What are the ramifications of this? Well for DeviceCapabilities, the BES server will note that it is set to nothing and assume that it is a pre-4.0 device, and thus will not reconcile the Sent Items from Notes onto the device, and will throw the error I noted above.

For RedirectSelectedFolders, this would be created and set to "1" if someone went into their subfolders and marked selected ones for redirection to the device. If this field doesn't exist, then the BES assumes naturally that they don't have any subfolders to redirect.

So the "seamless" sourceless moves result in all devices not getting Sent Items or any custom redirected subfolders. Not good.

Of course, reactivating the device fixes this stuff. But was I about to call 350 users and reactivate them all? Nope.

In order to fix this from the server side (which is where I prefer to be), I had to get a backup copy of the BlackBerryProfiles.nsf DB from the downed server, and script an agent to populate the fields in the new Profiles BD. Although partially scripted, this was not automated, so I ran the script for each account, copying and pasting in the long string in the DeviceCapabilities field which the script then set in the new Profiles db entry.

When these fields matched what was in the previous Profiles db, then magically people began getting their Sent Items and subfolders redirected as normal. Not a fun day for me though.

Of course I called RIM to ask about this "bug" and was told that they knew about this behavior but it really wasn't planned on being fixed.

The upshot is.... never allow the BlackBerryProfiles.nsf database to be recreated, either by deleting it yourself or doing a sourceless move to a new server. You will lose valuable info in each document. Instead, in addition to the backup replicas you have of the state DB's, have a backup replica of this DB as well, so that in an emergency you can copy and paste the profiles into the database on the backup server, so they don't get autocreated and inherit all the fields with valuable goodies in them.

One final note, this only applies to sourceless moves - if you have the original BES server up and available during an account move, the destination server will pull all the information from the original profile document into it's database cleanly.

BlackBerry Lookup - The Details

The BlackBerry lookup feature allows one to do a lookup against the Domino address book from the device. Some common questions I see regarding this feature are:

1) What fields does it pull from?

and...

2) Can I customize the fields it uses?

Well the answer to #2 is No, not as of September 2007, the code is fixed and will probably not be customizable by us BES admins for awhile, if ever. The answer to the first question, however, is a little more straightforward.

Here is a chart of the fields that, if populated in the Domino address book Person documents, are delivered as results to the device:


Here is a more graphical view of the actual Person document form and matching results fields, for those visual thinkers out there:


The Purpose of this Blog

This blog was created to share information related to the BlackBerry Enterprise Server for Domino software product. I am currently an administrator for 4 BES servers serving over 1,800 devices worldwide, so I have a lot of day to day experience with this product which you just can't get reading the manuals or attending the tradeshows. I hope to provide useful information related to the administration of BES servers, as well as gain some valuable feedback as well.

Thanks for browsing!