Tuesday, January 9, 2007

BES Stability - A History

Well the stability of our BES servers have been up and down over the last few years. Let me detail what has happened and where we are now.

2004 Implement BES in production. Ramp up to over 1,000 users within 1 year.

2005 BES 2.2 begins having stability issues with different types of faults, about once or twice a week. Requires a restart each time, which takes awhile to initialize given >1,000 users on the box.

10/2005 After submitting enless tickets and logfiles and dumpfiles to RIM, decided - even though monitoring said it was not an issue - to upgrade the RAM on our BES box from 2GB to 4GB. Immediately notice vast improvement - crashes virtually end overnight. Huzzah!

2006 Now have a new 4.0 server (w/ 4GB) that approaches and surpasses 1,000 users without issue.

Fall 2006 At about 1,300 users, 4.0 starts getting crashes now in either NBES or NSERVER process. Back down below 1,100 users and the server stops crashing (regularly)!

Lessons learned: 1,000 is just about the technical limit for a BES Domino box, even with 4GB of memory.

There are reasons why you would want to go below this limit for recoverability as well:

1) Restart times are less with lower users
2) Log file size (and associated searching, zipping up for RIM, etc) is smaller
3) Less time to do sourceless moves to another available server in DR situation
4) Less users out of service when server goes down
5) Maintain spare capacity to move users from other servers in case they go down

For all these reasons, I like to use 500 as an upper limit for number of BB accounts per BES 4.0 server. Yes this is much less than the theoretical 2,000 per server, but then I notice I end up sleeping better these days vs. when I didn't know how the server would act the next day...

No comments: