Thursday, November 1, 2007

Malformed Message Crashes BES 4.1 SP4

The idea that a malformed message will crash a BES server is nothing new - service packs have taken care of these issues many times in the past. I apparently discovered another one, as one of my servers crashed twice just after midnight last night.

Fortunately Domino restarted itself and was back up and operational in minutes (thanks transaction logging!). After the second crash, however, it did not crash again. Usually the BES will keep trying to re-read the malformed message and crash over and over until you figure out the message and delete it from the user's mailfile, but not this time.

From the logs I see the attempts to read which resulted in crashes:

[40000] (11/01 00:00:44.858):{0x19B0} {User} [Mailfile], ModifiedByName detected change
[40000] (11/01 00:00:44.905):{0x19B0} {User} [Mailfile], fetching modified documents since 11/01/2007 12:00:43 AM

..[CRASH HERE!]..

[40000] (11/01 00:05:02.452):{0x1830} {User} [Mailfile], ModifiedByName detected change
[40000] (11/01 00:05:02.452):{0x1830} {User} [Mailfile], fetching modified documents since 11/01/2007 12:00:43 AM

..[CRASH HERE!]..


But on the third attempt I see this:

[40000] (11/01 00:08:10.515):{0x1640} {User} [Mailfile], fetching modified documents since 11/01/2007 12:00:43 AM
[20039] (11/01 00:08:10.530):{0x1640} {User} Already attempted to open NID=3DAF2 for user User: Message has been quarantined, skipping now


Nice job RIM! This quarantining feature allowed me to stay peacefully asleep instead of having to get up and hunt down the offending message.

BTW, the message in question was a digest from a mailing list which included a BinHex encoded MIME part in the body of the message:

--B_3276667131_13573
Content-type: application/mac-binhex40; name="[Filename].doc"
Content-disposition: attachment;
filename="[Filename].doc"

(This file must be converted with BinHex 4.0)
:(8K[G#p1Eh3J9A"NBA4P)#dJ5R9XH5!R-$FZC'pM!&Fi3Nj08eG%!*!%UJ#3"GN
Jd-m4i+'a'Z%!N"!q!!-!r[m*!!B!N!X"!*!$8!#3#"!!!&)!N!-"!*!$r[q3!`#
3"%m!N!2rN2rrN,(XTF%!Kf%*"!!!q"+r!*!&!4%!!3!"!!B!!28L!!!1!'TLDQ+
`Zl#l!*!5#33@!#3d!!$Df3%!fYN"!28F!*!Hrrm2!*!*rrm2!*!*rrm2!*!4L!#


I am pretty sure this is the part that utterly confused the BES since:

a) it was in the body

and

b) was a binhex part, which I have seen trouble with in prior BES versions even when it was properly encoded.

Looking through the release notes for 4.1 SP4 MR2, I see the following:

*SDR 135729 In BlackBerry Enterprise Server Version 4.1 SP4, if a message contained truncated or incorrectly encoded data, the BlackBerry Enterprise Server might have stopped responding. In BlackBerry Enterprise Server Version 4.1 SP4 MR2 and later, this issue is resolved.

I love to see this - it means I don't have to report the issue to RIM, someone else already has! Also I can tell my manager when he asks about the crashes that it is already fixed in the next maintenance release, which we will now plan on deploying.

No comments: