Monday, September 24, 2007

Wcf reliability, load balancing and timeouts

This entry is not complete and provides just very raw thoughts.

Which protocol to pick for reliable messaging?

The following note in the msdn documentation seems to rule the Ws* binding out of the equation for the reliable messaging over load balancer.

Both the WSHttpBinding and the WSDualHttpBinding can be load balanced using HTTP load balancing techniques ...:

  • Turn off Security Context Establishment ...

  • Do not use reliable sessions. This feature is off by default.

Given basicHttpBinding can't use the reliable messaging, the above is enough for me to stick, if possible, to the TCP as the best option.

Configure NetTcpBinding to use reliable messaging.

TODO: provide a link

The pitfall I fell into was:

Quite important to realize:
"What is the security mode on your NetTcpBinding set to?  If you are using the default binding settings then it is set to Transport Security.
One thing you should know is that the relaible session's transport connection re-establishment feature works when SecurityMode is set to Message or when the SecurityMode is set to None.  If the SecurityMode is set to Transport, this feature is not available and that would explain why the transport connection was not re-established." (Source: http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1482094&SiteID=1).

Configure timeouts.

Either you target reliable messaging or not, timeouts play the vital role in your solution's "reliability/availability".

openTimeout, closeTimeout, receiveTimeout, session inactivityTimeout

One can be quite confused about which timeouts mean what as you can see from the following links, it is not exactly the most consistent names used:
http://blogs.msdn.com/madhuponduru/archive/2007/02/24/how-to-change-wcf-session-time-out-value.aspx
http://blogs.conchango.com/pauloreichert/archive/2007/02/22/WCF-Reliable-Sessions-Puzzle.aspx
The best explanation onto inactivity and receive timeout I've seen so far is from Nicholas Allen:
http://blogs.msdn.com/drnick/archive/2007/06/26/session-lifetime-on-the-server.aspx

ChannelInitializationTimeout.

One timeout setting that gave us a hard time was a ChannelInitializationTimeout. By default it is set to 5 seconds. It is hard to realize how channel can take longer time to initialize, although the following information may give a reason:

Important to know:
There's now a quota setting for how long a connection can take to authenticate. A client needs to send a few hundred bytes of data before the server has enough information to perform authentication. Previously, the client had the entire receive timeout in which to send this data. The header is often much smaller than a normal sized message and clients are anonymous until this information is processed. The ChannelInitializationTimeout on the binding element limits the amount of time a malicious client can tie up one of the connections for your server. (Source: Nicholas Allen's blog)

From the above quote it follows that if client somehow fails to authenticate itself within 5 seconds (by default), connection is going to be dropped by the server. That was exactly what we saw in the client/server wcf log. I can see the possible reason in the authentication with the kerberos server to take longer time.
Very unfortunately you can not just setup this value on the NetTcpBinding. It can be only configured via a CustomBinding. It can be a good idea then to use a CustomBinding anyway, because generally, it gives you more flexibility.

Configure quotas.

Pay special attention to the reliability requirements for connections number, as reliability may require doubling of communication needs.

Transport quotas (with default values):
http://msdn2.microsoft.com/en-us/library/ms731078.aspx

No comments: