It is no secret that automatic fail-over in distributed environments is no picnic to implement. Here are some useful pointers if you ever decide to do it on your own:
- Make sure to implement some sort of heartbeat protocol. A heartbeat is a message that every node emits to tell others that it's alive. It is usually implemented with IP Multicast, however actual communication protocol is not important here. Other nodes will consider a node to be failed after it missed a certain pre-configured number of heartbeats.
- Account for delays in node discovery. There is always a time window between an actual node crash and when other nodes find out about it.
- Store all messages on sender node until they get processed. This way you can fail them over to other nodes in case if the processing node failed.
- Account for possibility of receiving multiple notification events for the same node failure - you don't want to process the same fail-over event more than once.
- Make sure that your message does not get failed-over forever, i.e. keeps jumping between grid nodes indefinitely. After a certain number of fail-over attempts, let the whole processing of the message fail.
- Make sure that your message does not get failed-over to the same node it failed on initially - always give preference to other grid nodes.
- Make sure that message failure is not limited to node crashes. For example, you may potentially want to fail-over a message if it threw some exception on remote node or returned a bad result.
- Avoid sending any messages within synchronization blocks - this is a sure way to introduce deadlocks into your code.
- Make sure that fail-over happens automatically at infrastructure level and is transparent to your application logic.
- Provide a good interface for your Failover module and make it pluggable - failover logic, such as selecting a new node, may differ based on your application policy, so it is essential to be able to easily switch underlying implementation.


1 comments:
I am just implementing the fail over clustering using activemq 4.1.1. Because i am new to this, Can any body tell me how to implement.
Post a Comment