I found a bad bug where the loss of a reply caused a dead lock of
of primary and secondary services during sync up. It happens very
rarely, but i can happen.
I was wondering of people could share their thoughts in designing
robust client-server protocols.
Depends on what you mean by "robust". One definition is "any process that continues to run will halt with an output value in a fixed number of steps, regardless of delays or failures by other processes". This definition is discussed in:
which proves that for a large set of practical problems, this kind of robustness is an impossibility (mathematically).
Another paper:
discusses the different models of asynchronous concurrent computing, and builds a hierarchy of models that can be implemented on top of other models, losing a degree of robustness at each stage.
However, there is also a positive paper that applies the same methods:
- M.P. Herlihy, V. Luchangco, M. Moir and W.M. Scherer. Software Transactional Memory for Dynamic-sized Data Structures, Proceedings of the Twenty-Second Annual ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing (PODC), July 2003 (http://www.cs.brown.edu/people/mph/HerlihyLM03b/main.pdf)
It doesn't promise the same degree of robustness (called "wait-free") that has been proven impossible, but does promise a weaker form of robustness, called here "obstruction-free".
Is there something more concrete programmers can use in client-server messaging protocols. It's not clear how to turn those papers into systems.
SORRY! I had the wrong reference for the third one (corrected now). The right reference
is directly applicable in programming!